[jira] [Commented] (SOLR-2592) Pluggable shard lookup mechanism for SolrCloud

Dan Rosher (JIRA) Fri, 14 Sep 2012 02:06:12 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13455681#comment-13455681
 ]


Dan Rosher commented on SOLR-2592:
----------------------------------

I think I should reiterate that the default is the HashShardPartitioner, the 
NamedShardPartitioner was supplied as an example, and does what we needed.

HashShardPartitioner partitions by hash(id) % num_shards much as the existing 
implementation.

NamedShardPartitioner sends the doc to a particular shard, so that e.g. shard 
Sep2012 ONLY contains docs with doc.shard=Sep2012. Docs with doc.shard=Oct2012 
would live in another shard. I think this works much the way Lance pointed out 
on 07/Jun/12 04:49.

Michael - The problem for us with the patch you've submitted for the composite 
id, is that it still uses hashing to determine the shard to reside. 
On the indexing side, hashes of the composites might mean that e.g. 
doc=1234_Sep2012 and doc=4567_Oct2012 might end up in the same hash range and 
hence on the same shard, one might even end up with ALL docs on the same shard 
for example. 
On the searching side, again as hashing is used, it's not a simple task to 
determine which shard docs for Sep2012 would reside and so a query would need 
to be sent everywhere which would be less efficient, perhaps by a large margin, 
than sending the query directly to the shard. 

With the NamedShardPartitioner I think that since we know that all related docs 
live on the same shard, it should be more obvious how to split/merge shard 
indicies if desired.

These are just two implementations, but since we asbstract ShardPartitioner, a 
developer can write something that more suits their needs. 
                
> Pluggable shard lookup mechanism for SolrCloud
> ----------------------------------------------
>
>                 Key: SOLR-2592
>                 URL: https://issues.apache.org/jira/browse/SOLR-2592
>             Project: Solr
>          Issue Type: New Feature
>          Components: SolrCloud
>    Affects Versions: 4.0-ALPHA
>            Reporter: Noble Paul
>            Assignee: Mark Miller
>         Attachments: dbq_fix.patch, pluggable_sharding.patch, 
> pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_r1373086.patch, 
> SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, 
> SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch
>
>
> If the data in a cloud can be partitioned on some criteria (say range, hash, 
> attribute value etc) It will be easy to narrow down the search to a smaller 
> subset of shards and in effect can achieve more efficient search.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-2592) Pluggable shard lookup mechanism for SolrCloud

Reply via email to