[jira] [Commented] (SOLR-2592) Pluggable shard lookup mechanism for SolrCloud

Dan Rosher (JIRA) Mon, 17 Sep 2012 03:30:12 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13456927#comment-13456927
 ]


Dan Rosher commented on SOLR-2592:
----------------------------------

The idea of a shard.key is what I did with the supplied patch, e.g.

<shardPartitioner name="ShardPartitioner" 
class="org.apache.solr.cloud.NamedShardPartitioner">
    <str name="shardField">date</str>
  </shardPartitioner>

Though we could use any field, region,date etc. It's NOT specifically about 
date partitioning and it's at the users discretion.

The default is a HashPartition:

hash(id) % num_shards 

Michael - Your suggestion on 15/Sep/12 02:36 for us still wouldn't address the 
issue of knowing exactly on what shard a doc lives. For our (and I guess for 
most) apps, most queries are search ones, and we'd need to send a query to 
every shard, but in our app, I already know in advance what subset of the index 
I need to search, and to speed the query up I'd want to index docs that way too 
so that I ONLY need to query a particular shard. If I know the subset in 
advance, anything with fq=... seems wasteful to me.

The downside of my implementation is that deletes and RealTimeGets would be 
slower since the id alone is not enough to determine shard membership, and 
hence needs to be sent everywhere, but I suspect in most applications, this is 
a welcomed compromise as most queries will be search ones.

Perhaps shard membership can be efficiently stored in a distributed bloom 
filter or something like, to speed that up?

All this aside, as a compromise I've though that for us we can take this one 
level higher, i.e. instead of collections=docs and shard=Aug2012,Sep2012 etc we 
can do collections=docs_Aug2012,docs_Sep2012. Then if we need to search across 
multiple dates, we can do this today, and still have hashed based sharding, by 
using collection=docs_Aug2012,docs_Sep2012,... in the query.

Others might find this idea useful too.

                
> Pluggable shard lookup mechanism for SolrCloud
> ----------------------------------------------
>
>                 Key: SOLR-2592
>                 URL: https://issues.apache.org/jira/browse/SOLR-2592
>             Project: Solr
>          Issue Type: New Feature
>          Components: SolrCloud
>    Affects Versions: 4.0-ALPHA
>            Reporter: Noble Paul
>            Assignee: Mark Miller
>         Attachments: dbq_fix.patch, pluggable_sharding.patch, 
> pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_r1373086.patch, 
> SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, 
> SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch
>
>
> If the data in a cloud can be partitioned on some criteria (say range, hash, 
> attribute value etc) It will be easy to narrow down the search to a smaller 
> subset of shards and in effect can achieve more efficient search.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-2592) Pluggable shard lookup mechanism for SolrCloud

Reply via email to