[
https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13456927#comment-13456927
]
Dan Rosher commented on SOLR-2592:
----------------------------------
The idea of a shard.key is what I did with the supplied patch, e.g.
<shardPartitioner name="ShardPartitioner"
class="org.apache.solr.cloud.NamedShardPartitioner">
<str name="shardField">date</str>
</shardPartitioner>
Though we could use any field, region,date etc. It's NOT specifically about
date partitioning and it's at the users discretion.
The default is a HashPartition:
hash(id) % num_shards
Michael - Your suggestion on 15/Sep/12 02:36 for us still wouldn't address the
issue of knowing exactly on what shard a doc lives. For our (and I guess for
most) apps, most queries are search ones, and we'd need to send a query to
every shard, but in our app, I already know in advance what subset of the index
I need to search, and to speed the query up I'd want to index docs that way too
so that I ONLY need to query a particular shard. If I know the subset in
advance, anything with fq=... seems wasteful to me.
The downside of my implementation is that deletes and RealTimeGets would be
slower since the id alone is not enough to determine shard membership, and
hence needs to be sent everywhere, but I suspect in most applications, this is
a welcomed compromise as most queries will be search ones.
Perhaps shard membership can be efficiently stored in a distributed bloom
filter or something like, to speed that up?
All this aside, as a compromise I've though that for us we can take this one
level higher, i.e. instead of collections=docs and shard=Aug2012,Sep2012 etc we
can do collections=docs_Aug2012,docs_Sep2012. Then if we need to search across
multiple dates, we can do this today, and still have hashed based sharding, by
using collection=docs_Aug2012,docs_Sep2012,... in the query.
Others might find this idea useful too.
> Pluggable shard lookup mechanism for SolrCloud
> ----------------------------------------------
>
> Key: SOLR-2592
> URL: https://issues.apache.org/jira/browse/SOLR-2592
> Project: Solr
> Issue Type: New Feature
> Components: SolrCloud
> Affects Versions: 4.0-ALPHA
> Reporter: Noble Paul
> Assignee: Mark Miller
> Attachments: dbq_fix.patch, pluggable_sharding.patch,
> pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_r1373086.patch,
> SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch,
> SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch
>
>
> If the data in a cloud can be partitioned on some criteria (say range, hash,
> attribute value etc) It will be easy to narrow down the search to a smaller
> subset of shards and in effect can achieve more efficient search.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]