[ 
https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Garski updated SOLR-2592:
---------------------------------

    Attachment: pluggable_sharding_V2.patch

Here is an update to my original patch that accounts for the requirement of 
hashing based on unique id and works as follows:

1. Configure a ShardKeyParserFactory in SolrConfig under 
config/shardKeyParserFactory. If there is not one configured the default 
implementation of sharding on the document's unique id will be performed. The 
default configuration is equivalent to:
{code:xml} 
<shardKeyParserFactory class="solr.ShardKeyParserFactory"/>
{code}

2. The ShardKeyParser has two methods to parse a shard key out of the unique id 
or a delete by query. The default implementation returns the string value of 
the unique id when parsing the unique id to forward it to the specific shard, 
and null when parsing the delete by query to broadcast a delete by query to the 
entire collection.

3. Queries can be directed to a subset of shards in the collection by 
specifying one or more shard keys in the request parameter 'shard.keys'.

Notes:

There are no distinct unit tests for this change yet, however all current unit 
tests pass. The switch to hashing on the string value rather than the indexed 
value is how I realized the real-time get component requires support for 
hashing based on the document's unique id with a failing test.

By hashing on the string values rather than indexed values, the solrj client 
can direct queries to a specific shard however this is not yet implemented.

I put the hashing function in the oas.common.cloud.HashPartioner class, which 
encapsulates the hashing and partitioning in one place.  I can see a desire for 
a pluggable collection partitioning where a collection could be partitioned on 
time periods or some other criteria but that is outside of the scope of 
pluggable shard hashing.

                
> Pluggable shard lookup mechanism for SolrCloud
> ----------------------------------------------
>
>                 Key: SOLR-2592
>                 URL: https://issues.apache.org/jira/browse/SOLR-2592
>             Project: Solr
>          Issue Type: New Feature
>          Components: SolrCloud
>    Affects Versions: 4.0
>            Reporter: Noble Paul
>         Attachments: pluggable_sharding.patch, pluggable_sharding_V2.patch
>
>
> If the data in a cloud can be partitioned on some criteria (say range, hash, 
> attribute value etc) It will be easy to narrow down the search to a smaller 
> subset of shards and in effect can achieve more efficient search.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to