[
https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Garski updated SOLR-2592:
---------------------------------
Attachment: pluggable_sharding_V2.patch
Here is an update to my original patch that accounts for the requirement of
hashing based on unique id and works as follows:
1. Configure a ShardKeyParserFactory in SolrConfig under
config/shardKeyParserFactory. If there is not one configured the default
implementation of sharding on the document's unique id will be performed. The
default configuration is equivalent to:
{code:xml}
<shardKeyParserFactory class="solr.ShardKeyParserFactory"/>
{code}
2. The ShardKeyParser has two methods to parse a shard key out of the unique id
or a delete by query. The default implementation returns the string value of
the unique id when parsing the unique id to forward it to the specific shard,
and null when parsing the delete by query to broadcast a delete by query to the
entire collection.
3. Queries can be directed to a subset of shards in the collection by
specifying one or more shard keys in the request parameter 'shard.keys'.
Notes:
There are no distinct unit tests for this change yet, however all current unit
tests pass. The switch to hashing on the string value rather than the indexed
value is how I realized the real-time get component requires support for
hashing based on the document's unique id with a failing test.
By hashing on the string values rather than indexed values, the solrj client
can direct queries to a specific shard however this is not yet implemented.
I put the hashing function in the oas.common.cloud.HashPartioner class, which
encapsulates the hashing and partitioning in one place. I can see a desire for
a pluggable collection partitioning where a collection could be partitioned on
time periods or some other criteria but that is outside of the scope of
pluggable shard hashing.
> Pluggable shard lookup mechanism for SolrCloud
> ----------------------------------------------
>
> Key: SOLR-2592
> URL: https://issues.apache.org/jira/browse/SOLR-2592
> Project: Solr
> Issue Type: New Feature
> Components: SolrCloud
> Affects Versions: 4.0
> Reporter: Noble Paul
> Attachments: pluggable_sharding.patch, pluggable_sharding_V2.patch
>
>
> If the data in a cloud can be partitioned on some criteria (say range, hash,
> attribute value etc) It will be easy to narrow down the search to a smaller
> subset of shards and in effect can achieve more efficient search.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]