[
https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13530162#comment-13530162
]
Shawn Heisey commented on SOLR-2592:
------------------------------------
I use the hot shard concept in Solr 3.5.0. For the cold shards, I split
documents using a MOD on the CRC32 hash of a MySQL bigint autoincrement field -
my MySQL query does the CRC32 and the MOD. That field's actual value is
translated to a tlong field in the schema. For the hot shard, I simply use a
split point on the actual value of that field. Everything less than or equal
to the split point goes to the cold shards, everything greater than the split
point goes to the hot shard. Multiple shards are handled by a single Solr
instance - seven shards live on two servers.
This arrangement requires that I do a daily "distribute" process where I index
(from MySQL) data between the old split point and the new split point to the
cold shards, then delete that data from the hot shard. Full reindexes are done
with the dataimport handler and controlled by SolrJ, everything else (including
the distribute) is done directly with SolrJ.
How much of that could be automated and put server-side with the features added
by this issue? If I have to track shard and core names myself in order to do
the distribute, then I will have to decide whether the other automation I would
gain is worth switching to SolrCloud.
If I could avoid the client-side distribute indexing and have Solr shuffle the
data around itself, that would be awesome, but I'm not sure that's possible,
and it may be somewhat complicated by the fact that I have a number of unstored
fields that I search on.
At some point I will test performance on an index where I do not have a hot
shard, where the data is simply hashed between several large shards. This
entire concept was implemented for fast indexing of new data - because Solr 1.4
did not have NRT features.
> Custom Hashing
> --------------
>
> Key: SOLR-2592
> URL: https://issues.apache.org/jira/browse/SOLR-2592
> Project: Solr
> Issue Type: New Feature
> Components: SolrCloud
> Affects Versions: 4.0-ALPHA
> Reporter: Noble Paul
> Assignee: Yonik Seeley
> Fix For: 4.1
>
> Attachments: dbq_fix.patch, pluggable_sharding.patch,
> pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_progress.patch,
> SOLR-2592_query_try1.patch, SOLR-2592_r1373086.patch,
> SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch,
> SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch
>
>
> If the data in a cloud can be partitioned on some criteria (say range, hash,
> attribute value etc) It will be easy to narrow down the search to a smaller
> subset of shards and in effect can achieve more efficient search.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]