Yonik Seeley created SOLR-13399:
-----------------------------------

             Summary: compositeId support for shard splitting
                 Key: SOLR-13399
                 URL: https://issues.apache.org/jira/browse/SOLR-13399
             Project: Solr
          Issue Type: New Feature
      Security Level: Public (Default Security Level. Issues are Public)
            Reporter: Yonik Seeley


Shard splitting does not currently have a way to automatically take into 
account the actual distribution (number of documents) in each hash bucket 
created by using compositeId hashing.

We should probably add a parameter *splitByPrefix* to the *SPLITSHARD* command 
that would look at the number of docs sharing each compositeId prefix and use 
that to create roughly equal sized buckets by document count rather than just 
assuming an equal distribution across the entire hash range.

Like normal shard splitting, we should bias against splitting within hash 
buckets unless necessary (since that leads to larger query fanout.) . Perhaps 
this warrants a parameter that would control how much of a size mismatch is 
tolerable before resorting to splitting within a bucket. 
*allowedSizeDifference*?

To more quickly calculate the number of docs in each bucket, we could index the 
prefix in a different field.  Iterating over the terms for this field would 
quickly give us the number of docs in each (i.e lucene keeps track of the doc 
count for each term already.)  Perhaps the implementation could be a flag on 
the *id* field... something like *indexPrefixes* and poly-fields that would 
cause the indexing to be automatically done and alleviate having to pass in an 
additional field during indexing and during the call to *SPLITSHARD*.  This 
whole part is an optimization though and could be split off into its own issue 
if desired.

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to