David Smiley created SOLR-17373:
-----------------------------------

             Summary: Shard splitByPrefix should not do so if it would be too 
imbalanced/inefficient
                 Key: SOLR-17373
                 URL: https://issues.apache.org/jira/browse/SOLR-17373
             Project: Solr
          Issue Type: Improvement
      Security Level: Public (Default Security Level. Issues are Public)
          Components: SolrCloud
            Reporter: David Smiley


Shard split "splitByPrefix" exists to reduce the number of shards that a 
typical prefix is in, thus reducing query fanout distributed search (assuming 
the route param is used), and it can isolate indexing activity as well.  
Sometimes this can result in a very imbalanced (in-efficient) shard split that 
may even quickly lead to another split back-to-back!  (imagine splitting off 
less than 1%).  Here we propose that if the split would only split off < 20% of 
docs or so, then it's too inefficient.  Instead, split the middle of the 
largest key prefix.

Note: it's also been observed that a prefix might be so extremely low 
represented that it's likely those docs are marked deleted as part of a 
previous shard split (if "link" split method).  Thus this inefficiency can have 
a cascading badness effect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to