David Smiley created SOLR-17373:
-----------------------------------
Summary: Shard splitByPrefix should not do so if it would be too
imbalanced/inefficient
Key: SOLR-17373
URL: https://issues.apache.org/jira/browse/SOLR-17373
Project: Solr
Issue Type: Improvement
Security Level: Public (Default Security Level. Issues are Public)
Components: SolrCloud
Reporter: David Smiley
Shard split "splitByPrefix" exists to reduce the number of shards that a
typical prefix is in, thus reducing query fanout distributed search (assuming
the route param is used), and it can isolate indexing activity as well.
Sometimes this can result in a very imbalanced (in-efficient) shard split that
may even quickly lead to another split back-to-back! (imagine splitting off
less than 1%). Here we propose that if the split would only split off < 20% of
docs or so, then it's too inefficient. Instead, split the middle of the
largest key prefix.
Note: it's also been observed that a prefix might be so extremely low
represented that it's likely those docs are marked deleted as part of a
previous shard split (if "link" split method). Thus this inefficiency can have
a cascading badness effect.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]