[ https://issues.apache.org/jira/browse/SOLR-13399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16903409#comment-16903409 ]
Hoss Man commented on SOLR-13399: --------------------------------- i would assume it's related to the (numSubShards) changes in SplitShardCmd ? At first glance, that code path looks like it's specific to SPLIT_BY_PREFIX, but apparently your previous commit has it defaulting to "true" ? (see SplitShardCmd.java L212) {noformat} $ git show 19ddcfd282f3b9eccc50da83653674e510229960 -- core/src/java/org/apache/solr/cloud/api/collections/SplitShardCmd.java | cat commit 19ddcfd282f3b9eccc50da83653674e510229960 Author: yonik <yo...@apache.org> Date: Tue Aug 6 14:09:54 2019 -0400 SOLR-13399: ability to use id field for compositeId histogram diff --git a/solr/core/src/java/org/apache/solr/cloud/api/collections/SplitShardCmd.java b/solr/core/src/java/org/apache/solr/cloud/api/collections/SplitShardCmd.java index 4d623be..6c5921e 100644 --- a/solr/core/src/java/org/apache/solr/cloud/api/collections/SplitShardCmd.java +++ b/solr/core/src/java/org/apache/solr/cloud/api/collections/SplitShardCmd.java @@ -212,16 +212,14 @@ public class SplitShardCmd implements OverseerCollectionMessageHandler.Cmd { if (message.getBool(CommonAdminParams.SPLIT_BY_PREFIX, true)) { t = timings.sub("getRanges"); - log.info("Requesting split ranges from replica " + parentShardLeader.getName() + " as part of slice " + slice + " of collection " - + collectionName + " on " + parentShardLeader); - ModifiableSolrParams params = new ModifiableSolrParams(); params.set(CoreAdminParams.ACTION, CoreAdminParams.CoreAdminAction.SPLIT.toString()); params.set(CoreAdminParams.GET_RANGES, "true"); params.set(CommonAdminParams.SPLIT_METHOD, splitMethod.toLower()); params.set(CoreAdminParams.CORE, parentShardLeader.getStr("core")); - int numSubShards = message.getInt(NUM_SUB_SHARDS, DEFAULT_NUM_SUB_SHARDS); - params.set(NUM_SUB_SHARDS, Integer.toString(numSubShards)); + // Only 2 is currently supported + // int numSubShards = message.getInt(NUM_SUB_SHARDS, DEFAULT_NUM_SUB_SHARDS); + // params.set(NUM_SUB_SHARDS, Integer.toString(numSubShards)); { final ShardRequestTracker shardRequestTracker = ocmh.asyncRequestTracker(asyncId); @@ -236,7 +234,7 @@ public class SplitShardCmd implements OverseerCollectionMessageHandler.Cmd { NamedList shardRsp = (NamedList)successes.getVal(0); String splits = (String)shardRsp.get(CoreAdminParams.RANGES); if (splits != null) { - log.info("Resulting split range to be used is " + splits); + log.info("Resulting split ranges to be used: " + splits + " slice=" + slice + " leader=" + parentShardLeader); // change the message to use the recommended split ranges message = message.plus(CoreAdminParams.RANGES, splits); } {noformat} (I could be totally of base though -- i don't really understand 90% of what this test is doing, and the place where it fails doesn't seem to be trying to split into more then 2 subshards, so even if the SplitSHardCmd changes i pointed out are buggy, i'm not sure why it would cause this particular failure) > compositeId support for shard splitting > --------------------------------------- > > Key: SOLR-13399 > URL: https://issues.apache.org/jira/browse/SOLR-13399 > Project: Solr > Issue Type: New Feature > Reporter: Yonik Seeley > Assignee: Yonik Seeley > Priority: Major > Fix For: 8.3 > > Attachments: SOLR-13399.patch, SOLR-13399.patch, > SOLR-13399_testfix.patch, SOLR-13399_useId.patch, > ShardSplitTest.master.seed_AE04B5C9BA6E9A4.log.txt > > > Shard splitting does not currently have a way to automatically take into > account the actual distribution (number of documents) in each hash bucket > created by using compositeId hashing. > We should probably add a parameter *splitByPrefix* to the *SPLITSHARD* > command that would look at the number of docs sharing each compositeId prefix > and use that to create roughly equal sized buckets by document count rather > than just assuming an equal distribution across the entire hash range. > Like normal shard splitting, we should bias against splitting within hash > buckets unless necessary (since that leads to larger query fanout.) . Perhaps > this warrants a parameter that would control how much of a size mismatch is > tolerable before resorting to splitting within a bucket. > *allowedSizeDifference*? > To more quickly calculate the number of docs in each bucket, we could index > the prefix in a different field. Iterating over the terms for this field > would quickly give us the number of docs in each (i.e lucene keeps track of > the doc count for each term already.) Perhaps the implementation could be a > flag on the *id* field... something like *indexPrefixes* and poly-fields that > would cause the indexing to be automatically done and alleviate having to > pass in an additional field during indexing and during the call to > *SPLITSHARD*. This whole part is an optimization though and could be split > off into its own issue if desired. > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org