Hi I wanted to try out the (relatively) new replica placement strategy and how it plays with shard splitting. So I set up a 4-node cluster, created a collection with 1 shard and 2 replicas (each created on a different) node.
When I issue a SPLITSHARD command (without any rules set on the collection), the split finishes successfully and the state of the cluster is: n1: s1_r1 (INACTIVE), s1_0_r1, s1_1_r1 n2: s1_r2 (INACTIVE), s1_0_r2 n3: s1_1_r2 n4: empty So far as expected, since the shard splitting occurred on n1, the two sub shards were created there, and then Solr filled the missing replicas on nodes 2 and 3. Also the source shard s1 was set to INACTIVE and I did not delete it (in the test). Then I tried the same, curious if I set the right rule, one of the sub-shards' replicas will move to the 4th node, so I end up w/ a "balanced" cluster. So I created the collection with the rule: "shard:**,replica:<2,node:*", which per the ref guide says that I should end with no more than one replica per shard on every node. Per my understanding, I should end up with either 2 nodes each holding one replica of each shard, 3 nodes holding a mixture of replicas or 4 nodes each holds exactly one replica. However, while observing the cluster status I noticed that the two created sub-shards are marked as ACTIVE and leader, while the two others are marked in DOWN. Turning on INFO logging I found this: Caused by: java.lang.NullPointerException at org.apache.solr.cloud.rule.Rule.getNumberOfNodesWithSameTagVal(Rule.java:168) at org.apache.solr.cloud.rule.Rule.tryAssignNodeToShard(Rule.java:130) at org.apache.solr.cloud.rule.ReplicaAssigner.tryAPermutationOfRules(ReplicaAssigner.java:252) at org.apache.solr.cloud.rule.ReplicaAssigner.tryAllPermutations(ReplicaAssigner.java:203) at org.apache.solr.cloud.rule.ReplicaAssigner.getNodeMappings0(ReplicaAssigner.java:174) at org.apache.solr.cloud.rule.ReplicaAssigner.getNodeMappings(ReplicaAssigner.java:135) at org.apache.solr.cloud.Assign.getNodesViaRules(Assign.java:211) at org.apache.solr.cloud.Assign.getNodesForNewReplicas(Assign.java:179) at org.apache.solr.cloud.OverseerCollectionMessageHandler.addReplica(OverseerCollectionMessageHandler.java:2204) at org.apache.solr.cloud.OverseerCollectionMessageHandler.splitShard(OverseerCollectionMessageHandler.java:1212) I also tried with the rule "replica:<2,node:*" which yielded the same NPE. I run on 5.4.1 and I couldn't find if this is something that was fixed in 5.5.0/master already. So the question is -- is this a bug or did I misconfigure the rule? And as a side question, is there any rule which I can configure so that the split shards are distributed evenly in the cluster? Or currently SPLITSHARD will always result in the created shards existing on the origin node, and it's my responsibility to move them elsewhere? Shai