Hi

I wanted to try out the (relatively) new replica placement strategy and how
it plays with shard splitting. So I set up a 4-node cluster, created a
collection with 1 shard and 2 replicas (each created on a different) node.

When I issue a SPLITSHARD command (without any rules set on the
collection), the split finishes successfully and the state of the cluster
is:

n1: s1_r1 (INACTIVE), s1_0_r1, s1_1_r1
n2: s1_r2 (INACTIVE), s1_0_r2
n3: s1_1_r2
n4: empty

So far as expected, since the shard splitting occurred on n1, the two sub
shards were created there, and then Solr filled the missing replicas on
nodes 2 and 3. Also the source shard s1 was set to INACTIVE and I did not
delete it (in the test).

Then I tried the same, curious if I set the right rule, one of the
sub-shards' replicas will move to the 4th node, so I end up w/ a "balanced"
cluster. So I created the collection with the rule:
"shard:**,replica:<2,node:*", which per the ref guide says that I should
end with no more than one replica per shard on every node. Per my
understanding, I should end up with either 2 nodes each holding one replica
of each shard, 3 nodes holding a mixture of replicas or 4 nodes each holds
exactly one replica.

However, while observing the cluster status I noticed that the two created
sub-shards are marked as ACTIVE and leader, while the two others are marked
in DOWN. Turning on INFO logging I found this:

Caused by: java.lang.NullPointerException at
org.apache.solr.cloud.rule.Rule.getNumberOfNodesWithSameTagVal(Rule.java:168)
at org.apache.solr.cloud.rule.Rule.tryAssignNodeToShard(Rule.java:130) at
org.apache.solr.cloud.rule.ReplicaAssigner.tryAPermutationOfRules(ReplicaAssigner.java:252)
at
org.apache.solr.cloud.rule.ReplicaAssigner.tryAllPermutations(ReplicaAssigner.java:203)
at
org.apache.solr.cloud.rule.ReplicaAssigner.getNodeMappings0(ReplicaAssigner.java:174)
at
org.apache.solr.cloud.rule.ReplicaAssigner.getNodeMappings(ReplicaAssigner.java:135)
at org.apache.solr.cloud.Assign.getNodesViaRules(Assign.java:211) at
org.apache.solr.cloud.Assign.getNodesForNewReplicas(Assign.java:179) at
org.apache.solr.cloud.OverseerCollectionMessageHandler.addReplica(OverseerCollectionMessageHandler.java:2204)
at
org.apache.solr.cloud.OverseerCollectionMessageHandler.splitShard(OverseerCollectionMessageHandler.java:1212)

I also tried with the rule "replica:<2,node:*" which yielded the same NPE.
I run on 5.4.1 and I couldn't find if this is something that was fixed in
5.5.0/master already. So the question is -- is this a bug or did I
misconfigure the rule?

And as a side question, is there any rule which I can configure so that the
split shards are distributed evenly in the cluster? Or currently SPLITSHARD
will always result in the created shards existing on the origin node, and
it's my responsibility to move them elsewhere?

Shai

Reply via email to