Re: Shard splitting and replica placement strategy
Opened https://issues.apache.org/jira/browse/SOLR-8728 with a test which reproduces the exception. On Wed, Feb 24, 2016 at 3:49 PM Shai Ererawrote: > Thanks Noble, I'll try to reproduce in a test then. Does the rule I've set > sound right to you though? > > On Wed, Feb 24, 2016, 15:19 Noble Paul wrote: > >> Whatever it is , there should be no NPE. could be a bug >> >> On Wed, Feb 24, 2016 at 6:23 PM, Shai Erera wrote: >> > Hi >> > >> > I wanted to try out the (relatively) new replica placement strategy and >> how >> > it plays with shard splitting. So I set up a 4-node cluster, created a >> > collection with 1 shard and 2 replicas (each created on a different) >> node. >> > >> > When I issue a SPLITSHARD command (without any rules set on the >> collection), >> > the split finishes successfully and the state of the cluster is: >> > >> > n1: s1_r1 (INACTIVE), s1_0_r1, s1_1_r1 >> > n2: s1_r2 (INACTIVE), s1_0_r2 >> > n3: s1_1_r2 >> > n4: empty >> > >> > So far as expected, since the shard splitting occurred on n1, the two >> sub >> > shards were created there, and then Solr filled the missing replicas on >> > nodes 2 and 3. Also the source shard s1 was set to INACTIVE and I did >> not >> > delete it (in the test). >> > >> > Then I tried the same, curious if I set the right rule, one of the >> > sub-shards' replicas will move to the 4th node, so I end up w/ a >> "balanced" >> > cluster. So I created the collection with the rule: >> > "shard:**,replica:<2,node:*", which per the ref guide says that I >> should end >> > with no more than one replica per shard on every node. Per my >> understanding, >> > I should end up with either 2 nodes each holding one replica of each >> shard, >> > 3 nodes holding a mixture of replicas or 4 nodes each holds exactly one >> > replica. >> > >> > However, while observing the cluster status I noticed that the two >> created >> > sub-shards are marked as ACTIVE and leader, while the two others are >> marked >> > in DOWN. Turning on INFO logging I found this: >> > >> > Caused by: java.lang.NullPointerException at >> > >> org.apache.solr.cloud.rule.Rule.getNumberOfNodesWithSameTagVal(Rule.java:168) >> > at org.apache.solr.cloud.rule.Rule.tryAssignNodeToShard(Rule.java:130) >> at >> > >> org.apache.solr.cloud.rule.ReplicaAssigner.tryAPermutationOfRules(ReplicaAssigner.java:252) >> > at >> > >> org.apache.solr.cloud.rule.ReplicaAssigner.tryAllPermutations(ReplicaAssigner.java:203) >> > at >> > >> org.apache.solr.cloud.rule.ReplicaAssigner.getNodeMappings0(ReplicaAssigner.java:174) >> > at >> > >> org.apache.solr.cloud.rule.ReplicaAssigner.getNodeMappings(ReplicaAssigner.java:135) >> > at org.apache.solr.cloud.Assign.getNodesViaRules(Assign.java:211) at >> > org.apache.solr.cloud.Assign.getNodesForNewReplicas(Assign.java:179) at >> > >> org.apache.solr.cloud.OverseerCollectionMessageHandler.addReplica(OverseerCollectionMessageHandler.java:2204) >> > at >> > >> org.apache.solr.cloud.OverseerCollectionMessageHandler.splitShard(OverseerCollectionMessageHandler.java:1212) >> > >> > I also tried with the rule "replica:<2,node:*" which yielded the same >> NPE. I >> > run on 5.4.1 and I couldn't find if this is something that was fixed in >> > 5.5.0/master already. So the question is -- is this a bug or did I >> > misconfigure the rule? >> > >> > And as a side question, is there any rule which I can configure so that >> the >> > split shards are distributed evenly in the cluster? Or currently >> SPLITSHARD >> > will always result in the created shards existing on the origin node, >> and >> > it's my responsibility to move them elsewhere? >> > >> > Shai >> >> >> >> -- >> - >> Noble Paul >> >> - >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> >>
Re: Shard splitting and replica placement strategy
Thanks Noble, I'll try to reproduce in a test then. Does the rule I've set sound right to you though? On Wed, Feb 24, 2016, 15:19 Noble Paulwrote: > Whatever it is , there should be no NPE. could be a bug > > On Wed, Feb 24, 2016 at 6:23 PM, Shai Erera wrote: > > Hi > > > > I wanted to try out the (relatively) new replica placement strategy and > how > > it plays with shard splitting. So I set up a 4-node cluster, created a > > collection with 1 shard and 2 replicas (each created on a different) > node. > > > > When I issue a SPLITSHARD command (without any rules set on the > collection), > > the split finishes successfully and the state of the cluster is: > > > > n1: s1_r1 (INACTIVE), s1_0_r1, s1_1_r1 > > n2: s1_r2 (INACTIVE), s1_0_r2 > > n3: s1_1_r2 > > n4: empty > > > > So far as expected, since the shard splitting occurred on n1, the two sub > > shards were created there, and then Solr filled the missing replicas on > > nodes 2 and 3. Also the source shard s1 was set to INACTIVE and I did not > > delete it (in the test). > > > > Then I tried the same, curious if I set the right rule, one of the > > sub-shards' replicas will move to the 4th node, so I end up w/ a > "balanced" > > cluster. So I created the collection with the rule: > > "shard:**,replica:<2,node:*", which per the ref guide says that I should > end > > with no more than one replica per shard on every node. Per my > understanding, > > I should end up with either 2 nodes each holding one replica of each > shard, > > 3 nodes holding a mixture of replicas or 4 nodes each holds exactly one > > replica. > > > > However, while observing the cluster status I noticed that the two > created > > sub-shards are marked as ACTIVE and leader, while the two others are > marked > > in DOWN. Turning on INFO logging I found this: > > > > Caused by: java.lang.NullPointerException at > > > org.apache.solr.cloud.rule.Rule.getNumberOfNodesWithSameTagVal(Rule.java:168) > > at org.apache.solr.cloud.rule.Rule.tryAssignNodeToShard(Rule.java:130) at > > > org.apache.solr.cloud.rule.ReplicaAssigner.tryAPermutationOfRules(ReplicaAssigner.java:252) > > at > > > org.apache.solr.cloud.rule.ReplicaAssigner.tryAllPermutations(ReplicaAssigner.java:203) > > at > > > org.apache.solr.cloud.rule.ReplicaAssigner.getNodeMappings0(ReplicaAssigner.java:174) > > at > > > org.apache.solr.cloud.rule.ReplicaAssigner.getNodeMappings(ReplicaAssigner.java:135) > > at org.apache.solr.cloud.Assign.getNodesViaRules(Assign.java:211) at > > org.apache.solr.cloud.Assign.getNodesForNewReplicas(Assign.java:179) at > > > org.apache.solr.cloud.OverseerCollectionMessageHandler.addReplica(OverseerCollectionMessageHandler.java:2204) > > at > > > org.apache.solr.cloud.OverseerCollectionMessageHandler.splitShard(OverseerCollectionMessageHandler.java:1212) > > > > I also tried with the rule "replica:<2,node:*" which yielded the same > NPE. I > > run on 5.4.1 and I couldn't find if this is something that was fixed in > > 5.5.0/master already. So the question is -- is this a bug or did I > > misconfigure the rule? > > > > And as a side question, is there any rule which I can configure so that > the > > split shards are distributed evenly in the cluster? Or currently > SPLITSHARD > > will always result in the created shards existing on the origin node, and > > it's my responsibility to move them elsewhere? > > > > Shai > > > > -- > - > Noble Paul > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > >
Re: Shard splitting and replica placement strategy
Whatever it is , there should be no NPE. could be a bug On Wed, Feb 24, 2016 at 6:23 PM, Shai Ererawrote: > Hi > > I wanted to try out the (relatively) new replica placement strategy and how > it plays with shard splitting. So I set up a 4-node cluster, created a > collection with 1 shard and 2 replicas (each created on a different) node. > > When I issue a SPLITSHARD command (without any rules set on the collection), > the split finishes successfully and the state of the cluster is: > > n1: s1_r1 (INACTIVE), s1_0_r1, s1_1_r1 > n2: s1_r2 (INACTIVE), s1_0_r2 > n3: s1_1_r2 > n4: empty > > So far as expected, since the shard splitting occurred on n1, the two sub > shards were created there, and then Solr filled the missing replicas on > nodes 2 and 3. Also the source shard s1 was set to INACTIVE and I did not > delete it (in the test). > > Then I tried the same, curious if I set the right rule, one of the > sub-shards' replicas will move to the 4th node, so I end up w/ a "balanced" > cluster. So I created the collection with the rule: > "shard:**,replica:<2,node:*", which per the ref guide says that I should end > with no more than one replica per shard on every node. Per my understanding, > I should end up with either 2 nodes each holding one replica of each shard, > 3 nodes holding a mixture of replicas or 4 nodes each holds exactly one > replica. > > However, while observing the cluster status I noticed that the two created > sub-shards are marked as ACTIVE and leader, while the two others are marked > in DOWN. Turning on INFO logging I found this: > > Caused by: java.lang.NullPointerException at > org.apache.solr.cloud.rule.Rule.getNumberOfNodesWithSameTagVal(Rule.java:168) > at org.apache.solr.cloud.rule.Rule.tryAssignNodeToShard(Rule.java:130) at > org.apache.solr.cloud.rule.ReplicaAssigner.tryAPermutationOfRules(ReplicaAssigner.java:252) > at > org.apache.solr.cloud.rule.ReplicaAssigner.tryAllPermutations(ReplicaAssigner.java:203) > at > org.apache.solr.cloud.rule.ReplicaAssigner.getNodeMappings0(ReplicaAssigner.java:174) > at > org.apache.solr.cloud.rule.ReplicaAssigner.getNodeMappings(ReplicaAssigner.java:135) > at org.apache.solr.cloud.Assign.getNodesViaRules(Assign.java:211) at > org.apache.solr.cloud.Assign.getNodesForNewReplicas(Assign.java:179) at > org.apache.solr.cloud.OverseerCollectionMessageHandler.addReplica(OverseerCollectionMessageHandler.java:2204) > at > org.apache.solr.cloud.OverseerCollectionMessageHandler.splitShard(OverseerCollectionMessageHandler.java:1212) > > I also tried with the rule "replica:<2,node:*" which yielded the same NPE. I > run on 5.4.1 and I couldn't find if this is something that was fixed in > 5.5.0/master already. So the question is -- is this a bug or did I > misconfigure the rule? > > And as a side question, is there any rule which I can configure so that the > split shards are distributed evenly in the cluster? Or currently SPLITSHARD > will always result in the created shards existing on the origin node, and > it's my responsibility to move them elsewhere? > > Shai -- - Noble Paul - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Shard splitting and replica placement strategy
Hi I wanted to try out the (relatively) new replica placement strategy and how it plays with shard splitting. So I set up a 4-node cluster, created a collection with 1 shard and 2 replicas (each created on a different) node. When I issue a SPLITSHARD command (without any rules set on the collection), the split finishes successfully and the state of the cluster is: n1: s1_r1 (INACTIVE), s1_0_r1, s1_1_r1 n2: s1_r2 (INACTIVE), s1_0_r2 n3: s1_1_r2 n4: empty So far as expected, since the shard splitting occurred on n1, the two sub shards were created there, and then Solr filled the missing replicas on nodes 2 and 3. Also the source shard s1 was set to INACTIVE and I did not delete it (in the test). Then I tried the same, curious if I set the right rule, one of the sub-shards' replicas will move to the 4th node, so I end up w/ a "balanced" cluster. So I created the collection with the rule: "shard:**,replica:<2,node:*", which per the ref guide says that I should end with no more than one replica per shard on every node. Per my understanding, I should end up with either 2 nodes each holding one replica of each shard, 3 nodes holding a mixture of replicas or 4 nodes each holds exactly one replica. However, while observing the cluster status I noticed that the two created sub-shards are marked as ACTIVE and leader, while the two others are marked in DOWN. Turning on INFO logging I found this: Caused by: java.lang.NullPointerException at org.apache.solr.cloud.rule.Rule.getNumberOfNodesWithSameTagVal(Rule.java:168) at org.apache.solr.cloud.rule.Rule.tryAssignNodeToShard(Rule.java:130) at org.apache.solr.cloud.rule.ReplicaAssigner.tryAPermutationOfRules(ReplicaAssigner.java:252) at org.apache.solr.cloud.rule.ReplicaAssigner.tryAllPermutations(ReplicaAssigner.java:203) at org.apache.solr.cloud.rule.ReplicaAssigner.getNodeMappings0(ReplicaAssigner.java:174) at org.apache.solr.cloud.rule.ReplicaAssigner.getNodeMappings(ReplicaAssigner.java:135) at org.apache.solr.cloud.Assign.getNodesViaRules(Assign.java:211) at org.apache.solr.cloud.Assign.getNodesForNewReplicas(Assign.java:179) at org.apache.solr.cloud.OverseerCollectionMessageHandler.addReplica(OverseerCollectionMessageHandler.java:2204) at org.apache.solr.cloud.OverseerCollectionMessageHandler.splitShard(OverseerCollectionMessageHandler.java:1212) I also tried with the rule "replica:<2,node:*" which yielded the same NPE. I run on 5.4.1 and I couldn't find if this is something that was fixed in 5.5.0/master already. So the question is -- is this a bug or did I misconfigure the rule? And as a side question, is there any rule which I can configure so that the split shards are distributed evenly in the cluster? Or currently SPLITSHARD will always result in the created shards existing on the origin node, and it's my responsibility to move them elsewhere? Shai