Re: SolrCloud shard leader elections - Altering zookeeper sequence numbers

2015-01-13 Thread Daniel Collins
().getZkClient().getSolrZooKeeper().getChildren(/collections/core/leader_elect/shard1
/election, true)
 // node deletion
Op.delete(path, -1)
 // node creation
Op.create(createPath, new byte[0], ZooDefs.Ids.OPEN_ACL_UNSAFE,
  CreateMode.EPHEMERAL_SEQUENTIAL);
 // Perform operations
 
 
 solrServer.getZkStateReader().getZkClient().getSolrZooKeeper().multi(opsList);
solrServer.getZkStateReader().updateClusterState(true);
 
 
 
 
  --
  View this message in context:
 http://lucene.472066.n3.nabble.com/SolrCloud-shard-leader-elections-Altering-zookeeper-sequence-numbers-tp4178973.html
  Sent from the Solr - User mailing list archive at Nabble.com.



Re: SolrCloud shard leader elections - Altering zookeeper sequence numbers

2015-01-13 Thread Zisis Tachtsidis
Daniel Collins wrote
 Is it important where your leader is?  If you just want to minimize
 leadership changes during rolling re-start, then you could restart in the
 opposite order (S3, S2, S1).  That would give only 1 transition, but the
 end result would be a leader on S2 instead of S1 (not sure if that
 important to you or not).  I know its not a fix, but it might be a
 workaround until the whole leadership moving is done?

I think that rolling restarting the machines in the opposite order
(S3,S2,S1) will result in S3 being the leader. It's a valid approach but
shouldn't I have to revert to the original order (S1,S2,S3) to achieve the
same result in the following rolling restart? This includes operational
costs and complexity that I want to avoid.


Erick Erickson wrote
 Just skimming, but the problem here that I ran into was with the
 listeners. Each _Solr_ instance out there is listening to one of the
 ephemeral nodes (the one in front). So deleting a node does _not_
 change which ephemeral node the associated Solr instance is listening
 to.

 So, for instance, when you delete S2..n-01 and re-add it, S2 is
 still looking at S1n-00 and will continue looking at
 S1...n-00 until S1n-00 is deleted.

 Deleting S2..n-01 will wake up S3 though, which should now be
 looking at S1n-000. Now you have two Solr listeners looking at
 the same ephemeral node. The key is that deleting S2...n-01 does
 _not_ wake up S2, just any solr instance that has a watch on the
 associated ephemeral node.

Thanks for the info Erick. I wasn't aware of this linked-list listeners
structure between the zk nodes. Based on what you've said though I've
changed my implementation a bit and it seems to be working at first glance.
Of course it's not reliable yet but it looks promising.

My original attempt
 S1:-n_00 (no code running here)
 S2:-n_04 (code deleting zknode -n_01 and creating
 -n_04)
 S3:-n_03 (code deleting zknode -n_02 and creating
 -n_03) 

has been changed to 
S1:-n_00 (no code running here)
S2:-n_03 (code deleting zknode -n_01 and creating
-n_03 using EPHEMERAL_SEQUENTIAL)
S3:-n_02 (no code running here) 

Once S1 is shutdown S3 becomes leader since it listens to S1 now according
to what you've said

The original reason I pursued this minimize leadership changes quest was
that it _could_ lead to data loss in some scenarios. I'm not entirely sure
though and you could correct me on this and but I'm explaining myself.

If you have incoming indexing requests during a rolling restart, could there
be a case during the current leader shutdown where the leader-to-be-node
could not have the time to sync with the
current-leader-that-shut-downs-node in which case everyone will now sync
to the new leader thus missing some updates. I've seen an installation
having different index sizes in each replica that deteriorated over time.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-shard-leader-elections-Altering-zookeeper-sequence-numbers-tp4178973p4179147.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud shard leader elections - Altering zookeeper sequence numbers

2015-01-13 Thread Erick Erickson
SolrCloud is intended to work in the rolling restart case...

Index size, segment counts, segment names can (and will)
be different on different replicas of the same shard without
anything being amiss. Commits (hard) happen at different
times across the replicas in a shard. Merging logic kicks in
and may (will eventually in all probability) pick different
segments to merge, with varying numbers of deleted docs
that get purged etc.

The numFound reported on a q=*:*distrib=false, or looking at the
core in the admin screen for the replicas in question and noting
numDocs should be identical though if
1 you've issued a hard commit with openSearcher=true _or_
 a soft commit.
2 you haven't been indexing or haven't issued a commit
 as in 1 since you started looking.

Best,
Erick

On Tue, Jan 13, 2015 at 4:20 AM, Zisis Tachtsidis zist...@runbox.com wrote:
 Daniel Collins wrote
 Is it important where your leader is?  If you just want to minimize
 leadership changes during rolling re-start, then you could restart in the
 opposite order (S3, S2, S1).  That would give only 1 transition, but the
 end result would be a leader on S2 instead of S1 (not sure if that
 important to you or not).  I know its not a fix, but it might be a
 workaround until the whole leadership moving is done?

 I think that rolling restarting the machines in the opposite order
 (S3,S2,S1) will result in S3 being the leader. It's a valid approach but
 shouldn't I have to revert to the original order (S1,S2,S3) to achieve the
 same result in the following rolling restart? This includes operational
 costs and complexity that I want to avoid.


 Erick Erickson wrote
 Just skimming, but the problem here that I ran into was with the
 listeners. Each _Solr_ instance out there is listening to one of the
 ephemeral nodes (the one in front). So deleting a node does _not_
 change which ephemeral node the associated Solr instance is listening
 to.

 So, for instance, when you delete S2..n-01 and re-add it, S2 is
 still looking at S1n-00 and will continue looking at
 S1...n-00 until S1n-00 is deleted.

 Deleting S2..n-01 will wake up S3 though, which should now be
 looking at S1n-000. Now you have two Solr listeners looking at
 the same ephemeral node. The key is that deleting S2...n-01 does
 _not_ wake up S2, just any solr instance that has a watch on the
 associated ephemeral node.

 Thanks for the info Erick. I wasn't aware of this linked-list listeners
 structure between the zk nodes. Based on what you've said though I've
 changed my implementation a bit and it seems to be working at first glance.
 Of course it's not reliable yet but it looks promising.

 My original attempt
 S1:-n_00 (no code running here)
 S2:-n_04 (code deleting zknode -n_01 and creating
 -n_04)
 S3:-n_03 (code deleting zknode -n_02 and creating
 -n_03)

 has been changed to
 S1:-n_00 (no code running here)
 S2:-n_03 (code deleting zknode -n_01 and creating
 -n_03 using EPHEMERAL_SEQUENTIAL)
 S3:-n_02 (no code running here)

 Once S1 is shutdown S3 becomes leader since it listens to S1 now according
 to what you've said

 The original reason I pursued this minimize leadership changes quest was
 that it _could_ lead to data loss in some scenarios. I'm not entirely sure
 though and you could correct me on this and but I'm explaining myself.

 If you have incoming indexing requests during a rolling restart, could there
 be a case during the current leader shutdown where the leader-to-be-node
 could not have the time to sync with the
 current-leader-that-shut-downs-node in which case everyone will now sync
 to the new leader thus missing some updates. I've seen an installation
 having different index sizes in each replica that deteriorated over time.




 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/SolrCloud-shard-leader-elections-Altering-zookeeper-sequence-numbers-tp4178973p4179147.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud shard leader elections - Altering zookeeper sequence numbers

2015-01-12 Thread Erick Erickson
Just skimming, but the problem here that I ran into was with the
listeners. Each _Solr_ instance out there is listening to one of the
ephemeral nodes (the one in front). So deleting a node does _not_
change which ephemeral node the associated Solr instance is listening
to.

So, for instance, when you delete S2..n-01 and re-add it, S2 is
still looking at S1n-00 and will continue looking at
S1...n-00 until S1n-00 is deleted.

Deleting S2..n-01 will wake up S3 though, which should now be
looking at S1n-000. Now you have two Solr listeners looking at
the same ephemeral node. The key is that deleting S2...n-01 does
_not_ wake up S2, just any solr instance that has a watch on the
associated ephemeral node.

The code you want is in LeaderElector.checkIfIamLeader to understand
how it all works. Be aware that the sortSeqs call sorts the nodes by
1 sequence number
2 string comparison.

Which has the unfortunate characteristic of a secondary sort by
session ID. So two nodes with the same sequence number can sort before
or after each other depending on which one gets a session higher/lower
than the other.

This is quite tricky to get right, I once created a patch for 4.10.3
by applying things in this order (some minor tweaks required). All
SOLR-
6115
6512
6577
6513
6517
6670
6691

Good luck!
Erick




On Mon, Jan 12, 2015 at 8:54 AM, Zisis Tachtsidis zist...@runbox.com wrote:
 SolrCloud uses ZooKeeper sequence flags to keep track of the order in which
 nodes register themselves as leader candidates. The node with the lowest
 sequence number wins as leader of the shard.

 What I'm trying to do is to keep the leader re-assignments to the minimum
 during a rolling restart. In this direction I change the zk sequence numbers
 on the SolrCloud nodes when all nodes of the cluster are up and active. I'm
 using Solr 4.10.0 and I'm aware of SOLR-6491 which has a similar purpose but
 I'm trying to do it from outside, using the existing APIs without editing
 Solr source code.

 == TYPICAL SCENARIO ==
 Suppose we have 3 Solr instances S1,S2,S3. They are started in the same
 order and the zk sequences assigned have as follows
 S1:-n_00 (LEADER)
 S2:-n_01
 S3:-n_02

 In a rolling restart we'll get S2 as leader (after S1 shutdown), then S3
 (after S2 shutdown) and finally S1(after S3 shutdown), 3 changes in total.

 == MY ATTEMPT ==
 By using SolrZkClient and the Zookeeper multi API  I found a way to get rid
 of the old zknodes that participate in a shard's leader election and write
 new ones where we can assign the sequence number of our liking.

 S1:-n_00 (no code running here)
 S2:-n_04 (code deleting zknode -n_01 and creating
 -n_04)
 S3:-n_03 (code deleting zknode -n_02 and creating
 -n_03)

 In a rolling restart I'd expect to have S3 as leader (after S1 shutdown), no
 change (after S2 shutdown) and finally S1(after S3 shutdown), that is 2
 changes. This will be constant no matter how many servers are added in
 SolrCloud while in the first scenarion the # of re-assignments equals the #
 of Solr servers.

 The problem occurs when S1 (LEADER) is shut down. The elections that take
 place still set S2 as leader, It's like ignoring the new sequence numbers.
 When I go to /solr/#/~cloud?view=tree the new sequence numbers are listed
 under /collections based on which S3 should have become the leader.
 Do you have any idea why the new state is not acknowledged during the
 elections? Is something cached? Or to put it bluntly do I have any chance
 down this path? If not what are my options? Is it possible to apply all
 patches under SOLR-6491 in isolation and continue from there?

 Thank you.

 Extra info which might help follows
 1. Some logging related to leader elections after S1 has been shut down
 S2 - org.apache.solr.cloud.SyncStrategy Leader's attempt to sync with
 shard failed, moving to the next candidate
 S2 - org.apache.solr.cloud.ShardLeaderElectionContext We failed sync,
 but we have no versions - we can't sync in that
case - we were active before, so become leader anyway

 S3 - org.apache.solr.cloud.LeaderElector Our node is no longer in line
 to be leader

 2. And some sample code on how I perform the ZK re-sequencing
// Read current zk nodes for a specific collection

 solrServer.getZkStateReader().getZkClient().getSolrZooKeeper().getChildren(/collections/core/leader_elect/shard1
   /election, true)
// node deletion
   Op.delete(path, -1)
// node creation
   Op.create(createPath, new byte[0], ZooDefs.Ids.OPEN_ACL_UNSAFE,
 CreateMode.EPHEMERAL_SEQUENTIAL);
// Perform operations

 solrServer.getZkStateReader().getZkClient().getSolrZooKeeper().multi(opsList);
   solrServer.getZkStateReader().updateClusterState(true);




 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/SolrCloud-shard-leader-elections-Altering-zookeeper-sequence

SolrCloud shard leader elections - Altering zookeeper sequence numbers

2015-01-12 Thread Zisis Tachtsidis
SolrCloud uses ZooKeeper sequence flags to keep track of the order in which
nodes register themselves as leader candidates. The node with the lowest
sequence number wins as leader of the shard.

What I'm trying to do is to keep the leader re-assignments to the minimum
during a rolling restart. In this direction I change the zk sequence numbers
on the SolrCloud nodes when all nodes of the cluster are up and active. I'm
using Solr 4.10.0 and I'm aware of SOLR-6491 which has a similar purpose but
I'm trying to do it from outside, using the existing APIs without editing
Solr source code.

== TYPICAL SCENARIO ==
Suppose we have 3 Solr instances S1,S2,S3. They are started in the same
order and the zk sequences assigned have as follows
S1:-n_00 (LEADER)
S2:-n_01
S3:-n_02

In a rolling restart we'll get S2 as leader (after S1 shutdown), then S3
(after S2 shutdown) and finally S1(after S3 shutdown), 3 changes in total.

== MY ATTEMPT ==
By using SolrZkClient and the Zookeeper multi API  I found a way to get rid
of the old zknodes that participate in a shard's leader election and write
new ones where we can assign the sequence number of our liking. 

S1:-n_00 (no code running here)
S2:-n_04 (code deleting zknode -n_01 and creating
-n_04)
S3:-n_03 (code deleting zknode -n_02 and creating
-n_03)

In a rolling restart I'd expect to have S3 as leader (after S1 shutdown), no
change (after S2 shutdown) and finally S1(after S3 shutdown), that is 2
changes. This will be constant no matter how many servers are added in
SolrCloud while in the first scenarion the # of re-assignments equals the #
of Solr servers.

The problem occurs when S1 (LEADER) is shut down. The elections that take
place still set S2 as leader, It's like ignoring the new sequence numbers.
When I go to /solr/#/~cloud?view=tree the new sequence numbers are listed
under /collections based on which S3 should have become the leader.
Do you have any idea why the new state is not acknowledged during the
elections? Is something cached? Or to put it bluntly do I have any chance
down this path? If not what are my options? Is it possible to apply all
patches under SOLR-6491 in isolation and continue from there?

Thank you. 

Extra info which might help follows
1. Some logging related to leader elections after S1 has been shut down
S2 - org.apache.solr.cloud.SyncStrategy Leader's attempt to sync with
shard failed, moving to the next candidate
S2 - org.apache.solr.cloud.ShardLeaderElectionContext We failed sync,
but we have no versions - we can't sync in that 
   case - we were active before, so become leader anyway

S3 - org.apache.solr.cloud.LeaderElector Our node is no longer in line
to be leader

2. And some sample code on how I perform the ZK re-sequencing
   // Read current zk nodes for a specific collection
 
solrServer.getZkStateReader().getZkClient().getSolrZooKeeper().getChildren(/collections/core/leader_elect/shard1
  /election, true)
   // node deletion
  Op.delete(path, -1) 
   // node creation
  Op.create(createPath, new byte[0], ZooDefs.Ids.OPEN_ACL_UNSAFE,
CreateMode.EPHEMERAL_SEQUENTIAL);
   // Perform operations
 
solrServer.getZkStateReader().getZkClient().getSolrZooKeeper().multi(opsList);
  solrServer.getZkStateReader().updateClusterState(true);




--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-shard-leader-elections-Altering-zookeeper-sequence-numbers-tp4178973.html
Sent from the Solr - User mailing list archive at Nabble.com.