[jira] [Commented] (ZOOKEEPER-2076) Improve Leader Change Mechanism

2017-03-31 Thread Alexander Shraer (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15951381#comment-15951381
 ] 

Alexander Shraer commented on ZOOKEEPER-2076:
-

Sure, [~atris], thanks for taking this on. BTW, perhaps both items in the 
description are too much for a single JIRA, we could tackle one of them here 
and leave the other one for different JIRA(s).

> Improve Leader Change Mechanism
> ---
>
> Key: ZOOKEEPER-2076
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2076
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 3.5.0
>Reporter: Alexander Shraer
>Assignee: Atri Sharma
>
> When a leader is removed during a reconfiguration, ZOOKEEPER-107 uses a 
> mechanism where the old leader nominates the new one. Although it reduces the 
> time for a new leader to be elected, it still takes too long. This JIRA is 
> for two things:
> 1. Improve the mechanism, e.g., avoid loading snapshots, etc. during the 
> handoff.
> 2. Make it a first-class citizen & export it as a client API. We get 
> questions about this once in a while - how do I cause a different leader to 
> be elected ? Currently the response is either kill or reconfigure the current 
> leader.
> Any one interested to work on this ?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ZOOKEEPER-2076) Improve Leader Change Mechanism

2017-03-31 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15950735#comment-15950735
 ] 

Flavio Junqueira commented on ZOOKEEPER-2076:
-

Go for it, [~atris], I've assigned it to you.

> Improve Leader Change Mechanism
> ---
>
> Key: ZOOKEEPER-2076
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2076
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 3.5.0
>Reporter: Alexander Shraer
>Assignee: Atri Sharma
>
> When a leader is removed during a reconfiguration, ZOOKEEPER-107 uses a 
> mechanism where the old leader nominates the new one. Although it reduces the 
> time for a new leader to be elected, it still takes too long. This JIRA is 
> for two things:
> 1. Improve the mechanism, e.g., avoid loading snapshots, etc. during the 
> handoff.
> 2. Make it a first-class citizen & export it as a client API. We get 
> questions about this once in a while - how do I cause a different leader to 
> be elected ? Currently the response is either kill or reconfigure the current 
> leader.
> Any one interested to work on this ?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ZOOKEEPER-2076) Improve Leader Change Mechanism

2017-03-31 Thread Atri Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15950539#comment-15950539
 ] 

Atri Sharma commented on ZOOKEEPER-2076:


Hi Folks,

Is this still valid? [~shralex]

If nobody is working on this, I can take it up

> Improve Leader Change Mechanism
> ---
>
> Key: ZOOKEEPER-2076
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2076
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 3.5.0
>Reporter: Alexander Shraer
>
> When a leader is removed during a reconfiguration, ZOOKEEPER-107 uses a 
> mechanism where the old leader nominates the new one. Although it reduces the 
> time for a new leader to be elected, it still takes too long. This JIRA is 
> for two things:
> 1. Improve the mechanism, e.g., avoid loading snapshots, etc. during the 
> handoff.
> 2. Make it a first-class citizen & export it as a client API. We get 
> questions about this once in a while - how do I cause a different leader to 
> be elected ? Currently the response is either kill or reconfigure the current 
> leader.
> Any one interested to work on this ?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ZOOKEEPER-2076) Improve Leader Change Mechanism

2014-12-02 Thread Alexander Shraer (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14231859#comment-14231859
 ] 

Alexander Shraer commented on ZOOKEEPER-2076:
-

See Figure 8 here: http://www.cs.technion.ac.il/~shralex/zkreconfig.pdf
I think what I did is just ran an ensemble locally, invoked a reconfig removing 
the leader and looked on the log, which includes the time. You can add logging 
if needed, but the current logging should probably be enough to understand when 
the old leader terminates and when the new one is established to measure total 
time. (I don't really remember if this is how I did it since it was more than 3 
years ago, but this is where I'd suggest to start).

Exactly, I believe its possible to do some things more efficiently, but I 
really haven't thought this through and not familiar with the current mechanism 
in detail. My current implementation of leader handoff is just an optimization 
that usually reduces the number of rounds required in FLE to 1. 

I also suspect that one can even skip FLE completely and have them try to 
connect to the new leader and only if that fails go back to FLE. Not sure this 
is worth doing - it depends where the time is spent currently.


 Improve Leader Change Mechanism
 ---

 Key: ZOOKEEPER-2076
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2076
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.5.0
Reporter: Alexander Shraer

 When a leader is removed during a reconfiguration, ZOOKEEPER-107 uses a 
 mechanism where the old leader nominates the new one. Although it reduces the 
 time for a new leader to be elected, it still takes too long. This JIRA is 
 for two things:
 1. Improve the mechanism, e.g., avoid loading snapshots, etc. during the 
 handoff.
 2. Make it a first-class citizen  export it as a client API. We get 
 questions about this once in a while - how do I cause a different leader to 
 be elected ? Currently the response is either kill or reconfigure the current 
 leader.
 Any one interested to work on this ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2076) Improve Leader Change Mechanism

2014-12-02 Thread Hongchao Deng (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14231958#comment-14231958
 ] 

Hongchao Deng commented on ZOOKEEPER-2076:
--

My understanding is that leader election will take a long time regardless how 
close followers are:
1. Leader will take a snapshot on lead():
https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/server/quorum/Leader.java#L418-418
2. Learner will take a snapshot on receiving NEWLEADER:
https://github.com/fengjingchao/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/server/quorum/Learner.java#L486-486

While the first one is unnecessary, the second one is introducing bugs... I see 
the best solution is to fix the problem of taking snapshot. But it's out of the 
scope here.

Any idea on exposing the API of suggestedLeader?

 Improve Leader Change Mechanism
 ---

 Key: ZOOKEEPER-2076
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2076
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.5.0
Reporter: Alexander Shraer

 When a leader is removed during a reconfiguration, ZOOKEEPER-107 uses a 
 mechanism where the old leader nominates the new one. Although it reduces the 
 time for a new leader to be elected, it still takes too long. This JIRA is 
 for two things:
 1. Improve the mechanism, e.g., avoid loading snapshots, etc. during the 
 handoff.
 2. Make it a first-class citizen  export it as a client API. We get 
 questions about this once in a while - how do I cause a different leader to 
 be elected ? Currently the response is either kill or reconfigure the current 
 leader.
 Any one interested to work on this ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2076) Improve Leader Change Mechanism

2014-12-01 Thread Hongchao Deng (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230735#comment-14230735
 ] 

Hongchao Deng commented on ZOOKEEPER-2076:
--

Hi [~shralex].

 Can you explain the 1st point:
bq. 1. Improve the mechanism, e.g., avoid loading snapshots, etc. during the 
handoff.

 Improve Leader Change Mechanism
 ---

 Key: ZOOKEEPER-2076
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2076
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.5.0
Reporter: Alexander Shraer

 When a leader is removed during a reconfiguration, ZOOKEEPER-107 uses a 
 mechanism where the old leader nominates the new one. Although it reduces the 
 time for a new leader to be elected, it still takes too long. This JIRA is 
 for two things:
 1. Improve the mechanism, e.g., avoid loading snapshots, etc. during the 
 handoff.
 2. Make it a first-class citizen  export it as a client API. We get 
 questions about this once in a while - how do I cause a different leader to 
 be elected ? Currently the response is either kill or reconfigure the current 
 leader.
 Any one interested to work on this ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2076) Improve Leader Change Mechanism

2014-12-01 Thread Hongchao Deng (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230740#comment-14230740
 ] 

Hongchao Deng commented on ZOOKEEPER-2076:
--

I am wondering what the details are? :)

 Improve Leader Change Mechanism
 ---

 Key: ZOOKEEPER-2076
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2076
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.5.0
Reporter: Alexander Shraer

 When a leader is removed during a reconfiguration, ZOOKEEPER-107 uses a 
 mechanism where the old leader nominates the new one. Although it reduces the 
 time for a new leader to be elected, it still takes too long. This JIRA is 
 for two things:
 1. Improve the mechanism, e.g., avoid loading snapshots, etc. during the 
 handoff.
 2. Make it a first-class citizen  export it as a client API. We get 
 questions about this once in a while - how do I cause a different leader to 
 be elected ? Currently the response is either kill or reconfigure the current 
 leader.
 Any one interested to work on this ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2076) Improve Leader Change Mechanism

2014-12-01 Thread Alexander Shraer (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230747#comment-14230747
 ] 

Alexander Shraer commented on ZOOKEEPER-2076:
-

I don't have a clear idea of what's needed for part 1. But when I measured the 
latency of the leader handoff it still took about 1 second, even though it 
should be pretty much immediate. I think this can be improved. The idea of part 
1 here is to see where this time is spent and improve if possible.

 Improve Leader Change Mechanism
 ---

 Key: ZOOKEEPER-2076
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2076
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.5.0
Reporter: Alexander Shraer

 When a leader is removed during a reconfiguration, ZOOKEEPER-107 uses a 
 mechanism where the old leader nominates the new one. Although it reduces the 
 time for a new leader to be elected, it still takes too long. This JIRA is 
 for two things:
 1. Improve the mechanism, e.g., avoid loading snapshots, etc. during the 
 handoff.
 2. Make it a first-class citizen  export it as a client API. We get 
 questions about this once in a while - how do I cause a different leader to 
 be elected ? Currently the response is either kill or reconfigure the current 
 leader.
 Any one interested to work on this ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2076) Improve Leader Change Mechanism

2014-12-01 Thread Hongchao Deng (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230749#comment-14230749
 ] 

Hongchao Deng commented on ZOOKEEPER-2076:
--

Would you mind to point out the code where handoff happens?

 Improve Leader Change Mechanism
 ---

 Key: ZOOKEEPER-2076
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2076
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.5.0
Reporter: Alexander Shraer

 When a leader is removed during a reconfiguration, ZOOKEEPER-107 uses a 
 mechanism where the old leader nominates the new one. Although it reduces the 
 time for a new leader to be elected, it still takes too long. This JIRA is 
 for two things:
 1. Improve the mechanism, e.g., avoid loading snapshots, etc. during the 
 handoff.
 2. Make it a first-class citizen  export it as a client API. We get 
 questions about this once in a while - how do I cause a different leader to 
 be elected ? Currently the response is either kill or reconfigure the current 
 leader.
 Any one interested to work on this ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2076) Improve Leader Change Mechanism

2014-12-01 Thread Alexander Shraer (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230767#comment-14230767
 ] 

Alexander Shraer commented on ZOOKEEPER-2076:
-

In Leader.java look for designatedLeader - this is where the old leader 
chooses and announces its replacement.
Then Follower.java and Observer.java get this message (also look for 
designatedleader) and call QuorumPeer.processReconfig() which gets 
suggestedLeaderId as parameter. Notice the updateVote there.
Then the follower throws an exception (because its a major change) and 
QuorumPeer goes into LOOKING state,
which invokes FastLeaderElection, and here is where the initial vote set 
earlier is used, so they all initially vote for the designated leader, which is 
supposed to converge quickly.



 Improve Leader Change Mechanism
 ---

 Key: ZOOKEEPER-2076
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2076
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.5.0
Reporter: Alexander Shraer

 When a leader is removed during a reconfiguration, ZOOKEEPER-107 uses a 
 mechanism where the old leader nominates the new one. Although it reduces the 
 time for a new leader to be elected, it still takes too long. This JIRA is 
 for two things:
 1. Improve the mechanism, e.g., avoid loading snapshots, etc. during the 
 handoff.
 2. Make it a first-class citizen  export it as a client API. We get 
 questions about this once in a while - how do I cause a different leader to 
 be elected ? Currently the response is either kill or reconfigure the current 
 leader.
 Any one interested to work on this ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2076) Improve Leader Change Mechanism

2014-12-01 Thread Hongchao Deng (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230963#comment-14230963
 ] 

Hongchao Deng commented on ZOOKEEPER-2076:
--

Hi [~shralex].

Could you do me a favor for two things:
1. share how you measure the time so I can do the same?
2. I wonder if you comment out the line
https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/server/quorum/Leader.java#L418-418
and measure the time again. It takes a snapshot, writes it to *DISK* here.

After all, I wonder if there is a way to do a simpler election because we 
already know they are synced.

 Improve Leader Change Mechanism
 ---

 Key: ZOOKEEPER-2076
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2076
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.5.0
Reporter: Alexander Shraer

 When a leader is removed during a reconfiguration, ZOOKEEPER-107 uses a 
 mechanism where the old leader nominates the new one. Although it reduces the 
 time for a new leader to be elected, it still takes too long. This JIRA is 
 for two things:
 1. Improve the mechanism, e.g., avoid loading snapshots, etc. during the 
 handoff.
 2. Make it a first-class citizen  export it as a client API. We get 
 questions about this once in a while - how do I cause a different leader to 
 be elected ? Currently the response is either kill or reconfigure the current 
 leader.
 Any one interested to work on this ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)