[ 
https://issues.apache.org/jira/browse/SLING-5435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15123504#comment-15123504
 ] 

Timothee Maret edited comment on SLING-5435 at 1/29/16 2:21 PM:
----------------------------------------------------------------

bq. If those "processes" don't exist, then it sounds like there is nothing to 
stop a faster leader election implementation that is not slowed down by the 
latency required to ensure a repository reaches a consistent state.

As I wrote in my previous comment, I think that all consumers of the 
{{TopologyEventListener}} currently in Sling do make a legit case of waiting on 
the repository. However, my point is that not all {{TopologyEventListener}} 
need to wait on the repository. I have shared a list of use cases as well. 
Thus, instead of imposing the repository wait, we could allow to configure a 
listener so that it does not have to wait on some repository sync. The 
implementation proposal has been discussed above as well.

I propose to keep the focus of this issue on that goal. Alternatively, I would 
open a separate one.

bq.  If that's really the case, then this issue can be closed and replaced by a 
new issue titled something like "Implement leader election using RAFT over the 
network" in the same way that systems like etcd perform elections.
I have already opened SLING-5423 to track that and I am working on it.
Generally, it is possible to make a "too fast" discovery "slow enough" thanks 
to the piece of code [~egli] mentioned earlier in this thread.


was (Author: marett):
bq. If those "processes" don't exist, then it sounds like there is nothing to 
stop a faster leader election implementation that is not slowed down by the 
latency required to ensure a repository reaches a consistent state.

As I wrote in my previous comment, I think that all consumers of the 
{{TopologyEventListener}} currently in Sling do make a legit case of waiting on 
the repository. However, my point is that not all {{TopologyEventListener}} 
need to wait on the repository. I have shared a list of use cases as well. 
Thus, instead of imposing the repository wait, we could allow to configure a 
listener so that it does not have to wait on some repository sync. The 
implementation proposal has been discussed above as well.

I propose to keep the focus of this issue on that goal. Alternatively, I would 
open a separate one.

bq.  If that's really the case, then this issue can be closed and replaced by a 
new issue titled something like "Implement leader election using RAFT over the 
network" in the same way that systems like etcd perform elections.
I have already opened SLING-5423 to track that and I am working on it.

> Decouple processes that depend on cluster leader elections from the cluster 
> leader elections.
> ---------------------------------------------------------------------------------------------
>
>                 Key: SLING-5435
>                 URL: https://issues.apache.org/jira/browse/SLING-5435
>             Project: Sling
>          Issue Type: Improvement
>          Components: General
>            Reporter: Ian Boston
>
> Currently there are many processes in Sling that must complete before a Sling 
> Discovery cluster leader election is declared complete. These processes 
> include things like transferring all Jobs from the old leader to the new 
> leader and waiting for the data to appear visible on the new leader. This 
> introduces an additional overhead to the leader election process which 
> introduces a higher than desirable timeout for elections and heartbeat. This 
> higher than desirable timeout precludes the use of more efficient election 
> and distributed consensus algorithms as implemented in Etcd, Zookeeper or 
> implementations of RAFT.
> If the election could be declared complete leaving individual components to 
> manage their own post election operations (ie decoupling those processes from 
> the election), then faster election or alternative Discovery implementations 
> such as the one implemented on etcd could be used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to