[jira] [Updated] (SLING-3750) Delay discovery-service readiness until first vote has finished, to avoid leader being overthrown
[ https://issues.apache.org/jira/browse/SLING-3750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carsten Ziegeler updated SLING-3750: Fix Version/s: (was: Discovery Impl 1.0.12) Discovery Impl 1.0.14 > Delay discovery-service readiness until first vote has finished, to avoid > leader being overthrown > - > > Key: SLING-3750 > URL: https://issues.apache.org/jira/browse/SLING-3750 > Project: Sling > Issue Type: Bug > Components: Extensions >Affects Versions: Discovery Impl 1.0.8 >Reporter: Stefan Egli >Assignee: Stefan Egli >Priority: Critical > Fix For: Discovery Impl 1.0.14 > > > The current implementation of discovery.impl has a subtle problem at startup. > Consider the following problem happening with two simultaneous starts: > * two (sling) instances start at roughly the same time > * the goal is to write a service which runs on one of the two only, ever > * to achieve that, on a TopologyEventListener is used to get hold of the > latest TopologyView and derive whether the local instance is leader or not > * currently, upon registration of a TopologyEventListener, a TOPOLOGY_INIT > event is sent out immediately with the current TopologyView available > * right after startup though - hence before the first voting has passed - > discovery.impl considers itself to be in so-called "isolated" mode, creates a > topology which contains only itself, and makes itself leader (since every > cluster must have a leader) > * that means, both instances will receive that isolated view in the > TOPOLOGY_INIT and are marked as leader (which is kind of right as they don't > know about any other instance yet - but also wrong as it is not yet an > established view) > * at the same time, they both start voting, then find out about each other > and establish a view where one of the two is marked as leader - hence for the > other of the two a 'coup d'etat' is happening (the leader is overthrown even > though the instance did not crash). > This is certainly very problematic and should be avoided. > The suggested way to avoid this is to delay both the time when the > discovery.impl service is registered with OSGi (by making it a @Component > only and registering it as a service explicitly after the first voting) - and > by delaying the sending of TOPOLOGY_INIT until again said first voting is > finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (SLING-3750) Delay discovery-service readiness until first vote has finished, to avoid leader being overthrown
[ https://issues.apache.org/jira/browse/SLING-3750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated SLING-3750: --- Fix Version/s: (was: Discovery Impl 1.0.10) Discovery Impl 1.0.12 > Delay discovery-service readiness until first vote has finished, to avoid > leader being overthrown > - > > Key: SLING-3750 > URL: https://issues.apache.org/jira/browse/SLING-3750 > Project: Sling > Issue Type: Bug > Components: Extensions >Affects Versions: Discovery Impl 1.0.8 >Reporter: Stefan Egli >Assignee: Stefan Egli >Priority: Critical > Fix For: Discovery Impl 1.0.12 > > > The current implementation of discovery.impl has a subtle problem at startup. > Consider the following problem happening with two simultaneous starts: > * two (sling) instances start at roughly the same time > * the goal is to write a service which runs on one of the two only, ever > * to achieve that, on a TopologyEventListener is used to get hold of the > latest TopologyView and derive whether the local instance is leader or not > * currently, upon registration of a TopologyEventListener, a TOPOLOGY_INIT > event is sent out immediately with the current TopologyView available > * right after startup though - hence before the first voting has passed - > discovery.impl considers itself to be in so-called "isolated" mode, creates a > topology which contains only itself, and makes itself leader (since every > cluster must have a leader) > * that means, both instances will receive that isolated view in the > TOPOLOGY_INIT and are marked as leader (which is kind of right as they don't > know about any other instance yet - but also wrong as it is not yet an > established view) > * at the same time, they both start voting, then find out about each other > and establish a view where one of the two is marked as leader - hence for the > other of the two a 'coup d'etat' is happening (the leader is overthrown even > though the instance did not crash). > This is certainly very problematic and should be avoided. > The suggested way to avoid this is to delay both the time when the > discovery.impl service is registered with OSGi (by making it a @Component > only and registering it as a service explicitly after the first voting) - and > by delaying the sending of TOPOLOGY_INIT until again said first voting is > finished. -- This message was sent by Atlassian JIRA (v6.2#6252)