[
https://issues.apache.org/jira/browse/SLING-5435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15126058#comment-15126058
]
Stefan Egli commented on SLING-5435:
------------------------------------
[~cziegeler], re
bq. It might happen that the old leader did a change, which the new leader
doesn't see yet and therefore the new leader might try a similar operation
That was the original thinking behind doing the sync, yes.
A certain class of TopologyEventListeners (such as for example job handling)
can be written in a way that is resilient to repository delays. Namely by
forcing a 'repository sync' for a particular node by making a change that
forces a conflict even if old leader changes are delayed.
However, I think that this is not possible for all cases - ignoring the fact
that when writing a TopologyEventListener you might not be aware of these
subleties.
Additionally though, there might be 'derivative cases' - such as for example
the Sling Scheduler - which only check the 'leader' flag to then behave
leader-like or not. What that 'leader-like' implies is unclear and can perhaps
not always be 'guarded' via a repository-synched flag (which tries to result in
a conflict in these delay-cases).
I think it helps the default implementor of TopologyEventListener writing
leader-failover-stable code when it receives the TOPOLOGY_CHANGED only after a
repo-sync. Leaving this to each and every listener implementor might result in
duplication of code as well as less stable code..
[~marett], re
bq. For those use cases, with the given discovery API and implementations, it
is already possible to avoid the delay by polling the DiscoveryService instead
of handling the TopologyEventListener events. However, I believe it is a rather
discouraged thing.
I disagree. You cannot poll DiscoveryService to get {{isCurrent}} true *before*
the TopologyEventListeners get informed. In fact, these two things are coupled:
{{TopologyView.isCurrent()}} and the {{TOPOLOGY_CHANGED}} event are
synchronized. That is, {{isCurrent()}} only returns true once discovery starts
sending out {{TOPOLOGY_CHANGED}} events.
bq. LEADER_CHANGED
Some comments that come to mind:
* Alternative suggestion for the name: {{TOPOLOGY_CHANGED_UNSYNCHED}}
* If we introduce this event, it would have to be kept backwards-compatible, ie
after this event you'd also must get a {{TOPOLOGY_CHANGED}} event.
* I still see the risk of breaking client code with this, as some might do
things like "{{if (event.getType() != TOPOLOGY_CHANGING){}}}" or similar - in
which case the new event type might result in execution where that was not
intended..
* We should think about how we could keep the API symmetric: currently there
are two fully equivalent variants: polling via
{{DiscoveryService.getTopology()}} or push via {{TopologyEventListener}}. The
new event is so far only available via push.
* If we do it the other way, by making this a property of
{{TopologyEventListener}} ("{{unsynchronized=true}}"), we make it more
backwards compatible, as only those listeners are affected that set this
property. However it would still result in an asymmetric API and I think we
should do something about it in both cases.
* Perhaps one possibility for providing this info in the poll variant could be:
{{getUnsynchronizedTopology()}}. (sounds somewhat ugly though..)
Overall I tend to think that going via a listener property introduces less
friction, but a new event type would certainly also be possible...
> Decouple processes that depend on cluster leader elections from the cluster
> leader elections.
> ---------------------------------------------------------------------------------------------
>
> Key: SLING-5435
> URL: https://issues.apache.org/jira/browse/SLING-5435
> Project: Sling
> Issue Type: Improvement
> Components: General
> Reporter: Ian Boston
>
> Currently there are many processes in Sling that must complete before a Sling
> Discovery cluster leader election is declared complete. These processes
> include things like transferring all Jobs from the old leader to the new
> leader and waiting for the data to appear visible on the new leader. This
> introduces an additional overhead to the leader election process which
> introduces a higher than desirable timeout for elections and heartbeat. This
> higher than desirable timeout precludes the use of more efficient election
> and distributed consensus algorithms as implemented in Etcd, Zookeeper or
> implementations of RAFT.
> If the election could be declared complete leaving individual components to
> manage their own post election operations (ie decoupling those processes from
> the election), then faster election or alternative Discovery implementations
> such as the one implemented on etcd could be used.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)