[
https://issues.apache.org/jira/browse/ARTEMIS-2713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Francesco Nigro updated ARTEMIS-2713:
-------------------------------------
Component/s: Broker
Affects Version/s: 2.11.0
> Master failback can trigger a useless quorum vote on slave failover
> -------------------------------------------------------------------
>
> Key: ARTEMIS-2713
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2713
> Project: ActiveMQ Artemis
> Issue Type: Bug
> Components: Broker
> Affects Versions: 2.11.0
> Reporter: Francesco Nigro
> Priority: Major
>
> A shared nothing replicated master-slave pair using check-for-live-server on
> master and allow-failback on slave can trigger a (single or several) useless
> quorum vote during master restart.
> The issue can happen depending on the timing by which some messages are
> exchanged between the pair: specifically the slave, while restarting as a
> backup, will perform these operations:
> # async send STOP_CALLED on the connection with master used to send the
> replica files (ie let's call it replication connection)
> # close all the connections with master, but the replication connection
> (sending a DISCONNECT to the closing ones)
> # async send FAIL_OVER on the replication connection (waiting 5 seconds
> before giving up and move on)
> # close the replication connection
> The master, in order to restart as live, could receive the DISCONNECT before
> STOP_CALLED, believing that the slave isn't going down intentionally: this
> will make it to fire vote-retries quorum vote.
> Such quorum vote (in the happy path) will be positives and will make master
> to fail-over anyway, because the slave is already moved on and (ideally) the
> other brokers have "enough time" to update their topologies too.
> Although performing an additional quorum vote isn't a bad thing per-se, it
> could create an unnecessary long time window to await the observing cluster
> to update their topologies, slowing down an operation that is supposed
> instead to be completed quickly (in the happy path).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)