[
https://issues.apache.org/jira/browse/ZOOKEEPER-4785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Li Wang updated ZOOKEEPER-4785:
-------------------------------
Summary: Txn loss due to race condition Learner.syncWithLeader() when
follower DIFF sync with leader (was: Txn loss due to race condition when
follower DIFF sync with leader)
> Txn loss due to race condition Learner.syncWithLeader() when follower DIFF
> sync with leader
> -------------------------------------------------------------------------------------------
>
> Key: ZOOKEEPER-4785
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4785
> Project: ZooKeeper
> Issue Type: Bug
> Components: server
> Affects Versions: 3.8.0, 3.7.1, 3.8.1, 3.7.2, 3.8.2, 3.9.1
> Reporter: Li Wang
> Assignee: Li Wang
> Priority: Major
>
> We had txn loss incident in production recently. After investigation, we
> found it was caused by the race condition of follower writing the current
> epoch and sending the ACK_LD before successfully persisting all the txns from
> DIFF sync in Learner.syncWithLeader() method.
> case Leader.NEWLEADER:
> ...
> *self.setCurrentEpoch(newEpoch);*
> writeToTxnLog = true;
> //Anything after this needs to go to the transaction log, not applied
> directly in memory
> isPreZAB1_0 = false;
> // ZOOKEEPER-3911: make sure sync the uncommitted logs before commit
> them (ACK NEWLEADER).
> sock.setSoTimeout(self.tickTime * self.syncLimit);
> self.setSyncMode(QuorumPeer.SyncMode.NONE);
> zk.startupWithoutServing();
> if (zk instanceof FollowerZooKeeperServer) {
> FollowerZooKeeperServer fzk = (FollowerZooKeeperServer) zk;
> for (PacketInFlight p : packetsNotCommitted) {
> fzk.logRequest(p.hdr, p.rec, p.digest);
> }
> packetsNotCommitted.clear();
> }
> writePacket(new QuorumPacket(Leader.ACK, newLeaderZxid, null, null),
> true);
> break;
> }
> In this method, when follower receives the NEWLEADER msg, the current epoch
> is updated before writing the uncommitted txns to the disk and writing txns
> is done asynchronously by the SyncThreadd. If follower crashes after setting
> the current epoch and sending ACK_LD and before all transactions are
> successfully written to disk, transactions loss can happen.
> This is because leader election is based on epoch first and then transaction
> id. When the follower becomes a leader because it has highest epoch, it will
> ask the other followers to truncate txns even they have been written to disk,
> causing data loss.
> The following is the scenario
> 1. Leader election happened
> 2. A follower synced with Leader via DIFF, received committed proposals from
> leader and kept them in memory
> 3. The follower received the NEWLEADER message
> 4. The follower updated the newEpoch
> 5. The follower was bounced before writing all the uncommitted txns to disk
> 6. Leader shutdown and a new election triggered
> 7. Follower became the new leader because it has largest currentEpoch
> 8. New leader asked other followers to truncate their committed txns and
> transactions got lost
--
This message was sent by Atlassian Jira
(v8.20.10#820010)