jonmv commented on code in PR #1925: URL: https://github.com/apache/zookeeper/pull/1925#discussion_r998888168
########## zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/Learner.java: ########## @@ -756,13 +760,21 @@ protected void syncWithLeader(long newLeaderZxid) throws Exception { zk.startupWithoutServing(); if (zk instanceof FollowerZooKeeperServer) { FollowerZooKeeperServer fzk = (FollowerZooKeeperServer) zk; - for (PacketInFlight p : packetsNotCommitted) { + fzk.syncProcessor.setDelayForwarding(true); + for (PacketInFlight p : packetsNotLogged) { fzk.logRequest(p.hdr, p.rec, p.digest); } - packetsNotCommitted.clear(); + packetsNotLogged.clear(); Review Comment: This was the bug that would cause the learner to crash during sync, because it "forgot" a previous PROPOSAL on the NEWLEADER, and would then fail to match up that PROPOSAL with a later COMMIT, if one was sent during the sync. In turn, this caused the learner to have to re-sync, which could trigger the same crash again if there was heavy concurrent write traffic, and it would also give duplicate series of transactions in the transaction logs, with resulting transaction digest mismatch on that server (but otherwise consistent data view). So we need a separation of what's not yet written to the log, and what's not yet matched with a COMMIT, which is what these two queues are about. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@zookeeper.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org