Dimas Shidqi Parikesit created ZOOKEEPER-4837:
-------------------------------------------------
Summary: Network issue causes ephemeral node unremoved after the
session expiration
Key: ZOOKEEPER-4837
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4837
Project: ZooKeeper
Issue Type: Bug
Components: quorum, server
Reporter: Dimas Shidqi Parikesit
In our testing cluster with the latest ZooKeeper version (66202cb), we observed
that sometimes an ephemeral node never gets deleted if there is a network issue
during the PROPOSAL request, even after the session expires. This bug is
essentially related to ZOOKEEPER-2355, but the issue was not entirely fixed in
the previous patch. We also tested on some related open PRs (e.g.,
[https://github.com/apache/zookeeper/pull/2152] and
[https://github.com/apache/zookeeper/pull/1925] ), and confirmed the issue
exists after the proposed fix.
Steps to reproduce this bug:
# Start a cluster with 3 servers, follower A, leader B, follower C
# Open a zk client in server A
# Create an ephemeral node in the client
# Inject network issue in all server that causes SocketTimeoutException during
readPacket if the packet is a PROPOSAL
# Close the client
# Wait until the cluster is stable (the leader will change between B and C
several times)
# Remove the network issue from all server
# Check every server for ephemeral node existence. The ephemeral node will
exist in server A. However, server B and C will not have the ephemeral node
anymore.
Essentially the bug is caused by loadDatabase loading a snapshot with a higher
Zxid than the truncated log, causing fastForwardFromEdits to fail when trying
to replay the transactions. For example, if one of the follower has a
lastProcessedZxid of 0x200000001 and last snapshot snapshot.200000001, and the
leader sends a TRUNC with a zxid of 100000002, truncateLog will truncate the
follower's log to 100000002. However, loadDatabase will load
snapshot.200000001. So when fastForwardFromEdits happens, it will set the data
tree to 200000001 instead of 100000002.
We also attached a test case to reproduce this issue. Note that this test case
is still pretty flaky at this moment.
We proposed to fix this case by loading the database from the last snapshot
that happens before the last truncated-log entry during truncateLog. See our PR
attached. Of course, this may not be the ideal solution and we welcome
suggestions. Some other potential solutions include:
(1) Disable fastForwardDatabase in shutdown
(2) Run setLastProcessedZxid at the end of Learner's syncWithLeader function if
the packet is Leader.DIFF
Your insights are very much appreciated. We will continue following up this
issue until it is resolved.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)