[ https://issues.apache.org/jira/browse/ZOOKEEPER-4837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chang Lou updated ZOOKEEPER-4837: --------------------------------- Priority: Critical (was: Major) > Network issue causes ephemeral node unremoved after the session expiration > -------------------------------------------------------------------------- > > Key: ZOOKEEPER-4837 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4837 > Project: ZooKeeper > Issue Type: Bug > Components: quorum, server > Reporter: Dimas Shidqi Parikesit > Priority: Critical > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > In our testing cluster with the latest ZooKeeper version (66202cb), we > observed that sometimes an ephemeral node never gets deleted if there is a > network issue during the PROPOSAL request, even after the session expires. > This bug is essentially related to ZOOKEEPER-2355, but the issue was not > entirely fixed in the previous patch. We also tested on some related open PRs > (e.g., [https://github.com/apache/zookeeper/pull/2152] and > [https://github.com/apache/zookeeper/pull/1925] ), and confirmed the issue > exists after the proposed fix. > > Steps to reproduce this bug: > # Start a cluster with 3 servers, follower A, leader B, follower C > # Open a zk client in server A > # Create an ephemeral node in the client > # Inject network issue in all server that causes SocketTimeoutException > during readPacket if the packet is a PROPOSAL > # Close the client > # Wait until the cluster is stable (the leader will change between B and C > several times) > # Remove the network issue from all server > # Check every server for ephemeral node existence. The ephemeral node will > exist in server A. However, server B and C will not have the ephemeral node > anymore. > > Essentially the bug is caused by loadDatabase loading a snapshot with a > higher Zxid than the truncated log, causing fastForwardFromEdits to fail when > trying to replay the transactions. For example, if one of the follower has a > lastProcessedZxid of 0x200000001 and last snapshot snapshot.200000001, and > the leader sends a TRUNC with a zxid of 100000002, truncateLog will truncate > the follower's log to 100000002. However, loadDatabase will load > snapshot.200000001. So when fastForwardFromEdits happens, it will set the > data tree to 200000001 instead of 100000002. > > We also attached a test case to reproduce this issue. Note that this test > case is still pretty flaky at this moment. > > We proposed to fix this case by loading the database from the last snapshot > that happens before the last truncated-log entry during truncateLog. See our > PR attached. Of course, this may not be the ideal solution and we welcome > suggestions. Some other potential solutions include: > (1) Disable fastForwardDatabase in shutdown > (2) Run setLastProcessedZxid at the end of Learner's syncWithLeader function > if the packet is Leader.DIFF > > Your insights are very much appreciated. We will continue following up this > issue until it is resolved. -- This message was sent by Atlassian Jira (v8.20.10#820010)