wenbingshen opened a new pull request, #3418: URL: https://github.com/apache/bookkeeper/pull/3418
### Motivation fix Flaky-test: `BookieZKExpireTest.testBookieServerZKSessionExpireBehaviour` by disable retry session expired. issue #3206 was fixed by PR #3415, but it still introduced another flaky test: `BookieZKExpireTest.testBookieServerZKSessionExpireBehaviour` According to the investigation, the reason is: `ZookeeperClient` still creates a new `zookeeper` instance when the old zookeeper client session time out. Due to the asynchronous execution of two threads executing bookie temporary node re-registration and zk instance re-creation, the test program sometimes succeeds and sometimes fails. 1. When the temporary node re-registration is performed before the zk re-instantiation, the temporary node creation will use the old zk instance, which will cause a session timeout error, the bookie service will be shutdown, and the test will be successful; 2. When the zk re-instantiation precedes the re-registration of the temporary node, the temporary node creation will use the new re-instantiated zk instance, then the temporary node will be successfully created, the bookie service is running normally, and the test fails. ```java try { connectExecutor.submit(clientCreator); } catch (RejectedExecutionException ree) { if (!closed.get()) { logger.error("ZooKeeper reconnect task is rejected : ", ree); } } catch (Exception t) { logger.error("Failed to submit zookeeper reconnect task due to runtime exception : ", t); } ``` ### Changes Add a `retryExpired` flag to indicate whether to run the zk instance and retry to create a new instance after the session times out. Set this flag to false for `ZKMetadataBookieDriver`; Other ZookeeperClient and normal ZookeeperClient applications will generate the default value true or set to true, which is consistent with the original behavior. ### Test the behavior of this PR: **Before this PR:** Executed the test 10 times, all failed. <img width="1111" alt="image" src="https://user-images.githubusercontent.com/35599757/180596967-9f3f4300-6ba6-4989-b35e-a275a955d139.png"> **After this PR:** Executed the test 10 times, all successful. <img width="894" alt="image" src="https://user-images.githubusercontent.com/35599757/180597032-01318ed6-e498-4ba4-8bd7-fb081ed8245a.png"> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@bookkeeper.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org