wenbingshen opened a new pull request, #3418:
URL: https://github.com/apache/bookkeeper/pull/3418

   ### Motivation
   
   fix Flaky-test: 
`BookieZKExpireTest.testBookieServerZKSessionExpireBehaviour` by disable retry 
session expired.
   
   issue #3206  was fixed by PR #3415, but it still introduced another flaky 
test: `BookieZKExpireTest.testBookieServerZKSessionExpireBehaviour`
   
   According to the investigation, the reason is: 
   `ZookeeperClient` still creates a new `zookeeper` instance when the old 
zookeeper client session time out.
   
   Due to the asynchronous execution of two threads executing bookie temporary 
node re-registration and zk instance re-creation, the test program sometimes 
succeeds and sometimes fails.
   
   1. When the temporary node re-registration is performed before the zk 
re-instantiation, the temporary node creation will use the old zk instance, 
which will cause a session timeout error, the bookie service will be shutdown, 
and the test will be successful;
    
   2. When the zk re-instantiation precedes the re-registration of the 
temporary node, the temporary node creation will use the new re-instantiated zk 
instance, then the temporary node will be successfully created, the bookie 
service is running normally, and the test fails.
   
   ```java
           try {
               connectExecutor.submit(clientCreator);
           } catch (RejectedExecutionException ree) {
               if (!closed.get()) {
                   logger.error("ZooKeeper reconnect task is rejected : ", ree);
               }
           } catch (Exception t) {
               logger.error("Failed to submit zookeeper reconnect task due to 
runtime exception : ", t);
           }
   ```
   
   ### Changes
   Add a `retryExpired` flag to indicate whether to run the zk instance and 
retry to create a new instance after the session times out.
   Set this flag to false for `ZKMetadataBookieDriver`;
   Other ZookeeperClient and normal ZookeeperClient applications will generate 
the default value true or set to true, which is consistent with the original 
behavior.
   
   ### Test the behavior of this PR:
   **Before this PR:**
   Executed the test 10 times, all failed.
   <img width="1111" alt="image" 
src="https://user-images.githubusercontent.com/35599757/180596967-9f3f4300-6ba6-4989-b35e-a275a955d139.png";>
   
   **After this PR:**
   Executed the test 10 times, all successful.
   <img width="894" alt="image" 
src="https://user-images.githubusercontent.com/35599757/180597032-01318ed6-e498-4ba4-8bd7-fb081ed8245a.png";>
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to