ritegarg opened a new pull request, #2402:
URL: https://github.com/apache/phoenix/pull/2402

   # Fix flaky test 
`HAGroupStoreClientIT.testHAGroupStoreClientWithMultiThreadedUpdates`
   
   ## Problem
   
   `testHAGroupStoreClientWithMultiThreadedUpdates` fails intermittently (~21% 
failure rate over 100 runs) with:
   
   ```
   java.lang.AssertionError: 
       at 
HAGroupStoreClientIT.testHAGroupStoreClientWithMultiThreadedUpdates(HAGroupStoreClientIT.java:450)
   ```
   
   The test writes 5 versioned updates to the same ZK node from multiple 
threads and expects exactly 5 `PathChildrenCacheListener` events. However, all 
5 writes complete within ~14ms, and Curator's `PathChildrenCache` coalesces 
rapid updates -- when a one-time ZK watch fires, `getData()` reads the latest 
value (skipping intermediate versions) before setting a new watch. This results 
in fewer events than writes, causing the fixed-count `eventsLatch` to time out.
   
   ## Fix
   
   - **Replaced `eventsLatch(threadCount)` with `finalEventLatch(1)`**: Instead 
of requiring exactly N events, wait for the event carrying the final version. 
This accommodates event coalescing while still ensuring all updates were 
processed.
   - **Added inline ordering validation in the listener**: Each received event 
version is checked against the previous using `AtomicInteger`. Any out-of-order 
delivery is recorded and asserted after the test completes.
   - **Moved `updateLatch.countDown()` to a `finally` block**: Previously, if 
`createOrUpdateDataOnZookeeper` threw an exception, `countDown()` was skipped 
and the exception was silently swallowed by the executor. Now the latch always 
decrements, and exceptions are captured and asserted separately.
   - **Made shared collections thread-safe**: `crrEventVersions` and 
`orderingErrors` use `Collections.synchronizedList`.
   - **Added resource cleanup**: `executor.shutdown()` and 
`storeClient.close()` to prevent leaks.
   
   ## Test plan
   - [x] Ran the test 100 times in a loop in IntelliJ -- 0 failures (previously 
21/100 failures)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to