ShivrajJ commented on issue #24327:
URL: https://github.com/apache/pulsar/issues/24327#issuecomment-2898094762

   ### Description:
   In the tests module, the integration test 
PulsarWorkerRebalanceDrainTest#testRebalanceWorkers is flaky. Sometimes, after 
adding more function workers as part of the test, it fails because the function 
worker leader changes unexpectedly.
   
   **Expected Behaviour:**
   The original function worker leader should remain the same after adding more 
workers.
   
   **Actual Behaviour:**
   The leader changes unexpectedly in some cases even though the connectedSince 
field in the topic stats for the coordination topic doesn't change, and there 
are no disconnection messages in the logs before the leadership changes.
   
   **Steps to Reproduce:**
   
   - Build the docker images (pulsar, pulsar-all, pulsar-test-latest-version)
   - Run the PulsarWorkerRebalanceDrainTest#testRebalanceWorkers test 
(sometimes requires multiple runs.).
   - Observe the test logs (Cluster leader before..., Cluster leader after...)
   
   ### Notes:
   I added a debug-level log with the topic stats, so line numbers in the stack 
trace might be offset a little, see https://github.com/cognitree/pulsar/pull/22 
for the changes
   
   From my analysis, I've noticed the problem is more frequent in the Thread 
runtime, but it's pretty inconsistent on both Process and Thread runtimes.
   
   The function workers decide their leader based on a failover subscription to 
the public/functions/coordinate topic (non-partitioned), so there is no 
apparent reason for the function worker to change since the 'connectedSince' 
field for the original leader doesn't change in the topic stats.
   
   I've noticed no disconnection messages in the logs from the original leader 
until after the leadership changes, either. The original leader's producers on 
the assignment and metadata topics disconnect after it loses leadership, which 
seems to be the expected behaviour.
   
   I can add more logs from the containers to the gist if needed
   
   https://gist.github.com/ShivrajJ/134d23d79a6e122677fd5b300c4de3fa


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to