Bradley Peterson created KAFKA-10633:
----------------------------------------

             Summary: Constant probing rebalances in Streams 2.6
                 Key: KAFKA-10633
                 URL: https://issues.apache.org/jira/browse/KAFKA-10633
             Project: Kafka
          Issue Type: Bug
          Components: streams
    Affects Versions: 2.6.0
            Reporter: Bradley Peterson
         Attachments: Discover 2020-10-21T23 34 03.867Z - 2020-10-21T23 44 
46.409Z.csv

We are seeing a few issues with the new rebalancing behavior in Streams 2.6. 
This ticket is for constant probing rebalances on one StreamThread, but I'll 
mention the other issues, as they may be related.

First, when we redeploy the application we see tasks being moved, even though 
the task assignment was stable before redeploying. We would expect to see tasks 
assigned back to the same instances and no movement. The application is in EC2, 
with persistent EBS volumes, and we use static group membership to avoid 
rebalancing. To redeploy the app we terminate all EC2 instances. The new 
instances will reattach the EBS volumes and use the same group member id.

After redeploying, we sometimes see the group leader go into a tight probing 
rebalance loop. This doesn't happen immediately, it could be several hours 
later. Because the redeploy caused task movement, we see expected probing 
rebalances every 10 minutes. But, then one thread will go into a tight loop 
logging messages like "Triggering the followup rebalance scheduled for 
1603323868771 ms.", handling the partition assignment (which doesn't change), 
then "Requested to schedule probing rebalance for 1603323868771 ms." This 
repeats several times a second until the app is restarted again. I'll attach a 
log export from one such incident.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to