Konstantine Karantasis created KAFKA-9848:
---------------------------------------------

             Summary: Avoid triggering scheduled rebalance delay when task 
assignment fails but Connect workers remain in the group
                 Key: KAFKA-9848
                 URL: https://issues.apache.org/jira/browse/KAFKA-9848
             Project: Kafka
          Issue Type: Bug
          Components: KafkaConnect
    Affects Versions: 2.4.1, 2.3.1, 2.5.0
            Reporter: Konstantine Karantasis
            Assignee: Konstantine Karantasis


There are cases where a Connect worker does not receive its tasks assignments 
successfully after a rebalance but will still remain in the group. For example 
when a SyncGroup response is lost, a worker will not get its expected 
assignments but will rejoin the group immediately and will trigger another 
rebalance. 

With incremental cooperative rebalancing, tasks assignments that are computed 
and sent by the leader but are not received by any of the members are marked as 
lost assignments in the subsequent rebalance. The presence of lost assignments 
activates the scheduled rebalance delay (property) and the missing tasks are 
not assigned until this delay expires.


This situation can be improved in two cases: 
a) When it's the leader that failed to receive the new assignments from the 
broker coordinator (for example if the SyncGroup request or response was lost). 
If this worker remains the leader of the group in the subsequent rebalance 
round, it can detect that the previous assignment was not successfully applied 
by checking what's the expected generation.

b) If one or more regular members did not receive their assignments 
successfully, but have joined the latest round of rebalancing, they can be 
assigned the tasks that remain unassigned from the previous assignment 
immediately without these tasks being marked as lost. The leader can detect 
that by checking that some tasks seem lost since the previous assignment but 
also the number of workers is unchanged between the two rounds of rebalancing. 
In this case, the leader can go ahead and assign the missing tasks as new tasks 
immediately.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to