[jira] [Updated] (KAFKA-15676) Scheduled rebalance delay for Connect is unnecessarily triggered when group coordinator bounces

Chris Egerton (Jira) Tue, 24 Oct 2023 11:19:06 -0700


     [ 
https://issues.apache.org/jira/browse/KAFKA-15676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Chris Egerton updated KAFKA-15676:
----------------------------------
    Description: 
When a Connect worker loses contact with the group coordinator, it voluntarily 
gives up (i.e., stops) its assignment of connectors and tasks (for more 
context, see KAFKA-9184). However, this change in state is not relayed to the 
worker's instance of the [IncrementalCooperativeAssignor 
class|https://github.com/apache/kafka/blob/d144b7ee387308a59e52cbdabc7b66dd3b2926cc/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/IncrementalCooperativeAssignor.java].

If the group coordinator for a Connect cluster is unavailable for long enough, 
all of the workers in the cluster will revoke their assigned connectors and 
tasks and, upon rejoining the group, report that they have been assigned no 
connectors and tasks.

If a worker's member ID is reset before rejoining the group (which can happen 
if, for example, the [maximum poll 
interval|https://kafka.apache.org/documentation.html#consumerconfigs_max.poll.interval.ms]
 for the worker is exceeded), the leader of the cluster will not act as if the 
worker had rejoined the group; instead, it will act as if the worker had left 
the group and a new, unrelated worker had joined during the same rebalance. 
This will cause the scheduled rebalance delay to be triggered, and for the 
connectors and tasks previously-assigned to that worker to remain unassigned 
until the delay expires.

  was:
When a Connect worker loses contact with the group coordinator, it voluntarily 
gives up (i.e., stops) its assignment of connectors and tasks (for more 
context, see KAFKA-9184). However, this change in state is not relayed to the 
worker's instance of the [IncrementalCooperativeAssignor 
class|https://github.com/apache/kafka/blob/d144b7ee387308a59e52cbdabc7b66dd3b2926cc/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/IncrementalCooperativeAssignor.java].

If the group coordinator for a Connect cluster is unavailable for long enough, 
all of the workers in cluster will revoke their assigned connectors and tasks 
and, upon rejoining the group, report that they have been assigned no 
connectors and tasks.

If a worker's member ID is reset before rejoining the group (which can happen 
if, for example, the [maximum poll 
interval|https://kafka.apache.org/documentation.html#consumerconfigs_max.poll.interval.ms]
 for the worker is exceeded), the leader of the cluster will not act as if the 
worker had rejoined the group; instead, it will act as if the worker had left 
the group and a new, unrelated worker had joined during the same rebalance. 
This will cause the scheduled rebalance delay to be triggered, and for the 
connectors and tasks previously-assigned to that worker to remain unassigned 
until the delay expires.


> Scheduled rebalance delay for Connect is unnecessarily triggered when group 
> coordinator bounces
> -----------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-15676
>                 URL: https://issues.apache.org/jira/browse/KAFKA-15676
>             Project: Kafka
>          Issue Type: Bug
>          Components: KafkaConnect
>            Reporter: Chris Egerton
>            Assignee: Chris Egerton
>            Priority: Major
>
> When a Connect worker loses contact with the group coordinator, it 
> voluntarily gives up (i.e., stops) its assignment of connectors and tasks 
> (for more context, see KAFKA-9184). However, this change in state is not 
> relayed to the worker's instance of the [IncrementalCooperativeAssignor 
> class|https://github.com/apache/kafka/blob/d144b7ee387308a59e52cbdabc7b66dd3b2926cc/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/IncrementalCooperativeAssignor.java].
> If the group coordinator for a Connect cluster is unavailable for long 
> enough, all of the workers in the cluster will revoke their assigned 
> connectors and tasks and, upon rejoining the group, report that they have 
> been assigned no connectors and tasks.
> If a worker's member ID is reset before rejoining the group (which can happen 
> if, for example, the [maximum poll 
> interval|https://kafka.apache.org/documentation.html#consumerconfigs_max.poll.interval.ms]
>  for the worker is exceeded), the leader of the cluster will not act as if 
> the worker had rejoined the group; instead, it will act as if the worker had 
> left the group and a new, unrelated worker had joined during the same 
> rebalance. This will cause the scheduled rebalance delay to be triggered, and 
> for the connectors and tasks previously-assigned to that worker to remain 
> unassigned until the delay expires.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (KAFKA-15676) Scheduled rebalance delay for Connect is unnecessarily triggered when group coordinator bounces

Reply via email to