[ 
https://issues.apache.org/jira/browse/KAFKA-7909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arjun Satish updated KAFKA-7909:
--------------------------------
    Description: 
We recently introduced integration tests in Connect. This test spins up one or 
more Connect workers along with a Kafka broker and Zk in a single process and 
attempts to move records using a Connector. In the [Example Integration 
Test|https://github.com/apache/kafka/blob/3c73633/connect/runtime/src/test/java/org/apache/kafka/connect/integration/ExampleConnectIntegrationTest.java#L105],
 we spin up three workers each hosting a Connector task that consumes records 
from a Kafka topic. When the connector starts up, it may go through multiple 
rounds of rebalancing. We notice the following two problems in the last few 
days:
 # After members join a group, there are no pendingMembers remaining, but the 
join group method does not complete, and send these members a signal that they 
are not ready to start consuming from their respective partitions.
 # Because of quick rebalances, a consumer might have started a group, but 
Connect starts  a rebalance, after we which we create three new instances of 
the consumer (one from each worker/task). But the group coordinator seems to 
have 4 members in the group. This causes the JoinGroup to indefinitely stall. 

Even though this ticket is described in the connect of Connect, it may be 
applicable to general consumers.

  was:
We recently introduced integration tests in Connect. This test spins up one or 
more Connect workers along with a Kafka broker and Zk in a single process and 
attempts to move records using a Connector. In the Example Integration Test, we 
spin up three workers each hosting a Connector task that consumes records from 
a Kafka topic. When the connector starts up, it may go through multiple rounds 
of rebalancing. We notice the following two problems in the last few days:
 # After members join a group, there are no pendingMembers remaining, but the 
join group method does not complete, and send these members a signal that they 
are not ready to start consuming from their respective partitions.
 # Because of quick rebalances, a consumer might have started a group, but 
Connect starts  a rebalance, after we which we create three new instances of 
the consumer (one from each worker/task). But the group coordinator seems to 
have 4 members in the group. This causes the JoinGroup to indefinitely stall. 

Even though this ticket is described in the connect of Connect, it may be 
applicable to general consumers.


> Coordinator changes cause Connect integration test to fail
> ----------------------------------------------------------
>
>                 Key: KAFKA-7909
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7909
>             Project: Kafka
>          Issue Type: Bug
>          Components: consumer, core
>    Affects Versions: 2.2.0
>            Reporter: Arjun Satish
>            Priority: Blocker
>             Fix For: 2.2.0
>
>
> We recently introduced integration tests in Connect. This test spins up one 
> or more Connect workers along with a Kafka broker and Zk in a single process 
> and attempts to move records using a Connector. In the [Example Integration 
> Test|https://github.com/apache/kafka/blob/3c73633/connect/runtime/src/test/java/org/apache/kafka/connect/integration/ExampleConnectIntegrationTest.java#L105],
>  we spin up three workers each hosting a Connector task that consumes records 
> from a Kafka topic. When the connector starts up, it may go through multiple 
> rounds of rebalancing. We notice the following two problems in the last few 
> days:
>  # After members join a group, there are no pendingMembers remaining, but the 
> join group method does not complete, and send these members a signal that 
> they are not ready to start consuming from their respective partitions.
>  # Because of quick rebalances, a consumer might have started a group, but 
> Connect starts  a rebalance, after we which we create three new instances of 
> the consumer (one from each worker/task). But the group coordinator seems to 
> have 4 members in the group. This causes the JoinGroup to indefinitely stall. 
> Even though this ticket is described in the connect of Connect, it may be 
> applicable to general consumers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to