[ https://issues.apache.org/jira/browse/KAFKA-7909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Arjun Satish updated KAFKA-7909: -------------------------------- Description: We recently introduced integration tests in Connect. This test spins up one or more Connect workers along with a Kafka broker and Zk in a single process and attempts to move records using a Connector. In the [Example Integration Test|https://github.com/apache/kafka/blob/3c73633/connect/runtime/src/test/java/org/apache/kafka/connect/integration/ExampleConnectIntegrationTest.java#L105], we spin up three workers each hosting a Connector task that consumes records from a Kafka topic. When the connector starts up, it may go through multiple rounds of rebalancing. We notice the following two problems in the last few days: # After members join a group, there are no pendingMembers remaining, but the join group method does not complete, and send these members a signal that they are not ready to start consuming from their respective partitions. # Because of quick rebalances, a consumer might have started a group, but Connect starts a rebalance, after we which we create three new instances of the consumer (one from each worker/task). But the group coordinator seems to have 4 members in the group. This causes the JoinGroup to indefinitely stall. Even though this ticket is described in the connect of Connect, it may be applicable to general consumers. was: We recently introduced integration tests in Connect. This test spins up one or more Connect workers along with a Kafka broker and Zk in a single process and attempts to move records using a Connector. In the Example Integration Test, we spin up three workers each hosting a Connector task that consumes records from a Kafka topic. When the connector starts up, it may go through multiple rounds of rebalancing. We notice the following two problems in the last few days: # After members join a group, there are no pendingMembers remaining, but the join group method does not complete, and send these members a signal that they are not ready to start consuming from their respective partitions. # Because of quick rebalances, a consumer might have started a group, but Connect starts a rebalance, after we which we create three new instances of the consumer (one from each worker/task). But the group coordinator seems to have 4 members in the group. This causes the JoinGroup to indefinitely stall. Even though this ticket is described in the connect of Connect, it may be applicable to general consumers. > Coordinator changes cause Connect integration test to fail > ---------------------------------------------------------- > > Key: KAFKA-7909 > URL: https://issues.apache.org/jira/browse/KAFKA-7909 > Project: Kafka > Issue Type: Bug > Components: consumer, core > Affects Versions: 2.2.0 > Reporter: Arjun Satish > Priority: Blocker > Fix For: 2.2.0 > > > We recently introduced integration tests in Connect. This test spins up one > or more Connect workers along with a Kafka broker and Zk in a single process > and attempts to move records using a Connector. In the [Example Integration > Test|https://github.com/apache/kafka/blob/3c73633/connect/runtime/src/test/java/org/apache/kafka/connect/integration/ExampleConnectIntegrationTest.java#L105], > we spin up three workers each hosting a Connector task that consumes records > from a Kafka topic. When the connector starts up, it may go through multiple > rounds of rebalancing. We notice the following two problems in the last few > days: > # After members join a group, there are no pendingMembers remaining, but the > join group method does not complete, and send these members a signal that > they are not ready to start consuming from their respective partitions. > # Because of quick rebalances, a consumer might have started a group, but > Connect starts a rebalance, after we which we create three new instances of > the consumer (one from each worker/task). But the group coordinator seems to > have 4 members in the group. This causes the JoinGroup to indefinitely stall. > Even though this ticket is described in the connect of Connect, it may be > applicable to general consumers. -- This message was sent by Atlassian JIRA (v7.6.3#76005)