[jira] [Created] (KAFKA-7912) In-memory key-value store does not support concurrent access
Sophie Blee-Goldman created KAFKA-7912: -- Summary: In-memory key-value store does not support concurrent access Key: KAFKA-7912 URL: https://issues.apache.org/jira/browse/KAFKA-7912 Project: Kafka Issue Type: Bug Reporter: Sophie Blee-Goldman Assignee: Sophie Blee-Goldman Currently, the in-memory key-value store uses a Map to store key-value pairs and fetches them by calling subMap and returning an iterator to this submap. This is unsafe as the submap is just a view of the original map and there is risk of concurrent access. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7466) Implement KIP-339: Create a new IncrementalAlterConfigs API
[ https://issues.apache.org/jira/browse/KAFKA-7466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16764001#comment-16764001 ] ASF GitHub Bot commented on KAFKA-7466: --- omkreddy commented on pull request #6247: KAFKA-7466: Add IncrementalAlterConfigs API (KIP-339) URL: https://github.com/apache/kafka/pull/6247 - https://cwiki.apache.org/confluence/display/KAFKA/KIP-339%3A+Create+a+new+IncrementalAlterConfigs+API ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Implement KIP-339: Create a new IncrementalAlterConfigs API > --- > > Key: KAFKA-7466 > URL: https://issues.apache.org/jira/browse/KAFKA-7466 > Project: Kafka > Issue Type: Improvement >Reporter: Colin P. McCabe >Assignee: Manikumar >Priority: Major > > Implement KIP-339: Create a new IncrementalAlterConfigs API -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-7911) Add Timezone Support for Windowed Aggregations
Matthias J. Sax created KAFKA-7911: -- Summary: Add Timezone Support for Windowed Aggregations Key: KAFKA-7911 URL: https://issues.apache.org/jira/browse/KAFKA-7911 Project: Kafka Issue Type: New Feature Components: streams Reporter: Matthias J. Sax Currently, Kafka Streams only support UTC timestamps. The impact is, that `TimeWindows` are based on UTC time only. This is problematic for 24h windows, because windows are build aligned to UTC-days, but not your local time zone. While it's possible to "shift" timestamps as a workaround, it would be better to allow native timezone support. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7864) AdminZkClient.validateTopicCreate() should validate that partitions are 0-based
[ https://issues.apache.org/jira/browse/KAFKA-7864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763858#comment-16763858 ] ASF GitHub Bot commented on KAFKA-7864: --- ctchenn commented on pull request #6246: KAFKA-7864; validate partitions are 0-based URL: https://github.com/apache/kafka/pull/6246 in AdminZkClient.validateTopicCreate(), check the partition ids are consecutive and 0-based This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > AdminZkClient.validateTopicCreate() should validate that partitions are > 0-based > --- > > Key: KAFKA-7864 > URL: https://issues.apache.org/jira/browse/KAFKA-7864 > Project: Kafka > Issue Type: Improvement >Reporter: Jun Rao >Assignee: Ryan >Priority: Major > Labels: newbie > > AdminZkClient.validateTopicCreate() currently doesn't validate that partition > ids in a topic are consecutive, starting from 0. The client code depends on > that. So, it would be useful to tighten up the check. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (KAFKA-7909) Coordinator changes cause Connect integration test to fail
[ https://issues.apache.org/jira/browse/KAFKA-7909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arjun Satish reassigned KAFKA-7909: --- Assignee: Arjun Satish > Coordinator changes cause Connect integration test to fail > -- > > Key: KAFKA-7909 > URL: https://issues.apache.org/jira/browse/KAFKA-7909 > Project: Kafka > Issue Type: Bug > Components: consumer, core >Affects Versions: 2.2.0 >Reporter: Arjun Satish >Assignee: Arjun Satish >Priority: Blocker > Fix For: 2.2.0 > > > We recently introduced integration tests in Connect. This test spins up one > or more Connect workers along with a Kafka broker and Zk in a single process > and attempts to move records using a Connector. In the [Example Integration > Test|https://github.com/apache/kafka/blob/3c73633/connect/runtime/src/test/java/org/apache/kafka/connect/integration/ExampleConnectIntegrationTest.java#L105], > we spin up three workers each hosting a Connector task that consumes records > from a Kafka topic. When the connector starts up, it may go through multiple > rounds of rebalancing. We notice the following two problems in the last few > days: > # After members join a group, there are no pendingMembers remaining, but the > join group method does not complete, and send these members a signal that > they are not ready to start consuming from their respective partitions. > # Because of quick rebalances, a consumer might have started a group, but > Connect starts a rebalance, after we which we create three new instances of > the consumer (one from each worker/task). But the group coordinator seems to > have 4 members in the group. This causes the JoinGroup to indefinitely stall. > Even though this ticket is described in the connect of Connect, it may be > applicable to general consumers. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7909) Coordinator changes cause Connect integration test to fail
[ https://issues.apache.org/jira/browse/KAFKA-7909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763812#comment-16763812 ] Arjun Satish commented on KAFKA-7909: - [~ijuma] I believe [this commit|https://github.com/apache/kafka/commit/9a9310d074ead70ebf3e93d29d880e094b9080f6] caused the regression. > Coordinator changes cause Connect integration test to fail > -- > > Key: KAFKA-7909 > URL: https://issues.apache.org/jira/browse/KAFKA-7909 > Project: Kafka > Issue Type: Bug > Components: consumer, core >Affects Versions: 2.2.0 >Reporter: Arjun Satish >Priority: Blocker > Fix For: 2.2.0 > > > We recently introduced integration tests in Connect. This test spins up one > or more Connect workers along with a Kafka broker and Zk in a single process > and attempts to move records using a Connector. In the [Example Integration > Test|https://github.com/apache/kafka/blob/3c73633/connect/runtime/src/test/java/org/apache/kafka/connect/integration/ExampleConnectIntegrationTest.java#L105], > we spin up three workers each hosting a Connector task that consumes records > from a Kafka topic. When the connector starts up, it may go through multiple > rounds of rebalancing. We notice the following two problems in the last few > days: > # After members join a group, there are no pendingMembers remaining, but the > join group method does not complete, and send these members a signal that > they are not ready to start consuming from their respective partitions. > # Because of quick rebalances, a consumer might have started a group, but > Connect starts a rebalance, after we which we create three new instances of > the consumer (one from each worker/task). But the group coordinator seems to > have 4 members in the group. This causes the JoinGroup to indefinitely stall. > Even though this ticket is described in the connect of Connect, it may be > applicable to general consumers. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-7884) Docs for message.format.version and log.message.format.version show invalid (corrupt?) "valid values"
[ https://issues.apache.org/jira/browse/KAFKA-7884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin P. McCabe updated KAFKA-7884: --- Fix Version/s: (was: 2.1.1) > Docs for message.format.version and log.message.format.version show invalid > (corrupt?) "valid values" > - > > Key: KAFKA-7884 > URL: https://issues.apache.org/jira/browse/KAFKA-7884 > Project: Kafka > Issue Type: Bug > Components: documentation >Reporter: James Cheng >Assignee: Lee Dongjin >Priority: Major > Fix For: 2.2.0 > > > In the docs for message.format.version and log.message.format.version, the > list of valid values is > > {code:java} > kafka.api.ApiVersionValidator$@56aac163 > {code} > > It appears it's simply doing a .toString on the class/instance. > At a minimum, we should remove this java-y-ness. > Even better is, it should show all the valid values. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-7876) Broker suddenly got disconnected
[ https://issues.apache.org/jira/browse/KAFKA-7876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin P. McCabe updated KAFKA-7876: --- Fix Version/s: (was: 2.1.1) > Broker suddenly got disconnected > - > > Key: KAFKA-7876 > URL: https://issues.apache.org/jira/browse/KAFKA-7876 > Project: Kafka > Issue Type: Bug > Components: controller, network >Affects Versions: 2.1.0 >Reporter: Anthony Lazam >Priority: Critical > Fix For: 2.2.0 > > Attachments: kafka-issue.png > > > > We have 3 node cluster setup. There are scenarios that one of the broker > suddenly got disconnected from the cluster but no underlying system issue is > found. The node that got dc'ed wasn't able to release the partition it holds > as the leader, hence clients (spring-boot) was unable to send/receive data > from the issued broker. > We noticed that it always happen to the active controller count. > Environment details: > Provider: AWS > Kernel: 3.10.0-693.21.1.el7.x86_64 > OS: CentOS Linux release 7.5.1804 (Core) > Scala version: 2.11 > Kafka version: 2.1.0 > Kafka config: > {code:java} > # Socket Server Settings > # > num.network.threads=3 > num.io.threads=8 > socket.send.buffer.bytes=102400 > socket.receive.buffer.bytes=102400 > socket.request.max.bytes=104857600 > # Log Basics # > num.partitions=1 > num.recovery.threads.per.data.dir=1 > # Internal Topic Settings > # > offsets.topic.replication.factor=3 > transaction.state.log.replication.factor=3 > transaction.state.log.min.isr=2 > # Log Retention Policy > # > log.retention.hours=168 > log.segment.bytes=1073741824 > log.retention.check.interval.ms=30 > # Group Coordinator Settings > # > group.initial.rebalance.delay.ms=0 > # Zookeeper # > zookeeper.connection.timeout.ms=6000 > broker.id=1 > zookeeper.connect=zk1:2181,zk2:2181,zk3:2181 > log.dirs=/data/kafka-node > advertised.listeners=PLAINTEXT://node1:9092 > {code} > Broker disconnected controller log: > {code:java} > [2019-01-26 05:03:52,512] TRACE [Controller id=2] Checking need to trigger > auto leader balancing (kafka.controller.KafkaController) > [2019-01-26 05:03:52,513] DEBUG [Controller id=2] Preferred replicas by > broker Map(TOPICS->MAP) (kafka.controller.KafkaController) > [2019-01-26 05:03:52,513] DEBUG [Controller id=2] Topics not in preferred > replica for broker 2 Map() (kafka.controller.KafkaController) > [2019-01-26 05:03:52,513] TRACE [Controller id=2] Leader imbalance ratio for > broker 2 is 0.0 (kafka.controller.KafkaController) > [2019-01-26 05:03:52,513] DEBUG [Controller id=2] Topics not in preferred > replica for broker 1 Map() (kafka.controller.KafkaController) > [2019-01-26 05:03:52,513] TRACE [Controller id=2] Leader imbalance ratio for > broker 1 is 0.0 (kafka.controller.KafkaController) > [2019-01-26 05:03:52,513] DEBUG [Controller id=2] Topics not in preferred > replica for broker 3 Map() (kafka.controller.KafkaController) > [2019-01-26 05:03:52,513] TRACE [Controller id=2] Leader imbalance ratio for > broker 3 is 0.0 (kafka.controller.KafkaController) > [2019-01-26 05:08:52,513] TRACE [Controller id=2] Checking need to trigger > auto leader balancing (kafka.controller.KafkaController) > {code} > Broker working server.log: > {code:java} > [2019-01-26 05:02:05,564] INFO [ReplicaFetcher replicaId=3, leaderId=2, > fetcherId=0] Error sending fetch request (sessionId=1637095899, > epoch=21379644) to node 2: java.io.IOException: Connection to 2 was > disconnected before the response was read. > (org.apache.kafka.clients.FetchSessionHandler) > [2019-01-26 05:02:05,573] WARN [ReplicaFetcher replicaId=3, leaderId=2, > fetcherId=0] Error in response for fetch request (type=FetchRequest, > replicaId=3, maxWait=500, minBytes=1, maxBytes=10485760, > fetchData={PlayerGameRounds-8=(offset=2171960, logStartOffset=1483356, > maxBytes=1048576, currentLeaderEpoch=Optional[2])}, > isolationLevel=READ_UNCOMMITTED, toForget=, metadata=(sessio > nId=1637095899, epoch=21379644)) (kafka.server.ReplicaFetcherThread) > java.io.IOException: Connection to 2 was disconnected before the response was > read > at > org.apache.kafka.clients.NetworkClientUtils.sendAndReceive(NetworkClientUtils.java:97) > at > kafka.server.ReplicaFetcherBlockingSend.sendRequest(ReplicaFetcherBlockingSend.scala:97) > at > kafka.server.ReplicaFetcherThread.fetchFromLeader(ReplicaFetcherThread.scala:190) > at >
[jira] [Updated] (KAFKA-7880) KafkaConnect should standardize worker thread name
[ https://issues.apache.org/jira/browse/KAFKA-7880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin P. McCabe updated KAFKA-7880: --- Fix Version/s: (was: 2.1.1) > KafkaConnect should standardize worker thread name > -- > > Key: KAFKA-7880 > URL: https://issues.apache.org/jira/browse/KAFKA-7880 > Project: Kafka > Issue Type: Improvement > Components: KafkaConnect >Affects Versions: 2.1.0 >Reporter: YeLiang >Assignee: YeLiang >Priority: Minor > > KafkaConnect will create a WorkerTask for tasks assigned to it and then > submit tasks to a thread pool. > However,the > [Worker|https://github.com/apache/kafka/blob/2.1.0/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/Worker.java] > class initializes its thread pool using a default ThreadFactory.So the > thread name will have a pattern pool-[0-9]\-thread\-[0-9]. > When we are running KafkaConnect and find that one of the task thread is > under high CPU usage, it is difficult for us to find out which task is under > high load becasue when we print out the stack of KafkaConnect, we can only > see a list of threads name pool-[0-9]\-thread\-[0-9] even if we can know the > exact pid of the high CPU usage thread > If worker threads name will be named like connectorName-taskId, it will be > very helpful -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-7897) Invalid use of epoch cache with old message format versions
[ https://issues.apache.org/jira/browse/KAFKA-7897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gustafson updated KAFKA-7897: --- Summary: Invalid use of epoch cache with old message format versions (was: Invalid use of epoch cache following message format downgrade) > Invalid use of epoch cache with old message format versions > --- > > Key: KAFKA-7897 > URL: https://issues.apache.org/jira/browse/KAFKA-7897 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson >Assignee: Jason Gustafson >Priority: Major > > Message format downgrades are not supported, but they generally work as long > as broker/clients at least can continue to parse both message formats. After > a downgrade, the truncation logic should revert to using the high watermark, > but currently we use the existence of any cached epoch as the sole > prerequisite in order to leverage OffsetsForLeaderEpoch. This has the effect > of causing a massive truncation after startup which causes re-replication. > I think our options to fix this are to either 1) clear the cache when we > notice a downgrade, or 2) forbid downgrades and raise an error. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7897) Invalid use of epoch cache following message format downgrade
[ https://issues.apache.org/jira/browse/KAFKA-7897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763766#comment-16763766 ] ASF GitHub Bot commented on KAFKA-7897: --- hachikuji commented on pull request #6232: KAFKA-7897; Disable leader epoch cache when older message formats are used URL: https://github.com/apache/kafka/pull/6232 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Invalid use of epoch cache following message format downgrade > - > > Key: KAFKA-7897 > URL: https://issues.apache.org/jira/browse/KAFKA-7897 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson >Assignee: Jason Gustafson >Priority: Major > > Message format downgrades are not supported, but they generally work as long > as broker/clients at least can continue to parse both message formats. After > a downgrade, the truncation logic should revert to using the high watermark, > but currently we use the existence of any cached epoch as the sole > prerequisite in order to leverage OffsetsForLeaderEpoch. This has the effect > of causing a massive truncation after startup which causes re-replication. > I think our options to fix this are to either 1) clear the cache when we > notice a downgrade, or 2) forbid downgrades and raise an error. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7909) Coordinator changes cause Connect integration test to fail
[ https://issues.apache.org/jira/browse/KAFKA-7909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763622#comment-16763622 ] Ismael Juma commented on KAFKA-7909: Thanks for the report. Do you know which change caused this regression? > Coordinator changes cause Connect integration test to fail > -- > > Key: KAFKA-7909 > URL: https://issues.apache.org/jira/browse/KAFKA-7909 > Project: Kafka > Issue Type: Bug > Components: consumer, core >Affects Versions: 2.2.0 >Reporter: Arjun Satish >Priority: Blocker > Fix For: 2.2.0 > > > We recently introduced integration tests in Connect. This test spins up one > or more Connect workers along with a Kafka broker and Zk in a single process > and attempts to move records using a Connector. In the [Example Integration > Test|https://github.com/apache/kafka/blob/3c73633/connect/runtime/src/test/java/org/apache/kafka/connect/integration/ExampleConnectIntegrationTest.java#L105], > we spin up three workers each hosting a Connector task that consumes records > from a Kafka topic. When the connector starts up, it may go through multiple > rounds of rebalancing. We notice the following two problems in the last few > days: > # After members join a group, there are no pendingMembers remaining, but the > join group method does not complete, and send these members a signal that > they are not ready to start consuming from their respective partitions. > # Because of quick rebalances, a consumer might have started a group, but > Connect starts a rebalance, after we which we create three new instances of > the consumer (one from each worker/task). But the group coordinator seems to > have 4 members in the group. This causes the JoinGroup to indefinitely stall. > Even though this ticket is described in the connect of Connect, it may be > applicable to general consumers. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-7910) Document retention.ms behavior with record timestamp
Stanislav Kozlovski created KAFKA-7910: -- Summary: Document retention.ms behavior with record timestamp Key: KAFKA-7910 URL: https://issues.apache.org/jira/browse/KAFKA-7910 Project: Kafka Issue Type: Improvement Reporter: Stanislav Kozlovski It is intuitive to believe that `log.retention.ms` starts applying once a log file is closed. The documentation says: > This configuration controls the maximum time we will retain a log before we >will discard old log segments to free up space if we are using the "delete" >retention policy. Yet, the actual behavior is that we take into account the largest timestamp of that segment file ([https://github.com/apache/kafka/blob/4cdbb3e5c19142d118f0f3999dd3e21deccb3643/core/src/main/scala/kafka/log/Log.scala#L1246)|https://github.com/apache/kafka/blob/4cdbb3e5c19142d118f0f3999dd3e21deccb3643/core/src/main/scala/kafka/log/Log.scala#L1246).] and then consider `retention.ms` on top of that. This means that if Kafka is configured with `log.message.timestamp.type=CreateTime` (as it is by default), any records that have a future timestamp set by the producer will not get deleted as expected by the initial intuition (and documentation) of the `log.retention.ms`. We should document the behavior of `retention.ms` with the record timestamp. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-7909) Coordinator changes cause Connect integration test to fail
[ https://issues.apache.org/jira/browse/KAFKA-7909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arjun Satish updated KAFKA-7909: Component/s: core > Coordinator changes cause Connect integration test to fail > -- > > Key: KAFKA-7909 > URL: https://issues.apache.org/jira/browse/KAFKA-7909 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 2.2.0 >Reporter: Arjun Satish >Priority: Blocker > Fix For: 2.2.0 > > > We recently introduced integration tests in Connect. This test spins up one > or more Connect workers along with a Kafka broker and Zk in a single process > and attempts to move records using a Connector. In the Example Integration > Test, we spin up three workers each hosting a Connector task that consumes > records from a Kafka topic. When the connector starts up, it may go through > multiple rounds of rebalancing. We notice the following two problems in the > last few days: > # After members join a group, there are no pendingMembers remaining, but the > join group method does not complete, and send these members a signal that > they are not ready to start consuming from their respective partitions. > # Because of quick rebalances, a consumer might have started a group, but > Connect starts a rebalance, after we which we create three new instances of > the consumer (one from each worker/task). But the group coordinator seems to > have 4 members in the group. This causes the JoinGroup to indefinitely stall. > Even though this ticket is described in the connect of Connect, it may be > applicable to general consumers. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-7909) Coordinator changes cause Connect integration test to fail
[ https://issues.apache.org/jira/browse/KAFKA-7909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arjun Satish updated KAFKA-7909: Description: We recently introduced integration tests in Connect. This test spins up one or more Connect workers along with a Kafka broker and Zk in a single process and attempts to move records using a Connector. In the [Example Integration Test|https://github.com/apache/kafka/blob/3c73633/connect/runtime/src/test/java/org/apache/kafka/connect/integration/ExampleConnectIntegrationTest.java#L105], we spin up three workers each hosting a Connector task that consumes records from a Kafka topic. When the connector starts up, it may go through multiple rounds of rebalancing. We notice the following two problems in the last few days: # After members join a group, there are no pendingMembers remaining, but the join group method does not complete, and send these members a signal that they are not ready to start consuming from their respective partitions. # Because of quick rebalances, a consumer might have started a group, but Connect starts a rebalance, after we which we create three new instances of the consumer (one from each worker/task). But the group coordinator seems to have 4 members in the group. This causes the JoinGroup to indefinitely stall. Even though this ticket is described in the connect of Connect, it may be applicable to general consumers. was: We recently introduced integration tests in Connect. This test spins up one or more Connect workers along with a Kafka broker and Zk in a single process and attempts to move records using a Connector. In the Example Integration Test, we spin up three workers each hosting a Connector task that consumes records from a Kafka topic. When the connector starts up, it may go through multiple rounds of rebalancing. We notice the following two problems in the last few days: # After members join a group, there are no pendingMembers remaining, but the join group method does not complete, and send these members a signal that they are not ready to start consuming from their respective partitions. # Because of quick rebalances, a consumer might have started a group, but Connect starts a rebalance, after we which we create three new instances of the consumer (one from each worker/task). But the group coordinator seems to have 4 members in the group. This causes the JoinGroup to indefinitely stall. Even though this ticket is described in the connect of Connect, it may be applicable to general consumers. > Coordinator changes cause Connect integration test to fail > -- > > Key: KAFKA-7909 > URL: https://issues.apache.org/jira/browse/KAFKA-7909 > Project: Kafka > Issue Type: Bug > Components: consumer, core >Affects Versions: 2.2.0 >Reporter: Arjun Satish >Priority: Blocker > Fix For: 2.2.0 > > > We recently introduced integration tests in Connect. This test spins up one > or more Connect workers along with a Kafka broker and Zk in a single process > and attempts to move records using a Connector. In the [Example Integration > Test|https://github.com/apache/kafka/blob/3c73633/connect/runtime/src/test/java/org/apache/kafka/connect/integration/ExampleConnectIntegrationTest.java#L105], > we spin up three workers each hosting a Connector task that consumes records > from a Kafka topic. When the connector starts up, it may go through multiple > rounds of rebalancing. We notice the following two problems in the last few > days: > # After members join a group, there are no pendingMembers remaining, but the > join group method does not complete, and send these members a signal that > they are not ready to start consuming from their respective partitions. > # Because of quick rebalances, a consumer might have started a group, but > Connect starts a rebalance, after we which we create three new instances of > the consumer (one from each worker/task). But the group coordinator seems to > have 4 members in the group. This causes the JoinGroup to indefinitely stall. > Even though this ticket is described in the connect of Connect, it may be > applicable to general consumers. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-7909) Coordinator changes cause Connect integration test to fail
[ https://issues.apache.org/jira/browse/KAFKA-7909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arjun Satish updated KAFKA-7909: Component/s: consumer > Coordinator changes cause Connect integration test to fail > -- > > Key: KAFKA-7909 > URL: https://issues.apache.org/jira/browse/KAFKA-7909 > Project: Kafka > Issue Type: Bug > Components: consumer, core >Affects Versions: 2.2.0 >Reporter: Arjun Satish >Priority: Blocker > Fix For: 2.2.0 > > > We recently introduced integration tests in Connect. This test spins up one > or more Connect workers along with a Kafka broker and Zk in a single process > and attempts to move records using a Connector. In the Example Integration > Test, we spin up three workers each hosting a Connector task that consumes > records from a Kafka topic. When the connector starts up, it may go through > multiple rounds of rebalancing. We notice the following two problems in the > last few days: > # After members join a group, there are no pendingMembers remaining, but the > join group method does not complete, and send these members a signal that > they are not ready to start consuming from their respective partitions. > # Because of quick rebalances, a consumer might have started a group, but > Connect starts a rebalance, after we which we create three new instances of > the consumer (one from each worker/task). But the group coordinator seems to > have 4 members in the group. This causes the JoinGroup to indefinitely stall. > Even though this ticket is described in the connect of Connect, it may be > applicable to general consumers. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-7909) Coordinator changes cause Connect integration test to fail
[ https://issues.apache.org/jira/browse/KAFKA-7909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arjun Satish updated KAFKA-7909: Description: We recently introduced integration tests in Connect. This test spins up one or more Connect workers along with a Kafka broker and Zk in a single process and attempts to move records using a Connector. In the Example Integration Test, we spin up three workers each hosting a Connector task that consumes records from a Kafka topic. When the connector starts up, it may go through multiple rounds of rebalancing. We notice the following two problems in the last few days: # After members join a group, there are no pendingMembers remaining, but the join group method does not complete, and send these members a signal that they are not ready to start consuming from their respective partitions. # Because of quick rebalances, a consumer might have started a group, but Connect starts a rebalance, after we which we create three new instances of the consumer (one from each worker/task). But the group coordinator seems to have 4 members in the group. This causes the JoinGroup to indefinitely stall. Even though this ticket is described in the connect of Connect, it may be applicable to general consumers. was: We recently introduced integration tests in Connect. This test spins up one or more Connect workers along with a Kafka broker and Zk in a single process and attempts to move records using a Connector. In the Example Integration Test, we spin up three workers each hosting a Connector task that consumes records from a Kafka topic. When the connector starts up, it may go through multiple rounds of rebalancing. We notice the following two problems in the last few days: # After members join a group, there are no pendingMembers remaining, but the join group method does not complete, and send these members a signal that they are not ready to start consuming from their respective partitions. # Because of quick rebalances, a consumer might have started a group, but Connect starts a rebalance, after we which we create three new instances of the consumer (one from each worker/task). But the group coordinator seems to have 4 members in the group. This causes the JoinGroup to indefinitely stall. > Coordinator changes cause Connect integration test to fail > -- > > Key: KAFKA-7909 > URL: https://issues.apache.org/jira/browse/KAFKA-7909 > Project: Kafka > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Arjun Satish >Priority: Blocker > Fix For: 2.2.0 > > > We recently introduced integration tests in Connect. This test spins up one > or more Connect workers along with a Kafka broker and Zk in a single process > and attempts to move records using a Connector. In the Example Integration > Test, we spin up three workers each hosting a Connector task that consumes > records from a Kafka topic. When the connector starts up, it may go through > multiple rounds of rebalancing. We notice the following two problems in the > last few days: > # After members join a group, there are no pendingMembers remaining, but the > join group method does not complete, and send these members a signal that > they are not ready to start consuming from their respective partitions. > # Because of quick rebalances, a consumer might have started a group, but > Connect starts a rebalance, after we which we create three new instances of > the consumer (one from each worker/task). But the group coordinator seems to > have 4 members in the group. This causes the JoinGroup to indefinitely stall. > Even though this ticket is described in the connect of Connect, it may be > applicable to general consumers. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-7909) Coordinator changes cause Connect integration test to fail
Arjun Satish created KAFKA-7909: --- Summary: Coordinator changes cause Connect integration test to fail Key: KAFKA-7909 URL: https://issues.apache.org/jira/browse/KAFKA-7909 Project: Kafka Issue Type: Bug Affects Versions: 2.2.0 Reporter: Arjun Satish Fix For: 2.2.0 We recently introduced integration tests in Connect. This test spins up one or more Connect workers along with a Kafka broker and Zk in a single process and attempts to move records using a Connector. In the Example Integration Test, we spin up three workers each hosting a Connector task that consumes records from a Kafka topic. When the connector starts up, it may go through multiple rounds of rebalancing. We notice the following two problems in the last few days: # After members join a group, there are no pendingMembers remaining, but the join group method does not complete, and send these members a signal that they are not ready to start consuming from their respective partitions. # Because of quick rebalances, a consumer might have started a group, but Connect starts a rebalance, after we which we create three new instances of the consumer (one from each worker/task). But the group coordinator seems to have 4 members in the group. This causes the JoinGroup to indefinitely stall. -- This message was sent by Atlassian JIRA (v7.6.3#76005)