[jira] [Commented] (KAFKA-8104) Consumer cannot rejoin to the group after rebalancing
[ https://issues.apache.org/jira/browse/KAFKA-8104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16953423#comment-16953423 ] ASF GitHub Bot commented on KAFKA-8104: --- guozhangwang commented on pull request #7460: KAFKA-8104: Consumer cannot rejoin to the group after rebalancing URL: https://github.com/apache/kafka/pull/7460 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Consumer cannot rejoin to the group after rebalancing > - > > Key: KAFKA-8104 > URL: https://issues.apache.org/jira/browse/KAFKA-8104 > Project: Kafka > Issue Type: Bug > Components: consumer >Affects Versions: 2.0.0, 2.1.0, 2.2.0, 2.3.0 >Reporter: Gregory Koshelev >Assignee: Nikolay Izhikov >Priority: Critical > Attachments: consumer-rejoin-fail.log > > > TL;DR; {{KafkaConsumer}} cannot rejoin to the group due to inconsistent > {{AbstractCoordinator.generation}} (which is {{NO_GENERATION}} and > {{AbstractCoordinator.joinFuture}} (which is succeeded {{RequestFuture}}). > See explanation below. > There are 16 consumers in single process (threads from pool-4-thread-1 to > pool-4-thread-16). All of them belong to single consumer group > {{hercules.sink.elastic.legacy_logs_elk_c2}}. Rebalancing has been acquired > and consumers have got {{CommitFailedException}} as expected: > {noformat} > 2019-03-10T03:16:37.023Z [pool-4-thread-10] WARN > r.k.vostok.hercules.sink.SimpleSink - Commit failed due to rebalancing > org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be > completed since the group has already rebalanced and assigned the partitions > to another member. This means that the time between subsequent calls to > poll() was longer than the configured max.poll.interval.ms, which typically > implies that the poll loop is spending too much time message processing. You > can address this either by increasing the session timeout or by reducing the > maximum size of batches returned in poll() with max.poll.records. > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(ConsumerCoordinator.java:798) > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:681) > at > org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1334) > at > org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1298) > at ru.kontur.vostok.hercules.sink.Sink.commit(Sink.java:156) > at ru.kontur.vostok.hercules.sink.SimpleSink.run(SimpleSink.java:104) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {noformat} > After that, most of them successfully rejoined to the group with generation > 10699: > {noformat} > 2019-03-10T03:16:39.208Z [pool-4-thread-13] INFO > o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-13, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Successfully joined group > with generation 10699 > 2019-03-10T03:16:39.209Z [pool-4-thread-13] INFO > o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-13, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Setting newly assigned > partitions [legacy_logs_elk_c2-18] > ... > 2019-03-10T03:16:39.216Z [pool-4-thread-11] INFO > o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-11, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Successfully joined group > with generation 10699 > 2019-03-10T03:16:39.217Z [pool-4-thread-11] INFO > o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-11, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Setting newly assigned > partitions [legacy_logs_elk_c2-10, legacy_logs_elk_c2-11] > ... > 2019-03-10T03:16:39.218Z [pool-4-thread-15] INFO > o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-15, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Setting newly assigned > partitions [legacy_logs_elk_c2-24] > 2019-03-10T03:16:42.320Z [kafka-coordinator-heartbeat-thread | > hercules.sink.elastic.legacy_logs_elk_c2] INFO > o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-6, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Attempt to heartbeat failed > since group is rebalancing > 2019-03-10T03:16:42.320Z [kafka-coordinator-heartbeat-thread | >
[jira] [Commented] (KAFKA-8104) Consumer cannot rejoin to the group after rebalancing
[ https://issues.apache.org/jira/browse/KAFKA-8104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16946208#comment-16946208 ] Guozhang Wang commented on KAFKA-8104: -- [~nizhikov] Thanks for picking this up, I think this issue is the same as reported in https://issues.apache.org/jira/browse/KAFKA-8891 and https://issues.apache.org/jira/browse/KAFKA-7263 as well, which is a long-lurking bug. So I'm linking them together now. I will take a look into your PR as well. > Consumer cannot rejoin to the group after rebalancing > - > > Key: KAFKA-8104 > URL: https://issues.apache.org/jira/browse/KAFKA-8104 > Project: Kafka > Issue Type: Bug > Components: consumer >Affects Versions: 2.0.0, 2.1.0, 2.2.0, 2.3.0 >Reporter: Gregory Koshelev >Assignee: Nikolay Izhikov >Priority: Critical > Attachments: consumer-rejoin-fail.log > > > TL;DR; {{KafkaConsumer}} cannot rejoin to the group due to inconsistent > {{AbstractCoordinator.generation}} (which is {{NO_GENERATION}} and > {{AbstractCoordinator.joinFuture}} (which is succeeded {{RequestFuture}}). > See explanation below. > There are 16 consumers in single process (threads from pool-4-thread-1 to > pool-4-thread-16). All of them belong to single consumer group > {{hercules.sink.elastic.legacy_logs_elk_c2}}. Rebalancing has been acquired > and consumers have got {{CommitFailedException}} as expected: > {noformat} > 2019-03-10T03:16:37.023Z [pool-4-thread-10] WARN > r.k.vostok.hercules.sink.SimpleSink - Commit failed due to rebalancing > org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be > completed since the group has already rebalanced and assigned the partitions > to another member. This means that the time between subsequent calls to > poll() was longer than the configured max.poll.interval.ms, which typically > implies that the poll loop is spending too much time message processing. You > can address this either by increasing the session timeout or by reducing the > maximum size of batches returned in poll() with max.poll.records. > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(ConsumerCoordinator.java:798) > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:681) > at > org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1334) > at > org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1298) > at ru.kontur.vostok.hercules.sink.Sink.commit(Sink.java:156) > at ru.kontur.vostok.hercules.sink.SimpleSink.run(SimpleSink.java:104) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {noformat} > After that, most of them successfully rejoined to the group with generation > 10699: > {noformat} > 2019-03-10T03:16:39.208Z [pool-4-thread-13] INFO > o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-13, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Successfully joined group > with generation 10699 > 2019-03-10T03:16:39.209Z [pool-4-thread-13] INFO > o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-13, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Setting newly assigned > partitions [legacy_logs_elk_c2-18] > ... > 2019-03-10T03:16:39.216Z [pool-4-thread-11] INFO > o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-11, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Successfully joined group > with generation 10699 > 2019-03-10T03:16:39.217Z [pool-4-thread-11] INFO > o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-11, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Setting newly assigned > partitions [legacy_logs_elk_c2-10, legacy_logs_elk_c2-11] > ... > 2019-03-10T03:16:39.218Z [pool-4-thread-15] INFO > o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-15, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Setting newly assigned > partitions [legacy_logs_elk_c2-24] > 2019-03-10T03:16:42.320Z [kafka-coordinator-heartbeat-thread | > hercules.sink.elastic.legacy_logs_elk_c2] INFO > o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-6, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Attempt to heartbeat failed > since group is rebalancing > 2019-03-10T03:16:42.320Z [kafka-coordinator-heartbeat-thread | > hercules.sink.elastic.legacy_logs_elk_c2] INFO > o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-5, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Attempt to heartbeat failed > since group is
[jira] [Commented] (KAFKA-8104) Consumer cannot rejoin to the group after rebalancing
[ https://issues.apache.org/jira/browse/KAFKA-8104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16946114#comment-16946114 ] ASF GitHub Bot commented on KAFKA-8104: --- nizhikov commented on pull request #7460: KAFKA-8104: Consumer cannot rejoin to the group after rebalancing URL: https://github.com/apache/kafka/pull/7460 This PR contains the fix of race condition bug between "consumer thread" and "consumer coordinator heartbeat thread". It reproduces in many production environments. Condition for reproducing: 1. Consumer thread initiates rejoin to the group because of commit timeout. Call of `AbstractCoordinator#joinGroupIfNeeded` which leads to `sendJoinGroupRequest`. 2. `JoinGroupResponseHandler` writes to the `AbstractCoordinator.this.generation` new generation data and leaves the` synchronized` section. 3. Heartbeat thread executes `mabeLeaveGroup` and clears generation data via `resetGenerationOnLeaveGroup`. 4. Consumer thread executes `onJoinComplete(generation.generationId, generation.memberId, generation.protocol, memberAssignment);` with the cleared generation data. This leads to the corresponding exception. The race fixed with the condition in `maybeLeaveGroup`: if we have ongoing rejoin process in consumer thread there is no reason to reset generation data and send `LeaveGroupRequest` in heartbeat thread. This PR contains unfair "reproducer". It implemented with the `CountDownLatch` that imitates described race in `AbstractCoordinator` code. I need assistance on how should be fair reproducer implemented. ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Consumer cannot rejoin to the group after rebalancing > - > > Key: KAFKA-8104 > URL: https://issues.apache.org/jira/browse/KAFKA-8104 > Project: Kafka > Issue Type: Bug > Components: consumer >Affects Versions: 2.0.0, 2.1.0, 2.2.0, 2.3.0 >Reporter: Gregory Koshelev >Assignee: Nikolay Izhikov >Priority: Critical > Attachments: consumer-rejoin-fail.log > > > TL;DR; {{KafkaConsumer}} cannot rejoin to the group due to inconsistent > {{AbstractCoordinator.generation}} (which is {{NO_GENERATION}} and > {{AbstractCoordinator.joinFuture}} (which is succeeded {{RequestFuture}}). > See explanation below. > There are 16 consumers in single process (threads from pool-4-thread-1 to > pool-4-thread-16). All of them belong to single consumer group > {{hercules.sink.elastic.legacy_logs_elk_c2}}. Rebalancing has been acquired > and consumers have got {{CommitFailedException}} as expected: > {noformat} > 2019-03-10T03:16:37.023Z [pool-4-thread-10] WARN > r.k.vostok.hercules.sink.SimpleSink - Commit failed due to rebalancing > org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be > completed since the group has already rebalanced and assigned the partitions > to another member. This means that the time between subsequent calls to > poll() was longer than the configured max.poll.interval.ms, which typically > implies that the poll loop is spending too much time message processing. You > can address this either by increasing the session timeout or by reducing the > maximum size of batches returned in poll() with max.poll.records. > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(ConsumerCoordinator.java:798) > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:681) > at > org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1334) > at > org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1298) > at ru.kontur.vostok.hercules.sink.Sink.commit(Sink.java:156) > at ru.kontur.vostok.hercules.sink.SimpleSink.run(SimpleSink.java:104) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {noformat} > After that, most of them successfully rejoined to the group with generation > 10699: > {noformat} > 2019-03-10T03:16:39.208Z [pool-4-thread-13] INFO > o.a.k.c.c.i.AbstractCoordinator -
[jira] [Commented] (KAFKA-8104) Consumer cannot rejoin to the group after rebalancing
[ https://issues.apache.org/jira/browse/KAFKA-8104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16946061#comment-16946061 ] Nikolay Izhikov commented on KAFKA-8104: Reproducer simplified and updated. > Consumer cannot rejoin to the group after rebalancing > - > > Key: KAFKA-8104 > URL: https://issues.apache.org/jira/browse/KAFKA-8104 > Project: Kafka > Issue Type: Bug > Components: consumer >Affects Versions: 2.0.0, 2.1.0, 2.2.0, 2.3.0 >Reporter: Gregory Koshelev >Assignee: Nikolay Izhikov >Priority: Critical > Attachments: consumer-rejoin-fail.log > > > TL;DR; {{KafkaConsumer}} cannot rejoin to the group due to inconsistent > {{AbstractCoordinator.generation}} (which is {{NO_GENERATION}} and > {{AbstractCoordinator.joinFuture}} (which is succeeded {{RequestFuture}}). > See explanation below. > There are 16 consumers in single process (threads from pool-4-thread-1 to > pool-4-thread-16). All of them belong to single consumer group > {{hercules.sink.elastic.legacy_logs_elk_c2}}. Rebalancing has been acquired > and consumers have got {{CommitFailedException}} as expected: > {noformat} > 2019-03-10T03:16:37.023Z [pool-4-thread-10] WARN > r.k.vostok.hercules.sink.SimpleSink - Commit failed due to rebalancing > org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be > completed since the group has already rebalanced and assigned the partitions > to another member. This means that the time between subsequent calls to > poll() was longer than the configured max.poll.interval.ms, which typically > implies that the poll loop is spending too much time message processing. You > can address this either by increasing the session timeout or by reducing the > maximum size of batches returned in poll() with max.poll.records. > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(ConsumerCoordinator.java:798) > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:681) > at > org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1334) > at > org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1298) > at ru.kontur.vostok.hercules.sink.Sink.commit(Sink.java:156) > at ru.kontur.vostok.hercules.sink.SimpleSink.run(SimpleSink.java:104) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {noformat} > After that, most of them successfully rejoined to the group with generation > 10699: > {noformat} > 2019-03-10T03:16:39.208Z [pool-4-thread-13] INFO > o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-13, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Successfully joined group > with generation 10699 > 2019-03-10T03:16:39.209Z [pool-4-thread-13] INFO > o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-13, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Setting newly assigned > partitions [legacy_logs_elk_c2-18] > ... > 2019-03-10T03:16:39.216Z [pool-4-thread-11] INFO > o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-11, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Successfully joined group > with generation 10699 > 2019-03-10T03:16:39.217Z [pool-4-thread-11] INFO > o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-11, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Setting newly assigned > partitions [legacy_logs_elk_c2-10, legacy_logs_elk_c2-11] > ... > 2019-03-10T03:16:39.218Z [pool-4-thread-15] INFO > o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-15, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Setting newly assigned > partitions [legacy_logs_elk_c2-24] > 2019-03-10T03:16:42.320Z [kafka-coordinator-heartbeat-thread | > hercules.sink.elastic.legacy_logs_elk_c2] INFO > o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-6, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Attempt to heartbeat failed > since group is rebalancing > 2019-03-10T03:16:42.320Z [kafka-coordinator-heartbeat-thread | > hercules.sink.elastic.legacy_logs_elk_c2] INFO > o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-5, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Attempt to heartbeat failed > since group is rebalancing > 2019-03-10T03:16:42.323Z [kafka-coordinator-heartbeat-thread | > hercules.sink.elastic.legacy_logs_elk_c2] INFO > o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-7, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Attempt to
[jira] [Commented] (KAFKA-8104) Consumer cannot rejoin to the group after rebalancing
[ https://issues.apache.org/jira/browse/KAFKA-8104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16944730#comment-16944730 ] Nikolay Izhikov commented on KAFKA-8104: Hello. I wrote a reproducer [1] for this issue. It generates the output [2] exact to the issue description. It requires to emulate specific consumer threads execution path. It emulated via CyclicBarrier. Seems, we have a synchronization issue. [~guozhang] can you take a look? Is reproducer correct? Can you, please, assist me on how it should be fixed? [1] https://github.com/nizhikov/kafka/pull/1/files#diff-d5e6de9941a0199b89a70d312962d546 [2] https://gist.github.com/nizhikov/8c4d5e0c78fd634a97557fdc0ece0097#file-kafka-8104-reproducer-output-L1578 > Consumer cannot rejoin to the group after rebalancing > - > > Key: KAFKA-8104 > URL: https://issues.apache.org/jira/browse/KAFKA-8104 > Project: Kafka > Issue Type: Bug > Components: consumer >Affects Versions: 2.0.0, 2.1.0, 2.2.0, 2.3.0 >Reporter: Gregory Koshelev >Assignee: Nikolay Izhikov >Priority: Critical > Attachments: consumer-rejoin-fail.log > > > TL;DR; {{KafkaConsumer}} cannot rejoin to the group due to inconsistent > {{AbstractCoordinator.generation}} (which is {{NO_GENERATION}} and > {{AbstractCoordinator.joinFuture}} (which is succeeded {{RequestFuture}}). > See explanation below. > There are 16 consumers in single process (threads from pool-4-thread-1 to > pool-4-thread-16). All of them belong to single consumer group > {{hercules.sink.elastic.legacy_logs_elk_c2}}. Rebalancing has been acquired > and consumers have got {{CommitFailedException}} as expected: > {noformat} > 2019-03-10T03:16:37.023Z [pool-4-thread-10] WARN > r.k.vostok.hercules.sink.SimpleSink - Commit failed due to rebalancing > org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be > completed since the group has already rebalanced and assigned the partitions > to another member. This means that the time between subsequent calls to > poll() was longer than the configured max.poll.interval.ms, which typically > implies that the poll loop is spending too much time message processing. You > can address this either by increasing the session timeout or by reducing the > maximum size of batches returned in poll() with max.poll.records. > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(ConsumerCoordinator.java:798) > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:681) > at > org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1334) > at > org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1298) > at ru.kontur.vostok.hercules.sink.Sink.commit(Sink.java:156) > at ru.kontur.vostok.hercules.sink.SimpleSink.run(SimpleSink.java:104) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {noformat} > After that, most of them successfully rejoined to the group with generation > 10699: > {noformat} > 2019-03-10T03:16:39.208Z [pool-4-thread-13] INFO > o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-13, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Successfully joined group > with generation 10699 > 2019-03-10T03:16:39.209Z [pool-4-thread-13] INFO > o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-13, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Setting newly assigned > partitions [legacy_logs_elk_c2-18] > ... > 2019-03-10T03:16:39.216Z [pool-4-thread-11] INFO > o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-11, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Successfully joined group > with generation 10699 > 2019-03-10T03:16:39.217Z [pool-4-thread-11] INFO > o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-11, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Setting newly assigned > partitions [legacy_logs_elk_c2-10, legacy_logs_elk_c2-11] > ... > 2019-03-10T03:16:39.218Z [pool-4-thread-15] INFO > o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-15, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Setting newly assigned > partitions [legacy_logs_elk_c2-24] > 2019-03-10T03:16:42.320Z [kafka-coordinator-heartbeat-thread | > hercules.sink.elastic.legacy_logs_elk_c2] INFO > o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-6, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Attempt to heartbeat failed > since group is rebalancing > 2019-03-10T03:16:42.320Z
[jira] [Commented] (KAFKA-8104) Consumer cannot rejoin to the group after rebalancing
[ https://issues.apache.org/jira/browse/KAFKA-8104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16942526#comment-16942526 ] Nikita Koryabkin commented on KAFKA-8104: - [~kgn] , [~nizhikov] *Version 1.1.0 is also affect*, thanks. [2019-10-01 17:40:33,995] INFO [GroupCoordinator 1001]: Member -2af431fd-60e4-4dd7-a4fd-8dd85d4a5620 in group main has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator) [2019-10-01 17:40:33,995] INFO [GroupCoordinator 1001]: Preparing to rebalance group main with old generation 15 (__consumer_offsets-1) (kafka.coordinator.group.GroupCoordinator) [2019-10-01 17:40:33,995] INFO [GroupCoordinator 1001]: Group main with generation 16 is now empty (__consumer_offsets-1) (kafka.coordinator.group.GroupCoordinator) > Consumer cannot rejoin to the group after rebalancing > - > > Key: KAFKA-8104 > URL: https://issues.apache.org/jira/browse/KAFKA-8104 > Project: Kafka > Issue Type: Bug > Components: consumer >Affects Versions: 2.0.0, 2.1.0, 2.2.0, 2.3.0 >Reporter: Gregory Koshelev >Assignee: Nikolay Izhikov >Priority: Critical > Attachments: consumer-rejoin-fail.log > > > TL;DR; {{KafkaConsumer}} cannot rejoin to the group due to inconsistent > {{AbstractCoordinator.generation}} (which is {{NO_GENERATION}} and > {{AbstractCoordinator.joinFuture}} (which is succeeded {{RequestFuture}}). > See explanation below. > There are 16 consumers in single process (threads from pool-4-thread-1 to > pool-4-thread-16). All of them belong to single consumer group > {{hercules.sink.elastic.legacy_logs_elk_c2}}. Rebalancing has been acquired > and consumers have got {{CommitFailedException}} as expected: > {noformat} > 2019-03-10T03:16:37.023Z [pool-4-thread-10] WARN > r.k.vostok.hercules.sink.SimpleSink - Commit failed due to rebalancing > org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be > completed since the group has already rebalanced and assigned the partitions > to another member. This means that the time between subsequent calls to > poll() was longer than the configured max.poll.interval.ms, which typically > implies that the poll loop is spending too much time message processing. You > can address this either by increasing the session timeout or by reducing the > maximum size of batches returned in poll() with max.poll.records. > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(ConsumerCoordinator.java:798) > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:681) > at > org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1334) > at > org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1298) > at ru.kontur.vostok.hercules.sink.Sink.commit(Sink.java:156) > at ru.kontur.vostok.hercules.sink.SimpleSink.run(SimpleSink.java:104) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {noformat} > After that, most of them successfully rejoined to the group with generation > 10699: > {noformat} > 2019-03-10T03:16:39.208Z [pool-4-thread-13] INFO > o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-13, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Successfully joined group > with generation 10699 > 2019-03-10T03:16:39.209Z [pool-4-thread-13] INFO > o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-13, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Setting newly assigned > partitions [legacy_logs_elk_c2-18] > ... > 2019-03-10T03:16:39.216Z [pool-4-thread-11] INFO > o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-11, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Successfully joined group > with generation 10699 > 2019-03-10T03:16:39.217Z [pool-4-thread-11] INFO > o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-11, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Setting newly assigned > partitions [legacy_logs_elk_c2-10, legacy_logs_elk_c2-11] > ... > 2019-03-10T03:16:39.218Z [pool-4-thread-15] INFO > o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-15, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Setting newly assigned > partitions [legacy_logs_elk_c2-24] > 2019-03-10T03:16:42.320Z [kafka-coordinator-heartbeat-thread | > hercules.sink.elastic.legacy_logs_elk_c2] INFO > o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-6, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Attempt to heartbeat failed
[jira] [Commented] (KAFKA-8104) Consumer cannot rejoin to the group after rebalancing
[ https://issues.apache.org/jira/browse/KAFKA-8104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16939971#comment-16939971 ] Vladimir Teplygin commented on KAFKA-8104: -- same issue. kafka-client 1.1.1-cp1 100 partitions. 5 consumers with the same consumer group with 20 threads each > Consumer cannot rejoin to the group after rebalancing > - > > Key: KAFKA-8104 > URL: https://issues.apache.org/jira/browse/KAFKA-8104 > Project: Kafka > Issue Type: Bug > Components: consumer >Affects Versions: 2.0.0, 2.1.0, 2.2.0, 2.3.0 >Reporter: Gregory Koshelev >Priority: Critical > Attachments: consumer-rejoin-fail.log > > > TL;DR; {{KafkaConsumer}} cannot rejoin to the group due to inconsistent > {{AbstractCoordinator.generation}} (which is {{NO_GENERATION}} and > {{AbstractCoordinator.joinFuture}} (which is succeeded {{RequestFuture}}). > See explanation below. > There are 16 consumers in single process (threads from pool-4-thread-1 to > pool-4-thread-16). All of them belong to single consumer group > {{hercules.sink.elastic.legacy_logs_elk_c2}}. Rebalancing has been acquired > and consumers have got {{CommitFailedException}} as expected: > {noformat} > 2019-03-10T03:16:37.023Z [pool-4-thread-10] WARN > r.k.vostok.hercules.sink.SimpleSink - Commit failed due to rebalancing > org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be > completed since the group has already rebalanced and assigned the partitions > to another member. This means that the time between subsequent calls to > poll() was longer than the configured max.poll.interval.ms, which typically > implies that the poll loop is spending too much time message processing. You > can address this either by increasing the session timeout or by reducing the > maximum size of batches returned in poll() with max.poll.records. > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(ConsumerCoordinator.java:798) > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:681) > at > org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1334) > at > org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1298) > at ru.kontur.vostok.hercules.sink.Sink.commit(Sink.java:156) > at ru.kontur.vostok.hercules.sink.SimpleSink.run(SimpleSink.java:104) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {noformat} > After that, most of them successfully rejoined to the group with generation > 10699: > {noformat} > 2019-03-10T03:16:39.208Z [pool-4-thread-13] INFO > o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-13, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Successfully joined group > with generation 10699 > 2019-03-10T03:16:39.209Z [pool-4-thread-13] INFO > o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-13, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Setting newly assigned > partitions [legacy_logs_elk_c2-18] > ... > 2019-03-10T03:16:39.216Z [pool-4-thread-11] INFO > o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-11, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Successfully joined group > with generation 10699 > 2019-03-10T03:16:39.217Z [pool-4-thread-11] INFO > o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-11, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Setting newly assigned > partitions [legacy_logs_elk_c2-10, legacy_logs_elk_c2-11] > ... > 2019-03-10T03:16:39.218Z [pool-4-thread-15] INFO > o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-15, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Setting newly assigned > partitions [legacy_logs_elk_c2-24] > 2019-03-10T03:16:42.320Z [kafka-coordinator-heartbeat-thread | > hercules.sink.elastic.legacy_logs_elk_c2] INFO > o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-6, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Attempt to heartbeat failed > since group is rebalancing > 2019-03-10T03:16:42.320Z [kafka-coordinator-heartbeat-thread | > hercules.sink.elastic.legacy_logs_elk_c2] INFO > o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-5, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Attempt to heartbeat failed > since group is rebalancing > 2019-03-10T03:16:42.323Z [kafka-coordinator-heartbeat-thread | > hercules.sink.elastic.legacy_logs_elk_c2] INFO > o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-7, >
[jira] [Commented] (KAFKA-8104) Consumer cannot rejoin to the group after rebalancing
[ https://issues.apache.org/jira/browse/KAFKA-8104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931990#comment-16931990 ] Charles Qian commented on KAFKA-8104: - same issue. kafka-client 2.2.0 it happens in large topology. like each topic has > 16 partitions. so that for each topic, client needs > 16 consumer threads. > Consumer cannot rejoin to the group after rebalancing > - > > Key: KAFKA-8104 > URL: https://issues.apache.org/jira/browse/KAFKA-8104 > Project: Kafka > Issue Type: Bug > Components: consumer >Affects Versions: 2.0.0, 2.1.0, 2.2.0, 2.3.0 >Reporter: Gregory Koshelev >Priority: Critical > Attachments: consumer-rejoin-fail.log > > > TL;DR; {{KafkaConsumer}} cannot rejoin to the group due to inconsistent > {{AbstractCoordinator.generation}} (which is {{NO_GENERATION}} and > {{AbstractCoordinator.joinFuture}} (which is succeeded {{RequestFuture}}). > See explanation below. > There are 16 consumers in single process (threads from pool-4-thread-1 to > pool-4-thread-16). All of them belong to single consumer group > {{hercules.sink.elastic.legacy_logs_elk_c2}}. Rebalancing has been acquired > and consumers have got {{CommitFailedException}} as expected: > {noformat} > 2019-03-10T03:16:37.023Z [pool-4-thread-10] WARN > r.k.vostok.hercules.sink.SimpleSink - Commit failed due to rebalancing > org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be > completed since the group has already rebalanced and assigned the partitions > to another member. This means that the time between subsequent calls to > poll() was longer than the configured max.poll.interval.ms, which typically > implies that the poll loop is spending too much time message processing. You > can address this either by increasing the session timeout or by reducing the > maximum size of batches returned in poll() with max.poll.records. > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(ConsumerCoordinator.java:798) > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:681) > at > org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1334) > at > org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1298) > at ru.kontur.vostok.hercules.sink.Sink.commit(Sink.java:156) > at ru.kontur.vostok.hercules.sink.SimpleSink.run(SimpleSink.java:104) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {noformat} > After that, most of them successfully rejoined to the group with generation > 10699: > {noformat} > 2019-03-10T03:16:39.208Z [pool-4-thread-13] INFO > o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-13, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Successfully joined group > with generation 10699 > 2019-03-10T03:16:39.209Z [pool-4-thread-13] INFO > o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-13, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Setting newly assigned > partitions [legacy_logs_elk_c2-18] > ... > 2019-03-10T03:16:39.216Z [pool-4-thread-11] INFO > o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-11, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Successfully joined group > with generation 10699 > 2019-03-10T03:16:39.217Z [pool-4-thread-11] INFO > o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-11, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Setting newly assigned > partitions [legacy_logs_elk_c2-10, legacy_logs_elk_c2-11] > ... > 2019-03-10T03:16:39.218Z [pool-4-thread-15] INFO > o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-15, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Setting newly assigned > partitions [legacy_logs_elk_c2-24] > 2019-03-10T03:16:42.320Z [kafka-coordinator-heartbeat-thread | > hercules.sink.elastic.legacy_logs_elk_c2] INFO > o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-6, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Attempt to heartbeat failed > since group is rebalancing > 2019-03-10T03:16:42.320Z [kafka-coordinator-heartbeat-thread | > hercules.sink.elastic.legacy_logs_elk_c2] INFO > o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-5, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Attempt to heartbeat failed > since group is rebalancing > 2019-03-10T03:16:42.323Z [kafka-coordinator-heartbeat-thread | > hercules.sink.elastic.legacy_logs_elk_c2] INFO > o.a.k.c.c.i.AbstractCoordinator - [Consumer
[jira] [Commented] (KAFKA-8104) Consumer cannot rejoin to the group after rebalancing
[ https://issues.apache.org/jira/browse/KAFKA-8104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16912116#comment-16912116 ] Adam Kotwasinski commented on KAFKA-8104: - FWIW, I'm seeing the same {code:java} java.lang.IllegalStateException: Coordinator selected invalid assignment protocol: null {code} while using kafka-clients:1.1.1 > Consumer cannot rejoin to the group after rebalancing > - > > Key: KAFKA-8104 > URL: https://issues.apache.org/jira/browse/KAFKA-8104 > Project: Kafka > Issue Type: Bug > Components: consumer >Affects Versions: 2.0.0, 2.1.0, 2.2.0, 2.3.0 >Reporter: Gregory Koshelev >Priority: Critical > Attachments: consumer-rejoin-fail.log > > > TL;DR; {{KafkaConsumer}} cannot rejoin to the group due to inconsistent > {{AbstractCoordinator.generation}} (which is {{NO_GENERATION}} and > {{AbstractCoordinator.joinFuture}} (which is succeeded {{RequestFuture}}). > See explanation below. > There are 16 consumers in single process (threads from pool-4-thread-1 to > pool-4-thread-16). All of them belong to single consumer group > {{hercules.sink.elastic.legacy_logs_elk_c2}}. Rebalancing has been acquired > and consumers have got {{CommitFailedException}} as expected: > {noformat} > 2019-03-10T03:16:37.023Z [pool-4-thread-10] WARN > r.k.vostok.hercules.sink.SimpleSink - Commit failed due to rebalancing > org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be > completed since the group has already rebalanced and assigned the partitions > to another member. This means that the time between subsequent calls to > poll() was longer than the configured max.poll.interval.ms, which typically > implies that the poll loop is spending too much time message processing. You > can address this either by increasing the session timeout or by reducing the > maximum size of batches returned in poll() with max.poll.records. > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(ConsumerCoordinator.java:798) > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:681) > at > org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1334) > at > org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1298) > at ru.kontur.vostok.hercules.sink.Sink.commit(Sink.java:156) > at ru.kontur.vostok.hercules.sink.SimpleSink.run(SimpleSink.java:104) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {noformat} > After that, most of them successfully rejoined to the group with generation > 10699: > {noformat} > 2019-03-10T03:16:39.208Z [pool-4-thread-13] INFO > o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-13, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Successfully joined group > with generation 10699 > 2019-03-10T03:16:39.209Z [pool-4-thread-13] INFO > o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-13, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Setting newly assigned > partitions [legacy_logs_elk_c2-18] > ... > 2019-03-10T03:16:39.216Z [pool-4-thread-11] INFO > o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-11, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Successfully joined group > with generation 10699 > 2019-03-10T03:16:39.217Z [pool-4-thread-11] INFO > o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-11, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Setting newly assigned > partitions [legacy_logs_elk_c2-10, legacy_logs_elk_c2-11] > ... > 2019-03-10T03:16:39.218Z [pool-4-thread-15] INFO > o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-15, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Setting newly assigned > partitions [legacy_logs_elk_c2-24] > 2019-03-10T03:16:42.320Z [kafka-coordinator-heartbeat-thread | > hercules.sink.elastic.legacy_logs_elk_c2] INFO > o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-6, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Attempt to heartbeat failed > since group is rebalancing > 2019-03-10T03:16:42.320Z [kafka-coordinator-heartbeat-thread | > hercules.sink.elastic.legacy_logs_elk_c2] INFO > o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-5, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Attempt to heartbeat failed > since group is rebalancing > 2019-03-10T03:16:42.323Z [kafka-coordinator-heartbeat-thread | > hercules.sink.elastic.legacy_logs_elk_c2] INFO > o.a.k.c.c.i.AbstractCoordinator -
[jira] [Commented] (KAFKA-8104) Consumer cannot rejoin to the group after rebalancing
[ https://issues.apache.org/jira/browse/KAFKA-8104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16911182#comment-16911182 ] Gregory Koshelev commented on KAFKA-8104: - [~guozhang], I found server.log (there are 3 brokers, but only one of them has relevant log entries in log): {code:text} [2019-03-10 02:46:43,627] INFO [GroupCoordinator 2]: Stabilized group hercules.sink.elastic.legacy_logs_elk_c2 generation 10693 (__consumer_offsets-25) (kafka.coordinator.group.GroupCoordinator) [2019-03-10 02:46:43,630] INFO [GroupCoordinator 2]: Assignment received from leader for group hercules.sink.elastic.legacy_logs_elk_c2 for generation 10693 (kafka.coordinator.group.GroupCoordinator) [2019-03-10 02:46:45,397] INFO [GroupCoordinator 2]: Preparing to rebalance group hercules.sink.elastic.legacy_logs_elk_c2 with old generation 10693 (__consumer_offsets-25) (kafka.coordinator.group.GroupCoordinator) [2019-03-10 02:46:53,628] INFO [GroupCoordinator 2]: Member consumer-7-cc600ccb-72b6-4582-b288-3ea966f33c7c in group hercules.sink.elastic.legacy_logs_elk_c2 has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator) [2019-03-10 02:46:53,628] INFO [GroupCoordinator 2]: Member consumer-1-f284c5c8-22fc-40d5-aa30-46c2875463bd in group hercules.sink.elastic.legacy_logs_elk_c2 has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator) [2019-03-10 02:46:53,628] INFO [GroupCoordinator 2]: Member consumer-5-e608adfc-e9ad-427e-99b5-99ff1538611d in group hercules.sink.elastic.legacy_logs_elk_c2 has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator) [2019-03-10 02:46:53,628] INFO [GroupCoordinator 2]: Member consumer-6-daa568cc-a5ee-49a9-aa7c-bd34fbb1da52 in group hercules.sink.elastic.legacy_logs_elk_c2 has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator) [2019-03-10 02:46:53,628] INFO [GroupCoordinator 2]: Member consumer-2-da70d65b-53e6-4467-b06d-63f02ff89eec in group hercules.sink.elastic.legacy_logs_elk_c2 has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator) [2019-03-10 02:46:53,628] INFO [GroupCoordinator 2]: Member consumer-12-a0cb16ff-46eb-4945-a87a-244a20261af9 in group hercules.sink.elastic.legacy_logs_elk_c2 has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator) [2019-03-10 02:46:53,628] INFO [GroupCoordinator 2]: Member consumer-10-c10cb15d-edc3-4c6a-b21a-68e2b8298284 in group hercules.sink.elastic.legacy_logs_elk_c2 has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator) [2019-03-10 02:46:53,628] INFO [GroupCoordinator 2]: Member consumer-16-d80c8a16-be83-48f3-ac9d-08431b87c198 in group hercules.sink.elastic.legacy_logs_elk_c2 has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator) [2019-03-10 02:46:53,628] INFO [GroupCoordinator 2]: Member consumer-5-75899722-91a9-439a-a0c5-f23eb7c6c93f in group hercules.sink.elastic.legacy_logs_elk_c2 has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator) [2019-03-10 02:46:53,628] INFO [GroupCoordinator 2]: Member consumer-11-14fe64e9-1553-49e1-abf9-5ea623847c84 in group hercules.sink.elastic.legacy_logs_elk_c2 has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator) [2019-03-10 02:46:53,628] INFO [GroupCoordinator 2]: Member consumer-3-9b62bde0-ff71-46f7-b6a4-7be14e6055b5 in group hercules.sink.elastic.legacy_logs_elk_c2 has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator) [2019-03-10 02:46:53,629] INFO [GroupCoordinator 2]: Member consumer-14-42336dad-be7c-49da-a76b-72d13a856943 in group hercules.sink.elastic.legacy_logs_elk_c2 has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator) [2019-03-10 02:46:53,629] INFO [GroupCoordinator 2]: Member consumer-6-1d57e91e-a6cd-4917-9560-19fa6579d02e in group hercules.sink.elastic.legacy_logs_elk_c2 has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator) [2019-03-10 02:46:53,629] INFO [GroupCoordinator 2]: Member consumer-13-a310873e-5c66-402c-a031-0b2fce8eb4cc in group hercules.sink.elastic.legacy_logs_elk_c2 has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator) [2019-03-10 02:46:53,629] INFO [GroupCoordinator 2]: Member consumer-1-7c591ed8-01d8-4f2c-b756-a5c3d3f63eea in group hercules.sink.elastic.legacy_logs_elk_c2 has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator) [2019-03-10 02:46:53,629] INFO [GroupCoordinator 2]: Member consumer-7-70c381bc-5d32-4e2c-923b-2c15c0f15aa2 in group hercules.sink.elastic.legacy_logs_elk_c2 has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator) [2019-03-10 02:46:53,629] INFO [GroupCoordinator 2]: Member consumer-15-5de2a785-0cd8-43f8-b3e2-6b1da2983910 in group
[jira] [Commented] (KAFKA-8104) Consumer cannot rejoin to the group after rebalancing
[ https://issues.apache.org/jira/browse/KAFKA-8104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16895482#comment-16895482 ] Guozhang Wang commented on KAFKA-8104: -- [~mbarbon] [~kgn] do you have any broker-side logs around the time this happens that can be shared on this ticket as well? > Consumer cannot rejoin to the group after rebalancing > - > > Key: KAFKA-8104 > URL: https://issues.apache.org/jira/browse/KAFKA-8104 > Project: Kafka > Issue Type: Bug > Components: consumer >Affects Versions: 2.0.0 >Reporter: Gregory Koshelev >Priority: Critical > Attachments: consumer-rejoin-fail.log > > > TL;DR; {{KafkaConsumer}} cannot rejoin to the group due to inconsistent > {{AbstractCoordinator.generation}} (which is {{NO_GENERATION}} and > {{AbstractCoordinator.joinFuture}} (which is succeeded {{RequestFuture}}). > See explanation below. > There are 16 consumers in single process (threads from pool-4-thread-1 to > pool-4-thread-16). All of them belong to single consumer group > {{hercules.sink.elastic.legacy_logs_elk_c2}}. Rebalancing has been acquired > and consumers have got {{CommitFailedException}} as expected: > {noformat} > 2019-03-10T03:16:37.023Z [pool-4-thread-10] WARN > r.k.vostok.hercules.sink.SimpleSink - Commit failed due to rebalancing > org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be > completed since the group has already rebalanced and assigned the partitions > to another member. This means that the time between subsequent calls to > poll() was longer than the configured max.poll.interval.ms, which typically > implies that the poll loop is spending too much time message processing. You > can address this either by increasing the session timeout or by reducing the > maximum size of batches returned in poll() with max.poll.records. > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(ConsumerCoordinator.java:798) > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:681) > at > org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1334) > at > org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1298) > at ru.kontur.vostok.hercules.sink.Sink.commit(Sink.java:156) > at ru.kontur.vostok.hercules.sink.SimpleSink.run(SimpleSink.java:104) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {noformat} > After that, most of them successfully rejoined to the group with generation > 10699: > {noformat} > 2019-03-10T03:16:39.208Z [pool-4-thread-13] INFO > o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-13, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Successfully joined group > with generation 10699 > 2019-03-10T03:16:39.209Z [pool-4-thread-13] INFO > o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-13, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Setting newly assigned > partitions [legacy_logs_elk_c2-18] > ... > 2019-03-10T03:16:39.216Z [pool-4-thread-11] INFO > o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-11, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Successfully joined group > with generation 10699 > 2019-03-10T03:16:39.217Z [pool-4-thread-11] INFO > o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-11, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Setting newly assigned > partitions [legacy_logs_elk_c2-10, legacy_logs_elk_c2-11] > ... > 2019-03-10T03:16:39.218Z [pool-4-thread-15] INFO > o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-15, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Setting newly assigned > partitions [legacy_logs_elk_c2-24] > 2019-03-10T03:16:42.320Z [kafka-coordinator-heartbeat-thread | > hercules.sink.elastic.legacy_logs_elk_c2] INFO > o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-6, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Attempt to heartbeat failed > since group is rebalancing > 2019-03-10T03:16:42.320Z [kafka-coordinator-heartbeat-thread | > hercules.sink.elastic.legacy_logs_elk_c2] INFO > o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-5, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Attempt to heartbeat failed > since group is rebalancing > 2019-03-10T03:16:42.323Z [kafka-coordinator-heartbeat-thread | > hercules.sink.elastic.legacy_logs_elk_c2] INFO > o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-7, >
[jira] [Commented] (KAFKA-8104) Consumer cannot rejoin to the group after rebalancing
[ https://issues.apache.org/jira/browse/KAFKA-8104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16891092#comment-16891092 ] Mattia Barbon commented on KAFKA-8104: -- It also happens with version 2.3 of the client > Consumer cannot rejoin to the group after rebalancing > - > > Key: KAFKA-8104 > URL: https://issues.apache.org/jira/browse/KAFKA-8104 > Project: Kafka > Issue Type: Bug > Components: consumer >Affects Versions: 2.0.0 >Reporter: Gregory Koshelev >Priority: Critical > Attachments: consumer-rejoin-fail.log > > > TL;DR; {{KafkaConsumer}} cannot rejoin to the group due to inconsistent > {{AbstractCoordinator.generation}} (which is {{NO_GENERATION}} and > {{AbstractCoordinator.joinFuture}} (which is succeeded {{RequestFuture}}). > See explanation below. > There are 16 consumers in single process (threads from pool-4-thread-1 to > pool-4-thread-16). All of them belong to single consumer group > {{hercules.sink.elastic.legacy_logs_elk_c2}}. Rebalancing has been acquired > and consumers have got {{CommitFailedException}} as expected: > {noformat} > 2019-03-10T03:16:37.023Z [pool-4-thread-10] WARN > r.k.vostok.hercules.sink.SimpleSink - Commit failed due to rebalancing > org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be > completed since the group has already rebalanced and assigned the partitions > to another member. This means that the time between subsequent calls to > poll() was longer than the configured max.poll.interval.ms, which typically > implies that the poll loop is spending too much time message processing. You > can address this either by increasing the session timeout or by reducing the > maximum size of batches returned in poll() with max.poll.records. > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(ConsumerCoordinator.java:798) > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:681) > at > org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1334) > at > org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1298) > at ru.kontur.vostok.hercules.sink.Sink.commit(Sink.java:156) > at ru.kontur.vostok.hercules.sink.SimpleSink.run(SimpleSink.java:104) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {noformat} > After that, most of them successfully rejoined to the group with generation > 10699: > {noformat} > 2019-03-10T03:16:39.208Z [pool-4-thread-13] INFO > o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-13, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Successfully joined group > with generation 10699 > 2019-03-10T03:16:39.209Z [pool-4-thread-13] INFO > o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-13, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Setting newly assigned > partitions [legacy_logs_elk_c2-18] > ... > 2019-03-10T03:16:39.216Z [pool-4-thread-11] INFO > o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-11, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Successfully joined group > with generation 10699 > 2019-03-10T03:16:39.217Z [pool-4-thread-11] INFO > o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-11, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Setting newly assigned > partitions [legacy_logs_elk_c2-10, legacy_logs_elk_c2-11] > ... > 2019-03-10T03:16:39.218Z [pool-4-thread-15] INFO > o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-15, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Setting newly assigned > partitions [legacy_logs_elk_c2-24] > 2019-03-10T03:16:42.320Z [kafka-coordinator-heartbeat-thread | > hercules.sink.elastic.legacy_logs_elk_c2] INFO > o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-6, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Attempt to heartbeat failed > since group is rebalancing > 2019-03-10T03:16:42.320Z [kafka-coordinator-heartbeat-thread | > hercules.sink.elastic.legacy_logs_elk_c2] INFO > o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-5, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Attempt to heartbeat failed > since group is rebalancing > 2019-03-10T03:16:42.323Z [kafka-coordinator-heartbeat-thread | > hercules.sink.elastic.legacy_logs_elk_c2] INFO > o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-7, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Attempt to heartbeat failed > since group is rebalancing >