[jira] [Commented] (KAFKA-8104) Consumer cannot rejoin to the group after rebalancing

2019-10-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16953423#comment-16953423
 ] 

ASF GitHub Bot commented on KAFKA-8104:
---

guozhangwang commented on pull request #7460: KAFKA-8104: Consumer cannot 
rejoin to the group after rebalancing
URL: https://github.com/apache/kafka/pull/7460
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Consumer cannot rejoin to the group after rebalancing
> -
>
> Key: KAFKA-8104
> URL: https://issues.apache.org/jira/browse/KAFKA-8104
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 2.0.0, 2.1.0, 2.2.0, 2.3.0
>Reporter: Gregory Koshelev
>Assignee: Nikolay Izhikov
>Priority: Critical
> Attachments: consumer-rejoin-fail.log
>
>
> TL;DR; {{KafkaConsumer}} cannot rejoin to the group due to inconsistent 
> {{AbstractCoordinator.generation}} (which is {{NO_GENERATION}} and 
> {{AbstractCoordinator.joinFuture}} (which is succeeded {{RequestFuture}}). 
> See explanation below.
> There are 16 consumers in single process (threads from pool-4-thread-1 to 
> pool-4-thread-16). All of them belong to single consumer group 
> {{hercules.sink.elastic.legacy_logs_elk_c2}}. Rebalancing has been acquired 
> and consumers have got {{CommitFailedException}} as expected:
> {noformat}
> 2019-03-10T03:16:37.023Z [pool-4-thread-10] WARN  
> r.k.vostok.hercules.sink.SimpleSink - Commit failed due to rebalancing
> org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be 
> completed since the group has already rebalanced and assigned the partitions 
> to another member. This means that the time between subsequent calls to 
> poll() was longer than the configured max.poll.interval.ms, which typically 
> implies that the poll loop is spending too much time message processing. You 
> can address this either by increasing the session timeout or by reducing the 
> maximum size of batches returned in poll() with max.poll.records.
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(ConsumerCoordinator.java:798)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:681)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1334)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1298)
>   at ru.kontur.vostok.hercules.sink.Sink.commit(Sink.java:156)
>   at ru.kontur.vostok.hercules.sink.SimpleSink.run(SimpleSink.java:104)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}
> After that, most of them successfully rejoined to the group with generation 
> 10699:
> {noformat}
> 2019-03-10T03:16:39.208Z [pool-4-thread-13] INFO  
> o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-13, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Successfully joined group 
> with generation 10699
> 2019-03-10T03:16:39.209Z [pool-4-thread-13] INFO  
> o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-13, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Setting newly assigned 
> partitions [legacy_logs_elk_c2-18]
> ...
> 2019-03-10T03:16:39.216Z [pool-4-thread-11] INFO  
> o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-11, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Successfully joined group 
> with generation 10699
> 2019-03-10T03:16:39.217Z [pool-4-thread-11] INFO  
> o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-11, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Setting newly assigned 
> partitions [legacy_logs_elk_c2-10, legacy_logs_elk_c2-11]
> ...
> 2019-03-10T03:16:39.218Z [pool-4-thread-15] INFO  
> o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-15, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Setting newly assigned 
> partitions [legacy_logs_elk_c2-24]
> 2019-03-10T03:16:42.320Z [kafka-coordinator-heartbeat-thread | 
> hercules.sink.elastic.legacy_logs_elk_c2] INFO  
> o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-6, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Attempt to heartbeat failed 
> since group is rebalancing
> 2019-03-10T03:16:42.320Z [kafka-coordinator-heartbeat-thread | 
> 

[jira] [Commented] (KAFKA-8104) Consumer cannot rejoin to the group after rebalancing

2019-10-07 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16946208#comment-16946208
 ] 

Guozhang Wang commented on KAFKA-8104:
--

[~nizhikov] Thanks for picking this up, I think this issue is the same as 
reported in https://issues.apache.org/jira/browse/KAFKA-8891 and 
https://issues.apache.org/jira/browse/KAFKA-7263 as well, which is a 
long-lurking bug. So I'm linking them together now.

I will take a look into your PR as well.

> Consumer cannot rejoin to the group after rebalancing
> -
>
> Key: KAFKA-8104
> URL: https://issues.apache.org/jira/browse/KAFKA-8104
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 2.0.0, 2.1.0, 2.2.0, 2.3.0
>Reporter: Gregory Koshelev
>Assignee: Nikolay Izhikov
>Priority: Critical
> Attachments: consumer-rejoin-fail.log
>
>
> TL;DR; {{KafkaConsumer}} cannot rejoin to the group due to inconsistent 
> {{AbstractCoordinator.generation}} (which is {{NO_GENERATION}} and 
> {{AbstractCoordinator.joinFuture}} (which is succeeded {{RequestFuture}}). 
> See explanation below.
> There are 16 consumers in single process (threads from pool-4-thread-1 to 
> pool-4-thread-16). All of them belong to single consumer group 
> {{hercules.sink.elastic.legacy_logs_elk_c2}}. Rebalancing has been acquired 
> and consumers have got {{CommitFailedException}} as expected:
> {noformat}
> 2019-03-10T03:16:37.023Z [pool-4-thread-10] WARN  
> r.k.vostok.hercules.sink.SimpleSink - Commit failed due to rebalancing
> org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be 
> completed since the group has already rebalanced and assigned the partitions 
> to another member. This means that the time between subsequent calls to 
> poll() was longer than the configured max.poll.interval.ms, which typically 
> implies that the poll loop is spending too much time message processing. You 
> can address this either by increasing the session timeout or by reducing the 
> maximum size of batches returned in poll() with max.poll.records.
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(ConsumerCoordinator.java:798)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:681)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1334)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1298)
>   at ru.kontur.vostok.hercules.sink.Sink.commit(Sink.java:156)
>   at ru.kontur.vostok.hercules.sink.SimpleSink.run(SimpleSink.java:104)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}
> After that, most of them successfully rejoined to the group with generation 
> 10699:
> {noformat}
> 2019-03-10T03:16:39.208Z [pool-4-thread-13] INFO  
> o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-13, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Successfully joined group 
> with generation 10699
> 2019-03-10T03:16:39.209Z [pool-4-thread-13] INFO  
> o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-13, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Setting newly assigned 
> partitions [legacy_logs_elk_c2-18]
> ...
> 2019-03-10T03:16:39.216Z [pool-4-thread-11] INFO  
> o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-11, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Successfully joined group 
> with generation 10699
> 2019-03-10T03:16:39.217Z [pool-4-thread-11] INFO  
> o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-11, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Setting newly assigned 
> partitions [legacy_logs_elk_c2-10, legacy_logs_elk_c2-11]
> ...
> 2019-03-10T03:16:39.218Z [pool-4-thread-15] INFO  
> o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-15, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Setting newly assigned 
> partitions [legacy_logs_elk_c2-24]
> 2019-03-10T03:16:42.320Z [kafka-coordinator-heartbeat-thread | 
> hercules.sink.elastic.legacy_logs_elk_c2] INFO  
> o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-6, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Attempt to heartbeat failed 
> since group is rebalancing
> 2019-03-10T03:16:42.320Z [kafka-coordinator-heartbeat-thread | 
> hercules.sink.elastic.legacy_logs_elk_c2] INFO  
> o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-5, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Attempt to heartbeat failed 
> since group is 

[jira] [Commented] (KAFKA-8104) Consumer cannot rejoin to the group after rebalancing

2019-10-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16946114#comment-16946114
 ] 

ASF GitHub Bot commented on KAFKA-8104:
---

nizhikov commented on pull request #7460: KAFKA-8104: Consumer cannot rejoin to 
the group after rebalancing
URL: https://github.com/apache/kafka/pull/7460
 
 
   This PR contains the fix of race condition bug between "consumer thread" and 
"consumer coordinator heartbeat thread". It reproduces in many production 
environments.
   
   Condition for reproducing:
   
   1. Consumer thread initiates rejoin to the group because of commit timeout. 
Call of `AbstractCoordinator#joinGroupIfNeeded` which leads to 
`sendJoinGroupRequest`.
   2. `JoinGroupResponseHandler` writes to the 
`AbstractCoordinator.this.generation` new generation data and leaves the` 
synchronized` section.
   3. Heartbeat thread executes `mabeLeaveGroup` and clears generation data via 
`resetGenerationOnLeaveGroup`.
   4. Consumer thread executes `onJoinComplete(generation.generationId, 
generation.memberId, generation.protocol, memberAssignment);` with the cleared 
generation data. This leads to the corresponding exception.
   
   The race fixed with the condition in `maybeLeaveGroup`: if we have ongoing 
rejoin process in consumer thread there is no reason to reset generation data 
and send `LeaveGroupRequest` in heartbeat thread.
   
   This PR contains unfair "reproducer".
   It implemented with the `CountDownLatch` that imitates described race in 
`AbstractCoordinator` code.
   
   I need assistance on how should be fair reproducer implemented.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Consumer cannot rejoin to the group after rebalancing
> -
>
> Key: KAFKA-8104
> URL: https://issues.apache.org/jira/browse/KAFKA-8104
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 2.0.0, 2.1.0, 2.2.0, 2.3.0
>Reporter: Gregory Koshelev
>Assignee: Nikolay Izhikov
>Priority: Critical
> Attachments: consumer-rejoin-fail.log
>
>
> TL;DR; {{KafkaConsumer}} cannot rejoin to the group due to inconsistent 
> {{AbstractCoordinator.generation}} (which is {{NO_GENERATION}} and 
> {{AbstractCoordinator.joinFuture}} (which is succeeded {{RequestFuture}}). 
> See explanation below.
> There are 16 consumers in single process (threads from pool-4-thread-1 to 
> pool-4-thread-16). All of them belong to single consumer group 
> {{hercules.sink.elastic.legacy_logs_elk_c2}}. Rebalancing has been acquired 
> and consumers have got {{CommitFailedException}} as expected:
> {noformat}
> 2019-03-10T03:16:37.023Z [pool-4-thread-10] WARN  
> r.k.vostok.hercules.sink.SimpleSink - Commit failed due to rebalancing
> org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be 
> completed since the group has already rebalanced and assigned the partitions 
> to another member. This means that the time between subsequent calls to 
> poll() was longer than the configured max.poll.interval.ms, which typically 
> implies that the poll loop is spending too much time message processing. You 
> can address this either by increasing the session timeout or by reducing the 
> maximum size of batches returned in poll() with max.poll.records.
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(ConsumerCoordinator.java:798)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:681)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1334)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1298)
>   at ru.kontur.vostok.hercules.sink.Sink.commit(Sink.java:156)
>   at ru.kontur.vostok.hercules.sink.SimpleSink.run(SimpleSink.java:104)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}
> After that, most of them successfully rejoined to the group with generation 
> 10699:
> {noformat}
> 2019-03-10T03:16:39.208Z [pool-4-thread-13] INFO  
> o.a.k.c.c.i.AbstractCoordinator - 

[jira] [Commented] (KAFKA-8104) Consumer cannot rejoin to the group after rebalancing

2019-10-07 Thread Nikolay Izhikov (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16946061#comment-16946061
 ] 

Nikolay Izhikov commented on KAFKA-8104:


Reproducer simplified and updated.

> Consumer cannot rejoin to the group after rebalancing
> -
>
> Key: KAFKA-8104
> URL: https://issues.apache.org/jira/browse/KAFKA-8104
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 2.0.0, 2.1.0, 2.2.0, 2.3.0
>Reporter: Gregory Koshelev
>Assignee: Nikolay Izhikov
>Priority: Critical
> Attachments: consumer-rejoin-fail.log
>
>
> TL;DR; {{KafkaConsumer}} cannot rejoin to the group due to inconsistent 
> {{AbstractCoordinator.generation}} (which is {{NO_GENERATION}} and 
> {{AbstractCoordinator.joinFuture}} (which is succeeded {{RequestFuture}}). 
> See explanation below.
> There are 16 consumers in single process (threads from pool-4-thread-1 to 
> pool-4-thread-16). All of them belong to single consumer group 
> {{hercules.sink.elastic.legacy_logs_elk_c2}}. Rebalancing has been acquired 
> and consumers have got {{CommitFailedException}} as expected:
> {noformat}
> 2019-03-10T03:16:37.023Z [pool-4-thread-10] WARN  
> r.k.vostok.hercules.sink.SimpleSink - Commit failed due to rebalancing
> org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be 
> completed since the group has already rebalanced and assigned the partitions 
> to another member. This means that the time between subsequent calls to 
> poll() was longer than the configured max.poll.interval.ms, which typically 
> implies that the poll loop is spending too much time message processing. You 
> can address this either by increasing the session timeout or by reducing the 
> maximum size of batches returned in poll() with max.poll.records.
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(ConsumerCoordinator.java:798)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:681)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1334)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1298)
>   at ru.kontur.vostok.hercules.sink.Sink.commit(Sink.java:156)
>   at ru.kontur.vostok.hercules.sink.SimpleSink.run(SimpleSink.java:104)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}
> After that, most of them successfully rejoined to the group with generation 
> 10699:
> {noformat}
> 2019-03-10T03:16:39.208Z [pool-4-thread-13] INFO  
> o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-13, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Successfully joined group 
> with generation 10699
> 2019-03-10T03:16:39.209Z [pool-4-thread-13] INFO  
> o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-13, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Setting newly assigned 
> partitions [legacy_logs_elk_c2-18]
> ...
> 2019-03-10T03:16:39.216Z [pool-4-thread-11] INFO  
> o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-11, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Successfully joined group 
> with generation 10699
> 2019-03-10T03:16:39.217Z [pool-4-thread-11] INFO  
> o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-11, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Setting newly assigned 
> partitions [legacy_logs_elk_c2-10, legacy_logs_elk_c2-11]
> ...
> 2019-03-10T03:16:39.218Z [pool-4-thread-15] INFO  
> o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-15, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Setting newly assigned 
> partitions [legacy_logs_elk_c2-24]
> 2019-03-10T03:16:42.320Z [kafka-coordinator-heartbeat-thread | 
> hercules.sink.elastic.legacy_logs_elk_c2] INFO  
> o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-6, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Attempt to heartbeat failed 
> since group is rebalancing
> 2019-03-10T03:16:42.320Z [kafka-coordinator-heartbeat-thread | 
> hercules.sink.elastic.legacy_logs_elk_c2] INFO  
> o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-5, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Attempt to heartbeat failed 
> since group is rebalancing
> 2019-03-10T03:16:42.323Z [kafka-coordinator-heartbeat-thread | 
> hercules.sink.elastic.legacy_logs_elk_c2] INFO  
> o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-7, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Attempt to 

[jira] [Commented] (KAFKA-8104) Consumer cannot rejoin to the group after rebalancing

2019-10-04 Thread Nikolay Izhikov (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16944730#comment-16944730
 ] 

Nikolay Izhikov commented on KAFKA-8104:


Hello.

I wrote a reproducer [1] for this issue.
It generates the output [2] exact to the issue description.
It requires to emulate specific consumer threads execution path.
It emulated via CyclicBarrier.

Seems, we have a synchronization issue.

[~guozhang] can you take a look?
Is reproducer correct?
Can you, please, assist me on how it should be fixed?

[1] 
https://github.com/nizhikov/kafka/pull/1/files#diff-d5e6de9941a0199b89a70d312962d546
[2] 
https://gist.github.com/nizhikov/8c4d5e0c78fd634a97557fdc0ece0097#file-kafka-8104-reproducer-output-L1578

> Consumer cannot rejoin to the group after rebalancing
> -
>
> Key: KAFKA-8104
> URL: https://issues.apache.org/jira/browse/KAFKA-8104
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 2.0.0, 2.1.0, 2.2.0, 2.3.0
>Reporter: Gregory Koshelev
>Assignee: Nikolay Izhikov
>Priority: Critical
> Attachments: consumer-rejoin-fail.log
>
>
> TL;DR; {{KafkaConsumer}} cannot rejoin to the group due to inconsistent 
> {{AbstractCoordinator.generation}} (which is {{NO_GENERATION}} and 
> {{AbstractCoordinator.joinFuture}} (which is succeeded {{RequestFuture}}). 
> See explanation below.
> There are 16 consumers in single process (threads from pool-4-thread-1 to 
> pool-4-thread-16). All of them belong to single consumer group 
> {{hercules.sink.elastic.legacy_logs_elk_c2}}. Rebalancing has been acquired 
> and consumers have got {{CommitFailedException}} as expected:
> {noformat}
> 2019-03-10T03:16:37.023Z [pool-4-thread-10] WARN  
> r.k.vostok.hercules.sink.SimpleSink - Commit failed due to rebalancing
> org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be 
> completed since the group has already rebalanced and assigned the partitions 
> to another member. This means that the time between subsequent calls to 
> poll() was longer than the configured max.poll.interval.ms, which typically 
> implies that the poll loop is spending too much time message processing. You 
> can address this either by increasing the session timeout or by reducing the 
> maximum size of batches returned in poll() with max.poll.records.
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(ConsumerCoordinator.java:798)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:681)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1334)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1298)
>   at ru.kontur.vostok.hercules.sink.Sink.commit(Sink.java:156)
>   at ru.kontur.vostok.hercules.sink.SimpleSink.run(SimpleSink.java:104)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}
> After that, most of them successfully rejoined to the group with generation 
> 10699:
> {noformat}
> 2019-03-10T03:16:39.208Z [pool-4-thread-13] INFO  
> o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-13, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Successfully joined group 
> with generation 10699
> 2019-03-10T03:16:39.209Z [pool-4-thread-13] INFO  
> o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-13, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Setting newly assigned 
> partitions [legacy_logs_elk_c2-18]
> ...
> 2019-03-10T03:16:39.216Z [pool-4-thread-11] INFO  
> o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-11, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Successfully joined group 
> with generation 10699
> 2019-03-10T03:16:39.217Z [pool-4-thread-11] INFO  
> o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-11, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Setting newly assigned 
> partitions [legacy_logs_elk_c2-10, legacy_logs_elk_c2-11]
> ...
> 2019-03-10T03:16:39.218Z [pool-4-thread-15] INFO  
> o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-15, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Setting newly assigned 
> partitions [legacy_logs_elk_c2-24]
> 2019-03-10T03:16:42.320Z [kafka-coordinator-heartbeat-thread | 
> hercules.sink.elastic.legacy_logs_elk_c2] INFO  
> o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-6, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Attempt to heartbeat failed 
> since group is rebalancing
> 2019-03-10T03:16:42.320Z 

[jira] [Commented] (KAFKA-8104) Consumer cannot rejoin to the group after rebalancing

2019-10-02 Thread Nikita Koryabkin (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16942526#comment-16942526
 ] 

Nikita Koryabkin commented on KAFKA-8104:
-

[~kgn] , [~nizhikov]

*Version 1.1.0 is also affect*, thanks.

 

[2019-10-01 17:40:33,995] INFO [GroupCoordinator 1001]: Member 
-2af431fd-60e4-4dd7-a4fd-8dd85d4a5620 in group main has failed, 
removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-10-01 17:40:33,995] INFO [GroupCoordinator 1001]: Preparing to rebalance 
group main with old generation 15 (__consumer_offsets-1) 
(kafka.coordinator.group.GroupCoordinator)
[2019-10-01 17:40:33,995] INFO [GroupCoordinator 1001]: Group main with 
generation 16 is now empty (__consumer_offsets-1) 
(kafka.coordinator.group.GroupCoordinator)

> Consumer cannot rejoin to the group after rebalancing
> -
>
> Key: KAFKA-8104
> URL: https://issues.apache.org/jira/browse/KAFKA-8104
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 2.0.0, 2.1.0, 2.2.0, 2.3.0
>Reporter: Gregory Koshelev
>Assignee: Nikolay Izhikov
>Priority: Critical
> Attachments: consumer-rejoin-fail.log
>
>
> TL;DR; {{KafkaConsumer}} cannot rejoin to the group due to inconsistent 
> {{AbstractCoordinator.generation}} (which is {{NO_GENERATION}} and 
> {{AbstractCoordinator.joinFuture}} (which is succeeded {{RequestFuture}}). 
> See explanation below.
> There are 16 consumers in single process (threads from pool-4-thread-1 to 
> pool-4-thread-16). All of them belong to single consumer group 
> {{hercules.sink.elastic.legacy_logs_elk_c2}}. Rebalancing has been acquired 
> and consumers have got {{CommitFailedException}} as expected:
> {noformat}
> 2019-03-10T03:16:37.023Z [pool-4-thread-10] WARN  
> r.k.vostok.hercules.sink.SimpleSink - Commit failed due to rebalancing
> org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be 
> completed since the group has already rebalanced and assigned the partitions 
> to another member. This means that the time between subsequent calls to 
> poll() was longer than the configured max.poll.interval.ms, which typically 
> implies that the poll loop is spending too much time message processing. You 
> can address this either by increasing the session timeout or by reducing the 
> maximum size of batches returned in poll() with max.poll.records.
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(ConsumerCoordinator.java:798)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:681)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1334)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1298)
>   at ru.kontur.vostok.hercules.sink.Sink.commit(Sink.java:156)
>   at ru.kontur.vostok.hercules.sink.SimpleSink.run(SimpleSink.java:104)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}
> After that, most of them successfully rejoined to the group with generation 
> 10699:
> {noformat}
> 2019-03-10T03:16:39.208Z [pool-4-thread-13] INFO  
> o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-13, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Successfully joined group 
> with generation 10699
> 2019-03-10T03:16:39.209Z [pool-4-thread-13] INFO  
> o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-13, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Setting newly assigned 
> partitions [legacy_logs_elk_c2-18]
> ...
> 2019-03-10T03:16:39.216Z [pool-4-thread-11] INFO  
> o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-11, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Successfully joined group 
> with generation 10699
> 2019-03-10T03:16:39.217Z [pool-4-thread-11] INFO  
> o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-11, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Setting newly assigned 
> partitions [legacy_logs_elk_c2-10, legacy_logs_elk_c2-11]
> ...
> 2019-03-10T03:16:39.218Z [pool-4-thread-15] INFO  
> o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-15, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Setting newly assigned 
> partitions [legacy_logs_elk_c2-24]
> 2019-03-10T03:16:42.320Z [kafka-coordinator-heartbeat-thread | 
> hercules.sink.elastic.legacy_logs_elk_c2] INFO  
> o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-6, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Attempt to heartbeat failed 

[jira] [Commented] (KAFKA-8104) Consumer cannot rejoin to the group after rebalancing

2019-09-28 Thread Vladimir Teplygin (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16939971#comment-16939971
 ] 

Vladimir Teplygin commented on KAFKA-8104:
--

same issue.   kafka-client 1.1.1-cp1

100 partitions. 5 consumers with the same consumer group with 20 threads each

> Consumer cannot rejoin to the group after rebalancing
> -
>
> Key: KAFKA-8104
> URL: https://issues.apache.org/jira/browse/KAFKA-8104
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 2.0.0, 2.1.0, 2.2.0, 2.3.0
>Reporter: Gregory Koshelev
>Priority: Critical
> Attachments: consumer-rejoin-fail.log
>
>
> TL;DR; {{KafkaConsumer}} cannot rejoin to the group due to inconsistent 
> {{AbstractCoordinator.generation}} (which is {{NO_GENERATION}} and 
> {{AbstractCoordinator.joinFuture}} (which is succeeded {{RequestFuture}}). 
> See explanation below.
> There are 16 consumers in single process (threads from pool-4-thread-1 to 
> pool-4-thread-16). All of them belong to single consumer group 
> {{hercules.sink.elastic.legacy_logs_elk_c2}}. Rebalancing has been acquired 
> and consumers have got {{CommitFailedException}} as expected:
> {noformat}
> 2019-03-10T03:16:37.023Z [pool-4-thread-10] WARN  
> r.k.vostok.hercules.sink.SimpleSink - Commit failed due to rebalancing
> org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be 
> completed since the group has already rebalanced and assigned the partitions 
> to another member. This means that the time between subsequent calls to 
> poll() was longer than the configured max.poll.interval.ms, which typically 
> implies that the poll loop is spending too much time message processing. You 
> can address this either by increasing the session timeout or by reducing the 
> maximum size of batches returned in poll() with max.poll.records.
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(ConsumerCoordinator.java:798)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:681)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1334)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1298)
>   at ru.kontur.vostok.hercules.sink.Sink.commit(Sink.java:156)
>   at ru.kontur.vostok.hercules.sink.SimpleSink.run(SimpleSink.java:104)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}
> After that, most of them successfully rejoined to the group with generation 
> 10699:
> {noformat}
> 2019-03-10T03:16:39.208Z [pool-4-thread-13] INFO  
> o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-13, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Successfully joined group 
> with generation 10699
> 2019-03-10T03:16:39.209Z [pool-4-thread-13] INFO  
> o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-13, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Setting newly assigned 
> partitions [legacy_logs_elk_c2-18]
> ...
> 2019-03-10T03:16:39.216Z [pool-4-thread-11] INFO  
> o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-11, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Successfully joined group 
> with generation 10699
> 2019-03-10T03:16:39.217Z [pool-4-thread-11] INFO  
> o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-11, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Setting newly assigned 
> partitions [legacy_logs_elk_c2-10, legacy_logs_elk_c2-11]
> ...
> 2019-03-10T03:16:39.218Z [pool-4-thread-15] INFO  
> o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-15, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Setting newly assigned 
> partitions [legacy_logs_elk_c2-24]
> 2019-03-10T03:16:42.320Z [kafka-coordinator-heartbeat-thread | 
> hercules.sink.elastic.legacy_logs_elk_c2] INFO  
> o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-6, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Attempt to heartbeat failed 
> since group is rebalancing
> 2019-03-10T03:16:42.320Z [kafka-coordinator-heartbeat-thread | 
> hercules.sink.elastic.legacy_logs_elk_c2] INFO  
> o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-5, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Attempt to heartbeat failed 
> since group is rebalancing
> 2019-03-10T03:16:42.323Z [kafka-coordinator-heartbeat-thread | 
> hercules.sink.elastic.legacy_logs_elk_c2] INFO  
> o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-7, 
> 

[jira] [Commented] (KAFKA-8104) Consumer cannot rejoin to the group after rebalancing

2019-09-17 Thread Charles Qian (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931990#comment-16931990
 ] 

Charles Qian commented on KAFKA-8104:
-

same issue.   kafka-client 2.2.0

it happens in large topology. like each topic has > 16 partitions. so that for 
each topic, client needs > 16 consumer threads.

 

 

 

> Consumer cannot rejoin to the group after rebalancing
> -
>
> Key: KAFKA-8104
> URL: https://issues.apache.org/jira/browse/KAFKA-8104
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 2.0.0, 2.1.0, 2.2.0, 2.3.0
>Reporter: Gregory Koshelev
>Priority: Critical
> Attachments: consumer-rejoin-fail.log
>
>
> TL;DR; {{KafkaConsumer}} cannot rejoin to the group due to inconsistent 
> {{AbstractCoordinator.generation}} (which is {{NO_GENERATION}} and 
> {{AbstractCoordinator.joinFuture}} (which is succeeded {{RequestFuture}}). 
> See explanation below.
> There are 16 consumers in single process (threads from pool-4-thread-1 to 
> pool-4-thread-16). All of them belong to single consumer group 
> {{hercules.sink.elastic.legacy_logs_elk_c2}}. Rebalancing has been acquired 
> and consumers have got {{CommitFailedException}} as expected:
> {noformat}
> 2019-03-10T03:16:37.023Z [pool-4-thread-10] WARN  
> r.k.vostok.hercules.sink.SimpleSink - Commit failed due to rebalancing
> org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be 
> completed since the group has already rebalanced and assigned the partitions 
> to another member. This means that the time between subsequent calls to 
> poll() was longer than the configured max.poll.interval.ms, which typically 
> implies that the poll loop is spending too much time message processing. You 
> can address this either by increasing the session timeout or by reducing the 
> maximum size of batches returned in poll() with max.poll.records.
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(ConsumerCoordinator.java:798)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:681)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1334)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1298)
>   at ru.kontur.vostok.hercules.sink.Sink.commit(Sink.java:156)
>   at ru.kontur.vostok.hercules.sink.SimpleSink.run(SimpleSink.java:104)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}
> After that, most of them successfully rejoined to the group with generation 
> 10699:
> {noformat}
> 2019-03-10T03:16:39.208Z [pool-4-thread-13] INFO  
> o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-13, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Successfully joined group 
> with generation 10699
> 2019-03-10T03:16:39.209Z [pool-4-thread-13] INFO  
> o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-13, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Setting newly assigned 
> partitions [legacy_logs_elk_c2-18]
> ...
> 2019-03-10T03:16:39.216Z [pool-4-thread-11] INFO  
> o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-11, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Successfully joined group 
> with generation 10699
> 2019-03-10T03:16:39.217Z [pool-4-thread-11] INFO  
> o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-11, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Setting newly assigned 
> partitions [legacy_logs_elk_c2-10, legacy_logs_elk_c2-11]
> ...
> 2019-03-10T03:16:39.218Z [pool-4-thread-15] INFO  
> o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-15, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Setting newly assigned 
> partitions [legacy_logs_elk_c2-24]
> 2019-03-10T03:16:42.320Z [kafka-coordinator-heartbeat-thread | 
> hercules.sink.elastic.legacy_logs_elk_c2] INFO  
> o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-6, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Attempt to heartbeat failed 
> since group is rebalancing
> 2019-03-10T03:16:42.320Z [kafka-coordinator-heartbeat-thread | 
> hercules.sink.elastic.legacy_logs_elk_c2] INFO  
> o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-5, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Attempt to heartbeat failed 
> since group is rebalancing
> 2019-03-10T03:16:42.323Z [kafka-coordinator-heartbeat-thread | 
> hercules.sink.elastic.legacy_logs_elk_c2] INFO  
> o.a.k.c.c.i.AbstractCoordinator - [Consumer 

[jira] [Commented] (KAFKA-8104) Consumer cannot rejoin to the group after rebalancing

2019-08-21 Thread Adam Kotwasinski (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16912116#comment-16912116
 ] 

Adam Kotwasinski commented on KAFKA-8104:
-

FWIW, I'm seeing the same
{code:java}
java.lang.IllegalStateException: Coordinator selected invalid assignment 
protocol: null
{code}
while using kafka-clients:1.1.1

> Consumer cannot rejoin to the group after rebalancing
> -
>
> Key: KAFKA-8104
> URL: https://issues.apache.org/jira/browse/KAFKA-8104
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 2.0.0, 2.1.0, 2.2.0, 2.3.0
>Reporter: Gregory Koshelev
>Priority: Critical
> Attachments: consumer-rejoin-fail.log
>
>
> TL;DR; {{KafkaConsumer}} cannot rejoin to the group due to inconsistent 
> {{AbstractCoordinator.generation}} (which is {{NO_GENERATION}} and 
> {{AbstractCoordinator.joinFuture}} (which is succeeded {{RequestFuture}}). 
> See explanation below.
> There are 16 consumers in single process (threads from pool-4-thread-1 to 
> pool-4-thread-16). All of them belong to single consumer group 
> {{hercules.sink.elastic.legacy_logs_elk_c2}}. Rebalancing has been acquired 
> and consumers have got {{CommitFailedException}} as expected:
> {noformat}
> 2019-03-10T03:16:37.023Z [pool-4-thread-10] WARN  
> r.k.vostok.hercules.sink.SimpleSink - Commit failed due to rebalancing
> org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be 
> completed since the group has already rebalanced and assigned the partitions 
> to another member. This means that the time between subsequent calls to 
> poll() was longer than the configured max.poll.interval.ms, which typically 
> implies that the poll loop is spending too much time message processing. You 
> can address this either by increasing the session timeout or by reducing the 
> maximum size of batches returned in poll() with max.poll.records.
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(ConsumerCoordinator.java:798)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:681)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1334)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1298)
>   at ru.kontur.vostok.hercules.sink.Sink.commit(Sink.java:156)
>   at ru.kontur.vostok.hercules.sink.SimpleSink.run(SimpleSink.java:104)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}
> After that, most of them successfully rejoined to the group with generation 
> 10699:
> {noformat}
> 2019-03-10T03:16:39.208Z [pool-4-thread-13] INFO  
> o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-13, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Successfully joined group 
> with generation 10699
> 2019-03-10T03:16:39.209Z [pool-4-thread-13] INFO  
> o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-13, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Setting newly assigned 
> partitions [legacy_logs_elk_c2-18]
> ...
> 2019-03-10T03:16:39.216Z [pool-4-thread-11] INFO  
> o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-11, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Successfully joined group 
> with generation 10699
> 2019-03-10T03:16:39.217Z [pool-4-thread-11] INFO  
> o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-11, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Setting newly assigned 
> partitions [legacy_logs_elk_c2-10, legacy_logs_elk_c2-11]
> ...
> 2019-03-10T03:16:39.218Z [pool-4-thread-15] INFO  
> o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-15, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Setting newly assigned 
> partitions [legacy_logs_elk_c2-24]
> 2019-03-10T03:16:42.320Z [kafka-coordinator-heartbeat-thread | 
> hercules.sink.elastic.legacy_logs_elk_c2] INFO  
> o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-6, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Attempt to heartbeat failed 
> since group is rebalancing
> 2019-03-10T03:16:42.320Z [kafka-coordinator-heartbeat-thread | 
> hercules.sink.elastic.legacy_logs_elk_c2] INFO  
> o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-5, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Attempt to heartbeat failed 
> since group is rebalancing
> 2019-03-10T03:16:42.323Z [kafka-coordinator-heartbeat-thread | 
> hercules.sink.elastic.legacy_logs_elk_c2] INFO  
> o.a.k.c.c.i.AbstractCoordinator - 

[jira] [Commented] (KAFKA-8104) Consumer cannot rejoin to the group after rebalancing

2019-08-20 Thread Gregory Koshelev (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16911182#comment-16911182
 ] 

Gregory Koshelev commented on KAFKA-8104:
-

[~guozhang], I found server.log (there are 3 brokers, but only one of them has 
relevant log entries in log):

{code:text}
[2019-03-10 02:46:43,627] INFO [GroupCoordinator 2]: Stabilized group 
hercules.sink.elastic.legacy_logs_elk_c2 generation 10693 
(__consumer_offsets-25) (kafka.coordinator.group.GroupCoordinator)
[2019-03-10 02:46:43,630] INFO [GroupCoordinator 2]: Assignment received from 
leader for group hercules.sink.elastic.legacy_logs_elk_c2 for generation 10693 
(kafka.coordinator.group.GroupCoordinator)
[2019-03-10 02:46:45,397] INFO [GroupCoordinator 2]: Preparing to rebalance 
group hercules.sink.elastic.legacy_logs_elk_c2 with old generation 10693 
(__consumer_offsets-25) (kafka.coordinator.group.GroupCoordinator)
[2019-03-10 02:46:53,628] INFO [GroupCoordinator 2]: Member 
consumer-7-cc600ccb-72b6-4582-b288-3ea966f33c7c in group 
hercules.sink.elastic.legacy_logs_elk_c2 has failed, removing it from the group 
(kafka.coordinator.group.GroupCoordinator)
[2019-03-10 02:46:53,628] INFO [GroupCoordinator 2]: Member 
consumer-1-f284c5c8-22fc-40d5-aa30-46c2875463bd in group 
hercules.sink.elastic.legacy_logs_elk_c2 has failed, removing it from the group 
(kafka.coordinator.group.GroupCoordinator)
[2019-03-10 02:46:53,628] INFO [GroupCoordinator 2]: Member 
consumer-5-e608adfc-e9ad-427e-99b5-99ff1538611d in group 
hercules.sink.elastic.legacy_logs_elk_c2 has failed, removing it from the group 
(kafka.coordinator.group.GroupCoordinator)
[2019-03-10 02:46:53,628] INFO [GroupCoordinator 2]: Member 
consumer-6-daa568cc-a5ee-49a9-aa7c-bd34fbb1da52 in group 
hercules.sink.elastic.legacy_logs_elk_c2 has failed, removing it from the group 
(kafka.coordinator.group.GroupCoordinator)
[2019-03-10 02:46:53,628] INFO [GroupCoordinator 2]: Member 
consumer-2-da70d65b-53e6-4467-b06d-63f02ff89eec in group 
hercules.sink.elastic.legacy_logs_elk_c2 has failed, removing it from the group 
(kafka.coordinator.group.GroupCoordinator)
[2019-03-10 02:46:53,628] INFO [GroupCoordinator 2]: Member 
consumer-12-a0cb16ff-46eb-4945-a87a-244a20261af9 in group 
hercules.sink.elastic.legacy_logs_elk_c2 has failed, removing it from the group 
(kafka.coordinator.group.GroupCoordinator)
[2019-03-10 02:46:53,628] INFO [GroupCoordinator 2]: Member 
consumer-10-c10cb15d-edc3-4c6a-b21a-68e2b8298284 in group 
hercules.sink.elastic.legacy_logs_elk_c2 has failed, removing it from the group 
(kafka.coordinator.group.GroupCoordinator)
[2019-03-10 02:46:53,628] INFO [GroupCoordinator 2]: Member 
consumer-16-d80c8a16-be83-48f3-ac9d-08431b87c198 in group 
hercules.sink.elastic.legacy_logs_elk_c2 has failed, removing it from the group 
(kafka.coordinator.group.GroupCoordinator)
[2019-03-10 02:46:53,628] INFO [GroupCoordinator 2]: Member 
consumer-5-75899722-91a9-439a-a0c5-f23eb7c6c93f in group 
hercules.sink.elastic.legacy_logs_elk_c2 has failed, removing it from the group 
(kafka.coordinator.group.GroupCoordinator)
[2019-03-10 02:46:53,628] INFO [GroupCoordinator 2]: Member 
consumer-11-14fe64e9-1553-49e1-abf9-5ea623847c84 in group 
hercules.sink.elastic.legacy_logs_elk_c2 has failed, removing it from the group 
(kafka.coordinator.group.GroupCoordinator)
[2019-03-10 02:46:53,628] INFO [GroupCoordinator 2]: Member 
consumer-3-9b62bde0-ff71-46f7-b6a4-7be14e6055b5 in group 
hercules.sink.elastic.legacy_logs_elk_c2 has failed, removing it from the group 
(kafka.coordinator.group.GroupCoordinator)
[2019-03-10 02:46:53,629] INFO [GroupCoordinator 2]: Member 
consumer-14-42336dad-be7c-49da-a76b-72d13a856943 in group 
hercules.sink.elastic.legacy_logs_elk_c2 has failed, removing it from the group 
(kafka.coordinator.group.GroupCoordinator)
[2019-03-10 02:46:53,629] INFO [GroupCoordinator 2]: Member 
consumer-6-1d57e91e-a6cd-4917-9560-19fa6579d02e in group 
hercules.sink.elastic.legacy_logs_elk_c2 has failed, removing it from the group 
(kafka.coordinator.group.GroupCoordinator)
[2019-03-10 02:46:53,629] INFO [GroupCoordinator 2]: Member 
consumer-13-a310873e-5c66-402c-a031-0b2fce8eb4cc in group 
hercules.sink.elastic.legacy_logs_elk_c2 has failed, removing it from the group 
(kafka.coordinator.group.GroupCoordinator)
[2019-03-10 02:46:53,629] INFO [GroupCoordinator 2]: Member 
consumer-1-7c591ed8-01d8-4f2c-b756-a5c3d3f63eea in group 
hercules.sink.elastic.legacy_logs_elk_c2 has failed, removing it from the group 
(kafka.coordinator.group.GroupCoordinator)
[2019-03-10 02:46:53,629] INFO [GroupCoordinator 2]: Member 
consumer-7-70c381bc-5d32-4e2c-923b-2c15c0f15aa2 in group 
hercules.sink.elastic.legacy_logs_elk_c2 has failed, removing it from the group 
(kafka.coordinator.group.GroupCoordinator)
[2019-03-10 02:46:53,629] INFO [GroupCoordinator 2]: Member 
consumer-15-5de2a785-0cd8-43f8-b3e2-6b1da2983910 in group 

[jira] [Commented] (KAFKA-8104) Consumer cannot rejoin to the group after rebalancing

2019-07-29 Thread Guozhang Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16895482#comment-16895482
 ] 

Guozhang Wang commented on KAFKA-8104:
--

[~mbarbon] [~kgn] do you have any broker-side logs around the time this happens 
that can be shared on this ticket as well?

> Consumer cannot rejoin to the group after rebalancing
> -
>
> Key: KAFKA-8104
> URL: https://issues.apache.org/jira/browse/KAFKA-8104
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 2.0.0
>Reporter: Gregory Koshelev
>Priority: Critical
> Attachments: consumer-rejoin-fail.log
>
>
> TL;DR; {{KafkaConsumer}} cannot rejoin to the group due to inconsistent 
> {{AbstractCoordinator.generation}} (which is {{NO_GENERATION}} and 
> {{AbstractCoordinator.joinFuture}} (which is succeeded {{RequestFuture}}). 
> See explanation below.
> There are 16 consumers in single process (threads from pool-4-thread-1 to 
> pool-4-thread-16). All of them belong to single consumer group 
> {{hercules.sink.elastic.legacy_logs_elk_c2}}. Rebalancing has been acquired 
> and consumers have got {{CommitFailedException}} as expected:
> {noformat}
> 2019-03-10T03:16:37.023Z [pool-4-thread-10] WARN  
> r.k.vostok.hercules.sink.SimpleSink - Commit failed due to rebalancing
> org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be 
> completed since the group has already rebalanced and assigned the partitions 
> to another member. This means that the time between subsequent calls to 
> poll() was longer than the configured max.poll.interval.ms, which typically 
> implies that the poll loop is spending too much time message processing. You 
> can address this either by increasing the session timeout or by reducing the 
> maximum size of batches returned in poll() with max.poll.records.
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(ConsumerCoordinator.java:798)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:681)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1334)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1298)
>   at ru.kontur.vostok.hercules.sink.Sink.commit(Sink.java:156)
>   at ru.kontur.vostok.hercules.sink.SimpleSink.run(SimpleSink.java:104)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}
> After that, most of them successfully rejoined to the group with generation 
> 10699:
> {noformat}
> 2019-03-10T03:16:39.208Z [pool-4-thread-13] INFO  
> o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-13, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Successfully joined group 
> with generation 10699
> 2019-03-10T03:16:39.209Z [pool-4-thread-13] INFO  
> o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-13, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Setting newly assigned 
> partitions [legacy_logs_elk_c2-18]
> ...
> 2019-03-10T03:16:39.216Z [pool-4-thread-11] INFO  
> o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-11, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Successfully joined group 
> with generation 10699
> 2019-03-10T03:16:39.217Z [pool-4-thread-11] INFO  
> o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-11, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Setting newly assigned 
> partitions [legacy_logs_elk_c2-10, legacy_logs_elk_c2-11]
> ...
> 2019-03-10T03:16:39.218Z [pool-4-thread-15] INFO  
> o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-15, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Setting newly assigned 
> partitions [legacy_logs_elk_c2-24]
> 2019-03-10T03:16:42.320Z [kafka-coordinator-heartbeat-thread | 
> hercules.sink.elastic.legacy_logs_elk_c2] INFO  
> o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-6, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Attempt to heartbeat failed 
> since group is rebalancing
> 2019-03-10T03:16:42.320Z [kafka-coordinator-heartbeat-thread | 
> hercules.sink.elastic.legacy_logs_elk_c2] INFO  
> o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-5, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Attempt to heartbeat failed 
> since group is rebalancing
> 2019-03-10T03:16:42.323Z [kafka-coordinator-heartbeat-thread | 
> hercules.sink.elastic.legacy_logs_elk_c2] INFO  
> o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-7, 
> 

[jira] [Commented] (KAFKA-8104) Consumer cannot rejoin to the group after rebalancing

2019-07-23 Thread Mattia Barbon (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16891092#comment-16891092
 ] 

Mattia Barbon commented on KAFKA-8104:
--

It also happens with version 2.3 of the client

> Consumer cannot rejoin to the group after rebalancing
> -
>
> Key: KAFKA-8104
> URL: https://issues.apache.org/jira/browse/KAFKA-8104
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 2.0.0
>Reporter: Gregory Koshelev
>Priority: Critical
> Attachments: consumer-rejoin-fail.log
>
>
> TL;DR; {{KafkaConsumer}} cannot rejoin to the group due to inconsistent 
> {{AbstractCoordinator.generation}} (which is {{NO_GENERATION}} and 
> {{AbstractCoordinator.joinFuture}} (which is succeeded {{RequestFuture}}). 
> See explanation below.
> There are 16 consumers in single process (threads from pool-4-thread-1 to 
> pool-4-thread-16). All of them belong to single consumer group 
> {{hercules.sink.elastic.legacy_logs_elk_c2}}. Rebalancing has been acquired 
> and consumers have got {{CommitFailedException}} as expected:
> {noformat}
> 2019-03-10T03:16:37.023Z [pool-4-thread-10] WARN  
> r.k.vostok.hercules.sink.SimpleSink - Commit failed due to rebalancing
> org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be 
> completed since the group has already rebalanced and assigned the partitions 
> to another member. This means that the time between subsequent calls to 
> poll() was longer than the configured max.poll.interval.ms, which typically 
> implies that the poll loop is spending too much time message processing. You 
> can address this either by increasing the session timeout or by reducing the 
> maximum size of batches returned in poll() with max.poll.records.
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(ConsumerCoordinator.java:798)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:681)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1334)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1298)
>   at ru.kontur.vostok.hercules.sink.Sink.commit(Sink.java:156)
>   at ru.kontur.vostok.hercules.sink.SimpleSink.run(SimpleSink.java:104)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}
> After that, most of them successfully rejoined to the group with generation 
> 10699:
> {noformat}
> 2019-03-10T03:16:39.208Z [pool-4-thread-13] INFO  
> o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-13, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Successfully joined group 
> with generation 10699
> 2019-03-10T03:16:39.209Z [pool-4-thread-13] INFO  
> o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-13, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Setting newly assigned 
> partitions [legacy_logs_elk_c2-18]
> ...
> 2019-03-10T03:16:39.216Z [pool-4-thread-11] INFO  
> o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-11, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Successfully joined group 
> with generation 10699
> 2019-03-10T03:16:39.217Z [pool-4-thread-11] INFO  
> o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-11, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Setting newly assigned 
> partitions [legacy_logs_elk_c2-10, legacy_logs_elk_c2-11]
> ...
> 2019-03-10T03:16:39.218Z [pool-4-thread-15] INFO  
> o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-15, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Setting newly assigned 
> partitions [legacy_logs_elk_c2-24]
> 2019-03-10T03:16:42.320Z [kafka-coordinator-heartbeat-thread | 
> hercules.sink.elastic.legacy_logs_elk_c2] INFO  
> o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-6, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Attempt to heartbeat failed 
> since group is rebalancing
> 2019-03-10T03:16:42.320Z [kafka-coordinator-heartbeat-thread | 
> hercules.sink.elastic.legacy_logs_elk_c2] INFO  
> o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-5, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Attempt to heartbeat failed 
> since group is rebalancing
> 2019-03-10T03:16:42.323Z [kafka-coordinator-heartbeat-thread | 
> hercules.sink.elastic.legacy_logs_elk_c2] INFO  
> o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-7, 
> groupId=hercules.sink.elastic.legacy_logs_elk_c2] Attempt to heartbeat failed 
> since group is rebalancing
>