[jira] [Commented] (KAFKA-8104) Consumer cannot rejoin to the group after rebalancing
[ https://issues.apache.org/jira/browse/KAFKA-8104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16912116#comment-16912116 ] Adam Kotwasinski commented on KAFKA-8104: - FWIW, I'm seeing the same {code:java} java.lang.IllegalStateException: Coordinator selected invalid assignment protocol: null {code} while using kafka-clients:1.1.1 > Consumer cannot rejoin to the group after rebalancing > - > > Key: KAFKA-8104 > URL: https://issues.apache.org/jira/browse/KAFKA-8104 > Project: Kafka > Issue Type: Bug > Components: consumer >Affects Versions: 2.0.0, 2.1.0, 2.2.0, 2.3.0 >Reporter: Gregory Koshelev >Priority: Critical > Attachments: consumer-rejoin-fail.log > > > TL;DR; {{KafkaConsumer}} cannot rejoin to the group due to inconsistent > {{AbstractCoordinator.generation}} (which is {{NO_GENERATION}} and > {{AbstractCoordinator.joinFuture}} (which is succeeded {{RequestFuture}}). > See explanation below. > There are 16 consumers in single process (threads from pool-4-thread-1 to > pool-4-thread-16). All of them belong to single consumer group > {{hercules.sink.elastic.legacy_logs_elk_c2}}. Rebalancing has been acquired > and consumers have got {{CommitFailedException}} as expected: > {noformat} > 2019-03-10T03:16:37.023Z [pool-4-thread-10] WARN > r.k.vostok.hercules.sink.SimpleSink - Commit failed due to rebalancing > org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be > completed since the group has already rebalanced and assigned the partitions > to another member. This means that the time between subsequent calls to > poll() was longer than the configured max.poll.interval.ms, which typically > implies that the poll loop is spending too much time message processing. You > can address this either by increasing the session timeout or by reducing the > maximum size of batches returned in poll() with max.poll.records. > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(ConsumerCoordinator.java:798) > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:681) > at > org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1334) > at > org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1298) > at ru.kontur.vostok.hercules.sink.Sink.commit(Sink.java:156) > at ru.kontur.vostok.hercules.sink.SimpleSink.run(SimpleSink.java:104) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {noformat} > After that, most of them successfully rejoined to the group with generation > 10699: > {noformat} > 2019-03-10T03:16:39.208Z [pool-4-thread-13] INFO > o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-13, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Successfully joined group > with generation 10699 > 2019-03-10T03:16:39.209Z [pool-4-thread-13] INFO > o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-13, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Setting newly assigned > partitions [legacy_logs_elk_c2-18] > ... > 2019-03-10T03:16:39.216Z [pool-4-thread-11] INFO > o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-11, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Successfully joined group > with generation 10699 > 2019-03-10T03:16:39.217Z [pool-4-thread-11] INFO > o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-11, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Setting newly assigned > partitions [legacy_logs_elk_c2-10, legacy_logs_elk_c2-11] > ... > 2019-03-10T03:16:39.218Z [pool-4-thread-15] INFO > o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-15, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Setting newly assigned > partitions [legacy_logs_elk_c2-24] > 2019-03-10T03:16:42.320Z [kafka-coordinator-heartbeat-thread | > hercules.sink.elastic.legacy_logs_elk_c2] INFO > o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-6, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Attempt to heartbeat failed > since group is rebalancing > 2019-03-10T03:16:42.320Z [kafka-coordinator-heartbeat-thread | > hercules.sink.elastic.legacy_logs_elk_c2] INFO > o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-5, > groupId=hercules.sink.elastic.legacy_logs_elk_c2] Attempt to heartbeat failed > since group is rebalancing > 2019-03-10T03:16:42.323Z [kafka-coordinator-heartbeat-thread | > hercules.sink.elastic.legacy_logs_elk_c2] INFO > o.a.k.c.c.i.AbstractCoordi
[jira] [Updated] (KAFKA-6830) Add new metrics for consumer/replication fetch requests
[ https://issues.apache.org/jira/browse/KAFKA-6830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Kotwasinski updated KAFKA-6830: Description: Currently, we have only one fetch request-related metric for a topic. As fetch requests are used by both client consumers and replicating brokers, it is impossible to tell if the particular partition (with replication factor > 1) is being actively read from client by consumers. Rationale for this improvement: as owner of kafka installation, but not the owner of clients, I want to know which topics still have active (real) consumers. PR linked. KIP: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=80452537 was: Currently, we have only one fetch request-related metric for a topic. As fetch requests are used by both client consumers and replicating brokers, it is impossible to tell if the particular partition (with replication factor > 1) is being actively read from client by consumers. Rationale for this improvement: as owner of kafka installation, but not the owner of clients, I want to know which topics still have active (real) consumers. PR linked. > Add new metrics for consumer/replication fetch requests > --- > > Key: KAFKA-6830 > URL: https://issues.apache.org/jira/browse/KAFKA-6830 > Project: Kafka > Issue Type: Improvement > Components: core >Reporter: Adam Kotwasinski >Priority: Major > > Currently, we have only one fetch request-related metric for a topic. > As fetch requests are used by both client consumers and replicating brokers, > it is impossible to tell if the particular partition (with replication factor > > 1) is being actively read from client by consumers. > Rationale for this improvement: as owner of kafka installation, but not the > owner of clients, I want to know which topics still have active (real) > consumers. > PR linked. > KIP: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=80452537 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-6830) Add new metrics for consumer/replication fetch requests
[ https://issues.apache.org/jira/browse/KAFKA-6830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16471673#comment-16471673 ] Adam Kotwasinski commented on KAFKA-6830: - KIP - https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=80452537 > Add new metrics for consumer/replication fetch requests > --- > > Key: KAFKA-6830 > URL: https://issues.apache.org/jira/browse/KAFKA-6830 > Project: Kafka > Issue Type: Improvement > Components: core >Reporter: Adam Kotwasinski >Priority: Major > > Currently, we have only one fetch request-related metric for a topic. > As fetch requests are used by both client consumers and replicating brokers, > it is impossible to tell if the particular partition (with replication factor > > 1) is being actively read from client by consumers. > Rationale for this improvement: as owner of kafka installation, but not the > owner of clients, I want to know which topics still have active (real) > consumers. > PR linked. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-6830) Add new metrics for consumer/replication fetch requests
[ https://issues.apache.org/jira/browse/KAFKA-6830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Kotwasinski updated KAFKA-6830: Description: Currently, we have only one fetch request-related metric for a topic. As fetch requests are used by both client consumers and replicating brokers, it is impossible to tell if the particular partition (with replication factor > 1) is being actively read from client by consumers. Rationale for this improvement: as owner of kafka installation, but not the owner of clients, I want to know which topics still have active (real) consumers. Patch attached. was: Currently, we have only one fetch request-related metric for a topic. As fetch requests are used by both client consumers and replicating brokers, it is impossible to tell if the particular partition (with replication factor > 1) is being actively read from client by consumers. Rationale for this improvement: as owner of kafka installation, but not the owner of clients, I want to know which topics still have active (real) consumers. > Add new metrics for consumer/replication fetch requests > --- > > Key: KAFKA-6830 > URL: https://issues.apache.org/jira/browse/KAFKA-6830 > Project: Kafka > Issue Type: Improvement > Components: core >Reporter: Adam Kotwasinski >Priority: Major > > Currently, we have only one fetch request-related metric for a topic. > As fetch requests are used by both client consumers and replicating brokers, > it is impossible to tell if the particular partition (with replication factor > > 1) is being actively read from client by consumers. > Rationale for this improvement: as owner of kafka installation, but not the > owner of clients, I want to know which topics still have active (real) > consumers. > Patch attached. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-6830) Add new metrics for consumer/replication fetch requests
[ https://issues.apache.org/jira/browse/KAFKA-6830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Kotwasinski updated KAFKA-6830: Description: Currently, we have only one fetch request-related metric for a topic. As fetch requests are used by both client consumers and replicating brokers, it is impossible to tell if the particular partition (with replication factor > 1) is being actively read from client by consumers. Rationale for this improvement: as owner of kafka installation, but not the owner of clients, I want to know which topics still have active (real) consumers. PR linked. was: Currently, we have only one fetch request-related metric for a topic. As fetch requests are used by both client consumers and replicating brokers, it is impossible to tell if the particular partition (with replication factor > 1) is being actively read from client by consumers. Rationale for this improvement: as owner of kafka installation, but not the owner of clients, I want to know which topics still have active (real) consumers. Patch attached. > Add new metrics for consumer/replication fetch requests > --- > > Key: KAFKA-6830 > URL: https://issues.apache.org/jira/browse/KAFKA-6830 > Project: Kafka > Issue Type: Improvement > Components: core >Reporter: Adam Kotwasinski >Priority: Major > > Currently, we have only one fetch request-related metric for a topic. > As fetch requests are used by both client consumers and replicating brokers, > it is impossible to tell if the particular partition (with replication factor > > 1) is being actively read from client by consumers. > Rationale for this improvement: as owner of kafka installation, but not the > owner of clients, I want to know which topics still have active (real) > consumers. > PR linked. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-6830) Add new metrics for consumer/replication fetch requests
Adam Kotwasinski created KAFKA-6830: --- Summary: Add new metrics for consumer/replication fetch requests Key: KAFKA-6830 URL: https://issues.apache.org/jira/browse/KAFKA-6830 Project: Kafka Issue Type: Improvement Components: core Reporter: Adam Kotwasinski Currently, we have only one fetch request-related metric for a topic. As fetch requests are used by both client consumers and replicating brokers, it is impossible to tell if the particular partition (with replication factor > 1) is being actively read from client by consumers. Rationale for this improvement: as owner of kafka installation, but not the owner of clients, I want to know which topics still have active (real) consumers. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-6438) NSEE while concurrently creating and deleting a topic
[ https://issues.apache.org/jira/browse/KAFKA-6438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Kotwasinski updated KAFKA-6438: Description: It appears that deleting a topic and creating it at the same time can cause NSEE, what later results in a forced controller shutdown. Most probably topics are being created because consumers/producers are still active (yes, this means the deletion is happening blindly). The main problem here (for me) is the controller switch, the data loss and following unclean election is acceptable (as we admit to deleting blindly). Environment description: 20 kafka brokers 80k partitions (20k topics 4partitions each) 3 node ZK Incident: {code:java} [2018-01-09 11:19:05,912] INFO [Topic Deletion Manager 6], Partition deletion callback for mytopic-2,mytopic-0,mytopic-1,mytopic-3 (kafka.controller.TopicDeletionManager) [2018-01-09 11:19:06,237] INFO [Controller id=6] New leader and ISR for partition mytopic-0 is {"leader":-1,"leader_epoch":1,"isr":[]} (kafka.controller.KafkaController) [2018-01-09 11:19:06,412] INFO [Topic Deletion Manager 6], Deletion for replicas 12,9,10,11 for partition mytopic-3,mytopic-0,mytopic-1,mytopic-2 of topic mytopic in progress (kafka.controller.TopicDeletionManager) [2018-01-09 11:19:07,218] INFO [Topic Deletion Manager 6], Deletion for replicas 12,9,10,11 for partition mytopic-3,mytopic-0,mytopic-1,mytopic-2 of topic mytopic in progress (kafka.controller.TopicDeletionManager) [2018-01-09 11:19:07,304] INFO [Topic Deletion Manager 6], Deletion for replicas 12,9,10,11 for partition mytopic-3,mytopic-0,mytopic-1,mytopic-2 of topic mytopic in progress (kafka.controller.TopicDeletionManager) [2018-01-09 11:19:07,383] INFO [Topic Deletion Manager 6], Deletion for replicas 12,9,10,11 for partition mytopic-3,mytopic-0,mytopic-1,mytopic-2 of topic mytopic in progress (kafka.controller.TopicDeletionManager) [2018-01-09 11:19:07,510] INFO [Topic Deletion Manager 6], Deletion for replicas 12,9,10,11 for partition mytopic-3,mytopic-0,mytopic-1,mytopic-2 of topic mytopic in progress (kafka.controller.TopicDeletionManager) [2018-01-09 11:19:07,661] INFO [Topic Deletion Manager 6], Deletion for replicas 12,9,10,11 for partition mytopic-3,mytopic-0,mytopic-1,mytopic-2 of topic mytopic in progress (kafka.controller.TopicDeletionManager) [2018-01-09 11:19:07,728] INFO [Topic Deletion Manager 6], Deletion for replicas 9,10,11 for partition mytopic-0,mytopic-1,mytopic-2 of topic mytopic in progress (kafka.controller.TopicDeletionManager) [2018-01-09 11:19:07,924] INFO [PartitionStateMachine controllerId=6] Invoking state change to OfflinePartition for partitions mytopic-2,mytopic-0,mytopic-1,mytopic-3 (kafka.controller.PartitionStateMachine) [2018-01-09 11:19:07,924] INFO [PartitionStateMachine controllerId=6] Invoking state change to NonExistentPartition for partitions mytopic-2,mytopic-0,mytopic-1,mytopic-3 (kafka.controller.PartitionStateMachine) [2018-01-09 11:19:08,592] INFO [Controller id=6] New topics: [Set(mytopic, other, other2)], deleted topics: [Set()], new partition replica assignment [Map(other-0 -> Vector(8), mytopic-2 -> Vector(6), mytopic-0 -> Vector(4), other-2 -> Vector(10), mytopic-1 -> Vector(5), mytopic-3 -> Vector(7), other-1 -> Vector(9), other-3 -> Vector(11))] (kafka.controller.KafkaController) [2018-01-09 11:19:08,593] INFO [Controller id=6] New topic creation callback for other-0,mytopic-2,mytopic-0,other-2,mytopic-1,mytopic-3,other-1,other-3 (kafka.controller.KafkaController) [2018-01-09 11:19:08,596] INFO [Controller id=6] New partition creation callback for other-0,mytopic-2,mytopic-0,other-2,mytopic-1,mytopic-3,other-1,other-3 (kafka.controller.KafkaController) [2018-01-09 11:19:08,596] INFO [PartitionStateMachine controllerId=6] Invoking state change to NewPartition for partitions other-0,mytopic-2,mytopic-0,other-2,mytopic-1,mytopic-3,other-1,other-3 (kafka.controller.PartitionStateMachine) [2018-01-09 11:19:08,642] INFO [PartitionStateMachine controllerId=6] Invoking state change to OnlinePartition for partitions other-0,mytopic-2,mytopic-0,other-2,mytopic-1,mytopic-3,other-1,other-3 (kafka.controller.PartitionStateMachine) [2018-01-09 11:19:08,828] INFO [Topic Deletion Manager 6], Partition deletion callback for mytopic-2,mytopic-0,mytopic-1,mytopic-3 (kafka.controller.TopicDeletionManager) [2018-01-09 11:19:09,127] INFO [Controller id=6] New leader and ISR for partition mytopic-0 is {"leader":-1,"leader_epoch":1,"isr":[]} (kafka.controller.KafkaController) [2018-01-09 11:19:09,607] ERROR [controller-event-thread]: Error processing event TopicDeletion(Set(mytopic, other)) (kafka.controller.Contr ollerEventManager$ControllerEventThread) java.util.NoSuchElementException: key not found: mytopic-0 at scala.collection.MapLike$class.default(MapLike.scala:228) at scala.collection.AbstractMap.default(Map.s
[jira] [Created] (KAFKA-6438) NSEE while concurrently creating and deleting a topic
Adam Kotwasinski created KAFKA-6438: --- Summary: NSEE while concurrently creating and deleting a topic Key: KAFKA-6438 URL: https://issues.apache.org/jira/browse/KAFKA-6438 Project: Kafka Issue Type: Bug Components: controller Affects Versions: 1.0.0 Environment: kafka_2.11-1.0.0.jar OpenJDK Runtime Environment (build 1.8.0_102-b14), OpenJDK 64-Bit Server VM (build 25.102-b14, mixed mode) CentOS Linux release 7.3.1611 (Core) Reporter: Adam Kotwasinski It appears that deleting a topic and creating it at the same time can cause NSEE, what later results in a forced controller shutdown. Most probably topics are being created because consumers/producers are still active (yes, this means the deletion is happening blindly). The main problem here (for me) is the controller switch, the data loss and following unclean election is acceptable (as we admit to deleting blindly). Environment description: 20 kafka brokers 80k partitions (20k topics 4partitions each) 3 node ZK Incident: {code:java} [2018-01-09 11:19:05,912] INFO [Topic Deletion Manager 6], Partition deletion callback for mytopic-2,mytopic-0,mytopic-1,mytopic-3 (kafka.controller.TopicDeletionManager) [2018-01-09 11:19:06,237] INFO [Controller id=6] New leader and ISR for partition mytopic-0 is {"leader":-1,"leader_epoch":1,"isr":[]} (kafka.controller.KafkaController) [2018-01-09 11:19:06,412] INFO [Topic Deletion Manager 6], Deletion for replicas 12,9,10,11 for partition mytopic-3,mytopic-0,mytopic-1,mytopic-2 of topic mytopic in progress (kafka.controller.TopicDeletionManager) [2018-01-09 11:19:07,218] INFO [Topic Deletion Manager 6], Deletion for replicas 12,9,10,11 for partition mytopic-3,mytopic-0,mytopic-1,mytopic-2 of topic mytopic in progress (kafka.controller.TopicDeletionManager) [2018-01-09 11:19:07,304] INFO [Topic Deletion Manager 6], Deletion for replicas 12,9,10,11 for partition mytopic-3,mytopic-0,mytopic-1,mytopic-2 of topic mytopic in progress (kafka.controller.TopicDeletionManager) [2018-01-09 11:19:07,383] INFO [Topic Deletion Manager 6], Deletion for replicas 12,9,10,11 for partition mytopic-3,mytopic-0,mytopic-1,mytopic-2 of topic mytopic in progress (kafka.controller.TopicDeletionManager) [2018-01-09 11:19:07,510] INFO [Topic Deletion Manager 6], Deletion for replicas 12,9,10,11 for partition mytopic-3,mytopic-0,mytopic-1,mytopic-2 of topic mytopic in progress (kafka.controller.TopicDeletionManager) [2018-01-09 11:19:07,661] INFO [Topic Deletion Manager 6], Deletion for replicas 12,9,10,11 for partition mytopic-3,mytopic-0,mytopic-1,mytopic-2 of topic mytopic in progress (kafka.controller.TopicDeletionManager) [2018-01-09 11:19:07,728] INFO [Topic Deletion Manager 6], Deletion for replicas 9,10,11 for partition mytopic-0,mytopic-1,mytopic-2 of topic mytopic in progress (kafka.controller.TopicDeletionManager) [2018-01-09 11:19:07,924] INFO [PartitionStateMachine controllerId=6] Invoking state change to OfflinePartition for partitions mytopic-2,mytopic-0,mytopic-1,mytopic-3 (kafka.controller.PartitionStateMachine) [2018-01-09 11:19:07,924] INFO [PartitionStateMachine controllerId=6] Invoking state change to NonExistentPartition for partitions mytopic-2,mytopic-0,mytopic-1,mytopic-3 (kafka.controller.PartitionStateMachine) [2018-01-09 11:19:08,592] INFO [Controller id=6] New topics: [Set(mytopic, other, querytickle_WD2-SALES1_espgms0202v29)], deleted topics: [Set()], new partition replica assignment [Map(other-0 -> Vector(8), mytopic-2 -> Vector(6), mytopic-0 -> Vector(4), other-2 -> Vector(10), mytopic-1 -> Vector(5), mytopic-3 -> Vector(7), other-1 -> Vector(9), other-3 -> Vector(11))] (kafka.controller.KafkaController) [2018-01-09 11:19:08,593] INFO [Controller id=6] New topic creation callback for other-0,mytopic-2,mytopic-0,other-2,mytopic-1,mytopic-3,other-1,other-3 (kafka.controller.KafkaController) [2018-01-09 11:19:08,596] INFO [Controller id=6] New partition creation callback for other-0,mytopic-2,mytopic-0,other-2,mytopic-1,mytopic-3,other-1,other-3 (kafka.controller.KafkaController) [2018-01-09 11:19:08,596] INFO [PartitionStateMachine controllerId=6] Invoking state change to NewPartition for partitions other-0,mytopic-2,mytopic-0,other-2,mytopic-1,mytopic-3,other-1,other-3 (kafka.controller.PartitionStateMachine) [2018-01-09 11:19:08,642] INFO [PartitionStateMachine controllerId=6] Invoking state change to OnlinePartition for partitions other-0,mytopic-2,mytopic-0,other-2,mytopic-1,mytopic-3,other-1,other-3 (kafka.controller.PartitionStateMachine) [2018-01-09 11:19:08,828] INFO [Topic Deletion Manager 6], Partition deletion callback for mytopic-2,mytopic-0,mytopic-1,mytopic-3 (kafka.controller.TopicDeletionManager) [2018-01-09 11:19:09,127] INFO [Controller id=6] New leader and ISR for partition mytopic-0 is {"leader":-1,"leader_epoch"