[jira] [Commented] (KAFKA-6178) Broker is listed as only ISR for all partitions it is leader of
[ https://issues.apache.org/jira/browse/KAFKA-6178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16793461#comment-16793461 ] Narayan Periwal commented on KAFKA-6178: We are also seeing the same issue in our kafka cluster. We are using the version 0.10.2.1 > Broker is listed as only ISR for all partitions it is leader of > --- > > Key: KAFKA-6178 > URL: https://issues.apache.org/jira/browse/KAFKA-6178 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.10.1.0 > Environment: Windows >Reporter: AS >Priority: Major > Labels: windows > Attachments: KafkaServiceOutput.txt, log-cleaner.log, server.log > > > We're running a 15 broker cluster on windows machines, and one of the > brokers, 10, is the only ISR on all partitions that it is the leader of. On > partitions where it isn't the leader, it seems to follow the leadeer fine. > This is an excerpt from 'describe': > Topic: ClientQosCombined Partition: 458 Leader: 10 Replicas: > 10,6,7,8,9,0,1 Isr: 10 > Topic: ClientQosCombined Partition: 459 Leader: 11 Replicas: > 11,7,8,9,0,1,10 Isr: 0,10,1,9,7,11,8 > The server.log files all seem to be pretty standard, and the only indication > of this issue is the following pattern that often repeats: > 2017-11-06 20:28:25,207 [INFO] kafka.cluster.Partition > [kafka-request-handler-8:] - Partition [ClientQosCombined,398] on broker 10: > Expanding ISR for partition [ClientQosCombined,398] from 10 to 5,10 > 2017-11-06 20:28:39,382 [INFO] kafka.cluster.Partition [kafka-scheduler-1:] - > Partition [ClientQosCombined,398] on broker 10: Shrinking ISR for partition > [ClientQosCombined,398] from 5,10 to 10 > For each of the partitions that 10 leads. This is the only topic that we > currently have in our cluster. The __consumer_offsets topic seems completely > normal in terms of isr counts. The controller is broker 5, which is cycling > through attempting and failing to trigger leader elections on broker 10 led > partitions. From the controller log in broker 5: > 2017-11-06 20:45:04,857 [INFO] kafka.controller.KafkaController > [kafka-scheduler-0:] - [Controller 5]: Starting preferred replica leader > election for partitions [ClientQosCombined,375] > 2017-11-06 20:45:04,857 [INFO] kafka.controller.PartitionStateMachine > [kafka-scheduler-0:] - [Partition state machine on Controller 5]: Invoking > state change to OnlinePartition for partitions [ClientQosCombined,375] > 2017-11-06 20:45:04,857 [INFO] > kafka.controller.PreferredReplicaPartitionLeaderSelector [kafka-scheduler-0:] > - [PreferredReplicaPartitionLeaderSelector]: Current leader 10 for partition > [ClientQosCombined,375] is not the preferred replica. Trigerring preferred > replica leader election > 2017-11-06 20:45:04,857 [WARN] kafka.controller.KafkaController > [kafka-scheduler-0:] - [Controller 5]: Partition [ClientQosCombined,375] > failed to complete preferred replica leader election. Leader is 10 > I've also attached the logs and output from broker 10. Any idea what's wrong > here? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-6681) Two instances of kafka consumer reading the same partition within a consumer group
[ https://issues.apache.org/jira/browse/KAFKA-6681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16702913#comment-16702913 ] Narayan Periwal commented on KAFKA-6681: Thanks for letting us know [~lyn610] > Two instances of kafka consumer reading the same partition within a consumer > group > -- > > Key: KAFKA-6681 > URL: https://issues.apache.org/jira/browse/KAFKA-6681 > Project: Kafka > Issue Type: Bug > Components: clients, consumer >Affects Versions: 0.10.2.1 >Reporter: Narayan Periwal >Priority: Critical > Attachments: server-1.log, server-2.log > > > We have seen this issue with the Kafka consumer, the new library that got > introduced in 0.9 > With this new client, the group management is done by kafka coordinator, > which is one of the kafka broker. > We are using Kafka broker 0.10.2.1 and consumer client version is also > 0.10.2.1 > The issue that we have faced is that, after rebalancing, some of the > partitions gets consumed by 2 instances within a consumer group, leading to > duplication of the entire partition data. Both the instances continue to read > until the next rebalancing, or the restart of those clients. > It looks like that a particular consumer goes on fetching the data from a > partition, but the broker is not able to identify this "stale" consumer > instance. > We have hit this twice in production. Please look at it the earliest. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7026) Sticky assignor could assign a partition to multiple consumers (KIP-341)
[ https://issues.apache.org/jira/browse/KAFKA-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16639684#comment-16639684 ] Narayan Periwal commented on KAFKA-7026: Sure [~steven.aerts]. I have already created a ticket related to it - https://issues.apache.org/jira/browse/KAFKA-6681 > Sticky assignor could assign a partition to multiple consumers (KIP-341) > > > Key: KAFKA-7026 > URL: https://issues.apache.org/jira/browse/KAFKA-7026 > Project: Kafka > Issue Type: Bug > Components: clients >Reporter: Vahid Hashemian >Assignee: Vahid Hashemian >Priority: Major > Labels: kip > Fix For: 2.2.0 > > > In the following scenario sticky assignor assigns a topic partition to two > consumers in the group: > # Create a topic {{test}} with a single partition > # Start consumer {{c1}} in group {{sticky-group}} ({{c1}} becomes group > leader and gets {{test-0}}) > # Start consumer {{c2}} in group {{sticky-group}} ({{c1}} holds onto > {{test-0}}, {{c2}} does not get any partition) > # Pause {{c1}} (e.g. using Java debugger) ({{c2}} becomes leader and takes > over {{test-0}}, {{c1}} leaves the group) > # Resume {{c1}} > At this point both {{c1}} and {{c2}} will have {{test-0}} assigned to them. > > The reason is {{c1}} still has kept its previous assignment ({{test-0}}) from > the last assignment it received from the leader (itself) and did not get the > next round of assignments (when {{c2}} became leader) because it was paused. > Both {{c1}} and {{c2}} enter the rebalance supplying {{test-0}} as their > existing assignment. The sticky assignor code does not currently check and > avoid this duplication. > > Note: This issue was originally reported on > [StackOverflow|https://stackoverflow.com/questions/50761842/kafka-stickyassignor-breaking-delivery-to-single-consumer-in-the-group]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7026) Sticky assignor could assign a partition to multiple consumers (KIP-341)
[ https://issues.apache.org/jira/browse/KAFKA-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16637717#comment-16637717 ] Narayan Periwal commented on KAFKA-7026: [~steven.aerts] [~vahid] As I mentioned in KAFKA-6681, we are seeing this issue with RangeAssignor. So I do not think this fix is going to solve our issue. We are using a bit old version of Kafka brokers (0.10.2.1). Don't know if upgrading will fix this. > Sticky assignor could assign a partition to multiple consumers (KIP-341) > > > Key: KAFKA-7026 > URL: https://issues.apache.org/jira/browse/KAFKA-7026 > Project: Kafka > Issue Type: Bug > Components: clients >Reporter: Vahid Hashemian >Assignee: Vahid Hashemian >Priority: Major > Labels: kip > Fix For: 2.2.0 > > > In the following scenario sticky assignor assigns a topic partition to two > consumers in the group: > # Create a topic {{test}} with a single partition > # Start consumer {{c1}} in group {{sticky-group}} ({{c1}} becomes group > leader and gets {{test-0}}) > # Start consumer {{c2}} in group {{sticky-group}} ({{c1}} holds onto > {{test-0}}, {{c2}} does not get any partition) > # Pause {{c1}} (e.g. using Java debugger) ({{c2}} becomes leader and takes > over {{test-0}}, {{c1}} leaves the group) > # Resume {{c1}} > At this point both {{c1}} and {{c2}} will have {{test-0}} assigned to them. > > The reason is {{c1}} still has kept its previous assignment ({{test-0}}) from > the last assignment it received from the leader (itself) and did not get the > next round of assignments (when {{c2}} became leader) because it was paused. > Both {{c1}} and {{c2}} enter the rebalance supplying {{test-0}} as their > existing assignment. The sticky assignor code does not currently check and > avoid this duplication. > > Note: This issue was originally reported on > [StackOverflow|https://stackoverflow.com/questions/50761842/kafka-stickyassignor-breaking-delivery-to-single-consumer-in-the-group]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (KAFKA-7026) Sticky assignor could assign a partition to multiple consumers (KIP-341)
[ https://issues.apache.org/jira/browse/KAFKA-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16636871#comment-16636871 ] Narayan Periwal edited comment on KAFKA-7026 at 10/3/18 12:05 PM: -- [~steven.aerts], can the issue of "consumer of the process not losing it's subscription" come with RangeAssignor? was (Author: nperiwal): [~steven.aerts], can the issue of "consumer of the process not losing it's subscription" come with RangeAssignor? The reason why I ask this is that KAFKA-6717 raised by [~Yuancheng] is related to _StickyAssignor_ > Sticky assignor could assign a partition to multiple consumers (KIP-341) > > > Key: KAFKA-7026 > URL: https://issues.apache.org/jira/browse/KAFKA-7026 > Project: Kafka > Issue Type: Bug > Components: clients >Reporter: Vahid Hashemian >Assignee: Vahid Hashemian >Priority: Major > Labels: kip > Fix For: 2.2.0 > > > In the following scenario sticky assignor assigns a topic partition to two > consumers in the group: > # Create a topic {{test}} with a single partition > # Start consumer {{c1}} in group {{sticky-group}} ({{c1}} becomes group > leader and gets {{test-0}}) > # Start consumer {{c2}} in group {{sticky-group}} ({{c1}} holds onto > {{test-0}}, {{c2}} does not get any partition) > # Pause {{c1}} (e.g. using Java debugger) ({{c2}} becomes leader and takes > over {{test-0}}, {{c1}} leaves the group) > # Resume {{c1}} > At this point both {{c1}} and {{c2}} will have {{test-0}} assigned to them. > > The reason is {{c1}} still has kept its previous assignment ({{test-0}}) from > the last assignment it received from the leader (itself) and did not get the > next round of assignments (when {{c2}} became leader) because it was paused. > Both {{c1}} and {{c2}} enter the rebalance supplying {{test-0}} as their > existing assignment. The sticky assignor code does not currently check and > avoid this duplication. > > Note: This issue was originally reported on > [StackOverflow|https://stackoverflow.com/questions/50761842/kafka-stickyassignor-breaking-delivery-to-single-consumer-in-the-group]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (KAFKA-7026) Sticky assignor could assign a partition to multiple consumers (KIP-341)
[ https://issues.apache.org/jira/browse/KAFKA-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16636871#comment-16636871 ] Narayan Periwal edited comment on KAFKA-7026 at 10/3/18 12:04 PM: -- [~steven.aerts], can the issue of "consumer of the process not losing it's subscription" come with RangeAssignor? The reason why I ask this is that KAFKA-6717 raised by [~Yuancheng] is related to _StickyAssignor_ was (Author: nperiwal): [~steven.aerts], can the issue of "consumer of the process not losing it's subscription" come with RangeAssignor? > Sticky assignor could assign a partition to multiple consumers (KIP-341) > > > Key: KAFKA-7026 > URL: https://issues.apache.org/jira/browse/KAFKA-7026 > Project: Kafka > Issue Type: Bug > Components: clients >Reporter: Vahid Hashemian >Assignee: Vahid Hashemian >Priority: Major > Labels: kip > Fix For: 2.2.0 > > > In the following scenario sticky assignor assigns a topic partition to two > consumers in the group: > # Create a topic {{test}} with a single partition > # Start consumer {{c1}} in group {{sticky-group}} ({{c1}} becomes group > leader and gets {{test-0}}) > # Start consumer {{c2}} in group {{sticky-group}} ({{c1}} holds onto > {{test-0}}, {{c2}} does not get any partition) > # Pause {{c1}} (e.g. using Java debugger) ({{c2}} becomes leader and takes > over {{test-0}}, {{c1}} leaves the group) > # Resume {{c1}} > At this point both {{c1}} and {{c2}} will have {{test-0}} assigned to them. > > The reason is {{c1}} still has kept its previous assignment ({{test-0}}) from > the last assignment it received from the leader (itself) and did not get the > next round of assignments (when {{c2}} became leader) because it was paused. > Both {{c1}} and {{c2}} enter the rebalance supplying {{test-0}} as their > existing assignment. The sticky assignor code does not currently check and > avoid this duplication. > > Note: This issue was originally reported on > [StackOverflow|https://stackoverflow.com/questions/50761842/kafka-stickyassignor-breaking-delivery-to-single-consumer-in-the-group]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7026) Sticky assignor could assign a partition to multiple consumers (KIP-341)
[ https://issues.apache.org/jira/browse/KAFKA-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16636871#comment-16636871 ] Narayan Periwal commented on KAFKA-7026: [~steven.aerts], can the issue of "consumer of the process not losing it's subscription" come with RangeAssignor? > Sticky assignor could assign a partition to multiple consumers (KIP-341) > > > Key: KAFKA-7026 > URL: https://issues.apache.org/jira/browse/KAFKA-7026 > Project: Kafka > Issue Type: Bug > Components: clients >Reporter: Vahid Hashemian >Assignee: Vahid Hashemian >Priority: Major > Labels: kip > Fix For: 2.2.0 > > > In the following scenario sticky assignor assigns a topic partition to two > consumers in the group: > # Create a topic {{test}} with a single partition > # Start consumer {{c1}} in group {{sticky-group}} ({{c1}} becomes group > leader and gets {{test-0}}) > # Start consumer {{c2}} in group {{sticky-group}} ({{c1}} holds onto > {{test-0}}, {{c2}} does not get any partition) > # Pause {{c1}} (e.g. using Java debugger) ({{c2}} becomes leader and takes > over {{test-0}}, {{c1}} leaves the group) > # Resume {{c1}} > At this point both {{c1}} and {{c2}} will have {{test-0}} assigned to them. > > The reason is {{c1}} still has kept its previous assignment ({{test-0}}) from > the last assignment it received from the leader (itself) and did not get the > next round of assignments (when {{c2}} became leader) because it was paused. > Both {{c1}} and {{c2}} enter the rebalance supplying {{test-0}} as their > existing assignment. The sticky assignor code does not currently check and > avoid this duplication. > > Note: This issue was originally reported on > [StackOverflow|https://stackoverflow.com/questions/50761842/kafka-stickyassignor-breaking-delivery-to-single-consumer-in-the-group]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (KAFKA-7026) Sticky assignor could assign a partition to multiple consumers
[ https://issues.apache.org/jira/browse/KAFKA-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16524543#comment-16524543 ] Narayan Periwal edited comment on KAFKA-7026 at 6/27/18 4:11 AM: - [~vahid], unfortunately we are not able to reproduce this in our QA setup. Only co-relation that we have seen is this seems to happen when there is spike in the number of under replicated partitions in the kafka cluster. One more thing is when this issue happens, we have seen our consumers not processing data for more than "max.poll.interval.ms", thus the consumer.poll() call is not invoked for "max.poll.interval.ms", which means the consumer is considered failed and the group will rebalance in order to reassign the partitions to another member. Looks like, the first consumer, after recovery(able to process now), is still getting data from the earlier assigned partition, leading to this issue. was (Author: nperiwal): [~vahid], unfortunately we are not able to reproduce this in our QA setup. Only co-relation that we have seen is this seems to happen when there is spike in the number of under replicated partitions in the kafka cluster. One more thing is when this issue happens, we have seen our consumers not processing data for more than "max.poll.interval.ms", thus the consumer.poll() call is not invoked for "max.poll.interval.ms", which means the consumer is considered failed and the group will rebalance in order to reassign the partitions to another member. Looks like, the old consumer, after recovery, is still getting data from the earlier assigned partition, leading to this issue. > Sticky assignor could assign a partition to multiple consumers > -- > > Key: KAFKA-7026 > URL: https://issues.apache.org/jira/browse/KAFKA-7026 > Project: Kafka > Issue Type: Bug > Components: clients >Reporter: Vahid Hashemian >Assignee: Vahid Hashemian >Priority: Major > Fix For: 2.1.0 > > > In the following scenario sticky assignor assigns a topic partition to two > consumers in the group: > # Create a topic {{test}} with a single partition > # Start consumer {{c1}} in group {{sticky-group}} ({{c1}} becomes group > leader and gets {{test-0}}) > # Start consumer {{c2}} in group {{sticky-group}} ({{c1}} holds onto > {{test-0}}, {{c2}} does not get any partition) > # Pause {{c1}} (e.g. using Java debugger) ({{c2}} becomes leader and takes > over {{test-0}}, {{c1}} leaves the group) > # Resume {{c1}} > At this point both {{c1}} and {{c2}} will have {{test-0}} assigned to them. > > The reason is {{c1}} still has kept its previous assignment ({{test-0}}) from > the last assignment it received from the leader (itself) and did not get the > next round of assignments (when {{c2}} became leader) because it was paused. > Both {{c1}} and {{c2}} enter the rebalance supplying {{test-0}} as their > existing assignment. The sticky assignor code does not currently check and > avoid this duplication. > > Note: This issue was originally reported on > [StackOverflow|https://stackoverflow.com/questions/50761842/kafka-stickyassignor-breaking-delivery-to-single-consumer-in-the-group]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (KAFKA-7026) Sticky assignor could assign a partition to multiple consumers
[ https://issues.apache.org/jira/browse/KAFKA-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16524543#comment-16524543 ] Narayan Periwal edited comment on KAFKA-7026 at 6/27/18 4:10 AM: - [~vahid], unfortunately we are not able to reproduce this in our QA setup. Only co-relation that we have seen is this seems to happen when there is spike in the number of under replicated partitions in the kafka cluster. One more thing is when this issue happens, we have seen our consumers not processing data for more than "max.poll.interval.ms", thus the consumer.poll() call is not invoked for "max.poll.interval.ms", which means the consumer is considered failed and the group will rebalance in order to reassign the partitions to another member. Looks like, the old consumer, after recovery, is still getting data from the earlier assigned partition, leading to this issue. was (Author: nperiwal): [~vahid], unfortunately we are not able to reproduce this in our QA setup. Only co-relation that we have seen is this seems to happen when there is spike in the number of under replicated partitions in the kafka cluster. > Sticky assignor could assign a partition to multiple consumers > -- > > Key: KAFKA-7026 > URL: https://issues.apache.org/jira/browse/KAFKA-7026 > Project: Kafka > Issue Type: Bug > Components: clients >Reporter: Vahid Hashemian >Assignee: Vahid Hashemian >Priority: Major > Fix For: 2.1.0 > > > In the following scenario sticky assignor assigns a topic partition to two > consumers in the group: > # Create a topic {{test}} with a single partition > # Start consumer {{c1}} in group {{sticky-group}} ({{c1}} becomes group > leader and gets {{test-0}}) > # Start consumer {{c2}} in group {{sticky-group}} ({{c1}} holds onto > {{test-0}}, {{c2}} does not get any partition) > # Pause {{c1}} (e.g. using Java debugger) ({{c2}} becomes leader and takes > over {{test-0}}, {{c1}} leaves the group) > # Resume {{c1}} > At this point both {{c1}} and {{c2}} will have {{test-0}} assigned to them. > > The reason is {{c1}} still has kept its previous assignment ({{test-0}}) from > the last assignment it received from the leader (itself) and did not get the > next round of assignments (when {{c2}} became leader) because it was paused. > Both {{c1}} and {{c2}} enter the rebalance supplying {{test-0}} as their > existing assignment. The sticky assignor code does not currently check and > avoid this duplication. > > Note: This issue was originally reported on > [StackOverflow|https://stackoverflow.com/questions/50761842/kafka-stickyassignor-breaking-delivery-to-single-consumer-in-the-group]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7026) Sticky assignor could assign a partition to multiple consumers
[ https://issues.apache.org/jira/browse/KAFKA-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16524543#comment-16524543 ] Narayan Periwal commented on KAFKA-7026: [~vahid], unfortunately we are not able to reproduce this in our QA setup. Only co-relation that we have seen is this seems to happen when there is spike in the number of under replicated partitions in the kafka cluster. > Sticky assignor could assign a partition to multiple consumers > -- > > Key: KAFKA-7026 > URL: https://issues.apache.org/jira/browse/KAFKA-7026 > Project: Kafka > Issue Type: Bug > Components: clients >Reporter: Vahid Hashemian >Assignee: Vahid Hashemian >Priority: Major > Fix For: 2.1.0 > > > In the following scenario sticky assignor assigns a topic partition to two > consumers in the group: > # Create a topic {{test}} with a single partition > # Start consumer {{c1}} in group {{sticky-group}} ({{c1}} becomes group > leader and gets {{test-0}}) > # Start consumer {{c2}} in group {{sticky-group}} ({{c1}} holds onto > {{test-0}}, {{c2}} does not get any partition) > # Pause {{c1}} (e.g. using Java debugger) ({{c2}} becomes leader and takes > over {{test-0}}, {{c1}} leaves the group) > # Resume {{c1}} > At this point both {{c1}} and {{c2}} will have {{test-0}} assigned to them. > > The reason is {{c1}} still has kept its previous assignment ({{test-0}}) from > the last assignment it received from the leader (itself) and did not get the > next round of assignments (when {{c2}} became leader) because it was paused. > Both {{c1}} and {{c2}} enter the rebalance supplying {{test-0}} as their > existing assignment. The sticky assignor code does not currently check and > avoid this duplication. > > Note: This issue was originally reported on > [StackOverflow|https://stackoverflow.com/questions/50761842/kafka-stickyassignor-breaking-delivery-to-single-consumer-in-the-group]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7026) Sticky assignor could assign a partition to multiple consumers
[ https://issues.apache.org/jira/browse/KAFKA-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16524517#comment-16524517 ] Narayan Periwal commented on KAFKA-7026: [~vahid], Agree that this may not be the actual cause of the issue. But we have seen this occuring multiple times in our production setup, with consumers continuing to consume the same partition unless a manual restart is triggered. So, it could be due to some other issue. [~steven.aerts], we are using custom checkpointing in zookeeper, so the kafka-consumer-groups.sh script to describe the consumer group does not work for us. However, we have a mechanism to detect multiple consumers when consuming from the same partition. I am sharing the distribution of one such case. Topic - test, consumer group - group1, consumers - c1,c2,c3,c4,c5. Partition 3,4,5 of this topic were being consumed by multiple consumer instances. {noformat} group: group1, topic: test, partition: 0, consumer: c2 group: group1, topic: test, partition: 1, consumer: c4 group: group1, topic: test, partition: 2, consumer: c4 group: group1, topic: test, partition: 3, consumer: c3,c4 group: group1, topic: test, partition: 4, consumer: c3,c5 group: group1, topic: test, partition: 5, consumer: c3,c5 group: group1, topic: test, partition: 6, consumer: c5 group: group1, topic: test, partition: 7, consumer: c1 group: group1, topic: test, partition: 8, consumer: c1 group: group1, topic: test, partition: 9, consumer: c1 {noformat} > Sticky assignor could assign a partition to multiple consumers > -- > > Key: KAFKA-7026 > URL: https://issues.apache.org/jira/browse/KAFKA-7026 > Project: Kafka > Issue Type: Bug > Components: clients >Reporter: Vahid Hashemian >Assignee: Vahid Hashemian >Priority: Major > Fix For: 2.1.0 > > > In the following scenario sticky assignor assigns a topic partition to two > consumers in the group: > # Create a topic {{test}} with a single partition > # Start consumer {{c1}} in group {{sticky-group}} ({{c1}} becomes group > leader and gets {{test-0}}) > # Start consumer {{c2}} in group {{sticky-group}} ({{c1}} holds onto > {{test-0}}, {{c2}} does not get any partition) > # Pause {{c1}} (e.g. using Java debugger) ({{c2}} becomes leader and takes > over {{test-0}}, {{c1}} leaves the group) > # Resume {{c1}} > At this point both {{c1}} and {{c2}} will have {{test-0}} assigned to them. > > The reason is {{c1}} still has kept its previous assignment ({{test-0}}) from > the last assignment it received from the leader (itself) and did not get the > next round of assignments (when {{c2}} became leader) because it was paused. > Both {{c1}} and {{c2}} enter the rebalance supplying {{test-0}} as their > existing assignment. The sticky assignor code does not currently check and > avoid this duplication. > > Note: This issue was originally reported on > [StackOverflow|https://stackoverflow.com/questions/50761842/kafka-stickyassignor-breaking-delivery-to-single-consumer-in-the-group]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (KAFKA-6681) Two instances of kafka consumer reading the same partition within a consumer group
[ https://issues.apache.org/jira/browse/KAFKA-6681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16523095#comment-16523095 ] Narayan Periwal edited comment on KAFKA-6681 at 6/26/18 4:02 AM: - [~steven.aerts], We are using RangeAssignor(which is the default), and not the Sticky Assignor which KAFKA-7026 mentions of. Some observation is that there is spike in the number of UnderReplicated partition, after which multiple consumer instances start consuming the same topic partition Our Kafka brokers and consumer both are in version 0.10.2.1 was (Author: nperiwal): [~steven.aerts], We are using RangeAssignor(which is the default), and not the Sticky Assignor which KAFKA-7026 mentions of. > Two instances of kafka consumer reading the same partition within a consumer > group > -- > > Key: KAFKA-6681 > URL: https://issues.apache.org/jira/browse/KAFKA-6681 > Project: Kafka > Issue Type: Bug > Components: clients, consumer >Affects Versions: 0.10.2.1 >Reporter: Narayan Periwal >Priority: Critical > Attachments: server-1.log, server-2.log > > > We have seen this issue with the Kafka consumer, the new library that got > introduced in 0.9 > With this new client, the group management is done by kafka coordinator, > which is one of the kafka broker. > We are using Kafka broker 0.10.2.1 and consumer client version is also > 0.10.2.1 > The issue that we have faced is that, after rebalancing, some of the > partitions gets consumed by 2 instances within a consumer group, leading to > duplication of the entire partition data. Both the instances continue to read > until the next rebalancing, or the restart of those clients. > It looks like that a particular consumer goes on fetching the data from a > partition, but the broker is not able to identify this "stale" consumer > instance. > We have hit this twice in production. Please look at it the earliest. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (KAFKA-7026) Sticky assignor could assign a partition to multiple consumers
[ https://issues.apache.org/jira/browse/KAFKA-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16523146#comment-16523146 ] Narayan Periwal edited comment on KAFKA-7026 at 6/26/18 4:01 AM: - [~vahid], Can this issue be there with RangeAssignor as well, because we have seen this issue occuring multiple time in our Kafka consumer (0.10.2.1) with RangeAssignor. Jira - KAFKA-6681. Some observation is that there is spike in the number of UnderReplicated partition in our Kafka cluster, after which multiple consumer instances start consuming the same topic partition. Kafka broker is also at version 0.10.2.1 was (Author: nperiwal): [~vahid], Can this issue be there with RangeAssignor as well, because we have seen this issue occuring multiple time in our Kafka consumer (0.10.2.1) with RangeAssignor. Jira - KAFKA-6681. Some observation is that there is spike in the number of UnderReplicated partition, after which multiple consumer instances start consuming the same topic partition > Sticky assignor could assign a partition to multiple consumers > -- > > Key: KAFKA-7026 > URL: https://issues.apache.org/jira/browse/KAFKA-7026 > Project: Kafka > Issue Type: Bug > Components: clients >Reporter: Vahid Hashemian >Assignee: Vahid Hashemian >Priority: Major > > In the following scenario sticky assignor assigns a topic partition to two > consumers in the group: > # Create a topic {{test}} with a single partition > # Start consumer {{c1}} in group {{sticky-group}} ({{c1}} becomes group > leader and gets {{test-0}}) > # Start consumer {{c2}} in group {{sticky-group}} ({{c1}} holds onto > {{test-0}}, {{c2}} does not get any partition) > # Pause {{c1}} (e.g. using Java debugger) ({{c2}} becomes leader and takes > over {{test-0}}, {{c1}} leaves the group) > # Resume {{c1}} > At this point both {{c1}} and {{c2}} will have {{test-0}} assigned to them. > > The reason is {{c1}} still has kept its previous assignment ({{test-0}}) from > the last assignment it received from the leader (itself) and did not get the > next round of assignments (when {{c2}} became leader) because it was paused. > Both {{c1}} and {{c2}} enter the rebalance supplying {{test-0}} as their > existing assignment. The sticky assignor code does not currently check and > avoid this duplication. > > Note: This issue was originally reported on > [StackOverflow|https://stackoverflow.com/questions/50761842/kafka-stickyassignor-breaking-delivery-to-single-consumer-in-the-group]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (KAFKA-7026) Sticky assignor could assign a partition to multiple consumers
[ https://issues.apache.org/jira/browse/KAFKA-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16523146#comment-16523146 ] Narayan Periwal edited comment on KAFKA-7026 at 6/26/18 3:59 AM: - [~vahid], Can this issue be there with RangeAssignor as well, because we have seen this issue occuring multiple time in our Kafka consumer (0.10.2.1) with RangeAssignor. Jira - KAFKA-6681. Some observation is that there is spike in the number of UnderReplicated partition, after which multiple consumer instances start consuming the same topic partition was (Author: nperiwal): [~vahid], Can this issue be there with RangeAssignor as well, because we have seen this issue occuring multiple time in our Kafka consumer (0.10.2.1) with RangeAssignor. Jira - KAFKA-6681 > Sticky assignor could assign a partition to multiple consumers > -- > > Key: KAFKA-7026 > URL: https://issues.apache.org/jira/browse/KAFKA-7026 > Project: Kafka > Issue Type: Bug > Components: clients >Reporter: Vahid Hashemian >Assignee: Vahid Hashemian >Priority: Major > > In the following scenario sticky assignor assigns a topic partition to two > consumers in the group: > # Create a topic {{test}} with a single partition > # Start consumer {{c1}} in group {{sticky-group}} ({{c1}} becomes group > leader and gets {{test-0}}) > # Start consumer {{c2}} in group {{sticky-group}} ({{c1}} holds onto > {{test-0}}, {{c2}} does not get any partition) > # Pause {{c1}} (e.g. using Java debugger) ({{c2}} becomes leader and takes > over {{test-0}}, {{c1}} leaves the group) > # Resume {{c1}} > At this point both {{c1}} and {{c2}} will have {{test-0}} assigned to them. > > The reason is {{c1}} still has kept its previous assignment ({{test-0}}) from > the last assignment it received from the leader (itself) and did not get the > next round of assignments (when {{c2}} became leader) because it was paused. > Both {{c1}} and {{c2}} enter the rebalance supplying {{test-0}} as their > existing assignment. The sticky assignor code does not currently check and > avoid this duplication. > > Note: This issue was originally reported on > [StackOverflow|https://stackoverflow.com/questions/50761842/kafka-stickyassignor-breaking-delivery-to-single-consumer-in-the-group]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7026) Sticky assignor could assign a partition to multiple consumers
[ https://issues.apache.org/jira/browse/KAFKA-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16523146#comment-16523146 ] Narayan Periwal commented on KAFKA-7026: [~vahid], Can this issue be there with RangeAssignor as well, because we have seen this issue occuring multiple time in our Kafka consumer (0.10.2.1) with RangeAssignor. Jira - KAFKA-6681 > Sticky assignor could assign a partition to multiple consumers > -- > > Key: KAFKA-7026 > URL: https://issues.apache.org/jira/browse/KAFKA-7026 > Project: Kafka > Issue Type: Bug > Components: clients >Reporter: Vahid Hashemian >Assignee: Vahid Hashemian >Priority: Major > > In the following scenario sticky assignor assigns a topic partition to two > consumers in the group: > # Create a topic {{test}} with a single partition > # Start consumer {{c1}} in group {{sticky-group}} ({{c1}} becomes group > leader and gets {{test-0}}) > # Start consumer {{c2}} in group {{sticky-group}} ({{c1}} holds onto > {{test-0}}, {{c2}} does not get any partition) > # Pause {{c1}} (e.g. using Java debugger) ({{c2}} becomes leader and takes > over {{test-0}}, {{c1}} leaves the group) > # Resume {{c1}} > At this point both {{c1}} and {{c2}} will have {{test-0}} assigned to them. > > The reason is {{c1}} still has kept its previous assignment ({{test-0}}) from > the last assignment it received from the leader (itself) and did not get the > next round of assignments (when {{c2}} became leader) because it was paused. > Both {{c1}} and {{c2}} enter the rebalance supplying {{test-0}} as their > existing assignment. The sticky assignor code does not currently check and > avoid this duplication. > > Note: This issue was originally reported on > [StackOverflow|https://stackoverflow.com/questions/50761842/kafka-stickyassignor-breaking-delivery-to-single-consumer-in-the-group]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-6681) Two instances of kafka consumer reading the same partition within a consumer group
[ https://issues.apache.org/jira/browse/KAFKA-6681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16523095#comment-16523095 ] Narayan Periwal commented on KAFKA-6681: [~steven.aerts], We are using RangeAssignor(which is the default), and not the Sticky Assignor which KAFKA-7026 mentions of. > Two instances of kafka consumer reading the same partition within a consumer > group > -- > > Key: KAFKA-6681 > URL: https://issues.apache.org/jira/browse/KAFKA-6681 > Project: Kafka > Issue Type: Bug > Components: clients, consumer >Affects Versions: 0.10.2.1 >Reporter: Narayan Periwal >Priority: Critical > Attachments: server-1.log, server-2.log > > > We have seen this issue with the Kafka consumer, the new library that got > introduced in 0.9 > With this new client, the group management is done by kafka coordinator, > which is one of the kafka broker. > We are using Kafka broker 0.10.2.1 and consumer client version is also > 0.10.2.1 > The issue that we have faced is that, after rebalancing, some of the > partitions gets consumed by 2 instances within a consumer group, leading to > duplication of the entire partition data. Both the instances continue to read > until the next rebalancing, or the restart of those clients. > It looks like that a particular consumer goes on fetching the data from a > partition, but the broker is not able to identify this "stale" consumer > instance. > We have hit this twice in production. Please look at it the earliest. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-6681) Two instances of kafka consumer reading the same partition within a consumer group
[ https://issues.apache.org/jira/browse/KAFKA-6681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16423530#comment-16423530 ] Narayan Periwal commented on KAFKA-6681: [~yuzhih...@gmail.com], I understand your concern. 0.10.2.1 is a bit old. The thing is we have not been able to reproduce this in our dev environment even with 0.10.2.1. We have hit this issue only in production 3 times. If we are able to find a way to reproduce this in 0.10.2.1, then definitely we can give a try on 1.1.0. What do you suggest? > Two instances of kafka consumer reading the same partition within a consumer > group > -- > > Key: KAFKA-6681 > URL: https://issues.apache.org/jira/browse/KAFKA-6681 > Project: Kafka > Issue Type: Bug > Components: clients, consumer >Affects Versions: 0.10.2.1 >Reporter: Narayan Periwal >Priority: Critical > Attachments: server-1.log, server-2.log > > > We have seen this issue with the Kafka consumer, the new library that got > introduced in 0.9 > With this new client, the group management is done by kafka coordinator, > which is one of the kafka broker. > We are using Kafka broker 0.10.2.1 and consumer client version is also > 0.10.2.1 > The issue that we have faced is that, after rebalancing, some of the > partitions gets consumed by 2 instances within a consumer group, leading to > duplication of the entire partition data. Both the instances continue to read > until the next rebalancing, or the restart of those clients. > It looks like that a particular consumer goes on fetching the data from a > partition, but the broker is not able to identify this "stale" consumer > instance. > We have hit this twice in production. Please look at it the earliest. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-6681) Two instances of kafka consumer reading the same partition within a consumer group
[ https://issues.apache.org/jira/browse/KAFKA-6681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16422654#comment-16422654 ] Narayan Periwal commented on KAFKA-6681: [~yuzhih...@gmail.com], Thanks for looking into this problem. As you suggested, we will try out RoundRobinAssignor and let you know. > Two instances of kafka consumer reading the same partition within a consumer > group > -- > > Key: KAFKA-6681 > URL: https://issues.apache.org/jira/browse/KAFKA-6681 > Project: Kafka > Issue Type: Bug > Components: clients, consumer >Affects Versions: 0.10.2.1 >Reporter: Narayan Periwal >Priority: Critical > Attachments: server-1.log, server-2.log > > > We have seen this issue with the Kafka consumer, the new library that got > introduced in 0.9 > With this new client, the group management is done by kafka coordinator, > which is one of the kafka broker. > We are using Kafka broker 0.10.2.1 and consumer client version is also > 0.10.2.1 > The issue that we have faced is that, after rebalancing, some of the > partitions gets consumed by 2 instances within a consumer group, leading to > duplication of the entire partition data. Both the instances continue to read > until the next rebalancing, or the restart of those clients. > It looks like that a particular consumer goes on fetching the data from a > partition, but the broker is not able to identify this "stale" consumer > instance. > We have hit this twice in production. Please look at it the earliest. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (KAFKA-6681) Two instances of kafka consumer reading the same partition within a consumer group
[ https://issues.apache.org/jira/browse/KAFKA-6681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16422150#comment-16422150 ] Narayan Periwal edited comment on KAFKA-6681 at 4/2/18 11:23 AM: - [~yuzhih...@gmail.com] I assume you are asking about this config partition.assignment.strategy. For this, we are using the default class that is there in the consumer configs - org.apache.kafka.clients.consumer.RangeAssignor Will this have the issue? was (Author: nperiwal): [~yuzhih...@gmail.com] I assume you are asking about this config partition.assignment.strategy. For this, we are using the default class class org.apache.kafka.clients.consumer.RangeAssignor Will this have the issue? > Two instances of kafka consumer reading the same partition within a consumer > group > -- > > Key: KAFKA-6681 > URL: https://issues.apache.org/jira/browse/KAFKA-6681 > Project: Kafka > Issue Type: Bug > Components: clients, consumer >Affects Versions: 0.10.2.1 >Reporter: Narayan Periwal >Priority: Critical > Attachments: server-1.log, server-2.log > > > We have seen this issue with the Kafka consumer, the new library that got > introduced in 0.9 > With this new client, the group management is done by kafka coordinator, > which is one of the kafka broker. > We are using Kafka broker 0.10.2.1 and consumer client version is also > 0.10.2.1 > The issue that we have faced is that, after rebalancing, some of the > partitions gets consumed by 2 instances within a consumer group, leading to > duplication of the entire partition data. Both the instances continue to read > until the next rebalancing, or the restart of those clients. > It looks like that a particular consumer goes on fetching the data from a > partition, but the broker is not able to identify this "stale" consumer > instance. > We have hit this twice in production. Please look at it the earliest. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-6681) Two instances of kafka consumer reading the same partition within a consumer group
[ https://issues.apache.org/jira/browse/KAFKA-6681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16422150#comment-16422150 ] Narayan Periwal commented on KAFKA-6681: [~yuzhih...@gmail.com] I assume you are asking about this config partition.assignment.strategy. For this, we are using the default class class org.apache.kafka.clients.consumer.RangeAssignor Will this have the issue? > Two instances of kafka consumer reading the same partition within a consumer > group > -- > > Key: KAFKA-6681 > URL: https://issues.apache.org/jira/browse/KAFKA-6681 > Project: Kafka > Issue Type: Bug > Components: clients, consumer >Affects Versions: 0.10.2.1 >Reporter: Narayan Periwal >Priority: Critical > Attachments: server-1.log, server-2.log > > > We have seen this issue with the Kafka consumer, the new library that got > introduced in 0.9 > With this new client, the group management is done by kafka coordinator, > which is one of the kafka broker. > We are using Kafka broker 0.10.2.1 and consumer client version is also > 0.10.2.1 > The issue that we have faced is that, after rebalancing, some of the > partitions gets consumed by 2 instances within a consumer group, leading to > duplication of the entire partition data. Both the instances continue to read > until the next rebalancing, or the restart of those clients. > It looks like that a particular consumer goes on fetching the data from a > partition, but the broker is not able to identify this "stale" consumer > instance. > We have hit this twice in production. Please look at it the earliest. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-6681) Two instances of kafka consumer reading the same partition within a consumer group
[ https://issues.apache.org/jira/browse/KAFKA-6681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Narayan Periwal updated KAFKA-6681: --- Description: We have seen this issue with the Kafka consumer, the new library that got introduced in 0.9 With this new client, the group management is done by kafka coordinator, which is one of the kafka broker. We are using Kafka broker 0.10.2.1 and consumer client version is also 0.10.2.1 The issue that we have faced is that, after rebalancing, some of the partitions gets consumed by 2 instances within a consumer group, leading to duplication of the entire partition data. Both the instances continue to read until the next rebalancing, or the restart of those clients. It looks like that a particular consumer goes on fetching the data from a partition, but the broker is not able to identify this "stale" consumer instance. We have hit this twice in production. Please look at it the earliest. was: We have seen this issue with the Kafka consumer, the new library that got introduced in 0.9 With this new client, the group management is done by kafka coordinator, which is one of the kafka broker. We are using Kafka broker 0.10.2.1 and consumer client version is also 0.10.2.1 The issue that we have faced is that, after rebalancing, some of the partitions gets consumed by 2 instances within a consumer group, leading to duplication of the entire partition data. Both the instances continue to read until the next rebalancing, or the restart of those clients. It looks like that a particular consumer goes on fetching the data from a partition, but the broker is not able to identify this "stale" consumer instance. During this time, we also see the underreplicated partition metrics spiking. We have hit this twice in production. Please look at it the earliest. > Two instances of kafka consumer reading the same partition within a consumer > group > -- > > Key: KAFKA-6681 > URL: https://issues.apache.org/jira/browse/KAFKA-6681 > Project: Kafka > Issue Type: Bug > Components: clients, consumer >Affects Versions: 0.10.2.1 >Reporter: Narayan Periwal >Priority: Critical > Attachments: server-1.log, server-2.log > > > We have seen this issue with the Kafka consumer, the new library that got > introduced in 0.9 > With this new client, the group management is done by kafka coordinator, > which is one of the kafka broker. > We are using Kafka broker 0.10.2.1 and consumer client version is also > 0.10.2.1 > The issue that we have faced is that, after rebalancing, some of the > partitions gets consumed by 2 instances within a consumer group, leading to > duplication of the entire partition data. Both the instances continue to read > until the next rebalancing, or the restart of those clients. > It looks like that a particular consumer goes on fetching the data from a > partition, but the broker is not able to identify this "stale" consumer > instance. > We have hit this twice in production. Please look at it the earliest. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-6681) Two instances of kafka consumer reading the same partition within a consumer group
[ https://issues.apache.org/jira/browse/KAFKA-6681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Narayan Periwal updated KAFKA-6681: --- Component/s: clients > Two instances of kafka consumer reading the same partition within a consumer > group > -- > > Key: KAFKA-6681 > URL: https://issues.apache.org/jira/browse/KAFKA-6681 > Project: Kafka > Issue Type: Bug > Components: clients, consumer >Affects Versions: 0.10.2.1 >Reporter: Narayan Periwal >Priority: Critical > Attachments: server-1.log, server-2.log > > > We have seen this issue with the Kafka consumer, the new library that got > introduced in 0.9 > With this new client, the group management is done by kafka coordinator, > which is one of the kafka broker. > We are using Kafka broker 0.10.2.1 and consumer client version is also > 0.10.2.1 > The issue that we have faced is that, after rebalancing, some of the > partitions gets consumed by 2 instances within a consumer group, leading to > duplication of the entire partition data. Both the instances continue to read > until the next rebalancing, or the restart of those clients. > It looks like that a particular consumer goes on fetching the data from a > partition, but the broker is not able to identify this "stale" consumer > instance. > During this time, we also see the underreplicated partition metrics spiking. > We have hit this twice in production. Please look at it the earliest. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (KAFKA-6681) Two instances of kafka consumer reading the same partition within a consumer group
[ https://issues.apache.org/jira/browse/KAFKA-6681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16419345#comment-16419345 ] Narayan Periwal edited comment on KAFKA-6681 at 3/29/18 4:53 PM: - [~yuzhih...@gmail.com], We had one more occurence of the above issue. The topic had 1 partition and there were 4 consumers for it (all with the same consumer group name). Initially, as expected, only one if the consumer was reading from that partition and the others were simply doing nothing. We had an issue with our kafka cluster, due to which the entire cluster went down. When the cluster got up, after that I see all the 4 consumers reading that single partition of the topic, which was strange. For that topic, this is logs from the coordinator on the server side for that consumer group {noformat} [2018-03-27 23:06:49,113] INFO [GroupCoordinator 8]: Loading group metadata for testgroup with generation 63 (kafka.coordinator.GroupCoordinator) [2018-03-27 23:06:52,687] INFO [GroupCoordinator 8]: Preparing to restabilize group testgroup with old generation 63 (kafka.coordinator.GroupCoordinator) [2018-03-27 23:06:52,688] INFO [GroupCoordinator 8]: Stabilized group testgroup generation 64 (kafka.coordinator.GroupCoordinator) [2018-03-27 23:06:52,916] INFO [GroupCoordinator 8]: Assignment received from leader for group testgroup for generation 64 (kafka.coordinator.GroupCoordinator) {noformat} On the consumer side, the client-1 that was already reading that partition, on that we see the rebalancing getting triggered, both the callbacks onPartitionsRevoked and onPartitionsAssigned were invoked, while on client-2, none of these callbacks were invoked, however, still it started consuming the data from the partition, from there on. We saw the following exception in the client-2 logs, occuring 4 times with a gap of 1 to 2 seconds {noformat} 27 Mar 2018 23:06:42.307 ERROR [testgroup:testopic] [o.a.f.s.k.KafkaConsumerWorker.run:329] - testgroup:testopic:: exception occurred in kafka source worker. backing off for 1000 millis org.apache.kafka.clients.consumer.NoOffsetForPartitionException: Undefined offset with no reset policy for partition: testopic-0 at org.apache.kafka.clients.consumer.internals.Fetcher.resetOffset(Fetcher.java:375) ~[kafka-clients-0.10.2.1.jar:na] at org.apache.kafka.clients.consumer.internals.Fetcher.updateFetchPositions(Fetcher.java:248) ~[kafka-clients-0.10.2.1.jar:na] at org.apache.kafka.clients.consumer.KafkaConsumer.updateFetchPositions(KafkaConsumer.java:1601) ~[kafka-clients-0.10.2.1.jar:na] at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1034) ~[kafka-clients-0.10.2.1.jar:na] at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:995) ~[kafka-clients-0.10.2.1.jar:na] at org.apache.flume.source.kafka.KafkaConsumerWorker.fetchNextBatch(KafkaConsumerWorker.java:350) ~[flume-kafka-source-1.6.0.47.jar:1.6.0.47] at org.apache.flume.source.kafka.KafkaConsumerWorker.run(KafkaConsumerWorker.java:291) ~[flume-kafka-source-1.6.0.47.jar:1.6.0.47] 27 Mar 2018 23:06:43.743 ERROR [testgroup:testopic] [o.a.f.s.k.KafkaConsumerWorker.run:329] - testgroup:testopic:: exception occurred in kafka source worker. backing off for 1000 millis org.apache.kafka.clients.consumer.NoOffsetForPartitionException: Undefined offset with no reset policy for partition: testopic-0 at org.apache.kafka.clients.consumer.internals.Fetcher.resetOffset(Fetcher.java:375) ~[kafka-clients-0.10.2.1.jar:na] at org.apache.kafka.clients.consumer.internals.Fetcher.resetOffsetsIfNeeded(Fetcher.java:228) ~[kafka-clients-0.10.2.1.jar:na] at org.apache.kafka.clients.consumer.KafkaConsumer.updateFetchPositions(KafkaConsumer.java:1591) ~[kafka-clients-0.10.2.1.jar:na] at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1034) ~[kafka-clients-0.10.2.1.jar:na] at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:995) ~[kafka-clients-0.10.2.1.jar:na] at org.apache.flume.source.kafka.KafkaConsumerWorker.fetchNextBatch(KafkaConsumerWorker.java:350) ~[flume-kafka-source-1.6.0.47.jar:1.6.0.47] at org.apache.flume.source.kafka.KafkaConsumerWorker.run(KafkaConsumerWorker.java:291) ~[flume-kafka-source-1.6.0.47.jar:1.6.0.47] 27 Mar 2018 23:06:44.979 ERROR [testgroup:testopic] [o.a.f.s.k.KafkaConsumerWorker.run:329] - testgroup:testopic:: exception occurred in kafka source worker. backing off for 1000 millis org.apache.kafka.clients.consumer.NoOffsetForPartitionException: Undefined offset with no reset policy for partition: testopic-0 at org.apache.kafka.clients.consumer.internals.Fetcher.resetOffset(Fetcher.java:375) ~[kafka-clients-0.10.2.1.jar:na] at org.apache.kafka.cli
[jira] [Commented] (KAFKA-6681) Two instances of kafka consumer reading the same partition within a consumer group
[ https://issues.apache.org/jira/browse/KAFKA-6681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16419345#comment-16419345 ] Narayan Periwal commented on KAFKA-6681: [~yuzhih...@gmail.com], We had one more occurence of the above issue. The topic had 1 partition and there were 4 consumers for it (all with the same consumer group name). Initially, as expected, only one if the consumer was reading from that partition and the others were simply doing nothing. We had an issue with our kafka cluster, due to which the entire cluster went down. When the cluster got up, after that I see all the 4 consumers reading that single partition of the topic, which was strange. For that topic, this is logs from the coordinator on the server side for that consumer group {noformat} [2018-03-27 23:06:49,113] INFO [GroupCoordinator 8]: Loading group metadata for testgroup with generation 63 (kafka.coordinator.GroupCoordinator) [2018-03-27 23:06:52,687] INFO [GroupCoordinator 8]: Preparing to restabilize group testgroup with old generation 63 (kafka.coordinator.GroupCoordinator) [2018-03-27 23:06:52,688] INFO [GroupCoordinator 8]: Stabilized group testgroup generation 64 (kafka.coordinator.GroupCoordinator) [2018-03-27 23:06:52,916] INFO [GroupCoordinator 8]: Assignment received from leader for group testgroup for generation 64 (kafka.coordinator.GroupCoordinator) {noformat} On the consumer side, the client-1 that was already reading that partition, on that we see the rebalancing getting triggered, both the callbacks onPartitionsRevoked and onPartitionsAssigned were invoked, while on client-2, none of these callbacks were invoked, however, still it started consuming the data from the partition, from there on. We saw the following exception in the client-2 logs, occuring 4 times with a gap of 1 to 2 seconds {noformat} 27 Mar 2018 23:06:42.307 ERROR [testgroup:testopic] [o.a.f.s.k.KafkaConsumerWorker.run:329] - testgroup:testopic:: exception occurred in kafka source worker. backing off for 1000 millis org.apache.kafka.clients.consumer.NoOffsetForPartitionException: Undefined offset with no reset policy for partition: testopic-0 at org.apache.kafka.clients.consumer.internals.Fetcher.resetOffset(Fetcher.java:375) ~[kafka-clients-0.10.2.1.jar:na] at org.apache.kafka.clients.consumer.internals.Fetcher.updateFetchPositions(Fetcher.java:248) ~[kafka-clients-0.10.2.1.jar:na] at org.apache.kafka.clients.consumer.KafkaConsumer.updateFetchPositions(KafkaConsumer.java:1601) ~[kafka-clients-0.10.2.1.jar:na] at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1034) ~[kafka-clients-0.10.2.1.jar:na] at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:995) ~[kafka-clients-0.10.2.1.jar:na] at org.apache.flume.source.kafka.KafkaConsumerWorker.fetchNextBatch(KafkaConsumerWorker.java:350) ~[flume-kafka-source-1.6.0.47.jar:1.6.0.47] at org.apache.flume.source.kafka.KafkaConsumerWorker.run(KafkaConsumerWorker.java:291) ~[flume-kafka-source-1.6.0.47.jar:1.6.0.47] 27 Mar 2018 23:06:43.743 ERROR [testgroup:testopic] [o.a.f.s.k.KafkaConsumerWorker.run:329] - testgroup:testopic:: exception occurred in kafka source worker. backing off for 1000 millis org.apache.kafka.clients.consumer.NoOffsetForPartitionException: Undefined offset with no reset policy for partition: testopic-0 at org.apache.kafka.clients.consumer.internals.Fetcher.resetOffset(Fetcher.java:375) ~[kafka-clients-0.10.2.1.jar:na] at org.apache.kafka.clients.consumer.internals.Fetcher.resetOffsetsIfNeeded(Fetcher.java:228) ~[kafka-clients-0.10.2.1.jar:na] at org.apache.kafka.clients.consumer.KafkaConsumer.updateFetchPositions(KafkaConsumer.java:1591) ~[kafka-clients-0.10.2.1.jar:na] at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1034) ~[kafka-clients-0.10.2.1.jar:na] at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:995) ~[kafka-clients-0.10.2.1.jar:na] at org.apache.flume.source.kafka.KafkaConsumerWorker.fetchNextBatch(KafkaConsumerWorker.java:350) ~[flume-kafka-source-1.6.0.47.jar:1.6.0.47] at org.apache.flume.source.kafka.KafkaConsumerWorker.run(KafkaConsumerWorker.java:291) ~[flume-kafka-source-1.6.0.47.jar:1.6.0.47] 27 Mar 2018 23:06:44.979 ERROR [testgroup:testopic] [o.a.f.s.k.KafkaConsumerWorker.run:329] - testgroup:testopic:: exception occurred in kafka source worker. backing off for 1000 millis org.apache.kafka.clients.consumer.NoOffsetForPartitionException: Undefined offset with no reset policy for partition: testopic-0 at org.apache.kafka.clients.consumer.internals.Fetcher.resetOffset(Fetcher.java:375) ~[kafka-clients-0.10.2.1.jar:na] at org.apache.kafka.clients.consumer.internals.Fetcher.resetOffsetsIfNee
[jira] [Commented] (KAFKA-6681) Two instances of kafka consumer reading the same partition within a consumer group
[ https://issues.apache.org/jira/browse/KAFKA-6681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16413789#comment-16413789 ] Narayan Periwal commented on KAFKA-6681: [~yuzhih...@gmail.com], We faced yet another such issue, on server side we found these logs in this case {noformat} [2018-03-23 18:59:16,560] INFO [GroupCoordinator 6]: Stabilized group prod-m10n-event-batcher-billablebeaconams1 generation 6 (kafka.coordinator.GroupCoordinator) [2018-03-23 18:59:46,561] INFO [GroupCoordinator 6]: Preparing to restabilize group prod-m10n-event-batcher-billablebeaconams1 with old generation 6 (kafka.coordinator.GroupCoordinator) [2018-03-23 18:59:46,833] INFO [GroupCoordinator 6]: Stabilized group prod-m10n-event-batcher-billablebeaconams1 generation 7 (kafka.coordinator.GroupCoordinator) {noformat} > Two instances of kafka consumer reading the same partition within a consumer > group > -- > > Key: KAFKA-6681 > URL: https://issues.apache.org/jira/browse/KAFKA-6681 > Project: Kafka > Issue Type: Bug > Components: consumer >Affects Versions: 0.10.2.1 >Reporter: Narayan Periwal >Priority: Critical > Attachments: server-1.log, server-2.log > > > We have seen this issue with the Kafka consumer, the new library that got > introduced in 0.9 > With this new client, the group management is done by kafka coordinator, > which is one of the kafka broker. > We are using Kafka broker 0.10.2.1 and consumer client version is also > 0.10.2.1 > The issue that we have faced is that, after rebalancing, some of the > partitions gets consumed by 2 instances within a consumer group, leading to > duplication of the entire partition data. Both the instances continue to read > until the next rebalancing, or the restart of those clients. > It looks like that a particular consumer goes on fetching the data from a > partition, but the broker is not able to identify this "stale" consumer > instance. > During this time, we also see the underreplicated partition metrics spiking. > We have hit this twice in production. Please look at it the earliest. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-6681) Two instances of kafka consumer reading the same partition within a consumer group
[ https://issues.apache.org/jira/browse/KAFKA-6681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16413500#comment-16413500 ] Narayan Periwal commented on KAFKA-6681: [~yuzhih...@gmail.com], any update on this? > Two instances of kafka consumer reading the same partition within a consumer > group > -- > > Key: KAFKA-6681 > URL: https://issues.apache.org/jira/browse/KAFKA-6681 > Project: Kafka > Issue Type: Bug > Components: consumer >Affects Versions: 0.10.2.1 >Reporter: Narayan Periwal >Priority: Critical > Attachments: server-1.log, server-2.log > > > We have seen this issue with the Kafka consumer, the new library that got > introduced in 0.9 > With this new client, the group management is done by kafka coordinator, > which is one of the kafka broker. > We are using Kafka broker 0.10.2.1 and consumer client version is also > 0.10.2.1 > The issue that we have faced is that, after rebalancing, some of the > partitions gets consumed by 2 instances within a consumer group, leading to > duplication of the entire partition data. Both the instances continue to read > until the next rebalancing, or the restart of those clients. > It looks like that a particular consumer goes on fetching the data from a > partition, but the broker is not able to identify this "stale" consumer > instance. > During this time, we also see the underreplicated partition metrics spiking. > We have hit this twice in production. Please look at it the earliest. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-6681) Two instances of kafka consumer reading the same partition within a consumer group
[ https://issues.apache.org/jira/browse/KAFKA-6681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16407378#comment-16407378 ] Narayan Periwal commented on KAFKA-6681: [~yuzhih...@gmail.com] The partition read by the two consumers did not appear in any of the kafka broker logs. Partition number 1 of the topic renderCpmAms1 was being consumed by two consumer instances within a consumer group The following are the log lines in the server logs related to this topic {noformat} [2018-03-14 05:01:53,456] INFO Partition [renderCpmAms1,10] on broker 1: Shrinking ISR for partition [renderCpmAms1,10] from 1,2,3 to 1,3 (kafka.cluster.Partition) [2018-03-14 05:02:14,122] INFO Partition [renderCpmAms1,10] on broker 1: Expanding ISR for partition renderCpmAms1-10 from 1,3 to 1,3,2 (kafka.cluster.Partition) [2018-03-14 05:01:52,376] INFO Partition [renderCpmAms1,9] on broker 15: Shrinking ISR for partition [renderCpmAms1,9] from 2,15,1 to 15,1 (kafka.cluster.Partition) [2018-03-14 05:02:14,193] INFO Partition [renderCpmAms1,9] on broker 15: Expanding ISR for partition renderCpmAms1-9 from 15,1 to 15,1,2 (kafka.cluster.Partition) [2018-03-14 05:02:17,510] INFO Partition [renderCpmAms1,11] on broker 2: Shrinking ISR for partition [renderCpmAms1,11] from 2,4,3 to 2,4 (kafka.cluster.Partition) [2018-03-14 05:02:17,530] INFO Partition [renderCpmAms1,11] on broker 2: Cached zkVersion [171] not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition) {noformat} Wondering, if this error or log line has any corelation with the issue. {noformat} [2018-03-14 05:02:17,530] INFO Partition [renderCpmAms1,11] on broker 2: Cached zkVersion [171] not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition) {noformat} > Two instances of kafka consumer reading the same partition within a consumer > group > -- > > Key: KAFKA-6681 > URL: https://issues.apache.org/jira/browse/KAFKA-6681 > Project: Kafka > Issue Type: Bug > Components: consumer >Affects Versions: 0.10.2.1 >Reporter: Narayan Periwal >Priority: Critical > Attachments: server-1.log, server-2.log > > > We have seen this issue with the Kafka consumer, the new library that got > introduced in 0.9 > With this new client, the group management is done by kafka coordinator, > which is one of the kafka broker. > We are using Kafka broker 0.10.2.1 and consumer client version is also > 0.10.2.1 > The issue that we have faced is that, after rebalancing, some of the > partitions gets consumed by 2 instances within a consumer group, leading to > duplication of the entire partition data. Both the instances continue to read > until the next rebalancing, or the restart of those clients. > It looks like that a particular consumer goes on fetching the data from a > partition, but the broker is not able to identify this "stale" consumer > instance. > During this time, we also see the underreplicated partition metrics spiking. > We have hit this twice in production. Please look at it the earliest. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-6681) Two instances of kafka consumer reading the same partition within a consumer group
[ https://issues.apache.org/jira/browse/KAFKA-6681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16406681#comment-16406681 ] Narayan Periwal commented on KAFKA-6681: [~tedyu], Attached the server side logs. Could not retrieve the consumer side logs as it has hit the retention, I am again trying to reproduce this in our QA setup. See if the server side logs is of any help. The server side logs correspond to those node in which the under replicated metrics spiked during this time There is no logs in the controller.log file during this time. > Two instances of kafka consumer reading the same partition within a consumer > group > -- > > Key: KAFKA-6681 > URL: https://issues.apache.org/jira/browse/KAFKA-6681 > Project: Kafka > Issue Type: Bug > Components: consumer >Affects Versions: 0.10.2.1 >Reporter: Narayan Periwal >Priority: Critical > Attachments: server-1.log, server-2.log > > > We have seen this issue with the Kafka consumer, the new library that got > introduced in 0.9 > With this new client, the group management is done by kafka coordinator, > which is one of the kafka broker. > We are using Kafka broker 0.10.2.1 and consumer client version is also > 0.10.2.1 > The issue that we have faced is that, after rebalancing, some of the > partitions gets consumed by 2 instances within a consumer group, leading to > duplication of the entire partition data. Both the instances continue to read > until the next rebalancing, or the restart of those clients. > It looks like that a particular consumer goes on fetching the data from a > partition, but the broker is not able to identify this "stale" consumer > instance. > During this time, we also see the underreplicated partition metrics spiking. > We have hit this twice in production. Please look at it the earliest. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-6681) Two instances of kafka consumer reading the same partition within a consumer group
[ https://issues.apache.org/jira/browse/KAFKA-6681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Narayan Periwal updated KAFKA-6681: --- Attachment: server-2.log server-1.log > Two instances of kafka consumer reading the same partition within a consumer > group > -- > > Key: KAFKA-6681 > URL: https://issues.apache.org/jira/browse/KAFKA-6681 > Project: Kafka > Issue Type: Bug > Components: consumer >Affects Versions: 0.10.2.1 >Reporter: Narayan Periwal >Priority: Critical > Attachments: server-1.log, server-2.log > > > We have seen this issue with the Kafka consumer, the new library that got > introduced in 0.9 > With this new client, the group management is done by kafka coordinator, > which is one of the kafka broker. > We are using Kafka broker 0.10.2.1 and consumer client version is also > 0.10.2.1 > The issue that we have faced is that, after rebalancing, some of the > partitions gets consumed by 2 instances within a consumer group, leading to > duplication of the entire partition data. Both the instances continue to read > until the next rebalancing, or the restart of those clients. > It looks like that a particular consumer goes on fetching the data from a > partition, but the broker is not able to identify this "stale" consumer > instance. > During this time, we also see the underreplicated partition metrics spiking. > We have hit this twice in production. Please look at it the earliest. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-6681) Two instances of kafka consumer reading the same partition within a consumer group
[ https://issues.apache.org/jira/browse/KAFKA-6681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Narayan Periwal updated KAFKA-6681: --- Description: We have seen this issue with the Kafka consumer, the new library that got introduced in 0.9 With this new client, the group management is done by kafka coordinator, which is one of the kafka broker. We are using Kafka broker 0.10.2.1 and consumer client version is also 0.10.2.1 The issue that we have faced is that, after rebalancing, some of the partitions gets consumed by 2 instances within a consumer group, leading to duplication of the entire partition data. Both the instances continue to read until the next rebalancing, or the restart of those clients. It looks like that a particular consumer goes on fetching the data from a partition, but the broker is not able to identify this "stale" consumer instance. During this time, we also see the underreplicated partition metrics spiking. We have hit this twice in production. Please look at it the earliest. was: We have seen this issue with the Kafka consumer, the new library that got introduced in 0.9 With this new client, the group management is done by kafka coordinator, which is one of the kafka broker. We are using Kafka broker 0.10.2.1 and consumer client version is also 0.10.2.1 The issue that we have faced is that, after rebalancing, some of the partitions gets consumed by 2 instances within a consumer group, leading to duplication of the entire partition data. They continue to read until the next rebalancing, or the restart of those clients. It looks like that a particular consumer goes on fetching the data from a partition, but the broker is not able to identify this "stale" consumer instance. During this time, we also see the underreplicated partition metrics spiking. We have hit this twice in production. Please look at it the earliest. > Two instances of kafka consumer reading the same partition within a consumer > group > -- > > Key: KAFKA-6681 > URL: https://issues.apache.org/jira/browse/KAFKA-6681 > Project: Kafka > Issue Type: Bug > Components: consumer >Affects Versions: 0.10.2.1 >Reporter: Narayan Periwal >Priority: Critical > > We have seen this issue with the Kafka consumer, the new library that got > introduced in 0.9 > With this new client, the group management is done by kafka coordinator, > which is one of the kafka broker. > We are using Kafka broker 0.10.2.1 and consumer client version is also > 0.10.2.1 > The issue that we have faced is that, after rebalancing, some of the > partitions gets consumed by 2 instances within a consumer group, leading to > duplication of the entire partition data. Both the instances continue to read > until the next rebalancing, or the restart of those clients. > It looks like that a particular consumer goes on fetching the data from a > partition, but the broker is not able to identify this "stale" consumer > instance. > During this time, we also see the underreplicated partition metrics spiking. > We have hit this twice in production. Please look at it the earliest. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-6681) Two instances of kafka consumer reading the same partition within a consumer group
Narayan Periwal created KAFKA-6681: -- Summary: Two instances of kafka consumer reading the same partition within a consumer group Key: KAFKA-6681 URL: https://issues.apache.org/jira/browse/KAFKA-6681 Project: Kafka Issue Type: Bug Components: consumer Affects Versions: 0.10.2.1 Reporter: Narayan Periwal We have seen this issue with the Kafka consumer, the new library that got introduced in 0.9 With this new client, the group management is done by kafka coordinator, which is one of the kafka broker. We are using Kafka broker 0.10.2.1 and consumer client version is also 0.10.2.1 The issue that we have faced is that, after rebalancing, some of the partitions gets consumed by 2 instances within a consumer group, leading to duplication of the entire partition data. They continue to read until the next rebalancing, or the restart of those clients. It looks like that a particular consumer goes on fetching the data from a partition, but the broker is not able to identify this "stale" consumer instance. During this time, we also see the underreplicated partition metrics spiking. We have hit this twice in production. Please look at it the earliest. -- This message was sent by Atlassian JIRA (v7.6.3#76005)