[jira] [Commented] (KAFKA-6178) Broker is listed as only ISR for all partitions it is leader of

2019-03-15 Thread Narayan Periwal (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-6178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16793461#comment-16793461
 ] 

Narayan Periwal commented on KAFKA-6178:


We are also seeing the same issue in our kafka cluster. We are using the 
version 0.10.2.1

 

> Broker is listed as only ISR for all partitions it is leader of
> ---
>
> Key: KAFKA-6178
> URL: https://issues.apache.org/jira/browse/KAFKA-6178
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.1.0
> Environment: Windows
>Reporter: AS
>Priority: Major
>  Labels: windows
> Attachments: KafkaServiceOutput.txt, log-cleaner.log, server.log
>
>
> We're running a 15 broker cluster on windows machines, and one of the 
> brokers, 10, is the only ISR on all partitions that it is the leader of. On 
> partitions where it isn't the leader, it seems to follow the leadeer fine. 
> This is an excerpt from 'describe':
> Topic: ClientQosCombined  Partition: 458  Leader: 10  Replicas: 
> 10,6,7,8,9,0,1   Isr: 10
> Topic: ClientQosCombined  Partition: 459  Leader: 11  Replicas: 
> 11,7,8,9,0,1,10 Isr: 0,10,1,9,7,11,8
> The server.log files all seem to be pretty standard, and the only indication 
> of this issue is the following pattern that often repeats:
> 2017-11-06 20:28:25,207 [INFO] kafka.cluster.Partition 
> [kafka-request-handler-8:] - Partition [ClientQosCombined,398] on broker 10: 
> Expanding ISR for partition [ClientQosCombined,398] from 10 to 5,10
> 2017-11-06 20:28:39,382 [INFO] kafka.cluster.Partition [kafka-scheduler-1:] - 
> Partition [ClientQosCombined,398] on broker 10: Shrinking ISR for partition 
> [ClientQosCombined,398] from 5,10 to 10
> For each of the partitions that 10 leads. This is the only topic that we 
> currently have in our cluster. The __consumer_offsets topic seems completely 
> normal in terms of isr counts. The controller is broker 5, which is cycling 
> through attempting and failing to trigger leader elections on broker 10 led 
> partitions. From the controller log in broker 5:
> 2017-11-06 20:45:04,857 [INFO] kafka.controller.KafkaController 
> [kafka-scheduler-0:] - [Controller 5]: Starting preferred replica leader 
> election for partitions [ClientQosCombined,375]
> 2017-11-06 20:45:04,857 [INFO] kafka.controller.PartitionStateMachine 
> [kafka-scheduler-0:] - [Partition state machine on Controller 5]: Invoking 
> state change to OnlinePartition for partitions [ClientQosCombined,375]
> 2017-11-06 20:45:04,857 [INFO] 
> kafka.controller.PreferredReplicaPartitionLeaderSelector [kafka-scheduler-0:] 
> - [PreferredReplicaPartitionLeaderSelector]: Current leader 10 for partition 
> [ClientQosCombined,375] is not the preferred replica. Trigerring preferred 
> replica leader election
> 2017-11-06 20:45:04,857 [WARN] kafka.controller.KafkaController 
> [kafka-scheduler-0:] - [Controller 5]: Partition [ClientQosCombined,375] 
> failed to complete preferred replica leader election. Leader is 10
> I've also attached the logs and output from broker 10. Any idea what's wrong 
> here? 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-6681) Two instances of kafka consumer reading the same partition within a consumer group

2018-11-29 Thread Narayan Periwal (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-6681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16702913#comment-16702913
 ] 

Narayan Periwal commented on KAFKA-6681:


Thanks for letting us know [~lyn610]

> Two instances of kafka consumer reading the same partition within a consumer 
> group
> --
>
> Key: KAFKA-6681
> URL: https://issues.apache.org/jira/browse/KAFKA-6681
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer
>Affects Versions: 0.10.2.1
>Reporter: Narayan Periwal
>Priority: Critical
> Attachments: server-1.log, server-2.log
>
>
> We have seen this issue with the Kafka consumer, the new library that got 
> introduced in 0.9
> With this new client, the group management is done by kafka coordinator, 
> which is one of the kafka broker.
> We are using Kafka broker 0.10.2.1 and consumer client version is also 
> 0.10.2.1 
> The issue that we have faced is that, after rebalancing, some of the 
> partitions gets consumed by 2 instances within a consumer group, leading to 
> duplication of the entire partition data. Both the instances continue to read 
> until the next rebalancing, or the restart of those clients. 
> It looks like that a particular consumer goes on fetching the data from a 
> partition, but the broker is not able to identify this "stale" consumer 
> instance. 
> We have hit this twice in production. Please look at it the earliest. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7026) Sticky assignor could assign a partition to multiple consumers (KIP-341)

2018-10-05 Thread Narayan Periwal (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16639684#comment-16639684
 ] 

Narayan Periwal commented on KAFKA-7026:


Sure [~steven.aerts]. I have already created a ticket related to it - 
https://issues.apache.org/jira/browse/KAFKA-6681

> Sticky assignor could assign a partition to multiple consumers (KIP-341)
> 
>
> Key: KAFKA-7026
> URL: https://issues.apache.org/jira/browse/KAFKA-7026
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Reporter: Vahid Hashemian
>Assignee: Vahid Hashemian
>Priority: Major
>  Labels: kip
> Fix For: 2.2.0
>
>
> In the following scenario sticky assignor assigns a topic partition to two 
> consumers in the group:
>  # Create a topic {{test}} with a single partition
>  # Start consumer {{c1}} in group {{sticky-group}} ({{c1}} becomes group 
> leader and gets {{test-0}})
>  # Start consumer {{c2}}  in group {{sticky-group}} ({{c1}} holds onto 
> {{test-0}}, {{c2}} does not get any partition) 
>  # Pause {{c1}} (e.g. using Java debugger) ({{c2}} becomes leader and takes 
> over {{test-0}}, {{c1}} leaves the group)
>  # Resume {{c1}}
> At this point both {{c1}} and {{c2}} will have {{test-0}} assigned to them.
>  
> The reason is {{c1}} still has kept its previous assignment ({{test-0}}) from 
> the last assignment it received from the leader (itself) and did not get the 
> next round of assignments (when {{c2}} became leader) because it was paused. 
> Both {{c1}} and {{c2}} enter the rebalance supplying {{test-0}} as their 
> existing assignment. The sticky assignor code does not currently check and 
> avoid this duplication.
>  
> Note: This issue was originally reported on 
> [StackOverflow|https://stackoverflow.com/questions/50761842/kafka-stickyassignor-breaking-delivery-to-single-consumer-in-the-group].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7026) Sticky assignor could assign a partition to multiple consumers (KIP-341)

2018-10-03 Thread Narayan Periwal (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16637717#comment-16637717
 ] 

Narayan Periwal commented on KAFKA-7026:


[~steven.aerts] [~vahid] As I mentioned in KAFKA-6681, we are seeing this issue 
with RangeAssignor. So I do not think this fix is going to solve our issue. 

We are using a bit old version of Kafka brokers (0.10.2.1). Don't know if 
upgrading will fix this. 

> Sticky assignor could assign a partition to multiple consumers (KIP-341)
> 
>
> Key: KAFKA-7026
> URL: https://issues.apache.org/jira/browse/KAFKA-7026
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Reporter: Vahid Hashemian
>Assignee: Vahid Hashemian
>Priority: Major
>  Labels: kip
> Fix For: 2.2.0
>
>
> In the following scenario sticky assignor assigns a topic partition to two 
> consumers in the group:
>  # Create a topic {{test}} with a single partition
>  # Start consumer {{c1}} in group {{sticky-group}} ({{c1}} becomes group 
> leader and gets {{test-0}})
>  # Start consumer {{c2}}  in group {{sticky-group}} ({{c1}} holds onto 
> {{test-0}}, {{c2}} does not get any partition) 
>  # Pause {{c1}} (e.g. using Java debugger) ({{c2}} becomes leader and takes 
> over {{test-0}}, {{c1}} leaves the group)
>  # Resume {{c1}}
> At this point both {{c1}} and {{c2}} will have {{test-0}} assigned to them.
>  
> The reason is {{c1}} still has kept its previous assignment ({{test-0}}) from 
> the last assignment it received from the leader (itself) and did not get the 
> next round of assignments (when {{c2}} became leader) because it was paused. 
> Both {{c1}} and {{c2}} enter the rebalance supplying {{test-0}} as their 
> existing assignment. The sticky assignor code does not currently check and 
> avoid this duplication.
>  
> Note: This issue was originally reported on 
> [StackOverflow|https://stackoverflow.com/questions/50761842/kafka-stickyassignor-breaking-delivery-to-single-consumer-in-the-group].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (KAFKA-7026) Sticky assignor could assign a partition to multiple consumers (KIP-341)

2018-10-03 Thread Narayan Periwal (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16636871#comment-16636871
 ] 

Narayan Periwal edited comment on KAFKA-7026 at 10/3/18 12:05 PM:
--

[~steven.aerts], can the issue of "consumer of the process not losing it's 
subscription" come with RangeAssignor?


was (Author: nperiwal):
[~steven.aerts], can the issue of "consumer of the process not losing it's 
subscription" come with RangeAssignor? The reason why I ask this is that 
KAFKA-6717 raised by [~Yuancheng] is related to _StickyAssignor_

> Sticky assignor could assign a partition to multiple consumers (KIP-341)
> 
>
> Key: KAFKA-7026
> URL: https://issues.apache.org/jira/browse/KAFKA-7026
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Reporter: Vahid Hashemian
>Assignee: Vahid Hashemian
>Priority: Major
>  Labels: kip
> Fix For: 2.2.0
>
>
> In the following scenario sticky assignor assigns a topic partition to two 
> consumers in the group:
>  # Create a topic {{test}} with a single partition
>  # Start consumer {{c1}} in group {{sticky-group}} ({{c1}} becomes group 
> leader and gets {{test-0}})
>  # Start consumer {{c2}}  in group {{sticky-group}} ({{c1}} holds onto 
> {{test-0}}, {{c2}} does not get any partition) 
>  # Pause {{c1}} (e.g. using Java debugger) ({{c2}} becomes leader and takes 
> over {{test-0}}, {{c1}} leaves the group)
>  # Resume {{c1}}
> At this point both {{c1}} and {{c2}} will have {{test-0}} assigned to them.
>  
> The reason is {{c1}} still has kept its previous assignment ({{test-0}}) from 
> the last assignment it received from the leader (itself) and did not get the 
> next round of assignments (when {{c2}} became leader) because it was paused. 
> Both {{c1}} and {{c2}} enter the rebalance supplying {{test-0}} as their 
> existing assignment. The sticky assignor code does not currently check and 
> avoid this duplication.
>  
> Note: This issue was originally reported on 
> [StackOverflow|https://stackoverflow.com/questions/50761842/kafka-stickyassignor-breaking-delivery-to-single-consumer-in-the-group].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (KAFKA-7026) Sticky assignor could assign a partition to multiple consumers (KIP-341)

2018-10-03 Thread Narayan Periwal (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16636871#comment-16636871
 ] 

Narayan Periwal edited comment on KAFKA-7026 at 10/3/18 12:04 PM:
--

[~steven.aerts], can the issue of "consumer of the process not losing it's 
subscription" come with RangeAssignor? The reason why I ask this is that 
KAFKA-6717 raised by [~Yuancheng] is related to _StickyAssignor_


was (Author: nperiwal):
[~steven.aerts],  can the issue of "consumer of the process not losing it's 
subscription" come with RangeAssignor?

> Sticky assignor could assign a partition to multiple consumers (KIP-341)
> 
>
> Key: KAFKA-7026
> URL: https://issues.apache.org/jira/browse/KAFKA-7026
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Reporter: Vahid Hashemian
>Assignee: Vahid Hashemian
>Priority: Major
>  Labels: kip
> Fix For: 2.2.0
>
>
> In the following scenario sticky assignor assigns a topic partition to two 
> consumers in the group:
>  # Create a topic {{test}} with a single partition
>  # Start consumer {{c1}} in group {{sticky-group}} ({{c1}} becomes group 
> leader and gets {{test-0}})
>  # Start consumer {{c2}}  in group {{sticky-group}} ({{c1}} holds onto 
> {{test-0}}, {{c2}} does not get any partition) 
>  # Pause {{c1}} (e.g. using Java debugger) ({{c2}} becomes leader and takes 
> over {{test-0}}, {{c1}} leaves the group)
>  # Resume {{c1}}
> At this point both {{c1}} and {{c2}} will have {{test-0}} assigned to them.
>  
> The reason is {{c1}} still has kept its previous assignment ({{test-0}}) from 
> the last assignment it received from the leader (itself) and did not get the 
> next round of assignments (when {{c2}} became leader) because it was paused. 
> Both {{c1}} and {{c2}} enter the rebalance supplying {{test-0}} as their 
> existing assignment. The sticky assignor code does not currently check and 
> avoid this duplication.
>  
> Note: This issue was originally reported on 
> [StackOverflow|https://stackoverflow.com/questions/50761842/kafka-stickyassignor-breaking-delivery-to-single-consumer-in-the-group].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7026) Sticky assignor could assign a partition to multiple consumers (KIP-341)

2018-10-03 Thread Narayan Periwal (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16636871#comment-16636871
 ] 

Narayan Periwal commented on KAFKA-7026:


[~steven.aerts],  can the issue of "consumer of the process not losing it's 
subscription" come with RangeAssignor?

> Sticky assignor could assign a partition to multiple consumers (KIP-341)
> 
>
> Key: KAFKA-7026
> URL: https://issues.apache.org/jira/browse/KAFKA-7026
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Reporter: Vahid Hashemian
>Assignee: Vahid Hashemian
>Priority: Major
>  Labels: kip
> Fix For: 2.2.0
>
>
> In the following scenario sticky assignor assigns a topic partition to two 
> consumers in the group:
>  # Create a topic {{test}} with a single partition
>  # Start consumer {{c1}} in group {{sticky-group}} ({{c1}} becomes group 
> leader and gets {{test-0}})
>  # Start consumer {{c2}}  in group {{sticky-group}} ({{c1}} holds onto 
> {{test-0}}, {{c2}} does not get any partition) 
>  # Pause {{c1}} (e.g. using Java debugger) ({{c2}} becomes leader and takes 
> over {{test-0}}, {{c1}} leaves the group)
>  # Resume {{c1}}
> At this point both {{c1}} and {{c2}} will have {{test-0}} assigned to them.
>  
> The reason is {{c1}} still has kept its previous assignment ({{test-0}}) from 
> the last assignment it received from the leader (itself) and did not get the 
> next round of assignments (when {{c2}} became leader) because it was paused. 
> Both {{c1}} and {{c2}} enter the rebalance supplying {{test-0}} as their 
> existing assignment. The sticky assignor code does not currently check and 
> avoid this duplication.
>  
> Note: This issue was originally reported on 
> [StackOverflow|https://stackoverflow.com/questions/50761842/kafka-stickyassignor-breaking-delivery-to-single-consumer-in-the-group].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (KAFKA-7026) Sticky assignor could assign a partition to multiple consumers

2018-06-26 Thread Narayan Periwal (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16524543#comment-16524543
 ] 

Narayan Periwal edited comment on KAFKA-7026 at 6/27/18 4:11 AM:
-

[~vahid], unfortunately we are not able to reproduce this in our QA setup. Only 
co-relation that we have seen is this seems to happen when there is spike in 
the number of under replicated partitions in the kafka cluster. 

One more thing is when this issue happens, we have seen our consumers not 
processing data for more than "max.poll.interval.ms", thus the consumer.poll() 
call is not invoked for "max.poll.interval.ms", which means the consumer is 
considered failed and the group will rebalance in order to reassign the 
partitions to another member. Looks like, the first consumer, after 
recovery(able to process now), is still getting data from the earlier assigned 
partition, leading to this issue.


was (Author: nperiwal):
[~vahid], unfortunately we are not able to reproduce this in our QA setup. Only 
co-relation that we have seen is this seems to happen when there is spike in 
the number of under replicated partitions in the kafka cluster. 

One more thing is when this issue happens, we have seen our consumers not 
processing data for more than "max.poll.interval.ms", thus the consumer.poll() 
call is not invoked for "max.poll.interval.ms", which means the consumer is 
considered failed and the group will rebalance in order to reassign the 
partitions to another member. Looks like, the old consumer, after recovery, is 
still getting data from the earlier assigned partition, leading to this issue.

> Sticky assignor could assign a partition to multiple consumers
> --
>
> Key: KAFKA-7026
> URL: https://issues.apache.org/jira/browse/KAFKA-7026
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Reporter: Vahid Hashemian
>Assignee: Vahid Hashemian
>Priority: Major
> Fix For: 2.1.0
>
>
> In the following scenario sticky assignor assigns a topic partition to two 
> consumers in the group:
>  # Create a topic {{test}} with a single partition
>  # Start consumer {{c1}} in group {{sticky-group}} ({{c1}} becomes group 
> leader and gets {{test-0}})
>  # Start consumer {{c2}}  in group {{sticky-group}} ({{c1}} holds onto 
> {{test-0}}, {{c2}} does not get any partition) 
>  # Pause {{c1}} (e.g. using Java debugger) ({{c2}} becomes leader and takes 
> over {{test-0}}, {{c1}} leaves the group)
>  # Resume {{c1}}
> At this point both {{c1}} and {{c2}} will have {{test-0}} assigned to them.
>  
> The reason is {{c1}} still has kept its previous assignment ({{test-0}}) from 
> the last assignment it received from the leader (itself) and did not get the 
> next round of assignments (when {{c2}} became leader) because it was paused. 
> Both {{c1}} and {{c2}} enter the rebalance supplying {{test-0}} as their 
> existing assignment. The sticky assignor code does not currently check and 
> avoid this duplication.
>  
> Note: This issue was originally reported on 
> [StackOverflow|https://stackoverflow.com/questions/50761842/kafka-stickyassignor-breaking-delivery-to-single-consumer-in-the-group].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (KAFKA-7026) Sticky assignor could assign a partition to multiple consumers

2018-06-26 Thread Narayan Periwal (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16524543#comment-16524543
 ] 

Narayan Periwal edited comment on KAFKA-7026 at 6/27/18 4:10 AM:
-

[~vahid], unfortunately we are not able to reproduce this in our QA setup. Only 
co-relation that we have seen is this seems to happen when there is spike in 
the number of under replicated partitions in the kafka cluster. 

One more thing is when this issue happens, we have seen our consumers not 
processing data for more than "max.poll.interval.ms", thus the consumer.poll() 
call is not invoked for "max.poll.interval.ms", which means the consumer is 
considered failed and the group will rebalance in order to reassign the 
partitions to another member. Looks like, the old consumer, after recovery, is 
still getting data from the earlier assigned partition, leading to this issue.


was (Author: nperiwal):
[~vahid], unfortunately we are not able to reproduce this in our QA setup. Only 
co-relation that we have seen is this seems to happen when there is spike in 
the number of under replicated partitions in the kafka cluster. 

> Sticky assignor could assign a partition to multiple consumers
> --
>
> Key: KAFKA-7026
> URL: https://issues.apache.org/jira/browse/KAFKA-7026
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Reporter: Vahid Hashemian
>Assignee: Vahid Hashemian
>Priority: Major
> Fix For: 2.1.0
>
>
> In the following scenario sticky assignor assigns a topic partition to two 
> consumers in the group:
>  # Create a topic {{test}} with a single partition
>  # Start consumer {{c1}} in group {{sticky-group}} ({{c1}} becomes group 
> leader and gets {{test-0}})
>  # Start consumer {{c2}}  in group {{sticky-group}} ({{c1}} holds onto 
> {{test-0}}, {{c2}} does not get any partition) 
>  # Pause {{c1}} (e.g. using Java debugger) ({{c2}} becomes leader and takes 
> over {{test-0}}, {{c1}} leaves the group)
>  # Resume {{c1}}
> At this point both {{c1}} and {{c2}} will have {{test-0}} assigned to them.
>  
> The reason is {{c1}} still has kept its previous assignment ({{test-0}}) from 
> the last assignment it received from the leader (itself) and did not get the 
> next round of assignments (when {{c2}} became leader) because it was paused. 
> Both {{c1}} and {{c2}} enter the rebalance supplying {{test-0}} as their 
> existing assignment. The sticky assignor code does not currently check and 
> avoid this duplication.
>  
> Note: This issue was originally reported on 
> [StackOverflow|https://stackoverflow.com/questions/50761842/kafka-stickyassignor-breaking-delivery-to-single-consumer-in-the-group].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7026) Sticky assignor could assign a partition to multiple consumers

2018-06-26 Thread Narayan Periwal (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16524543#comment-16524543
 ] 

Narayan Periwal commented on KAFKA-7026:


[~vahid], unfortunately we are not able to reproduce this in our QA setup. Only 
co-relation that we have seen is this seems to happen when there is spike in 
the number of under replicated partitions in the kafka cluster. 

> Sticky assignor could assign a partition to multiple consumers
> --
>
> Key: KAFKA-7026
> URL: https://issues.apache.org/jira/browse/KAFKA-7026
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Reporter: Vahid Hashemian
>Assignee: Vahid Hashemian
>Priority: Major
> Fix For: 2.1.0
>
>
> In the following scenario sticky assignor assigns a topic partition to two 
> consumers in the group:
>  # Create a topic {{test}} with a single partition
>  # Start consumer {{c1}} in group {{sticky-group}} ({{c1}} becomes group 
> leader and gets {{test-0}})
>  # Start consumer {{c2}}  in group {{sticky-group}} ({{c1}} holds onto 
> {{test-0}}, {{c2}} does not get any partition) 
>  # Pause {{c1}} (e.g. using Java debugger) ({{c2}} becomes leader and takes 
> over {{test-0}}, {{c1}} leaves the group)
>  # Resume {{c1}}
> At this point both {{c1}} and {{c2}} will have {{test-0}} assigned to them.
>  
> The reason is {{c1}} still has kept its previous assignment ({{test-0}}) from 
> the last assignment it received from the leader (itself) and did not get the 
> next round of assignments (when {{c2}} became leader) because it was paused. 
> Both {{c1}} and {{c2}} enter the rebalance supplying {{test-0}} as their 
> existing assignment. The sticky assignor code does not currently check and 
> avoid this duplication.
>  
> Note: This issue was originally reported on 
> [StackOverflow|https://stackoverflow.com/questions/50761842/kafka-stickyassignor-breaking-delivery-to-single-consumer-in-the-group].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7026) Sticky assignor could assign a partition to multiple consumers

2018-06-26 Thread Narayan Periwal (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16524517#comment-16524517
 ] 

Narayan Periwal commented on KAFKA-7026:


[~vahid], Agree that this may not be the actual cause of the issue. But we have 
seen this occuring  multiple times in our production setup, with consumers 
continuing to consume the same partition unless a manual restart is triggered. 
So, it could be due to some other issue.

[~steven.aerts], we are using custom checkpointing in zookeeper, so the 
kafka-consumer-groups.sh script to describe the consumer group does not work 
for us. 
However, we have a mechanism to detect multiple consumers when consuming from 
the same partition. I am sharing the distribution of one such case. Topic - 
test, consumer group - group1, consumers - c1,c2,c3,c4,c5.
Partition 3,4,5 of this topic were being consumed by multiple consumer 
instances.
{noformat}
group: group1, topic: test, partition: 0, consumer: c2
group: group1, topic: test, partition: 1, consumer: c4
group: group1, topic: test, partition: 2, consumer: c4
group: group1, topic: test, partition: 3, consumer: c3,c4
group: group1, topic: test, partition: 4, consumer: c3,c5
group: group1, topic: test, partition: 5, consumer: c3,c5
group: group1, topic: test, partition: 6, consumer: c5
group: group1, topic: test, partition: 7, consumer: c1
group: group1, topic: test, partition: 8, consumer: c1
group: group1, topic: test, partition: 9, consumer: c1
{noformat} 

> Sticky assignor could assign a partition to multiple consumers
> --
>
> Key: KAFKA-7026
> URL: https://issues.apache.org/jira/browse/KAFKA-7026
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Reporter: Vahid Hashemian
>Assignee: Vahid Hashemian
>Priority: Major
> Fix For: 2.1.0
>
>
> In the following scenario sticky assignor assigns a topic partition to two 
> consumers in the group:
>  # Create a topic {{test}} with a single partition
>  # Start consumer {{c1}} in group {{sticky-group}} ({{c1}} becomes group 
> leader and gets {{test-0}})
>  # Start consumer {{c2}}  in group {{sticky-group}} ({{c1}} holds onto 
> {{test-0}}, {{c2}} does not get any partition) 
>  # Pause {{c1}} (e.g. using Java debugger) ({{c2}} becomes leader and takes 
> over {{test-0}}, {{c1}} leaves the group)
>  # Resume {{c1}}
> At this point both {{c1}} and {{c2}} will have {{test-0}} assigned to them.
>  
> The reason is {{c1}} still has kept its previous assignment ({{test-0}}) from 
> the last assignment it received from the leader (itself) and did not get the 
> next round of assignments (when {{c2}} became leader) because it was paused. 
> Both {{c1}} and {{c2}} enter the rebalance supplying {{test-0}} as their 
> existing assignment. The sticky assignor code does not currently check and 
> avoid this duplication.
>  
> Note: This issue was originally reported on 
> [StackOverflow|https://stackoverflow.com/questions/50761842/kafka-stickyassignor-breaking-delivery-to-single-consumer-in-the-group].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (KAFKA-6681) Two instances of kafka consumer reading the same partition within a consumer group

2018-06-25 Thread Narayan Periwal (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-6681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16523095#comment-16523095
 ] 

Narayan Periwal edited comment on KAFKA-6681 at 6/26/18 4:02 AM:
-

[~steven.aerts], We are using RangeAssignor(which is the default), and not the 
Sticky Assignor which KAFKA-7026 mentions of.

Some observation is that there is spike in the number of UnderReplicated 
partition, after which multiple consumer instances start consuming the same 
topic partition

Our Kafka brokers and consumer both are in version 0.10.2.1


was (Author: nperiwal):
[~steven.aerts], We are using RangeAssignor(which is the default), and not the 
Sticky Assignor which KAFKA-7026 mentions of.

> Two instances of kafka consumer reading the same partition within a consumer 
> group
> --
>
> Key: KAFKA-6681
> URL: https://issues.apache.org/jira/browse/KAFKA-6681
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer
>Affects Versions: 0.10.2.1
>Reporter: Narayan Periwal
>Priority: Critical
> Attachments: server-1.log, server-2.log
>
>
> We have seen this issue with the Kafka consumer, the new library that got 
> introduced in 0.9
> With this new client, the group management is done by kafka coordinator, 
> which is one of the kafka broker.
> We are using Kafka broker 0.10.2.1 and consumer client version is also 
> 0.10.2.1 
> The issue that we have faced is that, after rebalancing, some of the 
> partitions gets consumed by 2 instances within a consumer group, leading to 
> duplication of the entire partition data. Both the instances continue to read 
> until the next rebalancing, or the restart of those clients. 
> It looks like that a particular consumer goes on fetching the data from a 
> partition, but the broker is not able to identify this "stale" consumer 
> instance. 
> We have hit this twice in production. Please look at it the earliest. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (KAFKA-7026) Sticky assignor could assign a partition to multiple consumers

2018-06-25 Thread Narayan Periwal (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16523146#comment-16523146
 ] 

Narayan Periwal edited comment on KAFKA-7026 at 6/26/18 4:01 AM:
-

[~vahid], Can this issue be there with RangeAssignor as well, because we have 
seen this issue occuring multiple time in our Kafka consumer (0.10.2.1) with 
RangeAssignor. Jira - KAFKA-6681.

Some observation is that there is spike in the number of UnderReplicated 
partition in our Kafka cluster, after which multiple consumer instances start 
consuming the same topic partition.

Kafka broker is also at version 0.10.2.1


was (Author: nperiwal):
[~vahid], Can this issue be there with RangeAssignor as well, because we have 
seen this issue occuring multiple time in our Kafka consumer (0.10.2.1) with 
RangeAssignor. Jira - KAFKA-6681.

Some observation is that there is spike in the number of UnderReplicated 
partition, after which multiple consumer instances start consuming the same 
topic partition

> Sticky assignor could assign a partition to multiple consumers
> --
>
> Key: KAFKA-7026
> URL: https://issues.apache.org/jira/browse/KAFKA-7026
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Reporter: Vahid Hashemian
>Assignee: Vahid Hashemian
>Priority: Major
>
> In the following scenario sticky assignor assigns a topic partition to two 
> consumers in the group:
>  # Create a topic {{test}} with a single partition
>  # Start consumer {{c1}} in group {{sticky-group}} ({{c1}} becomes group 
> leader and gets {{test-0}})
>  # Start consumer {{c2}}  in group {{sticky-group}} ({{c1}} holds onto 
> {{test-0}}, {{c2}} does not get any partition) 
>  # Pause {{c1}} (e.g. using Java debugger) ({{c2}} becomes leader and takes 
> over {{test-0}}, {{c1}} leaves the group)
>  # Resume {{c1}}
> At this point both {{c1}} and {{c2}} will have {{test-0}} assigned to them.
>  
> The reason is {{c1}} still has kept its previous assignment ({{test-0}}) from 
> the last assignment it received from the leader (itself) and did not get the 
> next round of assignments (when {{c2}} became leader) because it was paused. 
> Both {{c1}} and {{c2}} enter the rebalance supplying {{test-0}} as their 
> existing assignment. The sticky assignor code does not currently check and 
> avoid this duplication.
>  
> Note: This issue was originally reported on 
> [StackOverflow|https://stackoverflow.com/questions/50761842/kafka-stickyassignor-breaking-delivery-to-single-consumer-in-the-group].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (KAFKA-7026) Sticky assignor could assign a partition to multiple consumers

2018-06-25 Thread Narayan Periwal (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16523146#comment-16523146
 ] 

Narayan Periwal edited comment on KAFKA-7026 at 6/26/18 3:59 AM:
-

[~vahid], Can this issue be there with RangeAssignor as well, because we have 
seen this issue occuring multiple time in our Kafka consumer (0.10.2.1) with 
RangeAssignor. Jira - KAFKA-6681.

Some observation is that there is spike in the number of UnderReplicated 
partition, after which multiple consumer instances start consuming the same 
topic partition


was (Author: nperiwal):
[~vahid], Can this issue be there with RangeAssignor as well, because we have 
seen this issue occuring multiple time in our Kafka consumer (0.10.2.1) with 
RangeAssignor. Jira - KAFKA-6681

> Sticky assignor could assign a partition to multiple consumers
> --
>
> Key: KAFKA-7026
> URL: https://issues.apache.org/jira/browse/KAFKA-7026
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Reporter: Vahid Hashemian
>Assignee: Vahid Hashemian
>Priority: Major
>
> In the following scenario sticky assignor assigns a topic partition to two 
> consumers in the group:
>  # Create a topic {{test}} with a single partition
>  # Start consumer {{c1}} in group {{sticky-group}} ({{c1}} becomes group 
> leader and gets {{test-0}})
>  # Start consumer {{c2}}  in group {{sticky-group}} ({{c1}} holds onto 
> {{test-0}}, {{c2}} does not get any partition) 
>  # Pause {{c1}} (e.g. using Java debugger) ({{c2}} becomes leader and takes 
> over {{test-0}}, {{c1}} leaves the group)
>  # Resume {{c1}}
> At this point both {{c1}} and {{c2}} will have {{test-0}} assigned to them.
>  
> The reason is {{c1}} still has kept its previous assignment ({{test-0}}) from 
> the last assignment it received from the leader (itself) and did not get the 
> next round of assignments (when {{c2}} became leader) because it was paused. 
> Both {{c1}} and {{c2}} enter the rebalance supplying {{test-0}} as their 
> existing assignment. The sticky assignor code does not currently check and 
> avoid this duplication.
>  
> Note: This issue was originally reported on 
> [StackOverflow|https://stackoverflow.com/questions/50761842/kafka-stickyassignor-breaking-delivery-to-single-consumer-in-the-group].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7026) Sticky assignor could assign a partition to multiple consumers

2018-06-25 Thread Narayan Periwal (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16523146#comment-16523146
 ] 

Narayan Periwal commented on KAFKA-7026:


[~vahid], Can this issue be there with RangeAssignor as well, because we have 
seen this issue occuring multiple time in our Kafka consumer (0.10.2.1) with 
RangeAssignor. Jira - KAFKA-6681

> Sticky assignor could assign a partition to multiple consumers
> --
>
> Key: KAFKA-7026
> URL: https://issues.apache.org/jira/browse/KAFKA-7026
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Reporter: Vahid Hashemian
>Assignee: Vahid Hashemian
>Priority: Major
>
> In the following scenario sticky assignor assigns a topic partition to two 
> consumers in the group:
>  # Create a topic {{test}} with a single partition
>  # Start consumer {{c1}} in group {{sticky-group}} ({{c1}} becomes group 
> leader and gets {{test-0}})
>  # Start consumer {{c2}}  in group {{sticky-group}} ({{c1}} holds onto 
> {{test-0}}, {{c2}} does not get any partition) 
>  # Pause {{c1}} (e.g. using Java debugger) ({{c2}} becomes leader and takes 
> over {{test-0}}, {{c1}} leaves the group)
>  # Resume {{c1}}
> At this point both {{c1}} and {{c2}} will have {{test-0}} assigned to them.
>  
> The reason is {{c1}} still has kept its previous assignment ({{test-0}}) from 
> the last assignment it received from the leader (itself) and did not get the 
> next round of assignments (when {{c2}} became leader) because it was paused. 
> Both {{c1}} and {{c2}} enter the rebalance supplying {{test-0}} as their 
> existing assignment. The sticky assignor code does not currently check and 
> avoid this duplication.
>  
> Note: This issue was originally reported on 
> [StackOverflow|https://stackoverflow.com/questions/50761842/kafka-stickyassignor-breaking-delivery-to-single-consumer-in-the-group].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-6681) Two instances of kafka consumer reading the same partition within a consumer group

2018-06-25 Thread Narayan Periwal (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-6681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16523095#comment-16523095
 ] 

Narayan Periwal commented on KAFKA-6681:


[~steven.aerts], We are using RangeAssignor(which is the default), and not the 
Sticky Assignor which KAFKA-7026 mentions of.

> Two instances of kafka consumer reading the same partition within a consumer 
> group
> --
>
> Key: KAFKA-6681
> URL: https://issues.apache.org/jira/browse/KAFKA-6681
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer
>Affects Versions: 0.10.2.1
>Reporter: Narayan Periwal
>Priority: Critical
> Attachments: server-1.log, server-2.log
>
>
> We have seen this issue with the Kafka consumer, the new library that got 
> introduced in 0.9
> With this new client, the group management is done by kafka coordinator, 
> which is one of the kafka broker.
> We are using Kafka broker 0.10.2.1 and consumer client version is also 
> 0.10.2.1 
> The issue that we have faced is that, after rebalancing, some of the 
> partitions gets consumed by 2 instances within a consumer group, leading to 
> duplication of the entire partition data. Both the instances continue to read 
> until the next rebalancing, or the restart of those clients. 
> It looks like that a particular consumer goes on fetching the data from a 
> partition, but the broker is not able to identify this "stale" consumer 
> instance. 
> We have hit this twice in production. Please look at it the earliest. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-6681) Two instances of kafka consumer reading the same partition within a consumer group

2018-04-02 Thread Narayan Periwal (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16423530#comment-16423530
 ] 

Narayan Periwal commented on KAFKA-6681:


[~yuzhih...@gmail.com], I understand your concern. 0.10.2.1 is a bit old.
The thing is we have not been able to reproduce this in our dev environment 
even with 0.10.2.1. We have hit this issue only in production 3 times.
If we are able to find a way to reproduce this in 0.10.2.1, then definitely we 
can give a try on 1.1.0. What do you suggest?

> Two instances of kafka consumer reading the same partition within a consumer 
> group
> --
>
> Key: KAFKA-6681
> URL: https://issues.apache.org/jira/browse/KAFKA-6681
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer
>Affects Versions: 0.10.2.1
>Reporter: Narayan Periwal
>Priority: Critical
> Attachments: server-1.log, server-2.log
>
>
> We have seen this issue with the Kafka consumer, the new library that got 
> introduced in 0.9
> With this new client, the group management is done by kafka coordinator, 
> which is one of the kafka broker.
> We are using Kafka broker 0.10.2.1 and consumer client version is also 
> 0.10.2.1 
> The issue that we have faced is that, after rebalancing, some of the 
> partitions gets consumed by 2 instances within a consumer group, leading to 
> duplication of the entire partition data. Both the instances continue to read 
> until the next rebalancing, or the restart of those clients. 
> It looks like that a particular consumer goes on fetching the data from a 
> partition, but the broker is not able to identify this "stale" consumer 
> instance. 
> We have hit this twice in production. Please look at it the earliest. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-6681) Two instances of kafka consumer reading the same partition within a consumer group

2018-04-02 Thread Narayan Periwal (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16422654#comment-16422654
 ] 

Narayan Periwal commented on KAFKA-6681:


[~yuzhih...@gmail.com], Thanks for looking into this problem. As you suggested, 
we will try out RoundRobinAssignor  and let you know.

> Two instances of kafka consumer reading the same partition within a consumer 
> group
> --
>
> Key: KAFKA-6681
> URL: https://issues.apache.org/jira/browse/KAFKA-6681
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer
>Affects Versions: 0.10.2.1
>Reporter: Narayan Periwal
>Priority: Critical
> Attachments: server-1.log, server-2.log
>
>
> We have seen this issue with the Kafka consumer, the new library that got 
> introduced in 0.9
> With this new client, the group management is done by kafka coordinator, 
> which is one of the kafka broker.
> We are using Kafka broker 0.10.2.1 and consumer client version is also 
> 0.10.2.1 
> The issue that we have faced is that, after rebalancing, some of the 
> partitions gets consumed by 2 instances within a consumer group, leading to 
> duplication of the entire partition data. Both the instances continue to read 
> until the next rebalancing, or the restart of those clients. 
> It looks like that a particular consumer goes on fetching the data from a 
> partition, but the broker is not able to identify this "stale" consumer 
> instance. 
> We have hit this twice in production. Please look at it the earliest. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (KAFKA-6681) Two instances of kafka consumer reading the same partition within a consumer group

2018-04-02 Thread Narayan Periwal (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16422150#comment-16422150
 ] 

Narayan Periwal edited comment on KAFKA-6681 at 4/2/18 11:23 AM:
-

[~yuzhih...@gmail.com]
I assume you are asking about this config partition.assignment.strategy. For 
this, we are using the default class that is there in the consumer configs -  
org.apache.kafka.clients.consumer.RangeAssignor 
Will this have the issue?



was (Author: nperiwal):
[~yuzhih...@gmail.com]
I assume you are asking about this config partition.assignment.strategy. For 
this, we are using the default class class 
org.apache.kafka.clients.consumer.RangeAssignor 
Will this have the issue?


> Two instances of kafka consumer reading the same partition within a consumer 
> group
> --
>
> Key: KAFKA-6681
> URL: https://issues.apache.org/jira/browse/KAFKA-6681
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer
>Affects Versions: 0.10.2.1
>Reporter: Narayan Periwal
>Priority: Critical
> Attachments: server-1.log, server-2.log
>
>
> We have seen this issue with the Kafka consumer, the new library that got 
> introduced in 0.9
> With this new client, the group management is done by kafka coordinator, 
> which is one of the kafka broker.
> We are using Kafka broker 0.10.2.1 and consumer client version is also 
> 0.10.2.1 
> The issue that we have faced is that, after rebalancing, some of the 
> partitions gets consumed by 2 instances within a consumer group, leading to 
> duplication of the entire partition data. Both the instances continue to read 
> until the next rebalancing, or the restart of those clients. 
> It looks like that a particular consumer goes on fetching the data from a 
> partition, but the broker is not able to identify this "stale" consumer 
> instance. 
> We have hit this twice in production. Please look at it the earliest. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-6681) Two instances of kafka consumer reading the same partition within a consumer group

2018-04-02 Thread Narayan Periwal (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16422150#comment-16422150
 ] 

Narayan Periwal commented on KAFKA-6681:


[~yuzhih...@gmail.com]
I assume you are asking about this config partition.assignment.strategy. For 
this, we are using the default class class 
org.apache.kafka.clients.consumer.RangeAssignor 
Will this have the issue?


> Two instances of kafka consumer reading the same partition within a consumer 
> group
> --
>
> Key: KAFKA-6681
> URL: https://issues.apache.org/jira/browse/KAFKA-6681
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer
>Affects Versions: 0.10.2.1
>Reporter: Narayan Periwal
>Priority: Critical
> Attachments: server-1.log, server-2.log
>
>
> We have seen this issue with the Kafka consumer, the new library that got 
> introduced in 0.9
> With this new client, the group management is done by kafka coordinator, 
> which is one of the kafka broker.
> We are using Kafka broker 0.10.2.1 and consumer client version is also 
> 0.10.2.1 
> The issue that we have faced is that, after rebalancing, some of the 
> partitions gets consumed by 2 instances within a consumer group, leading to 
> duplication of the entire partition data. Both the instances continue to read 
> until the next rebalancing, or the restart of those clients. 
> It looks like that a particular consumer goes on fetching the data from a 
> partition, but the broker is not able to identify this "stale" consumer 
> instance. 
> We have hit this twice in production. Please look at it the earliest. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-6681) Two instances of kafka consumer reading the same partition within a consumer group

2018-03-29 Thread Narayan Periwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Narayan Periwal updated KAFKA-6681:
---
Description: 
We have seen this issue with the Kafka consumer, the new library that got 
introduced in 0.9

With this new client, the group management is done by kafka coordinator, which 
is one of the kafka broker.

We are using Kafka broker 0.10.2.1 and consumer client version is also 0.10.2.1 

The issue that we have faced is that, after rebalancing, some of the partitions 
gets consumed by 2 instances within a consumer group, leading to duplication of 
the entire partition data. Both the instances continue to read until the next 
rebalancing, or the restart of those clients. 

It looks like that a particular consumer goes on fetching the data from a 
partition, but the broker is not able to identify this "stale" consumer 
instance. 

We have hit this twice in production. Please look at it the earliest. 

  was:
We have seen this issue with the Kafka consumer, the new library that got 
introduced in 0.9

With this new client, the group management is done by kafka coordinator, which 
is one of the kafka broker.

We are using Kafka broker 0.10.2.1 and consumer client version is also 0.10.2.1 

The issue that we have faced is that, after rebalancing, some of the partitions 
gets consumed by 2 instances within a consumer group, leading to duplication of 
the entire partition data. Both the instances continue to read until the next 
rebalancing, or the restart of those clients. 

It looks like that a particular consumer goes on fetching the data from a 
partition, but the broker is not able to identify this "stale" consumer 
instance. 

During this time, we also see the underreplicated partition metrics spiking. 

We have hit this twice in production. Please look at it the earliest. 


> Two instances of kafka consumer reading the same partition within a consumer 
> group
> --
>
> Key: KAFKA-6681
> URL: https://issues.apache.org/jira/browse/KAFKA-6681
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer
>Affects Versions: 0.10.2.1
>Reporter: Narayan Periwal
>Priority: Critical
> Attachments: server-1.log, server-2.log
>
>
> We have seen this issue with the Kafka consumer, the new library that got 
> introduced in 0.9
> With this new client, the group management is done by kafka coordinator, 
> which is one of the kafka broker.
> We are using Kafka broker 0.10.2.1 and consumer client version is also 
> 0.10.2.1 
> The issue that we have faced is that, after rebalancing, some of the 
> partitions gets consumed by 2 instances within a consumer group, leading to 
> duplication of the entire partition data. Both the instances continue to read 
> until the next rebalancing, or the restart of those clients. 
> It looks like that a particular consumer goes on fetching the data from a 
> partition, but the broker is not able to identify this "stale" consumer 
> instance. 
> We have hit this twice in production. Please look at it the earliest. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-6681) Two instances of kafka consumer reading the same partition within a consumer group

2018-03-29 Thread Narayan Periwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Narayan Periwal updated KAFKA-6681:
---
Component/s: clients

> Two instances of kafka consumer reading the same partition within a consumer 
> group
> --
>
> Key: KAFKA-6681
> URL: https://issues.apache.org/jira/browse/KAFKA-6681
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer
>Affects Versions: 0.10.2.1
>Reporter: Narayan Periwal
>Priority: Critical
> Attachments: server-1.log, server-2.log
>
>
> We have seen this issue with the Kafka consumer, the new library that got 
> introduced in 0.9
> With this new client, the group management is done by kafka coordinator, 
> which is one of the kafka broker.
> We are using Kafka broker 0.10.2.1 and consumer client version is also 
> 0.10.2.1 
> The issue that we have faced is that, after rebalancing, some of the 
> partitions gets consumed by 2 instances within a consumer group, leading to 
> duplication of the entire partition data. Both the instances continue to read 
> until the next rebalancing, or the restart of those clients. 
> It looks like that a particular consumer goes on fetching the data from a 
> partition, but the broker is not able to identify this "stale" consumer 
> instance. 
> During this time, we also see the underreplicated partition metrics spiking. 
> We have hit this twice in production. Please look at it the earliest. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (KAFKA-6681) Two instances of kafka consumer reading the same partition within a consumer group

2018-03-29 Thread Narayan Periwal (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16419345#comment-16419345
 ] 

Narayan Periwal edited comment on KAFKA-6681 at 3/29/18 4:53 PM:
-

[~yuzhih...@gmail.com],
We had one more occurence of the above issue. The topic had 1 partition and 
there were 4 consumers for it (all with the same consumer group name). 
Initially, as expected, only one if the consumer was reading from that 
partition and the others were simply doing nothing. We had an issue with our 
kafka cluster, due to which the entire cluster went down. When the cluster got 
up, after that I see all the 4 consumers reading that single partition of the 
topic, which was strange. 

For that topic, this is logs from the coordinator on the server side for that 
consumer group

{noformat}
[2018-03-27 23:06:49,113] INFO [GroupCoordinator 8]: Loading group metadata for 
testgroup with generation 63 (kafka.coordinator.GroupCoordinator)
[2018-03-27 23:06:52,687] INFO [GroupCoordinator 8]: Preparing to restabilize 
group testgroup with old generation 63 (kafka.coordinator.GroupCoordinator)
[2018-03-27 23:06:52,688] INFO [GroupCoordinator 8]: Stabilized group testgroup 
generation 64 (kafka.coordinator.GroupCoordinator)
[2018-03-27 23:06:52,916] INFO [GroupCoordinator 8]: Assignment received from 
leader for group testgroup for generation 64 
(kafka.coordinator.GroupCoordinator)
{noformat}

On the consumer side, the client-1 that was already reading that partition, on 
that we see the rebalancing getting triggered, both the callbacks 
onPartitionsRevoked and onPartitionsAssigned were invoked, while on client-2, 
none of these callbacks were invoked, however, still it started consuming the 
data from the partition, from there on. We saw the following exception in the 
client-2 logs, occuring 4 times with a gap of 1 to 2 seconds
{noformat}
27 Mar 2018 23:06:42.307 ERROR [testgroup:testopic] 
[o.a.f.s.k.KafkaConsumerWorker.run:329] - testgroup:testopic:: exception 
occurred in kafka source worker. backing off for 1000 millis
org.apache.kafka.clients.consumer.NoOffsetForPartitionException: Undefined 
offset with no reset policy for partition: testopic-0
at 
org.apache.kafka.clients.consumer.internals.Fetcher.resetOffset(Fetcher.java:375)
 ~[kafka-clients-0.10.2.1.jar:na]
at 
org.apache.kafka.clients.consumer.internals.Fetcher.updateFetchPositions(Fetcher.java:248)
 ~[kafka-clients-0.10.2.1.jar:na]
at 
org.apache.kafka.clients.consumer.KafkaConsumer.updateFetchPositions(KafkaConsumer.java:1601)
 ~[kafka-clients-0.10.2.1.jar:na]
at 
org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1034)
 ~[kafka-clients-0.10.2.1.jar:na]
at 
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:995) 
~[kafka-clients-0.10.2.1.jar:na]
at 
org.apache.flume.source.kafka.KafkaConsumerWorker.fetchNextBatch(KafkaConsumerWorker.java:350)
 ~[flume-kafka-source-1.6.0.47.jar:1.6.0.47]
at 
org.apache.flume.source.kafka.KafkaConsumerWorker.run(KafkaConsumerWorker.java:291)
 ~[flume-kafka-source-1.6.0.47.jar:1.6.0.47]
27 Mar 2018 23:06:43.743 ERROR [testgroup:testopic] 
[o.a.f.s.k.KafkaConsumerWorker.run:329] - testgroup:testopic:: exception 
occurred in kafka source worker. backing off for 1000 millis
org.apache.kafka.clients.consumer.NoOffsetForPartitionException: Undefined 
offset with no reset policy for partition: testopic-0
at 
org.apache.kafka.clients.consumer.internals.Fetcher.resetOffset(Fetcher.java:375)
 ~[kafka-clients-0.10.2.1.jar:na]
at 
org.apache.kafka.clients.consumer.internals.Fetcher.resetOffsetsIfNeeded(Fetcher.java:228)
 ~[kafka-clients-0.10.2.1.jar:na]
at 
org.apache.kafka.clients.consumer.KafkaConsumer.updateFetchPositions(KafkaConsumer.java:1591)
 ~[kafka-clients-0.10.2.1.jar:na]
at 
org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1034)
 ~[kafka-clients-0.10.2.1.jar:na]
at 
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:995) 
~[kafka-clients-0.10.2.1.jar:na]
at 
org.apache.flume.source.kafka.KafkaConsumerWorker.fetchNextBatch(KafkaConsumerWorker.java:350)
 ~[flume-kafka-source-1.6.0.47.jar:1.6.0.47]
at 
org.apache.flume.source.kafka.KafkaConsumerWorker.run(KafkaConsumerWorker.java:291)
 ~[flume-kafka-source-1.6.0.47.jar:1.6.0.47]
27 Mar 2018 23:06:44.979 ERROR [testgroup:testopic] 
[o.a.f.s.k.KafkaConsumerWorker.run:329] - testgroup:testopic:: exception 
occurred in kafka source worker. backing off for 1000 millis
org.apache.kafka.clients.consumer.NoOffsetForPartitionException: Undefined 
offset with no reset policy for partition: testopic-0
at 
org.apache.kafka.clients.consumer.internals.Fetcher.resetOffset(Fetcher.java:375)
 ~[kafka-clients-0.10.2.1.jar:na]
at 
org.apache.kafka.cli

[jira] [Commented] (KAFKA-6681) Two instances of kafka consumer reading the same partition within a consumer group

2018-03-29 Thread Narayan Periwal (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16419345#comment-16419345
 ] 

Narayan Periwal commented on KAFKA-6681:


[~yuzhih...@gmail.com],
We had one more occurence of the above issue. The topic had 1 partition and 
there were 4 consumers for it (all with the same consumer group name). 
Initially, as expected, only one if the consumer was reading from that 
partition and the others were simply doing nothing. We had an issue with our 
kafka cluster, due to which the entire cluster went down. When the cluster got 
up, after that I see all the 4 consumers reading that single partition of the 
topic, which was strange. 

For that topic, this is logs from the coordinator on the server side for that 
consumer group

{noformat}
[2018-03-27 23:06:49,113] INFO [GroupCoordinator 8]: Loading group metadata for 
testgroup with generation 63 (kafka.coordinator.GroupCoordinator)
[2018-03-27 23:06:52,687] INFO [GroupCoordinator 8]: Preparing to restabilize 
group testgroup with old generation 63 (kafka.coordinator.GroupCoordinator)
[2018-03-27 23:06:52,688] INFO [GroupCoordinator 8]: Stabilized group testgroup 
generation 64 (kafka.coordinator.GroupCoordinator)
[2018-03-27 23:06:52,916] INFO [GroupCoordinator 8]: Assignment received from 
leader for group testgroup for generation 64 
(kafka.coordinator.GroupCoordinator)
{noformat}

On the consumer side, the client-1 that was already reading that partition, on 
that we see the rebalancing getting triggered, both the callbacks 
onPartitionsRevoked and onPartitionsAssigned were invoked, while on client-2, 
none of these callbacks were invoked, however, still it started consuming the 
data from the partition, from there on. We saw the following exception in the 
client-2 logs, occuring 4 times with a gap of 1 to 2 seconds
{noformat}
27 Mar 2018 23:06:42.307 ERROR [testgroup:testopic] 
[o.a.f.s.k.KafkaConsumerWorker.run:329] - testgroup:testopic:: exception 
occurred in kafka source worker. backing off for 1000 millis
org.apache.kafka.clients.consumer.NoOffsetForPartitionException: Undefined 
offset with no reset policy for partition: testopic-0
at 
org.apache.kafka.clients.consumer.internals.Fetcher.resetOffset(Fetcher.java:375)
 ~[kafka-clients-0.10.2.1.jar:na]
at 
org.apache.kafka.clients.consumer.internals.Fetcher.updateFetchPositions(Fetcher.java:248)
 ~[kafka-clients-0.10.2.1.jar:na]
at 
org.apache.kafka.clients.consumer.KafkaConsumer.updateFetchPositions(KafkaConsumer.java:1601)
 ~[kafka-clients-0.10.2.1.jar:na]
at 
org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1034)
 ~[kafka-clients-0.10.2.1.jar:na]
at 
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:995) 
~[kafka-clients-0.10.2.1.jar:na]
at 
org.apache.flume.source.kafka.KafkaConsumerWorker.fetchNextBatch(KafkaConsumerWorker.java:350)
 ~[flume-kafka-source-1.6.0.47.jar:1.6.0.47]
at 
org.apache.flume.source.kafka.KafkaConsumerWorker.run(KafkaConsumerWorker.java:291)
 ~[flume-kafka-source-1.6.0.47.jar:1.6.0.47]
27 Mar 2018 23:06:43.743 ERROR [testgroup:testopic] 
[o.a.f.s.k.KafkaConsumerWorker.run:329] - testgroup:testopic:: exception 
occurred in kafka source worker. backing off for 1000 millis
org.apache.kafka.clients.consumer.NoOffsetForPartitionException: Undefined 
offset with no reset policy for partition: testopic-0
at 
org.apache.kafka.clients.consumer.internals.Fetcher.resetOffset(Fetcher.java:375)
 ~[kafka-clients-0.10.2.1.jar:na]
at 
org.apache.kafka.clients.consumer.internals.Fetcher.resetOffsetsIfNeeded(Fetcher.java:228)
 ~[kafka-clients-0.10.2.1.jar:na]
at 
org.apache.kafka.clients.consumer.KafkaConsumer.updateFetchPositions(KafkaConsumer.java:1591)
 ~[kafka-clients-0.10.2.1.jar:na]
at 
org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1034)
 ~[kafka-clients-0.10.2.1.jar:na]
at 
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:995) 
~[kafka-clients-0.10.2.1.jar:na]
at 
org.apache.flume.source.kafka.KafkaConsumerWorker.fetchNextBatch(KafkaConsumerWorker.java:350)
 ~[flume-kafka-source-1.6.0.47.jar:1.6.0.47]
at 
org.apache.flume.source.kafka.KafkaConsumerWorker.run(KafkaConsumerWorker.java:291)
 ~[flume-kafka-source-1.6.0.47.jar:1.6.0.47]
27 Mar 2018 23:06:44.979 ERROR [testgroup:testopic] 
[o.a.f.s.k.KafkaConsumerWorker.run:329] - testgroup:testopic:: exception 
occurred in kafka source worker. backing off for 1000 millis
org.apache.kafka.clients.consumer.NoOffsetForPartitionException: Undefined 
offset with no reset policy for partition: testopic-0
at 
org.apache.kafka.clients.consumer.internals.Fetcher.resetOffset(Fetcher.java:375)
 ~[kafka-clients-0.10.2.1.jar:na]
at 
org.apache.kafka.clients.consumer.internals.Fetcher.resetOffsetsIfNee

[jira] [Commented] (KAFKA-6681) Two instances of kafka consumer reading the same partition within a consumer group

2018-03-26 Thread Narayan Periwal (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16413789#comment-16413789
 ] 

Narayan Periwal commented on KAFKA-6681:


[~yuzhih...@gmail.com], 
We faced yet another such issue, on server side we found these logs in this case

{noformat}
[2018-03-23 18:59:16,560] INFO [GroupCoordinator 6]: Stabilized group 
prod-m10n-event-batcher-billablebeaconams1 generation 6 
(kafka.coordinator.GroupCoordinator)
[2018-03-23 18:59:46,561] INFO [GroupCoordinator 6]: Preparing to restabilize 
group prod-m10n-event-batcher-billablebeaconams1 with old generation 6 
(kafka.coordinator.GroupCoordinator)
[2018-03-23 18:59:46,833] INFO [GroupCoordinator 6]: Stabilized group 
prod-m10n-event-batcher-billablebeaconams1 generation 7 
(kafka.coordinator.GroupCoordinator)
{noformat}

> Two instances of kafka consumer reading the same partition within a consumer 
> group
> --
>
> Key: KAFKA-6681
> URL: https://issues.apache.org/jira/browse/KAFKA-6681
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.10.2.1
>Reporter: Narayan Periwal
>Priority: Critical
> Attachments: server-1.log, server-2.log
>
>
> We have seen this issue with the Kafka consumer, the new library that got 
> introduced in 0.9
> With this new client, the group management is done by kafka coordinator, 
> which is one of the kafka broker.
> We are using Kafka broker 0.10.2.1 and consumer client version is also 
> 0.10.2.1 
> The issue that we have faced is that, after rebalancing, some of the 
> partitions gets consumed by 2 instances within a consumer group, leading to 
> duplication of the entire partition data. Both the instances continue to read 
> until the next rebalancing, or the restart of those clients. 
> It looks like that a particular consumer goes on fetching the data from a 
> partition, but the broker is not able to identify this "stale" consumer 
> instance. 
> During this time, we also see the underreplicated partition metrics spiking. 
> We have hit this twice in production. Please look at it the earliest. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-6681) Two instances of kafka consumer reading the same partition within a consumer group

2018-03-26 Thread Narayan Periwal (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16413500#comment-16413500
 ] 

Narayan Periwal commented on KAFKA-6681:


[~yuzhih...@gmail.com], any update on this?

> Two instances of kafka consumer reading the same partition within a consumer 
> group
> --
>
> Key: KAFKA-6681
> URL: https://issues.apache.org/jira/browse/KAFKA-6681
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.10.2.1
>Reporter: Narayan Periwal
>Priority: Critical
> Attachments: server-1.log, server-2.log
>
>
> We have seen this issue with the Kafka consumer, the new library that got 
> introduced in 0.9
> With this new client, the group management is done by kafka coordinator, 
> which is one of the kafka broker.
> We are using Kafka broker 0.10.2.1 and consumer client version is also 
> 0.10.2.1 
> The issue that we have faced is that, after rebalancing, some of the 
> partitions gets consumed by 2 instances within a consumer group, leading to 
> duplication of the entire partition data. Both the instances continue to read 
> until the next rebalancing, or the restart of those clients. 
> It looks like that a particular consumer goes on fetching the data from a 
> partition, but the broker is not able to identify this "stale" consumer 
> instance. 
> During this time, we also see the underreplicated partition metrics spiking. 
> We have hit this twice in production. Please look at it the earliest. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-6681) Two instances of kafka consumer reading the same partition within a consumer group

2018-03-20 Thread Narayan Periwal (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16407378#comment-16407378
 ] 

Narayan Periwal commented on KAFKA-6681:


[~yuzhih...@gmail.com]

The partition read by the two consumers did not appear in any of the kafka 
broker logs. Partition number 1 of the topic renderCpmAms1 was being consumed 
by two consumer instances within a consumer group

The following are the log lines in the server logs related to this topic
{noformat}
[2018-03-14 05:01:53,456] INFO Partition [renderCpmAms1,10] on broker 1: 
Shrinking ISR for partition [renderCpmAms1,10] from 1,2,3 to 1,3 
(kafka.cluster.Partition)
[2018-03-14 05:02:14,122] INFO Partition [renderCpmAms1,10] on broker 1: 
Expanding ISR for partition renderCpmAms1-10 from 1,3 to 1,3,2 
(kafka.cluster.Partition)
[2018-03-14 05:01:52,376] INFO Partition [renderCpmAms1,9] on broker 15: 
Shrinking ISR for partition [renderCpmAms1,9] from 2,15,1 to 15,1 
(kafka.cluster.Partition)
[2018-03-14 05:02:14,193] INFO Partition [renderCpmAms1,9] on broker 15: 
Expanding ISR for partition renderCpmAms1-9 from 15,1 to 15,1,2 
(kafka.cluster.Partition)

[2018-03-14 05:02:17,510] INFO Partition [renderCpmAms1,11] on broker 2: 
Shrinking ISR for partition [renderCpmAms1,11] from 2,4,3 to 2,4 
(kafka.cluster.Partition)
[2018-03-14 05:02:17,530] INFO Partition [renderCpmAms1,11] on broker 2: Cached 
zkVersion [171] not equal to that in zookeeper, skip updating ISR 
(kafka.cluster.Partition)
{noformat}

Wondering, if this error or log line has any corelation with the issue.
{noformat}
[2018-03-14 05:02:17,530] INFO Partition [renderCpmAms1,11] on broker 2: Cached 
zkVersion [171] not equal to that in zookeeper, skip updating ISR 
(kafka.cluster.Partition)
{noformat}


> Two instances of kafka consumer reading the same partition within a consumer 
> group
> --
>
> Key: KAFKA-6681
> URL: https://issues.apache.org/jira/browse/KAFKA-6681
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.10.2.1
>Reporter: Narayan Periwal
>Priority: Critical
> Attachments: server-1.log, server-2.log
>
>
> We have seen this issue with the Kafka consumer, the new library that got 
> introduced in 0.9
> With this new client, the group management is done by kafka coordinator, 
> which is one of the kafka broker.
> We are using Kafka broker 0.10.2.1 and consumer client version is also 
> 0.10.2.1 
> The issue that we have faced is that, after rebalancing, some of the 
> partitions gets consumed by 2 instances within a consumer group, leading to 
> duplication of the entire partition data. Both the instances continue to read 
> until the next rebalancing, or the restart of those clients. 
> It looks like that a particular consumer goes on fetching the data from a 
> partition, but the broker is not able to identify this "stale" consumer 
> instance. 
> During this time, we also see the underreplicated partition metrics spiking. 
> We have hit this twice in production. Please look at it the earliest. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-6681) Two instances of kafka consumer reading the same partition within a consumer group

2018-03-20 Thread Narayan Periwal (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16406681#comment-16406681
 ] 

Narayan Periwal commented on KAFKA-6681:


[~tedyu], Attached the server side logs. Could not retrieve the consumer side 
logs as it has hit the retention, I am again trying to reproduce this in our QA 
setup. See if the server side logs is of any help.

The server side logs correspond to those node in which the under replicated 
metrics spiked during this time

There is no logs in the controller.log file during this time.

> Two instances of kafka consumer reading the same partition within a consumer 
> group
> --
>
> Key: KAFKA-6681
> URL: https://issues.apache.org/jira/browse/KAFKA-6681
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.10.2.1
>Reporter: Narayan Periwal
>Priority: Critical
> Attachments: server-1.log, server-2.log
>
>
> We have seen this issue with the Kafka consumer, the new library that got 
> introduced in 0.9
> With this new client, the group management is done by kafka coordinator, 
> which is one of the kafka broker.
> We are using Kafka broker 0.10.2.1 and consumer client version is also 
> 0.10.2.1 
> The issue that we have faced is that, after rebalancing, some of the 
> partitions gets consumed by 2 instances within a consumer group, leading to 
> duplication of the entire partition data. Both the instances continue to read 
> until the next rebalancing, or the restart of those clients. 
> It looks like that a particular consumer goes on fetching the data from a 
> partition, but the broker is not able to identify this "stale" consumer 
> instance. 
> During this time, we also see the underreplicated partition metrics spiking. 
> We have hit this twice in production. Please look at it the earliest. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-6681) Two instances of kafka consumer reading the same partition within a consumer group

2018-03-20 Thread Narayan Periwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Narayan Periwal updated KAFKA-6681:
---
Attachment: server-2.log
server-1.log

> Two instances of kafka consumer reading the same partition within a consumer 
> group
> --
>
> Key: KAFKA-6681
> URL: https://issues.apache.org/jira/browse/KAFKA-6681
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.10.2.1
>Reporter: Narayan Periwal
>Priority: Critical
> Attachments: server-1.log, server-2.log
>
>
> We have seen this issue with the Kafka consumer, the new library that got 
> introduced in 0.9
> With this new client, the group management is done by kafka coordinator, 
> which is one of the kafka broker.
> We are using Kafka broker 0.10.2.1 and consumer client version is also 
> 0.10.2.1 
> The issue that we have faced is that, after rebalancing, some of the 
> partitions gets consumed by 2 instances within a consumer group, leading to 
> duplication of the entire partition data. Both the instances continue to read 
> until the next rebalancing, or the restart of those clients. 
> It looks like that a particular consumer goes on fetching the data from a 
> partition, but the broker is not able to identify this "stale" consumer 
> instance. 
> During this time, we also see the underreplicated partition metrics spiking. 
> We have hit this twice in production. Please look at it the earliest. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-6681) Two instances of kafka consumer reading the same partition within a consumer group

2018-03-19 Thread Narayan Periwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Narayan Periwal updated KAFKA-6681:
---
Description: 
We have seen this issue with the Kafka consumer, the new library that got 
introduced in 0.9

With this new client, the group management is done by kafka coordinator, which 
is one of the kafka broker.

We are using Kafka broker 0.10.2.1 and consumer client version is also 0.10.2.1 

The issue that we have faced is that, after rebalancing, some of the partitions 
gets consumed by 2 instances within a consumer group, leading to duplication of 
the entire partition data. Both the instances continue to read until the next 
rebalancing, or the restart of those clients. 

It looks like that a particular consumer goes on fetching the data from a 
partition, but the broker is not able to identify this "stale" consumer 
instance. 

During this time, we also see the underreplicated partition metrics spiking. 

We have hit this twice in production. Please look at it the earliest. 

  was:
We have seen this issue with the Kafka consumer, the new library that got 
introduced in 0.9

With this new client, the group management is done by kafka coordinator, which 
is one of the kafka broker.

We are using Kafka broker 0.10.2.1 and consumer client version is also 0.10.2.1 

The issue that we have faced is that, after rebalancing, some of the partitions 
gets consumed by 2 instances within a consumer group, leading to duplication of 
the entire partition data. They continue to read until the next rebalancing, or 
the restart of those clients. 

It looks like that a particular consumer goes on fetching the data from a 
partition, but the broker is not able to identify this "stale" consumer 
instance. 

During this time, we also see the underreplicated partition metrics spiking. 

We have hit this twice in production. Please look at it the earliest. 


> Two instances of kafka consumer reading the same partition within a consumer 
> group
> --
>
> Key: KAFKA-6681
> URL: https://issues.apache.org/jira/browse/KAFKA-6681
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.10.2.1
>Reporter: Narayan Periwal
>Priority: Critical
>
> We have seen this issue with the Kafka consumer, the new library that got 
> introduced in 0.9
> With this new client, the group management is done by kafka coordinator, 
> which is one of the kafka broker.
> We are using Kafka broker 0.10.2.1 and consumer client version is also 
> 0.10.2.1 
> The issue that we have faced is that, after rebalancing, some of the 
> partitions gets consumed by 2 instances within a consumer group, leading to 
> duplication of the entire partition data. Both the instances continue to read 
> until the next rebalancing, or the restart of those clients. 
> It looks like that a particular consumer goes on fetching the data from a 
> partition, but the broker is not able to identify this "stale" consumer 
> instance. 
> During this time, we also see the underreplicated partition metrics spiking. 
> We have hit this twice in production. Please look at it the earliest. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-6681) Two instances of kafka consumer reading the same partition within a consumer group

2018-03-19 Thread Narayan Periwal (JIRA)
Narayan Periwal created KAFKA-6681:
--

 Summary: Two instances of kafka consumer reading the same 
partition within a consumer group
 Key: KAFKA-6681
 URL: https://issues.apache.org/jira/browse/KAFKA-6681
 Project: Kafka
  Issue Type: Bug
  Components: consumer
Affects Versions: 0.10.2.1
Reporter: Narayan Periwal


We have seen this issue with the Kafka consumer, the new library that got 
introduced in 0.9

With this new client, the group management is done by kafka coordinator, which 
is one of the kafka broker.

We are using Kafka broker 0.10.2.1 and consumer client version is also 0.10.2.1 

The issue that we have faced is that, after rebalancing, some of the partitions 
gets consumed by 2 instances within a consumer group, leading to duplication of 
the entire partition data. They continue to read until the next rebalancing, or 
the restart of those clients. 

It looks like that a particular consumer goes on fetching the data from a 
partition, but the broker is not able to identify this "stale" consumer 
instance. 

During this time, we also see the underreplicated partition metrics spiking. 

We have hit this twice in production. Please look at it the earliest. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)