[jira] [Commented] (KAFKA-6791) Add a CoordinatorNodeProvider in KafkaAdminClient

2020-09-17 Thread Jeff Widman (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17197411#comment-17197411
 ] 

Jeff Widman commented on KAFKA-6791:


Should this be closed as "wontfix"? 

The associated PR was closed w/o merge with the note 
https://github.com/apache/kafka/pull/4902#issuecomment-390830255:

> Closed this PR since it was already fixed by KAFKA-6299.



> Add a CoordinatorNodeProvider in KafkaAdminClient
> -
>
> Key: KAFKA-6791
> URL: https://issues.apache.org/jira/browse/KAFKA-6791
> Project: Kafka
>  Issue Type: Improvement
>  Components: admin
>Reporter: Guozhang Wang
>Assignee: huxihx
>Priority: Major
>
> As we add more and more coordinator-related requests to the admin client, we 
> can consider adding a CoordinatorNodeProvider to consolidate the common logic 
> pattern of finding the coordinator first, then send the corresponding request.
> Note that 1) with this provider interface it is almost not possible to batch 
> multiple groupIds per coordinator; there has to be a little more refactoring 
> to make it work. 2) for some requests like list consumers, group ids are not 
> known beforehand and hence we cannot use this provider as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (KAFKA-4668) Mirrormaker should default to auto.offset.reset=earliest

2020-04-10 Thread Jeff Widman (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-4668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17081137#comment-17081137
 ] 

Jeff Widman edited comment on KAFKA-4668 at 4/11/20, 4:44 AM:
--

Going to try to get this across the line, three years later. :D

I filed 
[KIP-592|https://cwiki.apache.org/confluence/display/KAFKA/KIP-592%3A++Replicate+MirrorMaker+topics+from+earliest]
 for this.


was (Author: jeffwidman):
Going to try to get this across the line, three years later. :D

I filed 
[KIP-592|[https://cwiki.apache.org/confluence/display/KAFKA/KIP-592%3A++Replicate+MirrorMaker+topics+from+earliest]]
 for this.

> Mirrormaker should default to auto.offset.reset=earliest
> 
>
> Key: KAFKA-4668
> URL: https://issues.apache.org/jira/browse/KAFKA-4668
> Project: Kafka
>  Issue Type: Improvement
>  Components: mirrormaker
>Reporter: Jeff Widman
>Assignee: Jeff Widman
>Priority: Major
>
> Mirrormaker currently inherits the default value for auto.offset.reset, which 
> is latest (new consumer) / largest (old consumer). 
> While for most consumers this is a sensible default, mirrormakers are 
> specifically designed for replication, so they should default to replicating 
> topics from the beginning.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (KAFKA-4668) Mirrormaker should default to auto.offset.reset=earliest

2020-04-10 Thread Jeff Widman (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-4668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17081137#comment-17081137
 ] 

Jeff Widman edited comment on KAFKA-4668 at 4/11/20, 4:44 AM:
--

Going to try to get this across the line, three years later. :D

I filed 
[KIP-592|[https://cwiki.apache.org/confluence/display/KAFKA/KIP-592%3A++Replicate+MirrorMaker+topics+from+earliest]]
 for this.


was (Author: jeffwidman):
Going to try to get this across the line, three years later. :D

I filed 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-592%3A++Replicate+MirrorMaker+topics+from+earliest
 for this.

> Mirrormaker should default to auto.offset.reset=earliest
> 
>
> Key: KAFKA-4668
> URL: https://issues.apache.org/jira/browse/KAFKA-4668
> Project: Kafka
>  Issue Type: Improvement
>  Components: mirrormaker
>Reporter: Jeff Widman
>Assignee: Jeff Widman
>Priority: Major
>
> Mirrormaker currently inherits the default value for auto.offset.reset, which 
> is latest (new consumer) / largest (old consumer). 
> While for most consumers this is a sensible default, mirrormakers are 
> specifically designed for replication, so they should default to replicating 
> topics from the beginning.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-4668) Mirrormaker should default to auto.offset.reset=earliest

2020-04-10 Thread Jeff Widman (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-4668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17081137#comment-17081137
 ] 

Jeff Widman commented on KAFKA-4668:


Going to try to get this across the line, three years later. :D

I filed 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-592%3A++Replicate+MirrorMaker+topics+from+earliest
 for this.

> Mirrormaker should default to auto.offset.reset=earliest
> 
>
> Key: KAFKA-4668
> URL: https://issues.apache.org/jira/browse/KAFKA-4668
> Project: Kafka
>  Issue Type: Improvement
>  Components: mirrormaker
>Reporter: Jeff Widman
>Assignee: Jeff Widman
>Priority: Major
>
> Mirrormaker currently inherits the default value for auto.offset.reset, which 
> is latest (new consumer) / largest (old consumer). 
> While for most consumers this is a sensible default, mirrormakers are 
> specifically designed for replication, so they should default to replicating 
> topics from the beginning.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KAFKA-4668) Mirrormaker should default to auto.offset.reset=earliest

2020-04-10 Thread Jeff Widman (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-4668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Widman reassigned KAFKA-4668:
--

Assignee: Jeff Widman

> Mirrormaker should default to auto.offset.reset=earliest
> 
>
> Key: KAFKA-4668
> URL: https://issues.apache.org/jira/browse/KAFKA-4668
> Project: Kafka
>  Issue Type: Improvement
>  Components: mirrormaker
>Reporter: Jeff Widman
>Assignee: Jeff Widman
>Priority: Major
>
> Mirrormaker currently inherits the default value for auto.offset.reset, which 
> is latest (new consumer) / largest (old consumer). 
> While for most consumers this is a sensible default, mirrormakers are 
> specifically designed for replication, so they should default to replicating 
> topics from the beginning.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-7278) replaceSegments() should not call asyncDeleteSegment() for segments which have been removed from segments list

2018-10-18 Thread Jeff Widman (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Widman updated KAFKA-7278:
---
Description: 
Currently Log.replaceSegments() will call `asyncDeleteSegment(...)` for every 
segment listed in the `oldSegments`. oldSegments should be constructed from 
Log.segments and only contain segments listed in Log.segments.

However, Log.segments may be modified between the time oldSegments is 
determined to the time Log.replaceSegments() is called. If there are concurrent 
async deletion of the same log segment file, Log.replaceSegments() will call 
asyncDeleteSegment() for a segment that does not exist and Kafka server may 
shutdown the log directory due to NoSuchFileException.

This is likely the root cause of KAFKA-6188.

Given the understanding of the problem, we should be able to fix the issue by 
only deleting segment if the segment can be found in Log.segments.

 

 

  was:
Currently Log.replaceSegments() will call `asyncDeleteSegment(...)` for every 
segment listed in the `oldSegments`. oldSegments should be constructed from 
Log.segments and only contain segments listed in Log.segments.

However, Log.segments may be modified between the time oldSegments is 
determined to the time Log.replaceSegments() is called. If there are concurrent 
async deletion of the same log segment file, Log.replaceSegments() will call 
asyncDeleteSegment() for a segment that does not exist and Kafka server may 
shutdown the log directory due to NoSuchFileException.

This is likely the root cause of 
https://issues.apache.org/jira/browse/KAFKA-6188.

Given the understanding of the problem, we should be able to fix the issue by 
only deleting segment if the segment can be found in Log.segments.

 

 


> replaceSegments() should not call asyncDeleteSegment() for segments which 
> have been removed from segments list
> --
>
> Key: KAFKA-7278
> URL: https://issues.apache.org/jira/browse/KAFKA-7278
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Dong Lin
>Assignee: Dong Lin
>Priority: Major
> Fix For: 1.1.2, 2.0.1, 2.1.0
>
>
> Currently Log.replaceSegments() will call `asyncDeleteSegment(...)` for every 
> segment listed in the `oldSegments`. oldSegments should be constructed from 
> Log.segments and only contain segments listed in Log.segments.
> However, Log.segments may be modified between the time oldSegments is 
> determined to the time Log.replaceSegments() is called. If there are 
> concurrent async deletion of the same log segment file, Log.replaceSegments() 
> will call asyncDeleteSegment() for a segment that does not exist and Kafka 
> server may shutdown the log directory due to NoSuchFileException.
> This is likely the root cause of KAFKA-6188.
> Given the understanding of the problem, we should be able to fix the issue by 
> only deleting segment if the segment can be found in Log.segments.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-6266) Kafka 1.0.0 : Repeated occurrence of WARN Resetting first dirty offset of __consumer_offsets-xx to log start offset 203569 since the checkpointed offset 120955 is invalid

2018-04-16 Thread Jeff Widman (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Widman updated KAFKA-6266:
---
Affects Version/s: 1.0.1

> Kafka 1.0.0 : Repeated occurrence of WARN Resetting first dirty offset of 
> __consumer_offsets-xx to log start offset 203569 since the checkpointed 
> offset 120955 is invalid. (kafka.log.LogCleanerManager$)
> --
>
> Key: KAFKA-6266
> URL: https://issues.apache.org/jira/browse/KAFKA-6266
> Project: Kafka
>  Issue Type: Bug
>  Components: offset manager
>Affects Versions: 1.0.0, 1.0.1
> Environment: CentOS 7, Apache kafka_2.12-1.0.0
>Reporter: VinayKumar
>Priority: Major
>
> I upgraded Kafka from 0.10.2.1 to 1.0.0 version. From then, I see the below 
> warnings in the log.
>  I'm seeing these continuously in the log, and want these to be fixed- so 
> that they wont repeat. Can someone please help me in fixing the below 
> warnings.
> {code}
> WARN Resetting first dirty offset of __consumer_offsets-17 to log start 
> offset 3346 since the checkpointed offset 3332 is invalid. 
> (kafka.log.LogCleanerManager$)
>  WARN Resetting first dirty offset of __consumer_offsets-23 to log start 
> offset 4 since the checkpointed offset 1 is invalid. 
> (kafka.log.LogCleanerManager$)
>  WARN Resetting first dirty offset of __consumer_offsets-19 to log start 
> offset 203569 since the checkpointed offset 120955 is invalid. 
> (kafka.log.LogCleanerManager$)
>  WARN Resetting first dirty offset of __consumer_offsets-35 to log start 
> offset 16957 since the checkpointed offset 7 is invalid. 
> (kafka.log.LogCleanerManager$)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (KAFKA-6266) Kafka 1.0.0 : Repeated occurrence of WARN Resetting first dirty offset of __consumer_offsets-xx to log start offset 203569 since the checkpointed offset 120955 is

2018-04-16 Thread Jeff Widman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16440212#comment-16440212
 ] 

Jeff Widman edited comment on KAFKA-6266 at 4/17/18 1:07 AM:
-

I am hitting this after upgrading a 0.10.0.1 broker to 1.0.1. 

the __consumer_offsets topic has the following config:
{code}
Topic:__consumer_offsetsPartitionCount:157  ReplicationFactor:2 
Configs:segment.bytes=104857600,cleanup.policy=compact,compression.type=producer
{code}

Ignore the non-standard partition count, that is a byproduct of a bug from 
years ago. In this case, I think the only effect is that it makes it less 
likely that a partition within the __consumer_offsets topic gets produced to, 
which it sounds like would clear this error.

As described above, the external symptoms were a zero-byte log file that has a 
name like 00012526.log. 

Since this particular cluster has significantly more partitions in 
__consumer_offsets than it does consumer groups, it will not clear the error 
anytime soon because no consumer groups offsets are being hashed onto the 
problem partitions.

So to get out of the situation, I shutdown all brokers that have replicas of 
the partition, then deleted the logfiles for that partition, then restarted the 
brokers. This cleared the filename so that it matched the zero-byte contents.

Note that doing this in production will require downtime as you are taking a 
partition in the __consumer_offsets topic completely offline. On the flip side, 
you are only likely to hit this on somewhat underloaded clusters that can 
likely afford downtime... typically busy production clusters will clear 
themselves automatically through consumer groups producing to this partition.


was (Author: jeffwidman):
I am hitting this after upgrading a 0.10.0.1 broker to 1.0.1. 

the __consumer_offsets topic has the following config:
{code}
Topic:__consumer_offsetsPartitionCount:157  ReplicationFactor:2 
Configs:segment.bytes=104857600,cleanup.policy=compact,compression.type=producer
{code}

Ignore the non-standard partition count, that is a byproduct of a bug from 
years ago. In this case, I think the only effect is that it makes it less 
likely that a partition within the __consumer_offsets topic gets produced to, 
which it sounds like would clear this error.

> Kafka 1.0.0 : Repeated occurrence of WARN Resetting first dirty offset of 
> __consumer_offsets-xx to log start offset 203569 since the checkpointed 
> offset 120955 is invalid. (kafka.log.LogCleanerManager$)
> --
>
> Key: KAFKA-6266
> URL: https://issues.apache.org/jira/browse/KAFKA-6266
> Project: Kafka
>  Issue Type: Bug
>  Components: offset manager
>Affects Versions: 1.0.0
> Environment: CentOS 7, Apache kafka_2.12-1.0.0
>Reporter: VinayKumar
>Priority: Major
>
> I upgraded Kafka from 0.10.2.1 to 1.0.0 version. From then, I see the below 
> warnings in the log.
>  I'm seeing these continuously in the log, and want these to be fixed- so 
> that they wont repeat. Can someone please help me in fixing the below 
> warnings.
> {code}
> WARN Resetting first dirty offset of __consumer_offsets-17 to log start 
> offset 3346 since the checkpointed offset 3332 is invalid. 
> (kafka.log.LogCleanerManager$)
>  WARN Resetting first dirty offset of __consumer_offsets-23 to log start 
> offset 4 since the checkpointed offset 1 is invalid. 
> (kafka.log.LogCleanerManager$)
>  WARN Resetting first dirty offset of __consumer_offsets-19 to log start 
> offset 203569 since the checkpointed offset 120955 is invalid. 
> (kafka.log.LogCleanerManager$)
>  WARN Resetting first dirty offset of __consumer_offsets-35 to log start 
> offset 16957 since the checkpointed offset 7 is invalid. 
> (kafka.log.LogCleanerManager$)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-6266) Kafka 1.0.0 : Repeated occurrence of WARN Resetting first dirty offset of __consumer_offsets-xx to log start offset 203569 since the checkpointed offset 120955 is inval

2018-04-16 Thread Jeff Widman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16440212#comment-16440212
 ] 

Jeff Widman commented on KAFKA-6266:


I am hitting this after upgrading a 0.10.0.1 broker to 1.0.1. 

the __consumer_offsets topic has the following config:
{code}
Topic:__consumer_offsetsPartitionCount:157  ReplicationFactor:2 
Configs:segment.bytes=104857600,cleanup.policy=compact,compression.type=producer
{code}

Ignore the non-standard partition count, that is a byproduct of a bug from 
years ago. In this case, I think the only effect is that it makes it less 
likely that a partition within the __consumer_offsets topic gets produced to, 
which it sounds like would clear this error.

> Kafka 1.0.0 : Repeated occurrence of WARN Resetting first dirty offset of 
> __consumer_offsets-xx to log start offset 203569 since the checkpointed 
> offset 120955 is invalid. (kafka.log.LogCleanerManager$)
> --
>
> Key: KAFKA-6266
> URL: https://issues.apache.org/jira/browse/KAFKA-6266
> Project: Kafka
>  Issue Type: Bug
>  Components: offset manager
>Affects Versions: 1.0.0
> Environment: CentOS 7, Apache kafka_2.12-1.0.0
>Reporter: VinayKumar
>Priority: Major
>
> I upgraded Kafka from 0.10.2.1 to 1.0.0 version. From then, I see the below 
> warnings in the log.
>  I'm seeing these continuously in the log, and want these to be fixed- so 
> that they wont repeat. Can someone please help me in fixing the below 
> warnings.
> {code}
> WARN Resetting first dirty offset of __consumer_offsets-17 to log start 
> offset 3346 since the checkpointed offset 3332 is invalid. 
> (kafka.log.LogCleanerManager$)
>  WARN Resetting first dirty offset of __consumer_offsets-23 to log start 
> offset 4 since the checkpointed offset 1 is invalid. 
> (kafka.log.LogCleanerManager$)
>  WARN Resetting first dirty offset of __consumer_offsets-19 to log start 
> offset 203569 since the checkpointed offset 120955 is invalid. 
> (kafka.log.LogCleanerManager$)
>  WARN Resetting first dirty offset of __consumer_offsets-35 to log start 
> offset 16957 since the checkpointed offset 7 is invalid. 
> (kafka.log.LogCleanerManager$)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-6266) Kafka 1.0.0 : Repeated occurrence of WARN Resetting first dirty offset of __consumer_offsets-xx to log start offset 203569 since the checkpointed offset 120955 is invalid

2018-04-16 Thread Jeff Widman (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Widman updated KAFKA-6266:
---
Description: 
I upgraded Kafka from 0.10.2.1 to 1.0.0 version. From then, I see the below 
warnings in the log.
 I'm seeing these continuously in the log, and want these to be fixed- so that 
they wont repeat. Can someone please help me in fixing the below warnings.
{code}
WARN Resetting first dirty offset of __consumer_offsets-17 to log start offset 
3346 since the checkpointed offset 3332 is invalid. 
(kafka.log.LogCleanerManager$)
 WARN Resetting first dirty offset of __consumer_offsets-23 to log start offset 
4 since the checkpointed offset 1 is invalid. (kafka.log.LogCleanerManager$)
 WARN Resetting first dirty offset of __consumer_offsets-19 to log start offset 
203569 since the checkpointed offset 120955 is invalid. 
(kafka.log.LogCleanerManager$)
 WARN Resetting first dirty offset of __consumer_offsets-35 to log start offset 
16957 since the checkpointed offset 7 is invalid. (kafka.log.LogCleanerManager$)
{code}

  was:
I upgraded Kafka from 0.10.2.1 to 1.0.0 version. From then, I see the below 
warnings in the log.
I'm seeing these continuously in the log, and want these to be fixed- so that 
they wont repeat. Can someone please help me in fixing the below warnings.

WARN Resetting first dirty offset of __consumer_offsets-17 to log start offset 
3346 since the checkpointed offset 3332 is invalid. 
(kafka.log.LogCleanerManager$)
WARN Resetting first dirty offset of __consumer_offsets-23 to log start offset 
4 since the checkpointed offset 1 is invalid. (kafka.log.LogCleanerManager$)
WARN Resetting first dirty offset of __consumer_offsets-19 to log start offset 
203569 since the checkpointed offset 120955 is invalid. 
(kafka.log.LogCleanerManager$)
WARN Resetting first dirty offset of __consumer_offsets-35 to log start offset 
16957 since the checkpointed offset 7 is invalid. (kafka.log.LogCleanerManager$)


> Kafka 1.0.0 : Repeated occurrence of WARN Resetting first dirty offset of 
> __consumer_offsets-xx to log start offset 203569 since the checkpointed 
> offset 120955 is invalid. (kafka.log.LogCleanerManager$)
> --
>
> Key: KAFKA-6266
> URL: https://issues.apache.org/jira/browse/KAFKA-6266
> Project: Kafka
>  Issue Type: Bug
>  Components: offset manager
>Affects Versions: 1.0.0
> Environment: CentOS 7, Apache kafka_2.12-1.0.0
>Reporter: VinayKumar
>Priority: Major
>
> I upgraded Kafka from 0.10.2.1 to 1.0.0 version. From then, I see the below 
> warnings in the log.
>  I'm seeing these continuously in the log, and want these to be fixed- so 
> that they wont repeat. Can someone please help me in fixing the below 
> warnings.
> {code}
> WARN Resetting first dirty offset of __consumer_offsets-17 to log start 
> offset 3346 since the checkpointed offset 3332 is invalid. 
> (kafka.log.LogCleanerManager$)
>  WARN Resetting first dirty offset of __consumer_offsets-23 to log start 
> offset 4 since the checkpointed offset 1 is invalid. 
> (kafka.log.LogCleanerManager$)
>  WARN Resetting first dirty offset of __consumer_offsets-19 to log start 
> offset 203569 since the checkpointed offset 120955 is invalid. 
> (kafka.log.LogCleanerManager$)
>  WARN Resetting first dirty offset of __consumer_offsets-35 to log start 
> offset 16957 since the checkpointed offset 7 is invalid. 
> (kafka.log.LogCleanerManager$)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-6529) Broker leaks memory and file descriptors after sudden client disconnects

2018-02-09 Thread Jeff Widman (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Widman updated KAFKA-6529:
---
Fix Version/s: (was: 1.0.2)
   1.0.1

> Broker leaks memory and file descriptors after sudden client disconnects
> 
>
> Key: KAFKA-6529
> URL: https://issues.apache.org/jira/browse/KAFKA-6529
> Project: Kafka
>  Issue Type: Bug
>  Components: network
>Affects Versions: 1.0.0, 0.11.0.2
>Reporter: Graham Campbell
>Priority: Major
> Fix For: 1.1.0, 1.0.1, 0.11.0.3
>
>
> If a producer forcefully disconnects from a broker while it has staged 
> receives, that connection enters a limbo state where it is no longer 
> processed by the SocketServer.Processor, leaking the file descriptor for the 
> socket and the memory used for the staged recieve queue for that connection.
> We noticed this during an upgrade from 0.9.0.2 to 0.11.0.2. Immediately after 
> the rolling restart to upgrade, open file descriptors on the brokers started 
> climbing uncontrollably. In a few cases brokers reached our configured max 
> open files limit of 100k and crashed before we rolled back.
> We tracked this down to a buildup of muted connections in the 
> Selector.closingChannels list. If a client disconnects from the broker with 
> multiple pending produce requests, when the broker attempts to send an ack to 
> the client it recieves an IOException because the TCP socket has been closed. 
> This triggers the Selector to close the channel, but because it still has 
> pending requests, it adds it to Selector.closingChannels to process those 
> requests. However, because that exception was triggered by trying to send a 
> response, the SocketServer.Processor has marked the channel as muted and will 
> no longer process it at all.
> *Reproduced by:*
> Starting a Kafka broker/cluster
> Client produces several messages and then disconnects abruptly (eg. 
> _./rdkafka_performance -P -x 100 -b broker:9092 -t test_topic_)
> Broker then leaks file descriptor previously used for TCP socket and memory 
> for unprocessed messages
> *Proposed solution (which we've implemented internally)*
> Whenever an exception is encountered when writing to a socket in 
> Selector.pollSelectionKeys(...) record that that connection failed a send by 
> adding the KafkaChannel ID to Selector.failedSends. Then re-raise the 
> exception to still trigger the socket disconnection logic. Since every 
> exception raised in this function triggers a disconnect, we also treat any 
> exception while writing to the socket as a failed send.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-6547) group offset reset and begin_offset ignored/no effect

2018-02-08 Thread Jeff Widman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357807#comment-16357807
 ] 

Jeff Widman commented on KAFKA-6547:


What is the specific command you are running?

How do you know there is no effect? Simply because the consumer is later unable 
to poll?

How do you know it's not a broken consumer?

> group offset reset and begin_offset ignored/no effect
> -
>
> Key: KAFKA-6547
> URL: https://issues.apache.org/jira/browse/KAFKA-6547
> Project: Kafka
>  Issue Type: Bug
>  Components: offset manager
>Affects Versions: 1.0.0
> Environment: ubuntu 16, java 1.8
>Reporter: Dan
>Priority: Major
> Fix For: 0.11.0.2
>
>
> Use of kafka-consumer-group.sh with --reset-offsets --execute  <--to-earliest 
> or anything> has no effect in 1.0. When my group client connects and requests 
> a specific offset or an earliest there's no effect and the consumer is unable 
> to poll, so no messages, even new ones are ignored.
> I installed 0.11 and these problems are not manifest.
> I'm unfamiliar with the internals and put the offset manager as the possible 
> component, but that's a guess.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (KAFKA-2435) More optimally balanced partition assignment strategy

2018-02-08 Thread Jeff Widman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357203#comment-16357203
 ] 

Jeff Widman edited comment on KAFKA-2435 at 2/8/18 4:54 PM:


Can this be closed as wontfix? 

Two reasons:
 # it targets the deprecated (removed?) high-level consumer
 # KIP-54 addressed some of the concerns here (although possibly not all, as 
mentioned here: 
https://issues.apache.org/jira/browse/KAFKA-3297?focusedCommentId=15788229=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-15788229)


was (Author: jeffwidman):
Can this be closed as wontfix? 

Two reasons:
 # it targets the deprecated (removed?) high-level consumer
 # KIP-54 addressed some of the concerns here, and IMHO is a better solution 
because it addresses both fairness and affinity/stickyness

> More optimally balanced partition assignment strategy
> -
>
> Key: KAFKA-2435
> URL: https://issues.apache.org/jira/browse/KAFKA-2435
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Andrew Olson
>Assignee: Andrew Olson
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: KAFKA-2435.patch
>
>
> While the roundrobin partition assignment strategy is an improvement over the 
> range strategy, when the consumer topic subscriptions are not identical 
> (previously disallowed but will be possible as of KAFKA-2172) it can produce 
> heavily skewed assignments. As suggested 
> [here|https://issues.apache.org/jira/browse/KAFKA-2172?focusedCommentId=14530767=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14530767]
>  it would be nice to have a strategy that attempts to assign an equal number 
> of partitions to each consumer in a group, regardless of how similar their 
> individual topic subscriptions are. We can accomplish this by tracking the 
> number of partitions assigned to each consumer, and having the partition 
> assignment loop assign each partition to a consumer interested in that topic 
> with the least number of partitions assigned. 
> Additionally, we can optimize the distribution fairness by adjusting the 
> partition assignment order:
> * Topics with fewer consumers are assigned first.
> * In the event of a tie for least consumers, the topic with more partitions 
> is assigned first.
> The general idea behind these two rules is to keep the most flexible 
> assignment choices available as long as possible by starting with the most 
> constrained partitions/consumers.
> This JIRA addresses the original high-level consumer. For the new consumer, 
> see KAFKA-3297.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (KAFKA-2435) More optimally balanced partition assignment strategy

2018-02-08 Thread Jeff Widman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357203#comment-16357203
 ] 

Jeff Widman edited comment on KAFKA-2435 at 2/8/18 4:51 PM:


Can this be closed as wontfix? 

Two reasons:

 # it targets the deprecated (removed?) high-level consumer

# KIP-54 addressed some of the concerns here, and IMHO is a better solution 
because it addresses both fairness and affinity/stickyness


was (Author: jeffwidman):
Can this be closed as wontfix? 

Two reasons:

 

1) it targets the deprecated (removed?) high-level consumer

2) KIP-54 addressed some of the concerns here, and IMHO is a better solution 
because it addresses both fairness and affinity/stickyness

> More optimally balanced partition assignment strategy
> -
>
> Key: KAFKA-2435
> URL: https://issues.apache.org/jira/browse/KAFKA-2435
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Andrew Olson
>Assignee: Andrew Olson
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: KAFKA-2435.patch
>
>
> While the roundrobin partition assignment strategy is an improvement over the 
> range strategy, when the consumer topic subscriptions are not identical 
> (previously disallowed but will be possible as of KAFKA-2172) it can produce 
> heavily skewed assignments. As suggested 
> [here|https://issues.apache.org/jira/browse/KAFKA-2172?focusedCommentId=14530767=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14530767]
>  it would be nice to have a strategy that attempts to assign an equal number 
> of partitions to each consumer in a group, regardless of how similar their 
> individual topic subscriptions are. We can accomplish this by tracking the 
> number of partitions assigned to each consumer, and having the partition 
> assignment loop assign each partition to a consumer interested in that topic 
> with the least number of partitions assigned. 
> Additionally, we can optimize the distribution fairness by adjusting the 
> partition assignment order:
> * Topics with fewer consumers are assigned first.
> * In the event of a tie for least consumers, the topic with more partitions 
> is assigned first.
> The general idea behind these two rules is to keep the most flexible 
> assignment choices available as long as possible by starting with the most 
> constrained partitions/consumers.
> This JIRA addresses the original high-level consumer. For the new consumer, 
> see KAFKA-3297.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-2435) More optimally balanced partition assignment strategy

2018-02-08 Thread Jeff Widman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357203#comment-16357203
 ] 

Jeff Widman commented on KAFKA-2435:


Can this be closed as wontfix? 

Two reasons:

 

1) it targets the deprecated (removed?) high-level consumer

2) KIP-54 addressed some of the concerns here, and IMHO is a better solution 
because it addresses both fairness and affinity/stickyness

> More optimally balanced partition assignment strategy
> -
>
> Key: KAFKA-2435
> URL: https://issues.apache.org/jira/browse/KAFKA-2435
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Andrew Olson
>Assignee: Andrew Olson
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: KAFKA-2435.patch
>
>
> While the roundrobin partition assignment strategy is an improvement over the 
> range strategy, when the consumer topic subscriptions are not identical 
> (previously disallowed but will be possible as of KAFKA-2172) it can produce 
> heavily skewed assignments. As suggested 
> [here|https://issues.apache.org/jira/browse/KAFKA-2172?focusedCommentId=14530767=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14530767]
>  it would be nice to have a strategy that attempts to assign an equal number 
> of partitions to each consumer in a group, regardless of how similar their 
> individual topic subscriptions are. We can accomplish this by tracking the 
> number of partitions assigned to each consumer, and having the partition 
> assignment loop assign each partition to a consumer interested in that topic 
> with the least number of partitions assigned. 
> Additionally, we can optimize the distribution fairness by adjusting the 
> partition assignment order:
> * Topics with fewer consumers are assigned first.
> * In the event of a tie for least consumers, the topic with more partitions 
> is assigned first.
> The general idea behind these two rules is to keep the most flexible 
> assignment choices available as long as possible by starting with the most 
> constrained partitions/consumers.
> This JIRA addresses the original high-level consumer. For the new consumer, 
> see KAFKA-3297.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-6058) KIP-222: Add "describe consumer groups" and "list consumer groups" to KafkaAdminClient

2018-02-01 Thread Jeff Widman (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Widman updated KAFKA-6058:
---
Labels: kip-222  (was: needs-kip)

> KIP-222: Add "describe consumer groups" and "list consumer groups" to 
> KafkaAdminClient
> --
>
> Key: KAFKA-6058
> URL: https://issues.apache.org/jira/browse/KAFKA-6058
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients
>Reporter: Matthias J. Sax
>Assignee: Jorge Quilcate
>Priority: Major
>  Labels: kip-222
>
> {{KafkaAdminClient}} does not allow to get information about consumer groups. 
> This feature is supported by old {{kafka.admin.AdminClient}} though.
> We should add {{KafkaAdminClient#describeConsumerGroups()}} and 
> {{KafkaAdminClient#listConsumerGroup()}}.
> Associated KIP: KIP-222



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-6058) KIP-222: Add "describe consumer groups" and "list consumer groups" to KafkaAdminClient

2018-01-23 Thread Jeff Widman (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Widman updated KAFKA-6058:
---
Summary: KIP-222: Add "describe consumer groups" and "list consumer groups" 
to KafkaAdminClient  (was: Add "describe consumer groups" and "list consumer 
groups" to KafkaAdminClient)

> KIP-222: Add "describe consumer groups" and "list consumer groups" to 
> KafkaAdminClient
> --
>
> Key: KAFKA-6058
> URL: https://issues.apache.org/jira/browse/KAFKA-6058
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients
>Reporter: Matthias J. Sax
>Assignee: Jorge Quilcate
>Priority: Major
>  Labels: needs-kip
>
> {{KafkaAdminClient}} does not allow to get information about consumer groups. 
> This feature is supported by old {{kafka.admin.AdminClient}} though.
> We should add {{KafkaAdminClient#describeConsumerGroups()}} and 
> {{KafkaAdminClient#listConsumerGroup()}}.
> Associated KIP: KIP-222



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-6058) KIP-222: Add "describe consumer groups" and "list consumer groups" to KafkaAdminClient

2018-01-23 Thread Jeff Widman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16336639#comment-16336639
 ] 

Jeff Widman commented on KAFKA-6058:


Note that listing group offsets has also been added to KIP-222

> KIP-222: Add "describe consumer groups" and "list consumer groups" to 
> KafkaAdminClient
> --
>
> Key: KAFKA-6058
> URL: https://issues.apache.org/jira/browse/KAFKA-6058
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients
>Reporter: Matthias J. Sax
>Assignee: Jorge Quilcate
>Priority: Major
>  Labels: needs-kip
>
> {{KafkaAdminClient}} does not allow to get information about consumer groups. 
> This feature is supported by old {{kafka.admin.AdminClient}} though.
> We should add {{KafkaAdminClient#describeConsumerGroups()}} and 
> {{KafkaAdminClient#listConsumerGroup()}}.
> Associated KIP: KIP-222



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-6058) Add "describe consumer groups" and "list consumer groups" to KafkaAdminClient

2018-01-23 Thread Jeff Widman (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Widman updated KAFKA-6058:
---
Description: 
{{KafkaAdminClient}} does not allow to get information about consumer groups. 
This feature is supported by old {{kafka.admin.AdminClient}} though.

We should add {{KafkaAdminClient#describeConsumerGroups()}} and 
{{KafkaAdminClient#listConsumerGroup()}}.

Associated KIP: KIP-222

  was:
{{KafkaAdminClient}} does not allow to get information about consumer groups. 
This feature is supported by old {{kafka.admin.AdminClient}} though.

We should add {{KafkaAdminClient#describeConsumerGroups()}} and 
{{KafkaAdminClient#listConsumerGroup()}}.


> Add "describe consumer groups" and "list consumer groups" to KafkaAdminClient
> -
>
> Key: KAFKA-6058
> URL: https://issues.apache.org/jira/browse/KAFKA-6058
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients
>Reporter: Matthias J. Sax
>Assignee: Jorge Quilcate
>Priority: Major
>  Labels: needs-kip
>
> {{KafkaAdminClient}} does not allow to get information about consumer groups. 
> This feature is supported by old {{kafka.admin.AdminClient}} though.
> We should add {{KafkaAdminClient#describeConsumerGroups()}} and 
> {{KafkaAdminClient#listConsumerGroup()}}.
> Associated KIP: KIP-222



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-6469) ISR change notification queue can prevent controller from making progress

2018-01-23 Thread Jeff Widman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16336114#comment-16336114
 ] 

Jeff Widman commented on KAFKA-6469:


We hit a similar issue when doing partition re-assignments across the cluster 
and the total payload was greater than 1MB... We solved it by raising the 
jute.maxbuffer.size limit to several MB.

> ISR change notification queue can prevent controller from making progress
> -
>
> Key: KAFKA-6469
> URL: https://issues.apache.org/jira/browse/KAFKA-6469
> Project: Kafka
>  Issue Type: Bug
>Reporter: Kyle Ambroff-Kao
>Assignee: Kyle Ambroff-Kao
>Priority: Major
>
> When the writes /isr_change_notification in ZooKeeper (which is effectively a 
> queue of ISR change events for the controller) happen at a rate high enough 
> that the node with a watch can't dequeue them, the trouble starts.
> The watcher kafka.controller.IsrChangeNotificationListener is fired in the 
> controller when a new entry is written to /isr_change_notification, and the 
> zkclient library sends a GetChildrenRequest to zookeeper to fetch all child 
> znodes.
> We've failures in one of our test clusters as the partition count started to 
> climb north of 60k per broker. We had brokers writing child nodes under 
> /isr_change_notification that were larger than the jute.maxbuffer size in 
> ZooKeeper (1MB), causing the ZooKeeper server to drop the controller's 
> session, effectively bricking the cluster.
> This can be partially mitigated by chunking ISR notifications to increase the 
> maximum number of partitions a broker can host.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-6185) Selector memory leak with high likelihood of OOM in case of down conversion

2018-01-18 Thread Jeff Widman (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Widman updated KAFKA-6185:
---
Component/s: core

> Selector memory leak with high likelihood of OOM in case of down conversion
> ---
>
> Key: KAFKA-6185
> URL: https://issues.apache.org/jira/browse/KAFKA-6185
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.0.0
> Environment: Ubuntu 14.04.5 LTS
> 5 brokers: 1&2 on 1.0.0 3,4,5 on 0.11.0.1
> inter.broker.protocol.version=0.11.0.1
> log.message.format.version=0.11.0.1
> clients a mix of 0.9, 0.10, 0.11
>Reporter: Brett Rann
>Assignee: Rajini Sivaram
>Priority: Blocker
>  Labels: regression
> Fix For: 1.1.0, 1.0.1
>
> Attachments: Kafka_Internals___Datadog.png, 
> Kafka_Internals___Datadog.png
>
>
> We are testing 1.0.0 in a couple of environments.
> Both have about 5 brokers, with two 1.0.0 brokers and the rest 0.11.0.1 
> brokers.
> One is using on disk message format 0.9.0.1, the other 0.11.0.1
> we have 0.9, 0.10, and 0.11 clients connecting.
> The cluster on the 0.9.0.1 format is running fine for a week.
> But the cluster on the 0.11.0.1 format is consistently having memory issues, 
> only on the two upgraded brokers running 1.0.0.
> The first occurrence of the error comes along with this stack trace
> {noformat}
> {"timestamp":"2017-11-06 
> 14:22:32,402","level":"ERROR","logger":"kafka.server.KafkaApis","thread":"kafka-request-handler-7","message":"[KafkaApi-1]
>  Error when handling request 
> {replica_id=-1,max_wait_time=500,min_bytes=1,topics=[{topic=maxwell.users,partitions=[{partition=0,fetch_offset=227537,max_bytes=1100},{partition=4,fetch_offset=354468,max_bytes=1100},{partition=5,fetch_offset=266524,max_bytes=1100},{partition=8,fetch_offset=324562,max_bytes=1100},{partition=10,fetch_offset=292931,max_bytes=1100},{partition=12,fetch_offset=325718,max_bytes=1100},{partition=15,fetch_offset=229036,max_bytes=1100}]}]}"}
> java.lang.OutOfMemoryError: Java heap space
> at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57)
> at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
> at 
> org.apache.kafka.common.record.AbstractRecords.downConvert(AbstractRecords.java:101)
> at 
> org.apache.kafka.common.record.FileRecords.downConvert(FileRecords.java:253)
> at 
> kafka.server.KafkaApis$$anonfun$kafka$server$KafkaApis$$convertedPartitionData$1$1$$anonfun$apply$4.apply(KafkaApis.scala:520)
> at 
> kafka.server.KafkaApis$$anonfun$kafka$server$KafkaApis$$convertedPartitionData$1$1$$anonfun$apply$4.apply(KafkaApis.scala:518)
> at scala.Option.map(Option.scala:146)
> at 
> kafka.server.KafkaApis$$anonfun$kafka$server$KafkaApis$$convertedPartitionData$1$1.apply(KafkaApis.scala:518)
> at 
> kafka.server.KafkaApis$$anonfun$kafka$server$KafkaApis$$convertedPartitionData$1$1.apply(KafkaApis.scala:508)
> at scala.Option.flatMap(Option.scala:171)
> at 
> kafka.server.KafkaApis.kafka$server$KafkaApis$$convertedPartitionData$1(KafkaApis.scala:508)
> at 
> kafka.server.KafkaApis$$anonfun$kafka$server$KafkaApis$$createResponse$2$1.apply(KafkaApis.scala:556)
> at 
> kafka.server.KafkaApis$$anonfun$kafka$server$KafkaApis$$createResponse$2$1.apply(KafkaApis.scala:555)
> at scala.collection.Iterator$class.foreach(Iterator.scala:891)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
> at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
> at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
> at 
> kafka.server.KafkaApis.kafka$server$KafkaApis$$createResponse$2(KafkaApis.scala:555)
> at 
> kafka.server.KafkaApis$$anonfun$kafka$server$KafkaApis$$fetchResponseCallback$1$1.apply(KafkaApis.scala:569)
> at 
> kafka.server.KafkaApis$$anonfun$kafka$server$KafkaApis$$fetchResponseCallback$1$1.apply(KafkaApis.scala:569)
> at 
> kafka.server.KafkaApis$$anonfun$sendResponseMaybeThrottle$1.apply$mcVI$sp(KafkaApis.scala:2034)
> at 
> kafka.server.ClientRequestQuotaManager.maybeRecordAndThrottle(ClientRequestQuotaManager.scala:52)
> at 
> kafka.server.KafkaApis.sendResponseMaybeThrottle(KafkaApis.scala:2033)
> at 
> kafka.server.KafkaApis.kafka$server$KafkaApis$$fetchResponseCallback$1(KafkaApis.scala:569)
> at 
> kafka.server.KafkaApis$$anonfun$kafka$server$KafkaApis$$processResponseCallback$1$1.apply$mcVI$sp(KafkaApis.scala:588)
> at 
> kafka.server.ClientQuotaManager.maybeRecordAndThrottle(ClientQuotaManager.scala:175)
> at 
> 

[jira] [Updated] (KAFKA-5083) always leave the last surviving member of the ISR in ZK

2018-01-18 Thread Jeff Widman (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Widman updated KAFKA-5083:
---
Fix Version/s: 1.1.0

> always leave the last surviving member of the ISR in ZK
> ---
>
> Key: KAFKA-5083
> URL: https://issues.apache.org/jira/browse/KAFKA-5083
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Onur Karaman
>Assignee: Onur Karaman
>Priority: Major
> Fix For: 1.1.0
>
>
> Currently we erase ISR membership if the replica to be removed from the ISR 
> is the last surviving member of the ISR and unclean leader election is 
> enabled for the corresponding topic.
> We should investigate leaving the last replica in ISR in ZK, independent of 
> whether unclean leader election is enabled or not. That way, if people 
> re-disabled unclean leader election, we can still try to elect the leader 
> from the last in-sync replica.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (KAFKA-1120) Controller could miss a broker state change

2018-01-18 Thread Jeff Widman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16331093#comment-16331093
 ] 

Jeff Widman edited comment on KAFKA-1120 at 1/18/18 7:58 PM:
-

The issue description says "the broker will be in this weird state until it is 
restarted."

Could this also be fixed by simply forcing a controller re-election through 
removing the /controller znode? Since it will re-identify the leaders? In some 
scenarios, it seems that might be a lighter-weight solution. I understand this 
does not fix the root code cause, but just want to be sure I understand what 
options I have if we hit this in an emergency situation.


was (Author: jeffwidman):
The issue description says "the broker will be in this weird state until it is 
restarted."

Couldn't this also be fixed by simply forcing a controller re-election? Since 
it will re-identiy the leaders?

> Controller could miss a broker state change 
> 
>
> Key: KAFKA-1120
> URL: https://issues.apache.org/jira/browse/KAFKA-1120
> Project: Kafka
>  Issue Type: Sub-task
>  Components: core
>Affects Versions: 0.8.1
>Reporter: Jun Rao
>Assignee: Mickael Maison
>Priority: Major
>  Labels: reliability
> Fix For: 1.1.0
>
>
> When the controller is in the middle of processing a task (e.g., preferred 
> leader election, broker change), it holds a controller lock. During this 
> time, a broker could have de-registered and re-registered itself in ZK. After 
> the controller finishes processing the current task, it will start processing 
> the logic in the broker change listener. However, it will see no broker 
> change and therefore won't do anything to the restarted broker. This broker 
> will be in a weird state since the controller doesn't inform it to become the 
> leader of any partition. Yet, the cached metadata in other brokers could 
> still list that broker as the leader for some partitions. Client requests 
> routed to that broker will then get a TopicOrPartitionNotExistException. This 
> broker will continue to be in this bad state until it's restarted again.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-1120) Controller could miss a broker state change

2018-01-18 Thread Jeff Widman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16331093#comment-16331093
 ] 

Jeff Widman commented on KAFKA-1120:


The issue description says "the broker will be in this weird state until it is 
restarted."

Couldn't this also be fixed by simply forcing a controller re-election? Since 
it will re-identiy the leaders?

> Controller could miss a broker state change 
> 
>
> Key: KAFKA-1120
> URL: https://issues.apache.org/jira/browse/KAFKA-1120
> Project: Kafka
>  Issue Type: Sub-task
>  Components: core
>Affects Versions: 0.8.1
>Reporter: Jun Rao
>Assignee: Mickael Maison
>Priority: Major
>  Labels: reliability
> Fix For: 1.1.0
>
>
> When the controller is in the middle of processing a task (e.g., preferred 
> leader election, broker change), it holds a controller lock. During this 
> time, a broker could have de-registered and re-registered itself in ZK. After 
> the controller finishes processing the current task, it will start processing 
> the logic in the broker change listener. However, it will see no broker 
> change and therefore won't do anything to the restarted broker. This broker 
> will be in a weird state since the controller doesn't inform it to become the 
> leader of any partition. Yet, the cached metadata in other brokers could 
> still list that broker as the leader for some partitions. Client requests 
> routed to that broker will then get a TopicOrPartitionNotExistException. This 
> broker will continue to be in this bad state until it's restarted again.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (KAFKA-2758) Improve Offset Commit Behavior

2017-11-06 Thread Jeff Widman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16240796#comment-16240796
 ] 

Jeff Widman edited comment on KAFKA-2758 at 11/6/17 7:59 PM:
-

item 1 would be significantly more useful if KIP-211 
([https://cwiki.apache.org/confluence/display/KAFKA/KIP-211%3A+Revise+Expiration+Semantics+of+Consumer+Group+Offsets])
 gets accepted. That would remove the risk of accidentally expiring a 
consumer's offsets.


was (Author: jeffwidman):
item 1 would be significantly more useful if KIP-211 
([https://cwiki.apache.org/confluence/display/KAFKA/KIP-211%3A+Revise+Expiration+Semantics+of+Consumer+Group+Offsets])
 gets accepted. That would remove the risk of accidentally expiring a consumers 
offsets.

> Improve Offset Commit Behavior
> --
>
> Key: KAFKA-2758
> URL: https://issues.apache.org/jira/browse/KAFKA-2758
> Project: Kafka
>  Issue Type: Improvement
>  Components: consumer
>Reporter: Guozhang Wang
>  Labels: newbiee, reliability
>
> There are two scenarios of offset committing that we can improve:
> 1) we can filter the partitions whose committed offset is equal to the 
> consumed offset, meaning there is no new consumed messages from this 
> partition and hence we do not need to include this partition in the commit 
> request.
> 2) we can make a commit request right after resetting to a fetch / consume 
> position either according to the reset policy (e.g. on consumer starting up, 
> or handling of out of range offset, etc), or through the {code} seek {code} 
> so that if the consumer fails right after these event, upon recovery it can 
> restarts from the reset position instead of resetting again: this can lead 
> to, for example, data loss if we use "largest" as reset policy while there 
> are new messages coming to the fetching partitions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (KAFKA-2758) Improve Offset Commit Behavior

2017-11-06 Thread Jeff Widman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16240796#comment-16240796
 ] 

Jeff Widman edited comment on KAFKA-2758 at 11/6/17 7:59 PM:
-

item 1 would be significantly more useful if KIP-211 
([https://cwiki.apache.org/confluence/display/KAFKA/KIP-211%3A+Revise+Expiration+Semantics+of+Consumer+Group+Offsets])
 gets accepted. That would remove the risk off accidentally expiring a 
consumers offsets.


was (Author: jeffwidman):
item 1 would be significantly more useful if 
[KIP-211](https://cwiki.apache.org/confluence/display/KAFKA/KIP-211%3A+Revise+Expiration+Semantics+of+Consumer+Group+Offsets)
 gets accepted. That would remove the risk off accidentally expiring a 
consumers offsets.

> Improve Offset Commit Behavior
> --
>
> Key: KAFKA-2758
> URL: https://issues.apache.org/jira/browse/KAFKA-2758
> Project: Kafka
>  Issue Type: Improvement
>  Components: consumer
>Reporter: Guozhang Wang
>  Labels: newbiee, reliability
>
> There are two scenarios of offset committing that we can improve:
> 1) we can filter the partitions whose committed offset is equal to the 
> consumed offset, meaning there is no new consumed messages from this 
> partition and hence we do not need to include this partition in the commit 
> request.
> 2) we can make a commit request right after resetting to a fetch / consume 
> position either according to the reset policy (e.g. on consumer starting up, 
> or handling of out of range offset, etc), or through the {code} seek {code} 
> so that if the consumer fails right after these event, upon recovery it can 
> restarts from the reset position instead of resetting again: this can lead 
> to, for example, data loss if we use "largest" as reset policy while there 
> are new messages coming to the fetching partitions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-2758) Improve Offset Commit Behavior

2017-11-06 Thread Jeff Widman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16240796#comment-16240796
 ] 

Jeff Widman commented on KAFKA-2758:


item 1 would be significantly more useful if 
[KIP-211](https://cwiki.apache.org/confluence/display/KAFKA/KIP-211%3A+Revise+Expiration+Semantics+of+Consumer+Group+Offsets)
 gets accepted. That would remove the risk off accidentally expiring a 
consumers offsets.

> Improve Offset Commit Behavior
> --
>
> Key: KAFKA-2758
> URL: https://issues.apache.org/jira/browse/KAFKA-2758
> Project: Kafka
>  Issue Type: Improvement
>  Components: consumer
>Reporter: Guozhang Wang
>  Labels: newbiee, reliability
>
> There are two scenarios of offset committing that we can improve:
> 1) we can filter the partitions whose committed offset is equal to the 
> consumed offset, meaning there is no new consumed messages from this 
> partition and hence we do not need to include this partition in the commit 
> request.
> 2) we can make a commit request right after resetting to a fetch / consume 
> position either according to the reset policy (e.g. on consumer starting up, 
> or handling of out of range offset, etc), or through the {code} seek {code} 
> so that if the consumer fails right after these event, upon recovery it can 
> restarts from the reset position instead of resetting again: this can lead 
> to, for example, data loss if we use "largest" as reset policy while there 
> are new messages coming to the fetching partitions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-4084) automated leader rebalance causes replication downtime for clusters with too many partitions

2017-10-30 Thread Jeff Widman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225870#comment-16225870
 ] 

Jeff Widman commented on KAFKA-4084:


[~junrao] any ballpark quantification to "much faster"? 

Are we talking 2x, 10x, or 100x faster?

When you say "batches the requests", I'm not sure what the batch size is... if 
it does all changes as a single batch or if there's multiple batches... so it's 
hard to guesstimate the expected performance impact.

> automated leader rebalance causes replication downtime for clusters with too 
> many partitions
> 
>
> Key: KAFKA-4084
> URL: https://issues.apache.org/jira/browse/KAFKA-4084
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 0.8.2.2, 0.9.0.0, 0.9.0.1, 0.10.0.0, 0.10.0.1
>Reporter: Tom Crayford
>  Labels: reliability
> Fix For: 1.1.0
>
>
> If you enable {{auto.leader.rebalance.enable}} (which is on by default), and 
> you have a cluster with many partitions, there is a severe amount of 
> replication downtime following a restart. This causes 
> `UnderReplicatedPartitions` to fire, and replication is paused.
> This is because the current automated leader rebalance mechanism changes 
> leaders for *all* imbalanced partitions at once, instead of doing it 
> gradually. This effectively stops all replica fetchers in the cluster 
> (assuming there are enough imbalanced partitions), and restarts them. This 
> can take minutes on busy clusters, during which no replication is happening 
> and user data is at risk. Clients with {{acks=-1}} also see issues at this 
> time, because replication is effectively stalled.
> To quote Todd Palino from the mailing list:
> bq. There is an admin CLI command to trigger the preferred replica election 
> manually. There is also a broker configuration “auto.leader.rebalance.enable” 
> which you can set to have the broker automatically perform the PLE when 
> needed. DO NOT USE THIS OPTION. There are serious performance issues when 
> doing so, especially on larger clusters. It needs some development work that 
> has not been fully identified yet.
> This setting is extremely useful for smaller clusters, but with high 
> partition counts causes the huge issues stated above.
> One potential fix could be adding a new configuration for the number of 
> partitions to do automated leader rebalancing for at once, and *stop* once 
> that number of leader rebalances are in flight, until they're done. There may 
> be better mechanisms, and I'd love to hear if anybody has any ideas.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-3356) Remove ConsumerOffsetChecker, deprecated in 0.9, in 0.11

2017-08-30 Thread Jeff Widman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16148417#comment-16148417
 ] 

Jeff Widman commented on KAFKA-3356:


Is this waiting on anything to be merged?

> Remove ConsumerOffsetChecker, deprecated in 0.9, in 0.11
> 
>
> Key: KAFKA-3356
> URL: https://issues.apache.org/jira/browse/KAFKA-3356
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 0.10.2.0
>Reporter: Ashish Singh
>Assignee: Mickael Maison
>Priority: Blocker
> Fix For: 1.0.0
>
>
> ConsumerOffsetChecker is marked deprecated as of 0.9, should be removed in 
> 0.11.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KAFKA-5584) Incorrect log size for topics larger than 2 GB

2017-07-12 Thread Jeff Widman (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Widman updated KAFKA-5584:
---
Description: 
The {{size}} of a {{Log}} is calculated incorrectly due to an Integer overflow. 
For large topics (larger than 2 GB) this value overflows.

This is easily observable in the reported metrics values of the path 
{{log.Log.partition.*.topic..Size}} (see attached screenshot).

Moreover I think this breaks the size-based retention (via 
{{log.retention.bytes}} and {{retention.bytes}}) of large topics as well.

I am not sure on the recommended workflow, should I open a pull request on 
github with a fix?

  was:
The {{size}} of a {{Log}} is calculated incorrectly due to an Integer overflow. 
For large topics (larger than 2 GB) this value overflows.

This is easily observable in the reported metrics values of the path 
{{log.Log.partiion.*.topic..Size}} (see attached screenshot).

Moreover I think this breaks the size-based retention (via 
{{log.retention.bytes}} and {{retention.bytes}}) of large topics as well.

I am not sure on the recommended workflow, should I open a pull request on 
github with a fix?


> Incorrect log size for topics larger than 2 GB
> --
>
> Key: KAFKA-5584
> URL: https://issues.apache.org/jira/browse/KAFKA-5584
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Reporter: Gregor Uhlenheuer
> Attachments: Screen Shot 2017-07-12 at 09.10.53.png
>
>
> The {{size}} of a {{Log}} is calculated incorrectly due to an Integer 
> overflow. For large topics (larger than 2 GB) this value overflows.
> This is easily observable in the reported metrics values of the path 
> {{log.Log.partition.*.topic..Size}} (see attached screenshot).
> Moreover I think this breaks the size-based retention (via 
> {{log.retention.bytes}} and {{retention.bytes}}) of large topics as well.
> I am not sure on the recommended workflow, should I open a pull request on 
> github with a fix?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5584) Incorrect log size for topics larger than 2 GB

2017-07-12 Thread Jeff Widman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083580#comment-16083580
 ] 

Jeff Widman commented on KAFKA-5584:


What version of Kafka did you observe this on?

> Incorrect log size for topics larger than 2 GB
> --
>
> Key: KAFKA-5584
> URL: https://issues.apache.org/jira/browse/KAFKA-5584
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Reporter: Gregor Uhlenheuer
> Attachments: Screen Shot 2017-07-12 at 09.10.53.png
>
>
> The {{size}} of a {{Log}} is calculated incorrectly due to an Integer 
> overflow. For large topics (larger than 2 GB) this value overflows.
> This is easily observable in the reported metrics values of the path 
> {{log.Log.partiion.*.topic..Size}} (see attached screenshot).
> Moreover I think this breaks the size-based retention (via 
> {{log.retention.bytes}} and {{retention.bytes}}) of large topics as well.
> I am not sure on the recommended workflow, should I open a pull request on 
> github with a fix?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5299) MirrorMaker with New.consumer doesn't consume message from multiple topics whitelisted

2017-06-29 Thread Jeff Widman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16069006#comment-16069006
 ] 

Jeff Widman commented on KAFKA-5299:


Closing due to lack of activity and inadequate description to reproduce.

Happy to re-open if you post the config that you're using.

> MirrorMaker with New.consumer doesn't consume message from multiple topics 
> whitelisted 
> ---
>
> Key: KAFKA-5299
> URL: https://issues.apache.org/jira/browse/KAFKA-5299
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.1
>Reporter: Jyoti
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (KAFKA-5299) MirrorMaker with New.consumer doesn't consume message from multiple topics whitelisted

2017-06-29 Thread Jeff Widman (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Widman resolved KAFKA-5299.

Resolution: Cannot Reproduce

> MirrorMaker with New.consumer doesn't consume message from multiple topics 
> whitelisted 
> ---
>
> Key: KAFKA-5299
> URL: https://issues.apache.org/jira/browse/KAFKA-5299
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.1
>Reporter: Jyoti
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5027) Kafka Controller Redesign

2017-06-23 Thread Jeff Widman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061806#comment-16061806
 ] 

Jeff Widman commented on KAFKA-5027:


Just wanted to say thanks [~onurkaraman] for all your hard work on this, it is 
much appreciated!

> Kafka Controller Redesign
> -
>
> Key: KAFKA-5027
> URL: https://issues.apache.org/jira/browse/KAFKA-5027
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Onur Karaman
>Assignee: Onur Karaman
>
> The goal of this redesign is to improve controller performance, controller 
> maintainability, and cluster reliability.
> Documentation regarding what's being considered can be found 
> [here|https://docs.google.com/document/d/1rLDmzDOGQQeSiMANP0rC2RYp_L7nUGHzFD9MQISgXYM].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5465) FetchResponse v0 does not return any messages when max_bytes smaller than v2 message set

2017-06-19 Thread Jeff Widman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054956#comment-16054956
 ] 

Jeff Widman commented on KAFKA-5465:


I agree with the "WontFix" resolution; however, I'm scared that I'll forget 
about this when upgrading our cluster. Can a link to this issue be added to the 
Upgrade instructions for 0.11?

> FetchResponse v0 does not return any messages when max_bytes smaller than v2 
> message set 
> -
>
> Key: KAFKA-5465
> URL: https://issues.apache.org/jira/browse/KAFKA-5465
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.11.0.0
>Reporter: Dana Powers
>Assignee: Jason Gustafson
>Priority: Blocker
> Fix For: 0.11.0.1
>
>
> In prior releases, when consuming uncompressed messages, FetchResponse v0 
> will return a message if it is smaller than the max_bytes sent in the 
> FetchRequest. In 0.11.0.0 RC0, when messages are stored as v2 internally, the 
> response will be empty unless the full MessageSet is smaller than max_bytes. 
> In some configurations, this may cause old consumers to get stuck on large 
> messages where previously they were able to make progress one message at a 
> time.
> For example, when I produce 10 5KB messages using ProduceRequest v0 and then 
> attempt FetchRequest v0 with partition max bytes = 6KB (larger than a single 
> message but smaller than all 10 messages together), I get an empty message 
> set from 0.11.0.0. Previous brokers would have returned a single message.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)