[jira] [Comment Edited] (KAFKA-5040) Increase number of Streams producer retries from the default of 0

2017-05-05 Thread Peter Davis (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999309#comment-15999309
 ] 

Peter Davis edited comment on KAFKA-5040 at 5/6/17 5:58 AM:


Just confirming that per [this comment on 
KAFKA-3197|https://issues.apache.org/jira/browse/KAFKA-3197?focusedCommentId=15130761=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15130761],
 this change does not break any ordering guarantees?


was (Author: davispw):
Just confirming that per [this comment on 
KAFKA-3197](https://issues.apache.org/jira/browse/KAFKA-3197?focusedCommentId=15130761=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15130761),
 this change does not break any ordering guarantees?

> Increase number of Streams producer retries from the default of 0
> -
>
> Key: KAFKA-5040
> URL: https://issues.apache.org/jira/browse/KAFKA-5040
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.2.0
>Reporter: Eno Thereska
>Assignee: Eno Thereska
>Priority: Blocker
> Fix For: 0.11.0.0, 0.10.2.1
>
>
> In Kafka Streams, the default value for the producer retries is not changed 
> from the default of 0. That leads to situations where a streams instance 
> fails when a broker is temporarily unavailable. Increasing this number to > 0 
> would help.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5040) Increase number of Streams producer retries from the default of 0

2017-05-05 Thread Peter Davis (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999309#comment-15999309
 ] 

Peter Davis commented on KAFKA-5040:


Just confirming that per [this comment on 
KAFKA-3197](https://issues.apache.org/jira/browse/KAFKA-3197?focusedCommentId=15130761=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15130761),
 this change does not break any ordering guarantees?

> Increase number of Streams producer retries from the default of 0
> -
>
> Key: KAFKA-5040
> URL: https://issues.apache.org/jira/browse/KAFKA-5040
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.2.0
>Reporter: Eno Thereska
>Assignee: Eno Thereska
>Priority: Blocker
> Fix For: 0.11.0.0, 0.10.2.1
>
>
> In Kafka Streams, the default value for the producer retries is not changed 
> from the default of 0. That leads to situations where a streams instance 
> fails when a broker is temporarily unavailable. Increasing this number to > 0 
> would help.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-1641) Log cleaner exits if last cleaned offset is lower than earliest offset

2017-01-20 Thread Peter Davis (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832266#comment-15832266
 ] 

Peter Davis commented on KAFKA-1641:


"Me too" on 0.10.0.1 - does this issue need to be reopened?

java.lang.IllegalArgumentException: requirement failed: Last clean 
offset is 43056300 but segment base offset is 42738384 for log -redacted- -0.
at scala.Predef$.require(Predef.scala:224)
at kafka.log.Cleaner.buildOffsetMap(LogCleaner.scala:604)
at kafka.log.Cleaner.clean(LogCleaner.scala:329)
at kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:237)
at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:215)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)


> Log cleaner exits if last cleaned offset is lower than earliest offset
> --
>
> Key: KAFKA-1641
> URL: https://issues.apache.org/jira/browse/KAFKA-1641
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.1.1
>Reporter: Joel Koshy
>Assignee: Guozhang Wang
> Fix For: 0.9.0.0
>
> Attachments: KAFKA-1641_2014-10-09_13:04:15.patch, KAFKA-1641.patch
>
>
> Encountered this recently: the log cleaner exited a while ago (I think 
> because the topic had compressed messages). That issue was subsequently 
> addressed by having the producer only send uncompressed. However, on a 
> subsequent restart of the broker we see this:
> In this scenario I think it is reasonable to just emit a warning and have the 
> cleaner round up its first dirty offset to the base offset of the first 
> segment.
> {code}
> [kafka-server] [] [kafka-log-cleaner-thread-0], Error due to 
> java.lang.IllegalArgumentException: requirement failed: Last clean offset is 
> 54770438 but segment base offset is 382844024 for log testtopic-0.
> at scala.Predef$.require(Predef.scala:145)
> at kafka.log.Cleaner.buildOffsetMap(LogCleaner.scala:491)
> at kafka.log.Cleaner.clean(LogCleaner.scala:288)
> at 
> kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:202)
> at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:187)
> at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:51)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3894) Log Cleaner thread crashes and never restarts

2016-07-18 Thread Peter Davis (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15382511#comment-15382511
 ] 

Peter Davis commented on KAFKA-3894:


Re: "the broker seems to be working"

You may regret not taking action now.  As Tim mentioned from the talk at the 
Kafka Summit 
(http://www.slideshare.net/jjkoshy/kafkaesque-days-at-linked-in-in-2015/49), if 
__consumer_offsets is not compacted and has accumulated millions (or billions!) 
of messages, it can take many minutes for the broker to elect a new coordinator 
after any kind of hiccup.  *Your new consumers may be hung during this time!*

However, even shutting down brokers to change the configuration will cause 
coordinator elections which will cause an outage.  It seems like not having a 
"hot spare" for Offset Managers is a liability hereā€¦

We were bit by this bug and it caused all kinds of headaches until we managed 
to get __consumer_offsets cleaned up again.

> Log Cleaner thread crashes and never restarts
> -
>
> Key: KAFKA-3894
> URL: https://issues.apache.org/jira/browse/KAFKA-3894
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8.2.2, 0.9.0.1
> Environment: Oracle JDK 8
> Ubuntu Precise
>Reporter: Tim Carey-Smith
>  Labels: compaction
>
> The log-cleaner thread can crash if the number of keys in a topic grows to be 
> too large to fit into the dedupe buffer. 
> The result of this is a log line: 
> {quote}
> broker=0 pri=ERROR t=kafka-log-cleaner-thread-0 at=LogCleaner 
> \[kafka-log-cleaner-thread-0\], Error due to  
> java.lang.IllegalArgumentException: requirement failed: 9750860 messages in 
> segment MY_FAVORITE_TOPIC-2/47580165.log but offset map can fit 
> only 5033164. You can increase log.cleaner.dedupe.buffer.size or decrease 
> log.cleaner.threads
> {quote}
> As a result, the broker is left in a potentially dangerous situation where 
> cleaning of compacted topics is not running. 
> It is unclear if the broader strategy for the {{LogCleaner}} is the reason 
> for this upper bound, or if this is a value which must be tuned for each 
> specific use-case. 
> Of more immediate concern is the fact that the thread crash is not visible 
> via JMX or exposed as some form of service degradation. 
> Some short-term remediations we have made are:
> * increasing the size of the dedupe buffer
> * monitoring the log-cleaner threads inside the JVM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-3893) Kafka Borker ID disappears from /borkers/ids

2016-07-02 Thread Peter Davis (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15360306#comment-15360306
 ] 

Peter Davis edited comment on KAFKA-3893 at 7/2/16 7:36 PM:


Sriharsha, I have witnessed this too and it very much seems like a bug in Kafka 
-- when a zookeeper connection is lost, it seems any other changes in the 
cluster during the loss (which would be expected if an outage affects multiple 
brokers) are not recognized when it reconnects.  We see the same loop of 
"Shrinking ISR" and "Cached zkVerskom [###] not equal to that in zookeeper", 
and the broker never recovers until manually restarted. 

For us this happened almost daily when running on a cluster virtual machines 
that would get paused for a few seconds every night for a snapshot backup.  We 
disabled the backup but it's very concerning that Kafka won't recover after a 
pause!

Seen with 0.9 and 0.10. 


was (Author: davispw):
Sriharsha, I have witnessed this too and it very much seems like a bug in Kafka 
-- when a zookeeper connection is lost, any other changes in the cluster during 
the loss are not recognized when it reconnects.  We see the same loop of 
"Shrinking ISR" and "Cached zkVerskom [###] not equal to that in zookeeper", 
and the broker never recovers. 

For us this happened almost daily when running on a cluster virtual machines 
that would get paused for a few seconds every night for a snapshot backup.  We 
disabled the backup but it's very concerning that Kafka won't recover after a 
pause!

> Kafka Borker ID disappears from /borkers/ids
> 
>
> Key: KAFKA-3893
> URL: https://issues.apache.org/jira/browse/KAFKA-3893
> Project: Kafka
>  Issue Type: Bug
>Reporter: chaitra
>Priority: Critical
>
> Kafka version used : 0.8.2.1 
> Zookeeper version: 3.4.6
> We have scenario where kafka 's broker in  zookeeper path /brokers/ids just 
> disappears.
> We see the zookeeper connection active and no network issue.
> The zookeeper conection timeout is set to 6000ms in server.properties
> Hence Kafka not participating in cluster



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3893) Kafka Borker ID disappears from /borkers/ids

2016-07-02 Thread Peter Davis (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15360306#comment-15360306
 ] 

Peter Davis commented on KAFKA-3893:


Sriharsha, I have witnessed this too and it very much seems like a bug in Kafka 
-- when a zookeeper connection is lost, any other changes in the cluster during 
the loss are not recognized when it reconnects.  We see the same loop of 
"Shrinking ISR" and "Cached zkVerskom [###] not equal to that in zookeeper", 
and the broker never recovers. 

For us this happened almost daily when running on a cluster virtual machines 
that would get paused for a few seconds every night for a snapshot backup.  We 
disabled the backup but it's very concerning that Kafka won't recover after a 
pause!

> Kafka Borker ID disappears from /borkers/ids
> 
>
> Key: KAFKA-3893
> URL: https://issues.apache.org/jira/browse/KAFKA-3893
> Project: Kafka
>  Issue Type: Bug
>Reporter: chaitra
>Priority: Critical
>
> Kafka version used : 0.8.2.1 
> Zookeeper version: 3.4.6
> We have scenario where kafka 's broker in  zookeeper path /brokers/ids just 
> disappears.
> We see the zookeeper connection active and no network issue.
> The zookeeper conection timeout is set to 6000ms in server.properties
> Hence Kafka not participating in cluster



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3925) Default log.dir=/tmp/kafka-logs is unsafe

2016-07-02 Thread Peter Davis (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Davis updated KAFKA-3925:
---
Description: 
Many operating systems are configured to delete files under /tmp.  For example 
Ubuntu has 
[tmpreaper|http://manpages.ubuntu.com/manpages/wily/man8/tmpreaper.8.html], 
others use tmpfs, others delete /tmp on startup. 

Defaults are OK to make getting started easier but should not be unsafe 
(risking data loss). 

Something under /var would be a better default log.dir under *nix.  Or relative 
to the Kafka bin directory to avoid needing root.  

If the default cannot be changed, I would suggest a special warning print to 
the console on broker startup if log.dir is under /tmp. 

See [users list 
thread|http://mail-archives.apache.org/mod_mbox/kafka-users/201607.mbox/%3cCAD5tkZb-0MMuWJqHNUJ3i1+xuNPZ4tnQt-RPm65grxE0=0o...@mail.gmail.com%3e].
  I've also been bitten by this. 

  was:
Many operating systems are configured to delete files under /tmp.  For example 
Ubuntu has 
[tmpreaper](http://manpages.ubuntu.com/manpages/wily/man8/tmpreaper.8.html), 
others use tmpfs, others delete /tmp on startup. 

Defaults are OK to make getting started easier but should not be unsafe 
(risking data loss). 

Something under /var would be a better default log.dir under *nix.  Or relative 
to the Kafka bin directory to avoid needing root.  

If the default cannot be changed, I would suggest a special warning print to 
the console on broker startup if log.dir is under /tmp. 

See [users list 
thread](http://mail-archives.apache.org/mod_mbox/kafka-users/201607.mbox/%3cCAD5tkZb-0MMuWJqHNUJ3i1+xuNPZ4tnQt-RPm65grxE0=0o...@mail.gmail.com%3e).
  I've also been bitten by this. 


> Default log.dir=/tmp/kafka-logs is unsafe
> -
>
> Key: KAFKA-3925
> URL: https://issues.apache.org/jira/browse/KAFKA-3925
> Project: Kafka
>  Issue Type: Bug
>  Components: config
>Affects Versions: 0.10.0.0
> Environment: Various, depends on OS and configuration
>Reporter: Peter Davis
>
> Many operating systems are configured to delete files under /tmp.  For 
> example Ubuntu has 
> [tmpreaper|http://manpages.ubuntu.com/manpages/wily/man8/tmpreaper.8.html], 
> others use tmpfs, others delete /tmp on startup. 
> Defaults are OK to make getting started easier but should not be unsafe 
> (risking data loss). 
> Something under /var would be a better default log.dir under *nix.  Or 
> relative to the Kafka bin directory to avoid needing root.  
> If the default cannot be changed, I would suggest a special warning print to 
> the console on broker startup if log.dir is under /tmp. 
> See [users list 
> thread|http://mail-archives.apache.org/mod_mbox/kafka-users/201607.mbox/%3cCAD5tkZb-0MMuWJqHNUJ3i1+xuNPZ4tnQt-RPm65grxE0=0o...@mail.gmail.com%3e].
>   I've also been bitten by this. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3925) Default log.dir=/tmp/kafka-logs is unsafe

2016-07-02 Thread Peter Davis (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15360291#comment-15360291
 ] 

Peter Davis commented on KAFKA-3925:


Another recent mailing list thread: 
http://mail-archives.apache.org/mod_mbox/kafka-users/201606.mbox/%3ccag4txrjgodvq5_nlb2qt_sskhxttjpuy6d42hdtcz5hjqxk...@mail.gmail.com%3e

> Default log.dir=/tmp/kafka-logs is unsafe
> -
>
> Key: KAFKA-3925
> URL: https://issues.apache.org/jira/browse/KAFKA-3925
> Project: Kafka
>  Issue Type: Bug
>  Components: config
>Affects Versions: 0.10.0.0
> Environment: Various, depends on OS and configuration
>Reporter: Peter Davis
>
> Many operating systems are configured to delete files under /tmp.  For 
> example Ubuntu has 
> [tmpreaper](http://manpages.ubuntu.com/manpages/wily/man8/tmpreaper.8.html), 
> others use tmpfs, others delete /tmp on startup. 
> Defaults are OK to make getting started easier but should not be unsafe 
> (risking data loss). 
> Something under /var would be a better default log.dir under *nix.  Or 
> relative to the Kafka bin directory to avoid needing root.  
> If the default cannot be changed, I would suggest a special warning print to 
> the console on broker startup if log.dir is under /tmp. 
> See [users list 
> thread](http://mail-archives.apache.org/mod_mbox/kafka-users/201607.mbox/%3cCAD5tkZb-0MMuWJqHNUJ3i1+xuNPZ4tnQt-RPm65grxE0=0o...@mail.gmail.com%3e).
>   I've also been bitten by this. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-3925) Default log.dir=/tmp/kafka-logs is unsafe

2016-07-02 Thread Peter Davis (JIRA)
Peter Davis created KAFKA-3925:
--

 Summary: Default log.dir=/tmp/kafka-logs is unsafe
 Key: KAFKA-3925
 URL: https://issues.apache.org/jira/browse/KAFKA-3925
 Project: Kafka
  Issue Type: Bug
  Components: config
Affects Versions: 0.10.0.0
 Environment: Various, depends on OS and configuration
Reporter: Peter Davis


Many operating systems are configured to delete files under /tmp.  For example 
Ubuntu has 
[tmpreaper](http://manpages.ubuntu.com/manpages/wily/man8/tmpreaper.8.html), 
others use tmpfs, others delete /tmp on startup. 

Defaults are OK to make getting started easier but should not be unsafe 
(risking data loss). 

Something under /var would be a better default log.dir under *nix.  Or relative 
to the Kafka bin directory to avoid needing root.  

If the default cannot be changed, I would suggest a special warning print to 
the console on broker startup if log.dir is under /tmp. 

See [users list 
thread](http://mail-archives.apache.org/mod_mbox/kafka-users/201607.mbox/%3cCAD5tkZb-0MMuWJqHNUJ3i1+xuNPZ4tnQt-RPm65grxE0=0o...@mail.gmail.com%3e).
  I've also been bitten by this. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-3857) Additional log cleaner metrics

2016-07-02 Thread Peter Davis (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15360064#comment-15360064
 ] 

Peter Davis edited comment on KAFKA-3857 at 7/2/16 8:16 AM:


Related to KAFKA-3894


was (Author: davispw):
Related to KAFKA-3894 - dup?

> Additional log cleaner metrics
> --
>
> Key: KAFKA-3857
> URL: https://issues.apache.org/jira/browse/KAFKA-3857
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Kiran Pillarisetty
>
> The proposal would be to add a couple of additional log cleaner metrics: 
> 1. Time of last log cleaner run 
> 2. Cumulative number of successful log cleaner runs since last broker restart.
> Existing log cleaner metrics (max-buffer-utilization-percent, 
> cleaner-recopy-percent, max-clean-time-secs, max-dirty-percent) do not 
> differentiate an idle log cleaner from a dead log cleaner. It would be useful 
> to have the above two metrics added, to indicate whether log cleaner is alive 
> (and successfully cleaning) or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3857) Additional log cleaner metrics

2016-07-02 Thread Peter Davis (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15360064#comment-15360064
 ] 

Peter Davis commented on KAFKA-3857:


Related to KAFKA-3894 - dup?

> Additional log cleaner metrics
> --
>
> Key: KAFKA-3857
> URL: https://issues.apache.org/jira/browse/KAFKA-3857
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Kiran Pillarisetty
>
> The proposal would be to add a couple of additional log cleaner metrics: 
> 1. Time of last log cleaner run 
> 2. Cumulative number of successful log cleaner runs since last broker restart.
> Existing log cleaner metrics (max-buffer-utilization-percent, 
> cleaner-recopy-percent, max-clean-time-secs, max-dirty-percent) do not 
> differentiate an idle log cleaner from a dead log cleaner. It would be useful 
> to have the above two metrics added, to indicate whether log cleaner is alive 
> (and successfully cleaning) or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3894) Log Cleaner thread crashes and never restarts

2016-07-02 Thread Peter Davis (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15360062#comment-15360062
 ] 

Peter Davis commented on KAFKA-3894:


A quick improvement would be to increase the severity of the log message when 
log cleaner stops. Right now there is just an "INFO" message that's easy to 
miss.

> Log Cleaner thread crashes and never restarts
> -
>
> Key: KAFKA-3894
> URL: https://issues.apache.org/jira/browse/KAFKA-3894
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8.2.2, 0.9.0.1
> Environment: Oracle JDK 8
> Ubuntu Precise
>Reporter: Tim Carey-Smith
>  Labels: compaction
>
> The log-cleaner thread can crash if the number of keys in a topic grows to be 
> too large to fit into the dedupe buffer. 
> The result of this is a log line: 
> {quote}
> broker=0 pri=ERROR t=kafka-log-cleaner-thread-0 at=LogCleaner 
> \[kafka-log-cleaner-thread-0\], Error due to  
> java.lang.IllegalArgumentException: requirement failed: 9750860 messages in 
> segment MY_FAVORITE_TOPIC-2/47580165.log but offset map can fit 
> only 5033164. You can increase log.cleaner.dedupe.buffer.size or decrease 
> log.cleaner.threads
> {quote}
> As a result, the broker is left in a potentially dangerous situation where 
> cleaning of compacted topics is not running. 
> It is unclear if the broader strategy for the {{LogCleaner}} is the reason 
> for this upper bound, or if this is a value which must be tuned for each 
> specific use-case. 
> Of more immediate concern is the fact that the thread crash is not visible 
> via JMX or exposed as some form of service degradation. 
> Some short-term remediations we have made are:
> * increasing the size of the dedupe buffer
> * monitoring the log-cleaner threads inside the JVM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3834) Consumer should not block in poll on coordinator discovery

2016-06-26 Thread Peter Davis (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15350268#comment-15350268
 ] 

Peter Davis commented on KAFKA-3834:


I believe we have seen this issue IRL when the new coordinator takes a long 
time to become available after an election.  This can happen if log compaction 
has halted (for example due to too-small I/O buffer), then __consumer_offsets 
will grow ridiculously large; in one instance it was taking the coordinators 
several minutes to come online before we realized the problem.  Meanwhile, 
poll() would spin and log red-herring errors every 100ms. 

This also occurs on commitSync(), which I believe does a poll() internally, but 
also has a "while" loop of its own.  Should improving blocking of commitSync() 
be a separate JIRA?

> Consumer should not block in poll on coordinator discovery
> --
>
> Key: KAFKA-3834
> URL: https://issues.apache.org/jira/browse/KAFKA-3834
> Project: Kafka
>  Issue Type: Improvement
>  Components: consumer
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>
> Currently we block indefinitely in poll() when discovering the coordinator 
> for the group. Instead, we can return an empty record set when the passed 
> timeout expires. The downside is that it may obscure the underlying problem 
> (which is usually misconfiguration), but users typically have to look at the 
> logs to figure out the problem anyway. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2934) Offset storage file configuration in Connect standalone mode is not included in StandaloneConfig

2016-06-09 Thread Peter Davis (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15323697#comment-15323697
 ] 

Peter Davis commented on KAFKA-2934:


After upgrading to 0.10, discovered that this fix causes 
`offset.storage.file.filename` to be a *required* property, even if a 
FileOffsetBackingStore is not used.  We are consuming from JMS (which has no 
need to store offsets as acknowledged messages are removed from the source) so 
we are using a MemoryOffsetBackingStore.

https://github.com/apache/kafka/pull/734 argues there are only two sensible 
choices for a backing store but I think my use case shows that is not true!

Would you be willing to revisit the pull request to make the offset backing 
store class configurable, and make the file optional?

> Offset storage file configuration in Connect standalone mode is not included 
> in StandaloneConfig
> 
>
> Key: KAFKA-2934
> URL: https://issues.apache.org/jira/browse/KAFKA-2934
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.9.0.0
>Reporter: Ewen Cheslack-Postava
>Assignee: Ewen Cheslack-Postava
> Fix For: 0.10.0.0
>
>
> The config is coded directly in FileOffsetBackingStore rather than being 
> listed (and validated) in StandaloneConfig. This also means it wouldn't be 
> included if we autogenerated docs from the config classes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3710) MemoryOffsetBackingStore creates a non-daemon thread that prevents clean shutdown

2016-05-13 Thread Peter Davis (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Davis updated KAFKA-3710:
---
Labels: easyfix newbie patch  (was: easyfix newbie)

> MemoryOffsetBackingStore creates a non-daemon thread that prevents clean 
> shutdown
> -
>
> Key: KAFKA-3710
> URL: https://issues.apache.org/jira/browse/KAFKA-3710
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.9.0.1
>Reporter: Peter Davis
>Assignee: Ewen Cheslack-Postava
>  Labels: easyfix, newbie, patch
>
> MemoryOffsetBackingStore creates an ExecutorService but 
> MemoryOffsetBackingStore.stop() fails to call executor.shutdown().  This 
> creates a zombie non-daemon thread which prevents clean shutdown when running 
> a StandaloneHerder embedded in another application.
> Note that FileOffsetBackingStore extends MemoryOffsetBackingStore so is also 
> affected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-3710) MemoryOffsetBackingStore creates a non-daemon thread that prevents clean shutdown

2016-05-13 Thread Peter Davis (JIRA)
Peter Davis created KAFKA-3710:
--

 Summary: MemoryOffsetBackingStore creates a non-daemon thread that 
prevents clean shutdown
 Key: KAFKA-3710
 URL: https://issues.apache.org/jira/browse/KAFKA-3710
 Project: Kafka
  Issue Type: Bug
  Components: KafkaConnect
Affects Versions: 0.9.0.1
Reporter: Peter Davis
Assignee: Ewen Cheslack-Postava


MemoryOffsetBackingStore creates an ExecutorService but 
MemoryOffsetBackingStore.stop() fails to call executor.shutdown().  This 
creates a zombie non-daemon thread which prevents clean shutdown when running a 
StandaloneHerder embedded in another application.

Note that FileOffsetBackingStore extends MemoryOffsetBackingStore so is also 
affected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3627) New consumer doesn't run delayed tasks while under load

2016-05-05 Thread Peter Davis (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15272615#comment-15272615
 ] 

Peter Davis commented on KAFKA-3627:


I believe this problem is aggravated by max.poll.messages, but since there are 
timing issues I haven't confirmed.  I can confirm that this is affecting us and 
that any failure to heartbeat or commit is an extremely serious problem as it 
can result in a "rebalance storm" where no consumers ever make progress.  
Unfortunately, we are banking on max.poll.messages to address other rebalance 
storm problems.

> New consumer doesn't run delayed tasks while under load
> ---
>
> Key: KAFKA-3627
> URL: https://issues.apache.org/jira/browse/KAFKA-3627
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.9.0.1
>Reporter: Rob Underwood
>Assignee: Jason Gustafson
>Priority: Blocker
> Fix For: 0.10.0.0
>
> Attachments: DelayedTaskBugConsumer.java, kafka-3627-output.log
>
>
> If the new consumer receives a steady flow of fetch responses it will not run 
> delayed tasks, which means it will not heartbeat or perform automatic offset 
> commits.
> The main cause is the code that attempts to pipeline fetch responses and keep 
> the consumer fed.  Specifically, in KafkaConsumer::pollOnce() there is a 
> check that skips calling client.poll() if there are fetched records ready 
> (line 903 in the 0.9.0 branch of this writing).  Then in 
> KafkaConsumer::poll(), if records are returned it will initiate another fetch 
> and perform a quick poll, which will send/receive fetch requests/responses 
> but will not run delayed tasks.
> If the timing works out, and the consumer is consistently receiving fetched 
> records, it won't run delayed tasks until it doesn't receive a fetch response 
> during its quick poll.  That leads to a rebalance since the consumer isn't 
> heartbeating, and typically means all the consumed records will be 
> re-delivered since the automatic offset commit wasn't able to run either.
> h5. Steps to reproduce
> # Start up a cluster with *at least 2 brokers*.  This seems to be required to 
> reproduce the issue, I'm guessing because the fetch responses all arrive 
> together when using a single broker.
> # Create a topic with a good number of partitions
> #* bq. bin/kafka-topics.sh --zookeeper localhost:2181 --create --topic 
> delayed-task-bug --partitions 10 --replication-factor 1
> # Generate some test data so the consumer has plenty to consume.  In this 
> case I'm just using uuids
> #* bq. for ((i=0;i<100;++i)) do; cat /proc/sys/kernel/random/uuid >>  
> /tmp/test-messages; done
> #* bq. bin/kafka-console-producer.sh --broker-list localhost:9092 --topic 
> delayed-task-bug < /tmp/test-messages
> # Start up a consumer with a small max fetch size to ensure it only pulls a 
> few records at a time.  The consumer can simply sleep for a moment when it 
> receives a record.
> #* I'll attach an example in Java
> # There's a timing aspect to this issue so it may take a few attempts to 
> reproduce



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)