[jira] [Comment Edited] (KAFKA-5040) Increase number of Streams producer retries from the default of 0
[ https://issues.apache.org/jira/browse/KAFKA-5040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15999309#comment-15999309 ] Peter Davis edited comment on KAFKA-5040 at 5/6/17 5:58 AM: Just confirming that per [this comment on KAFKA-3197|https://issues.apache.org/jira/browse/KAFKA-3197?focusedCommentId=15130761&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15130761], this change does not break any ordering guarantees? was (Author: davispw): Just confirming that per [this comment on KAFKA-3197](https://issues.apache.org/jira/browse/KAFKA-3197?focusedCommentId=15130761&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15130761), this change does not break any ordering guarantees? > Increase number of Streams producer retries from the default of 0 > - > > Key: KAFKA-5040 > URL: https://issues.apache.org/jira/browse/KAFKA-5040 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 0.10.2.0 >Reporter: Eno Thereska >Assignee: Eno Thereska >Priority: Blocker > Fix For: 0.11.0.0, 0.10.2.1 > > > In Kafka Streams, the default value for the producer retries is not changed > from the default of 0. That leads to situations where a streams instance > fails when a broker is temporarily unavailable. Increasing this number to > 0 > would help. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-5040) Increase number of Streams producer retries from the default of 0
[ https://issues.apache.org/jira/browse/KAFKA-5040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15999309#comment-15999309 ] Peter Davis commented on KAFKA-5040: Just confirming that per [this comment on KAFKA-3197](https://issues.apache.org/jira/browse/KAFKA-3197?focusedCommentId=15130761&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15130761), this change does not break any ordering guarantees? > Increase number of Streams producer retries from the default of 0 > - > > Key: KAFKA-5040 > URL: https://issues.apache.org/jira/browse/KAFKA-5040 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 0.10.2.0 >Reporter: Eno Thereska >Assignee: Eno Thereska >Priority: Blocker > Fix For: 0.11.0.0, 0.10.2.1 > > > In Kafka Streams, the default value for the producer retries is not changed > from the default of 0. That leads to situations where a streams instance > fails when a broker is temporarily unavailable. Increasing this number to > 0 > would help. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-1641) Log cleaner exits if last cleaned offset is lower than earliest offset
[ https://issues.apache.org/jira/browse/KAFKA-1641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15832266#comment-15832266 ] Peter Davis commented on KAFKA-1641: "Me too" on 0.10.0.1 - does this issue need to be reopened? java.lang.IllegalArgumentException: requirement failed: Last clean offset is 43056300 but segment base offset is 42738384 for log -redacted- -0. at scala.Predef$.require(Predef.scala:224) at kafka.log.Cleaner.buildOffsetMap(LogCleaner.scala:604) at kafka.log.Cleaner.clean(LogCleaner.scala:329) at kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:237) at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:215) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63) > Log cleaner exits if last cleaned offset is lower than earliest offset > -- > > Key: KAFKA-1641 > URL: https://issues.apache.org/jira/browse/KAFKA-1641 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.8.1.1 >Reporter: Joel Koshy >Assignee: Guozhang Wang > Fix For: 0.9.0.0 > > Attachments: KAFKA-1641_2014-10-09_13:04:15.patch, KAFKA-1641.patch > > > Encountered this recently: the log cleaner exited a while ago (I think > because the topic had compressed messages). That issue was subsequently > addressed by having the producer only send uncompressed. However, on a > subsequent restart of the broker we see this: > In this scenario I think it is reasonable to just emit a warning and have the > cleaner round up its first dirty offset to the base offset of the first > segment. > {code} > [kafka-server] [] [kafka-log-cleaner-thread-0], Error due to > java.lang.IllegalArgumentException: requirement failed: Last clean offset is > 54770438 but segment base offset is 382844024 for log testtopic-0. > at scala.Predef$.require(Predef.scala:145) > at kafka.log.Cleaner.buildOffsetMap(LogCleaner.scala:491) > at kafka.log.Cleaner.clean(LogCleaner.scala:288) > at > kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:202) > at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:187) > at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:51) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3894) Log Cleaner thread crashes and never restarts
[ https://issues.apache.org/jira/browse/KAFKA-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15382511#comment-15382511 ] Peter Davis commented on KAFKA-3894: Re: "the broker seems to be working" You may regret not taking action now. As Tim mentioned from the talk at the Kafka Summit (http://www.slideshare.net/jjkoshy/kafkaesque-days-at-linked-in-in-2015/49), if __consumer_offsets is not compacted and has accumulated millions (or billions!) of messages, it can take many minutes for the broker to elect a new coordinator after any kind of hiccup. *Your new consumers may be hung during this time!* However, even shutting down brokers to change the configuration will cause coordinator elections which will cause an outage. It seems like not having a "hot spare" for Offset Managers is a liability here… We were bit by this bug and it caused all kinds of headaches until we managed to get __consumer_offsets cleaned up again. > Log Cleaner thread crashes and never restarts > - > > Key: KAFKA-3894 > URL: https://issues.apache.org/jira/browse/KAFKA-3894 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.8.2.2, 0.9.0.1 > Environment: Oracle JDK 8 > Ubuntu Precise >Reporter: Tim Carey-Smith > Labels: compaction > > The log-cleaner thread can crash if the number of keys in a topic grows to be > too large to fit into the dedupe buffer. > The result of this is a log line: > {quote} > broker=0 pri=ERROR t=kafka-log-cleaner-thread-0 at=LogCleaner > \[kafka-log-cleaner-thread-0\], Error due to > java.lang.IllegalArgumentException: requirement failed: 9750860 messages in > segment MY_FAVORITE_TOPIC-2/47580165.log but offset map can fit > only 5033164. You can increase log.cleaner.dedupe.buffer.size or decrease > log.cleaner.threads > {quote} > As a result, the broker is left in a potentially dangerous situation where > cleaning of compacted topics is not running. > It is unclear if the broader strategy for the {{LogCleaner}} is the reason > for this upper bound, or if this is a value which must be tuned for each > specific use-case. > Of more immediate concern is the fact that the thread crash is not visible > via JMX or exposed as some form of service degradation. > Some short-term remediations we have made are: > * increasing the size of the dedupe buffer > * monitoring the log-cleaner threads inside the JVM -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KAFKA-3893) Kafka Borker ID disappears from /borkers/ids
[ https://issues.apache.org/jira/browse/KAFKA-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15360306#comment-15360306 ] Peter Davis edited comment on KAFKA-3893 at 7/2/16 7:36 PM: Sriharsha, I have witnessed this too and it very much seems like a bug in Kafka -- when a zookeeper connection is lost, it seems any other changes in the cluster during the loss (which would be expected if an outage affects multiple brokers) are not recognized when it reconnects. We see the same loop of "Shrinking ISR" and "Cached zkVerskom [###] not equal to that in zookeeper", and the broker never recovers until manually restarted. For us this happened almost daily when running on a cluster virtual machines that would get paused for a few seconds every night for a snapshot backup. We disabled the backup but it's very concerning that Kafka won't recover after a pause! Seen with 0.9 and 0.10. was (Author: davispw): Sriharsha, I have witnessed this too and it very much seems like a bug in Kafka -- when a zookeeper connection is lost, any other changes in the cluster during the loss are not recognized when it reconnects. We see the same loop of "Shrinking ISR" and "Cached zkVerskom [###] not equal to that in zookeeper", and the broker never recovers. For us this happened almost daily when running on a cluster virtual machines that would get paused for a few seconds every night for a snapshot backup. We disabled the backup but it's very concerning that Kafka won't recover after a pause! > Kafka Borker ID disappears from /borkers/ids > > > Key: KAFKA-3893 > URL: https://issues.apache.org/jira/browse/KAFKA-3893 > Project: Kafka > Issue Type: Bug >Reporter: chaitra >Priority: Critical > > Kafka version used : 0.8.2.1 > Zookeeper version: 3.4.6 > We have scenario where kafka 's broker in zookeeper path /brokers/ids just > disappears. > We see the zookeeper connection active and no network issue. > The zookeeper conection timeout is set to 6000ms in server.properties > Hence Kafka not participating in cluster -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3893) Kafka Borker ID disappears from /borkers/ids
[ https://issues.apache.org/jira/browse/KAFKA-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15360306#comment-15360306 ] Peter Davis commented on KAFKA-3893: Sriharsha, I have witnessed this too and it very much seems like a bug in Kafka -- when a zookeeper connection is lost, any other changes in the cluster during the loss are not recognized when it reconnects. We see the same loop of "Shrinking ISR" and "Cached zkVerskom [###] not equal to that in zookeeper", and the broker never recovers. For us this happened almost daily when running on a cluster virtual machines that would get paused for a few seconds every night for a snapshot backup. We disabled the backup but it's very concerning that Kafka won't recover after a pause! > Kafka Borker ID disappears from /borkers/ids > > > Key: KAFKA-3893 > URL: https://issues.apache.org/jira/browse/KAFKA-3893 > Project: Kafka > Issue Type: Bug >Reporter: chaitra >Priority: Critical > > Kafka version used : 0.8.2.1 > Zookeeper version: 3.4.6 > We have scenario where kafka 's broker in zookeeper path /brokers/ids just > disappears. > We see the zookeeper connection active and no network issue. > The zookeeper conection timeout is set to 6000ms in server.properties > Hence Kafka not participating in cluster -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-3925) Default log.dir=/tmp/kafka-logs is unsafe
[ https://issues.apache.org/jira/browse/KAFKA-3925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Davis updated KAFKA-3925: --- Description: Many operating systems are configured to delete files under /tmp. For example Ubuntu has [tmpreaper|http://manpages.ubuntu.com/manpages/wily/man8/tmpreaper.8.html], others use tmpfs, others delete /tmp on startup. Defaults are OK to make getting started easier but should not be unsafe (risking data loss). Something under /var would be a better default log.dir under *nix. Or relative to the Kafka bin directory to avoid needing root. If the default cannot be changed, I would suggest a special warning print to the console on broker startup if log.dir is under /tmp. See [users list thread|http://mail-archives.apache.org/mod_mbox/kafka-users/201607.mbox/%3cCAD5tkZb-0MMuWJqHNUJ3i1+xuNPZ4tnQt-RPm65grxE0=0o...@mail.gmail.com%3e]. I've also been bitten by this. was: Many operating systems are configured to delete files under /tmp. For example Ubuntu has [tmpreaper](http://manpages.ubuntu.com/manpages/wily/man8/tmpreaper.8.html), others use tmpfs, others delete /tmp on startup. Defaults are OK to make getting started easier but should not be unsafe (risking data loss). Something under /var would be a better default log.dir under *nix. Or relative to the Kafka bin directory to avoid needing root. If the default cannot be changed, I would suggest a special warning print to the console on broker startup if log.dir is under /tmp. See [users list thread](http://mail-archives.apache.org/mod_mbox/kafka-users/201607.mbox/%3cCAD5tkZb-0MMuWJqHNUJ3i1+xuNPZ4tnQt-RPm65grxE0=0o...@mail.gmail.com%3e). I've also been bitten by this. > Default log.dir=/tmp/kafka-logs is unsafe > - > > Key: KAFKA-3925 > URL: https://issues.apache.org/jira/browse/KAFKA-3925 > Project: Kafka > Issue Type: Bug > Components: config >Affects Versions: 0.10.0.0 > Environment: Various, depends on OS and configuration >Reporter: Peter Davis > > Many operating systems are configured to delete files under /tmp. For > example Ubuntu has > [tmpreaper|http://manpages.ubuntu.com/manpages/wily/man8/tmpreaper.8.html], > others use tmpfs, others delete /tmp on startup. > Defaults are OK to make getting started easier but should not be unsafe > (risking data loss). > Something under /var would be a better default log.dir under *nix. Or > relative to the Kafka bin directory to avoid needing root. > If the default cannot be changed, I would suggest a special warning print to > the console on broker startup if log.dir is under /tmp. > See [users list > thread|http://mail-archives.apache.org/mod_mbox/kafka-users/201607.mbox/%3cCAD5tkZb-0MMuWJqHNUJ3i1+xuNPZ4tnQt-RPm65grxE0=0o...@mail.gmail.com%3e]. > I've also been bitten by this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3925) Default log.dir=/tmp/kafka-logs is unsafe
[ https://issues.apache.org/jira/browse/KAFKA-3925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15360291#comment-15360291 ] Peter Davis commented on KAFKA-3925: Another recent mailing list thread: http://mail-archives.apache.org/mod_mbox/kafka-users/201606.mbox/%3ccag4txrjgodvq5_nlb2qt_sskhxttjpuy6d42hdtcz5hjqxk...@mail.gmail.com%3e > Default log.dir=/tmp/kafka-logs is unsafe > - > > Key: KAFKA-3925 > URL: https://issues.apache.org/jira/browse/KAFKA-3925 > Project: Kafka > Issue Type: Bug > Components: config >Affects Versions: 0.10.0.0 > Environment: Various, depends on OS and configuration >Reporter: Peter Davis > > Many operating systems are configured to delete files under /tmp. For > example Ubuntu has > [tmpreaper](http://manpages.ubuntu.com/manpages/wily/man8/tmpreaper.8.html), > others use tmpfs, others delete /tmp on startup. > Defaults are OK to make getting started easier but should not be unsafe > (risking data loss). > Something under /var would be a better default log.dir under *nix. Or > relative to the Kafka bin directory to avoid needing root. > If the default cannot be changed, I would suggest a special warning print to > the console on broker startup if log.dir is under /tmp. > See [users list > thread](http://mail-archives.apache.org/mod_mbox/kafka-users/201607.mbox/%3cCAD5tkZb-0MMuWJqHNUJ3i1+xuNPZ4tnQt-RPm65grxE0=0o...@mail.gmail.com%3e). > I've also been bitten by this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-3925) Default log.dir=/tmp/kafka-logs is unsafe
Peter Davis created KAFKA-3925: -- Summary: Default log.dir=/tmp/kafka-logs is unsafe Key: KAFKA-3925 URL: https://issues.apache.org/jira/browse/KAFKA-3925 Project: Kafka Issue Type: Bug Components: config Affects Versions: 0.10.0.0 Environment: Various, depends on OS and configuration Reporter: Peter Davis Many operating systems are configured to delete files under /tmp. For example Ubuntu has [tmpreaper](http://manpages.ubuntu.com/manpages/wily/man8/tmpreaper.8.html), others use tmpfs, others delete /tmp on startup. Defaults are OK to make getting started easier but should not be unsafe (risking data loss). Something under /var would be a better default log.dir under *nix. Or relative to the Kafka bin directory to avoid needing root. If the default cannot be changed, I would suggest a special warning print to the console on broker startup if log.dir is under /tmp. See [users list thread](http://mail-archives.apache.org/mod_mbox/kafka-users/201607.mbox/%3cCAD5tkZb-0MMuWJqHNUJ3i1+xuNPZ4tnQt-RPm65grxE0=0o...@mail.gmail.com%3e). I've also been bitten by this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KAFKA-3857) Additional log cleaner metrics
[ https://issues.apache.org/jira/browse/KAFKA-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15360064#comment-15360064 ] Peter Davis edited comment on KAFKA-3857 at 7/2/16 8:16 AM: Related to KAFKA-3894 was (Author: davispw): Related to KAFKA-3894 - dup? > Additional log cleaner metrics > -- > > Key: KAFKA-3857 > URL: https://issues.apache.org/jira/browse/KAFKA-3857 > Project: Kafka > Issue Type: Improvement >Reporter: Kiran Pillarisetty > > The proposal would be to add a couple of additional log cleaner metrics: > 1. Time of last log cleaner run > 2. Cumulative number of successful log cleaner runs since last broker restart. > Existing log cleaner metrics (max-buffer-utilization-percent, > cleaner-recopy-percent, max-clean-time-secs, max-dirty-percent) do not > differentiate an idle log cleaner from a dead log cleaner. It would be useful > to have the above two metrics added, to indicate whether log cleaner is alive > (and successfully cleaning) or not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3857) Additional log cleaner metrics
[ https://issues.apache.org/jira/browse/KAFKA-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15360064#comment-15360064 ] Peter Davis commented on KAFKA-3857: Related to KAFKA-3894 - dup? > Additional log cleaner metrics > -- > > Key: KAFKA-3857 > URL: https://issues.apache.org/jira/browse/KAFKA-3857 > Project: Kafka > Issue Type: Improvement >Reporter: Kiran Pillarisetty > > The proposal would be to add a couple of additional log cleaner metrics: > 1. Time of last log cleaner run > 2. Cumulative number of successful log cleaner runs since last broker restart. > Existing log cleaner metrics (max-buffer-utilization-percent, > cleaner-recopy-percent, max-clean-time-secs, max-dirty-percent) do not > differentiate an idle log cleaner from a dead log cleaner. It would be useful > to have the above two metrics added, to indicate whether log cleaner is alive > (and successfully cleaning) or not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3894) Log Cleaner thread crashes and never restarts
[ https://issues.apache.org/jira/browse/KAFKA-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15360062#comment-15360062 ] Peter Davis commented on KAFKA-3894: A quick improvement would be to increase the severity of the log message when log cleaner stops. Right now there is just an "INFO" message that's easy to miss. > Log Cleaner thread crashes and never restarts > - > > Key: KAFKA-3894 > URL: https://issues.apache.org/jira/browse/KAFKA-3894 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.8.2.2, 0.9.0.1 > Environment: Oracle JDK 8 > Ubuntu Precise >Reporter: Tim Carey-Smith > Labels: compaction > > The log-cleaner thread can crash if the number of keys in a topic grows to be > too large to fit into the dedupe buffer. > The result of this is a log line: > {quote} > broker=0 pri=ERROR t=kafka-log-cleaner-thread-0 at=LogCleaner > \[kafka-log-cleaner-thread-0\], Error due to > java.lang.IllegalArgumentException: requirement failed: 9750860 messages in > segment MY_FAVORITE_TOPIC-2/47580165.log but offset map can fit > only 5033164. You can increase log.cleaner.dedupe.buffer.size or decrease > log.cleaner.threads > {quote} > As a result, the broker is left in a potentially dangerous situation where > cleaning of compacted topics is not running. > It is unclear if the broader strategy for the {{LogCleaner}} is the reason > for this upper bound, or if this is a value which must be tuned for each > specific use-case. > Of more immediate concern is the fact that the thread crash is not visible > via JMX or exposed as some form of service degradation. > Some short-term remediations we have made are: > * increasing the size of the dedupe buffer > * monitoring the log-cleaner threads inside the JVM -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Brokers are crash due to __consumer_offsets folder are deleted
Dear 黄杰斌: I am guessing your operating system is configured to delete your /tmp directory when you restart the server. You will need to change the "log.dir" property in your broker's server.properties file to someplace permanent. Unfortunately, your data is lost unless you had a backup or had configured replication. log.dir The directory in which the log data is kept (supplemental for log.dirs property)string /tmp/kafka-logs high Dear Community: why does log.dir default under /tmp? It is unsafe as a default. -Peter > On Jun 30, 2016, at 11:19 PM, 黄杰斌 wrote: > > Hi All, > > Do you encounter below issue when using kafka_2.11-0.10.0.0? > All brokers are crash due to __consumer_offsets folder are deleted. > sample log: > [2016-06-30 12:46:32,579] FATAL [Replica Manager on Broker 2]: Halting due > to unrecoverable I/O error while handling produce request: > (kafka.server.ReplicaManager) > kafka.common.KafkaStorageException: I/O exception in append to log > '__consumer_offsets-32' >at kafka.log.Log.append(Log.scala:329) >at kafka.cluster.Partition$$anonfun$11.apply(Partition.scala:443) >at kafka.cluster.Partition$$anonfun$11.apply(Partition.scala:429) >at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:231) >at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:237) >at > kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:429) >at > kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:406) >at > kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:392) >at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) >at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) >at scala.collection.immutable.Map$Map1.foreach(Map.scala:116) >at > scala.collection.TraversableLike$class.map(TraversableLike.scala:234) >at scala.collection.AbstractTraversable.map(Traversable.scala:104) >at > kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:392) >at > kafka.server.ReplicaManager.appendMessages(ReplicaManager.scala:328) >at > kafka.coordinator.GroupMetadataManager.store(GroupMetadataManager.scala:232) >at > kafka.coordinator.GroupCoordinator$$anonfun$handleCommitOffsets$9.apply(GroupCoordinator.scala:424) >at > kafka.coordinator.GroupCoordinator$$anonfun$handleCommitOffsets$9.apply(GroupCoordinator.scala:424) >at scala.Option.foreach(Option.scala:257) >at > kafka.coordinator.GroupCoordinator.handleCommitOffsets(GroupCoordinator.scala:424) >at > kafka.server.KafkaApis.handleOffsetCommitRequest(KafkaApis.scala:310) >at kafka.server.KafkaApis.handle(KafkaApis.scala:84) >at > kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60) >at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.FileNotFoundException: > /tmp/kafka2-logs/__consumer_offsets-32/.index (No such > file or directory) >at java.io.RandomAccessFile.open0(Native Method) >at java.io.RandomAccessFile.open(RandomAccessFile.java:316) >at java.io.RandomAccessFile.(RandomAccessFile.java:243) >at > kafka.log.OffsetIndex$$anonfun$resize$1.apply(OffsetIndex.scala:286) >at > kafka.log.OffsetIndex$$anonfun$resize$1.apply(OffsetIndex.scala:285) >at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:231) >at kafka.log.OffsetIndex.resize(OffsetIndex.scala:285) >at > kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply$mcV$sp(OffsetIndex.scala:274) >at > kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply(OffsetIndex.scala:274) >at > kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply(OffsetIndex.scala:274) >at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:231) >at kafka.log.OffsetIndex.trimToValidSize(OffsetIndex.scala:273) >at kafka.log.Log.roll(Log.scala:655) >at kafka.log.Log.maybeRoll(Log.scala:630) >at kafka.log.Log.append(Log.scala:383) >... 23 more > > No one remove those folders, and topic __consumer_offsets is handled by > broker, no one can remove this topic. > Do you know why this happened? And how to avoid it? > > Best Regards, > Ben
[jira] [Commented] (KAFKA-3834) Consumer should not block in poll on coordinator discovery
[ https://issues.apache.org/jira/browse/KAFKA-3834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15350268#comment-15350268 ] Peter Davis commented on KAFKA-3834: I believe we have seen this issue IRL when the new coordinator takes a long time to become available after an election. This can happen if log compaction has halted (for example due to too-small I/O buffer), then __consumer_offsets will grow ridiculously large; in one instance it was taking the coordinators several minutes to come online before we realized the problem. Meanwhile, poll() would spin and log red-herring errors every 100ms. This also occurs on commitSync(), which I believe does a poll() internally, but also has a "while" loop of its own. Should improving blocking of commitSync() be a separate JIRA? > Consumer should not block in poll on coordinator discovery > -- > > Key: KAFKA-3834 > URL: https://issues.apache.org/jira/browse/KAFKA-3834 > Project: Kafka > Issue Type: Improvement > Components: consumer >Reporter: Jason Gustafson >Assignee: Jason Gustafson > > Currently we block indefinitely in poll() when discovering the coordinator > for the group. Instead, we can return an empty record set when the passed > timeout expires. The downside is that it may obscure the underlying problem > (which is usually misconfiguration), but users typically have to look at the > logs to figure out the problem anyway. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2934) Offset storage file configuration in Connect standalone mode is not included in StandaloneConfig
[ https://issues.apache.org/jira/browse/KAFKA-2934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15323697#comment-15323697 ] Peter Davis commented on KAFKA-2934: After upgrading to 0.10, discovered that this fix causes `offset.storage.file.filename` to be a *required* property, even if a FileOffsetBackingStore is not used. We are consuming from JMS (which has no need to store offsets as acknowledged messages are removed from the source) so we are using a MemoryOffsetBackingStore. https://github.com/apache/kafka/pull/734 argues there are only two sensible choices for a backing store but I think my use case shows that is not true! Would you be willing to revisit the pull request to make the offset backing store class configurable, and make the file optional? > Offset storage file configuration in Connect standalone mode is not included > in StandaloneConfig > > > Key: KAFKA-2934 > URL: https://issues.apache.org/jira/browse/KAFKA-2934 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Affects Versions: 0.9.0.0 >Reporter: Ewen Cheslack-Postava >Assignee: Ewen Cheslack-Postava > Fix For: 0.10.0.0 > > > The config is coded directly in FileOffsetBackingStore rather than being > listed (and validated) in StandaloneConfig. This also means it wouldn't be > included if we autogenerated docs from the config classes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-3710) MemoryOffsetBackingStore creates a non-daemon thread that prevents clean shutdown
[ https://issues.apache.org/jira/browse/KAFKA-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Davis updated KAFKA-3710: --- Labels: easyfix newbie patch (was: easyfix newbie) > MemoryOffsetBackingStore creates a non-daemon thread that prevents clean > shutdown > - > > Key: KAFKA-3710 > URL: https://issues.apache.org/jira/browse/KAFKA-3710 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Affects Versions: 0.9.0.1 >Reporter: Peter Davis >Assignee: Ewen Cheslack-Postava > Labels: easyfix, newbie, patch > > MemoryOffsetBackingStore creates an ExecutorService but > MemoryOffsetBackingStore.stop() fails to call executor.shutdown(). This > creates a zombie non-daemon thread which prevents clean shutdown when running > a StandaloneHerder embedded in another application. > Note that FileOffsetBackingStore extends MemoryOffsetBackingStore so is also > affected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-3710) MemoryOffsetBackingStore creates a non-daemon thread that prevents clean shutdown
Peter Davis created KAFKA-3710: -- Summary: MemoryOffsetBackingStore creates a non-daemon thread that prevents clean shutdown Key: KAFKA-3710 URL: https://issues.apache.org/jira/browse/KAFKA-3710 Project: Kafka Issue Type: Bug Components: KafkaConnect Affects Versions: 0.9.0.1 Reporter: Peter Davis Assignee: Ewen Cheslack-Postava MemoryOffsetBackingStore creates an ExecutorService but MemoryOffsetBackingStore.stop() fails to call executor.shutdown(). This creates a zombie non-daemon thread which prevents clean shutdown when running a StandaloneHerder embedded in another application. Note that FileOffsetBackingStore extends MemoryOffsetBackingStore so is also affected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3627) New consumer doesn't run delayed tasks while under load
[ https://issues.apache.org/jira/browse/KAFKA-3627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15272615#comment-15272615 ] Peter Davis commented on KAFKA-3627: I believe this problem is aggravated by max.poll.messages, but since there are timing issues I haven't confirmed. I can confirm that this is affecting us and that any failure to heartbeat or commit is an extremely serious problem as it can result in a "rebalance storm" where no consumers ever make progress. Unfortunately, we are banking on max.poll.messages to address other rebalance storm problems. > New consumer doesn't run delayed tasks while under load > --- > > Key: KAFKA-3627 > URL: https://issues.apache.org/jira/browse/KAFKA-3627 > Project: Kafka > Issue Type: Bug > Components: consumer >Affects Versions: 0.9.0.1 >Reporter: Rob Underwood >Assignee: Jason Gustafson >Priority: Blocker > Fix For: 0.10.0.0 > > Attachments: DelayedTaskBugConsumer.java, kafka-3627-output.log > > > If the new consumer receives a steady flow of fetch responses it will not run > delayed tasks, which means it will not heartbeat or perform automatic offset > commits. > The main cause is the code that attempts to pipeline fetch responses and keep > the consumer fed. Specifically, in KafkaConsumer::pollOnce() there is a > check that skips calling client.poll() if there are fetched records ready > (line 903 in the 0.9.0 branch of this writing). Then in > KafkaConsumer::poll(), if records are returned it will initiate another fetch > and perform a quick poll, which will send/receive fetch requests/responses > but will not run delayed tasks. > If the timing works out, and the consumer is consistently receiving fetched > records, it won't run delayed tasks until it doesn't receive a fetch response > during its quick poll. That leads to a rebalance since the consumer isn't > heartbeating, and typically means all the consumed records will be > re-delivered since the automatic offset commit wasn't able to run either. > h5. Steps to reproduce > # Start up a cluster with *at least 2 brokers*. This seems to be required to > reproduce the issue, I'm guessing because the fetch responses all arrive > together when using a single broker. > # Create a topic with a good number of partitions > #* bq. bin/kafka-topics.sh --zookeeper localhost:2181 --create --topic > delayed-task-bug --partitions 10 --replication-factor 1 > # Generate some test data so the consumer has plenty to consume. In this > case I'm just using uuids > #* bq. for ((i=0;i<100;++i)) do; cat /proc/sys/kernel/random/uuid >> > /tmp/test-messages; done > #* bq. bin/kafka-console-producer.sh --broker-list localhost:9092 --topic > delayed-task-bug < /tmp/test-messages > # Start up a consumer with a small max fetch size to ensure it only pulls a > few records at a time. The consumer can simply sleep for a moment when it > receives a record. > #* I'll attach an example in Java > # There's a timing aspect to this issue so it may take a few attempts to > reproduce -- This message was sent by Atlassian JIRA (v6.3.4#6332)