[jira] [Comment Edited] (KAFKA-2729) Cached zkVersion not equal to that in zookeeper, broker not recovering.
[ https://issues.apache.org/jira/browse/KAFKA-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648157#comment-16648157 ] adam keyser edited comment on KAFKA-2729 at 10/12/18 5:12 PM: -- Still seeing it here as well - 1.1.1 was (Author: adamkeyser): Still seeing it here as well. > Cached zkVersion not equal to that in zookeeper, broker not recovering. > --- > > Key: KAFKA-2729 > URL: https://issues.apache.org/jira/browse/KAFKA-2729 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.8.2.1, 0.9.0.0, 0.10.0.0, 0.10.1.0, 0.11.0.0 >Reporter: Danil Serdyuchenko >Assignee: Onur Karaman >Priority: Major > Fix For: 1.1.0 > > > After a small network wobble where zookeeper nodes couldn't reach each other, > we started seeing a large number of undereplicated partitions. The zookeeper > cluster recovered, however we continued to see a large number of > undereplicated partitions. Two brokers in the kafka cluster were showing this > in the logs: > {code} > [2015-10-27 11:36:00,888] INFO Partition > [__samza_checkpoint_event-creation_1,3] on broker 5: Shrinking ISR for > partition [__samza_checkpoint_event-creation_1,3] from 6,5 to 5 > (kafka.cluster.Partition) > [2015-10-27 11:36:00,891] INFO Partition > [__samza_checkpoint_event-creation_1,3] on broker 5: Cached zkVersion [66] > not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition) > {code} > For all of the topics on the effected brokers. Both brokers only recovered > after a restart. Our own investigation yielded nothing, I was hoping you > could shed some light on this issue. Possibly if it's related to: > https://issues.apache.org/jira/browse/KAFKA-1382 , however we're using > 0.8.2.1. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-2729) Cached zkVersion not equal to that in zookeeper, broker not recovering.
[ https://issues.apache.org/jira/browse/KAFKA-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648157#comment-16648157 ] adam keyser commented on KAFKA-2729: Still seeing it here as well. > Cached zkVersion not equal to that in zookeeper, broker not recovering. > --- > > Key: KAFKA-2729 > URL: https://issues.apache.org/jira/browse/KAFKA-2729 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.8.2.1, 0.9.0.0, 0.10.0.0, 0.10.1.0, 0.11.0.0 >Reporter: Danil Serdyuchenko >Assignee: Onur Karaman >Priority: Major > Fix For: 1.1.0 > > > After a small network wobble where zookeeper nodes couldn't reach each other, > we started seeing a large number of undereplicated partitions. The zookeeper > cluster recovered, however we continued to see a large number of > undereplicated partitions. Two brokers in the kafka cluster were showing this > in the logs: > {code} > [2015-10-27 11:36:00,888] INFO Partition > [__samza_checkpoint_event-creation_1,3] on broker 5: Shrinking ISR for > partition [__samza_checkpoint_event-creation_1,3] from 6,5 to 5 > (kafka.cluster.Partition) > [2015-10-27 11:36:00,891] INFO Partition > [__samza_checkpoint_event-creation_1,3] on broker 5: Cached zkVersion [66] > not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition) > {code} > For all of the topics on the effected brokers. Both brokers only recovered > after a restart. Our own investigation yielded nothing, I was hoping you > could shed some light on this issue. Possibly if it's related to: > https://issues.apache.org/jira/browse/KAFKA-1382 , however we're using > 0.8.2.1. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-6982) java.lang.ArithmeticException: / by zero
[ https://issues.apache.org/jira/browse/KAFKA-6982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16506806#comment-16506806 ] adam keyser commented on KAFKA-6982: Can this get added to 1.1.1 ? > java.lang.ArithmeticException: / by zero > > > Key: KAFKA-6982 > URL: https://issues.apache.org/jira/browse/KAFKA-6982 > Project: Kafka > Issue Type: Bug > Components: network >Affects Versions: 1.1.0 > Environment: Environment: Windows 10. >Reporter: wade wu >Priority: Major > Fix For: 2.0.0 > > > Producer keeps sending messages to Kafka, Kafka is down. > Server.log shows: > .. > [2018-06-01 17:01:33,945] WARN [Log partition=__consumer_offsets-6, > dir=D:\data\Kafka\kafka-logs] Found a corrupted index file corresponding to > log file > D:\data\Kafka\kafka-logs__consumer_offsets-6\.log due to > Corrupt index found, index file > (D:\data\Kafka\kafka-logs__consumer_offsets-6\.index) has > non-zero size but the last offset is 0 which is no greater than the base > offset 0.}, recovering segment and rebuilding index files... (kafka.log.Log) > [2018-06-01 17:01:33,945] WARN [Log partition=__consumer_offsets-6, > dir=D:\data\Kafka\kafka-logs] Found a corrupted index file corresponding to > log file > D:\data\Kafka\kafka-logs__consumer_offsets-6\.log due to > Corrupt index found, index file > (D:\data\Kafka\kafka-logs__consumer_offsets-6\.index) has > non-zero size but the last offset is 0 which is no greater than the base > offset 0.}, recovering segment and rebuilding index files... (kafka.log.Log) > [2018-06-01 17:01:34,664] ERROR Error while accepting connection > (kafka.network.Acceptor) > java.lang.ArithmeticException: / by zero > at kafka.network.Acceptor.run(SocketServer.scala:354) > at java.lang.Thread.run(Thread.java:748) > [2018-06-01 17:01:34,664] ERROR Error while accepting connection > (kafka.network.Acceptor) > java.lang.ArithmeticException: / by zero > at kafka.network.Acceptor.run(SocketServer.scala:354) > at java.lang.Thread.run(Thread.java:748) > [2018-06-01 17:01:34,664] ERROR Error while accepting connection > (kafka.network.Acceptor) > java.lang.ArithmeticException: / by zero > at kafka.network.Acceptor.run(SocketServer.scala:354) > at java.lang.Thread.run(Thread.java:748) > .. > > This line of code in SocketServer.scala causing the error: > {color:#33} currentProcessor = > currentProcessor{color:#d04437} % processors.size{color}{color} > > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (KAFKA-5413) Log cleaner fails due to large offset in segment file
[ https://issues.apache.org/jira/browse/KAFKA-5413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250419#comment-16250419 ] adam keyser edited comment on KAFKA-5413 at 11/13/17 10:49 PM: --- We are encountering this bug - Is 0.10.2.2 released? I don't see a link in the releases page. was (Author: adamkeyser): We are encountering this bug - Is 10.2.2 released? I don't see a link in the releases page. > Log cleaner fails due to large offset in segment file > - > > Key: KAFKA-5413 > URL: https://issues.apache.org/jira/browse/KAFKA-5413 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.10.2.0, 0.10.2.1 > Environment: Ubuntu 14.04 LTS, Oracle Java 8u92, kafka_2.11-0.10.2.0 >Reporter: Nicholas Ngorok >Assignee: Kelvin Rutt >Priority: Critical > Labels: reliability > Fix For: 0.10.2.2, 0.11.0.0 > > Attachments: .index.cleaned, > .log, .log.cleaned, > .timeindex.cleaned, 002147422683.log, > kafka-5413.patch > > > The log cleaner thread in our brokers is failing with the trace below > {noformat} > [2017-06-08 15:49:54,822] INFO {kafka-log-cleaner-thread-0} Cleaner 0: > Cleaning segment 0 in log __consumer_offsets-12 (largest timestamp Thu Jun 08 > 15:48:59 PDT 2017) into 0, retaining deletes. (kafka.log.LogCleaner) > [2017-06-08 15:49:54,822] INFO {kafka-log-cleaner-thread-0} Cleaner 0: > Cleaning segment 2147343575 in log __consumer_offsets-12 (largest timestamp > Thu Jun 08 15:49:06 PDT 2017) into 0, retaining deletes. > (kafka.log.LogCleaner) > [2017-06-08 15:49:54,834] ERROR {kafka-log-cleaner-thread-0} > [kafka-log-cleaner-thread-0], Error due to (kafka.log.LogCleaner) > java.lang.IllegalArgumentException: requirement failed: largest offset in > message set can not be safely converted to relative offset. > at scala.Predef$.require(Predef.scala:224) > at kafka.log.LogSegment.append(LogSegment.scala:109) > at kafka.log.Cleaner.cleanInto(LogCleaner.scala:478) > at > kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCleaner.scala:405) > at > kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCleaner.scala:401) > at scala.collection.immutable.List.foreach(List.scala:381) > at kafka.log.Cleaner.cleanSegments(LogCleaner.scala:401) > at kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:363) > at kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:362) > at scala.collection.immutable.List.foreach(List.scala:381) > at kafka.log.Cleaner.clean(LogCleaner.scala:362) > at > kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:241) > at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:220) > at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63) > [2017-06-08 15:49:54,835] INFO {kafka-log-cleaner-thread-0} > [kafka-log-cleaner-thread-0], Stopped (kafka.log.LogCleaner) > {noformat} > This seems to point at the specific line [here| > https://github.com/apache/kafka/blob/0.11.0/core/src/main/scala/kafka/log/LogSegment.scala#L92] > in the kafka src where the difference is actually larger than MAXINT as both > baseOffset and offset are of type long. It was introduced in this [pr| > https://github.com/apache/kafka/pull/2210/files/56d1f8196b77a47b176b7bbd1e4220a3be827631] > These were the outputs of dumping the first two log segments > {noformat} > :~$ /usr/bin/kafka-run-class kafka.tools.DumpLogSegments --deep-iteration > --files /kafka-logs/__consumer_offsets-12/000 > 0.log > Dumping /kafka-logs/__consumer_offsets-12/.log > Starting offset: 0 > offset: 1810054758 position: 0 NoTimestampType: -1 isvalid: true payloadsize: > -1 magic: 0 compresscodec: NONE crc: 3127861909 keysize: 34 > :~$ /usr/bin/kafka-run-class kafka.tools.DumpLogSegments --deep-iteration > --files /kafka-logs/__consumer_offsets-12/000 > 0002147343575.log > Dumping /kafka-logs/__consumer_offsets-12/002147343575.log > Starting offset: 2147343575 > offset: 2147539884 position: 0 NoTimestampType: -1 isvalid: true paylo > adsize: -1 magic: 0 compresscodec: NONE crc: 2282192097 keysize: 34 > {noformat} > My guess is that since 2147539884 is larger than MAXINT, we are hitting this > exception. Was there a specific reason, this check was added in 0.10.2? > E.g. if the first offset is a key = "key 0" and then we have MAXINT + 1 of > "key 1" following, wouldn't we run into this situation whenever the log > cleaner runs? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5413) Log cleaner fails due to large offset in segment file
[ https://issues.apache.org/jira/browse/KAFKA-5413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250419#comment-16250419 ] adam keyser commented on KAFKA-5413: We are encountering this bug - Is 10.2.2 released? I don't see a link in the releases page. > Log cleaner fails due to large offset in segment file > - > > Key: KAFKA-5413 > URL: https://issues.apache.org/jira/browse/KAFKA-5413 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.10.2.0, 0.10.2.1 > Environment: Ubuntu 14.04 LTS, Oracle Java 8u92, kafka_2.11-0.10.2.0 >Reporter: Nicholas Ngorok >Assignee: Kelvin Rutt >Priority: Critical > Labels: reliability > Fix For: 0.10.2.2, 0.11.0.0 > > Attachments: .index.cleaned, > .log, .log.cleaned, > .timeindex.cleaned, 002147422683.log, > kafka-5413.patch > > > The log cleaner thread in our brokers is failing with the trace below > {noformat} > [2017-06-08 15:49:54,822] INFO {kafka-log-cleaner-thread-0} Cleaner 0: > Cleaning segment 0 in log __consumer_offsets-12 (largest timestamp Thu Jun 08 > 15:48:59 PDT 2017) into 0, retaining deletes. (kafka.log.LogCleaner) > [2017-06-08 15:49:54,822] INFO {kafka-log-cleaner-thread-0} Cleaner 0: > Cleaning segment 2147343575 in log __consumer_offsets-12 (largest timestamp > Thu Jun 08 15:49:06 PDT 2017) into 0, retaining deletes. > (kafka.log.LogCleaner) > [2017-06-08 15:49:54,834] ERROR {kafka-log-cleaner-thread-0} > [kafka-log-cleaner-thread-0], Error due to (kafka.log.LogCleaner) > java.lang.IllegalArgumentException: requirement failed: largest offset in > message set can not be safely converted to relative offset. > at scala.Predef$.require(Predef.scala:224) > at kafka.log.LogSegment.append(LogSegment.scala:109) > at kafka.log.Cleaner.cleanInto(LogCleaner.scala:478) > at > kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCleaner.scala:405) > at > kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCleaner.scala:401) > at scala.collection.immutable.List.foreach(List.scala:381) > at kafka.log.Cleaner.cleanSegments(LogCleaner.scala:401) > at kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:363) > at kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:362) > at scala.collection.immutable.List.foreach(List.scala:381) > at kafka.log.Cleaner.clean(LogCleaner.scala:362) > at > kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:241) > at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:220) > at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63) > [2017-06-08 15:49:54,835] INFO {kafka-log-cleaner-thread-0} > [kafka-log-cleaner-thread-0], Stopped (kafka.log.LogCleaner) > {noformat} > This seems to point at the specific line [here| > https://github.com/apache/kafka/blob/0.11.0/core/src/main/scala/kafka/log/LogSegment.scala#L92] > in the kafka src where the difference is actually larger than MAXINT as both > baseOffset and offset are of type long. It was introduced in this [pr| > https://github.com/apache/kafka/pull/2210/files/56d1f8196b77a47b176b7bbd1e4220a3be827631] > These were the outputs of dumping the first two log segments > {noformat} > :~$ /usr/bin/kafka-run-class kafka.tools.DumpLogSegments --deep-iteration > --files /kafka-logs/__consumer_offsets-12/000 > 0.log > Dumping /kafka-logs/__consumer_offsets-12/.log > Starting offset: 0 > offset: 1810054758 position: 0 NoTimestampType: -1 isvalid: true payloadsize: > -1 magic: 0 compresscodec: NONE crc: 3127861909 keysize: 34 > :~$ /usr/bin/kafka-run-class kafka.tools.DumpLogSegments --deep-iteration > --files /kafka-logs/__consumer_offsets-12/000 > 0002147343575.log > Dumping /kafka-logs/__consumer_offsets-12/002147343575.log > Starting offset: 2147343575 > offset: 2147539884 position: 0 NoTimestampType: -1 isvalid: true paylo > adsize: -1 magic: 0 compresscodec: NONE crc: 2282192097 keysize: 34 > {noformat} > My guess is that since 2147539884 is larger than MAXINT, we are hitting this > exception. Was there a specific reason, this check was added in 0.10.2? > E.g. if the first offset is a key = "key 0" and then we have MAXINT + 1 of > "key 1" following, wouldn't we run into this situation whenever the log > cleaner runs? -- This message was sent by Atlassian JIRA (v6.4.14#64029)