[jira] [Comment Edited] (KAFKA-4497) log cleaner breaks on timeindex
[ https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15804713#comment-15804713 ] Michael Andre Pearce (IG) edited comment on KAFKA-4497 at 1/6/17 3:00 PM: -- [~ijuma], [~becket_qin], [~junrao] - thanks and sorry again for your time, effort and patience. was (Author: michael.andre.pearce): [~ijuma][~becket_qin][~junrao] - thanks and sorry again for your time, effort and patience. > log cleaner breaks on timeindex > --- > > Key: KAFKA-4497 > URL: https://issues.apache.org/jira/browse/KAFKA-4497 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 0.10.1.0 > Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0 >Reporter: Robert Schumann >Assignee: Jiangjie Qin >Priority: Critical > Labels: reliability > Fix For: 0.10.1.1 > > Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, > com_ig_trade_v1_position_event--demo--compacted-14.tar.gz, > log-cleaner.log.1.zip, vrtstokf005_4thJan > > > _created from ML entry by request of [~ijuma]_ > Hi all, > we are facing an issue with latest kafka 0.10.1 and the log cleaner thread > with regards to the timeindex files. From the log of the log-cleaner we see > after startup that it tries to cleanup a topic called xdc_listing-status-v2 > [1]. The topic is setup with log compaction [2] and the kafka cluster > configuration has log.cleaner enabled [3]. Looking at the log and the newly > created file [4], the cleaner seems to refer to tombstones prior to > epoch_time=0 - maybe because he finds messages, which don’t have a timestamp > at all (?). All producers and consumers are using 0.10.1 and the topics have > been created completely new, so I’m not sure, where this issue would come > from. The original timeindex file [5] seems to show only valid timestamps for > the mentioned offsets. I would also like to mention that the issue happened > in two independent datacenters at the same time, so I would rather expect an > application/producer issue instead of random disk failures. We didn’t have > the problem with 0.10.0 for around half a year, it appeared shortly after the > upgrade to 0.10.1. > The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 > CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also > confuses me a bit. Does that mean, it does not find any log segments which > can be cleaned up or the last timestamp of the last log segment is somehow > broken/missing? > I’m also a bit wondering, why the log cleaner thread stops completely after > an error with one topic. I would at least expect that it keeps on cleaning up > other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning > the __consumer_offsets anymore. > Does anybody have the same issues or can explain, what’s going on? Thanks for > any help or suggestions. > Cheers > Robert > [1] > {noformat} > [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner) > [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting > (kafka.log.LogCleaner) > [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log > xdc_listing-status-v2-1. (kafka.log.LogCleaner) > [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for > xdc_listing-status-v2-1... (kafka.log.LogCleaner) > [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log > xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log > xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log > xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, > discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log > xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into > 0, retaining deletes. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to > (kafka.log.LogCleaner) > kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot > 9 no larger than the last offset appended (11832) to > /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned. > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117) > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107) > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107) > at kafka.utils.CoreUtils$.inLock(CoreUtils.sc
[jira] [Comment Edited] (KAFKA-4497) log cleaner breaks on timeindex
[ https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15804660#comment-15804660 ] Michael Andre Pearce (IG) edited comment on KAFKA-4497 at 1/6/17 2:46 PM: -- [~junrao] Sorry about this, it does indeed seem to be our patching that caused the issue, now we have removed the jar in question all seems to be good. We have been testing all morning. We have not seen the issue exist again. To explain what occurred - Our patching of the confluent package was done by replacing all jars. download latest from apache site. (2.11 scala version) expand and copy all jars from: ./kafka_2.11-0.10.1.1/libs into confluent-3.1.0/share/java/kafka then remove all files in confluent-3.1.0/share/java/kafka with *-0.10.1.0-cp1*.jar What has caught us out was that in the confluent bundle the jar "kafka_2.11-0.10.1.0.jar" seems to be renamed "kafka.jar" without any versioning, unlike all the other Apache jars, which are versioned and are appended with -CP so on doing the above we ended up with both kafka_2.11-0.10.1.1.jar and kafka.jar as the kafka.jar was not removed. We didn't easily spot this, as obviously we didn't see any 0.10.1.0 versioned jars on the class path, and like wise we did see 0.10.1.1 versions for all the expected jars. Whilst this is obviously the apache jira, we're aware that confluent bundle is by yourselves, is there a reason why this is not versioned like all the other jars and as per the apache release in the confluent bundle? This obviously has tripped us up. was (Author: michael.andre.pearce): [~junrao] Sorry about this, it does indeed seem to be our patching that caused the issue, now we have removed the jar in question all seems to be good. We have been testing all morning. We have not seen the issue exist again. To explain what occurred - Our patching of the confluent package was done by replacing all jars. download latest from apache site. (2.11 scala version) expand and copy all jars from: ./kafka_2.11.-0.10.1.1/libs into confluent-3.1.0/share/java/kafka then remove all files in confluent-3.1.0/share/java/kafka with *-0.10.1.0-cp1*.jar What has caught us out was that in the confluent bundle the jar "kafka_2.11-0.10.1.0.jar" seems to be renamed "kafka.jar" without any versioning, unlike all the other Apache jars, which are versioned and are appended with -CP so on doing the above we ended up with both kafka_2.11-0.10.1.1.jar and kafka.jar as the kafka.jar was not removed. We didn't easily spot this, as obviously we didn't see any 0.10.1.0 versioned jars on the class path, and like wise we did see 0.10.1.1 versions for all the expected jars. Whilst this is obviously the apache jira, we're aware that confluent bundle is by yourselves, is there a reason why this is not versioned like all the other jars and as per the apache release in the confluent bundle? This obviously has tripped us up. > log cleaner breaks on timeindex > --- > > Key: KAFKA-4497 > URL: https://issues.apache.org/jira/browse/KAFKA-4497 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 0.10.1.0 > Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0 >Reporter: Robert Schumann >Assignee: Jiangjie Qin >Priority: Critical > Labels: reliability > Fix For: 0.10.1.1 > > Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, > com_ig_trade_v1_position_event--demo--compacted-14.tar.gz, > log-cleaner.log.1.zip, vrtstokf005_4thJan > > > _created from ML entry by request of [~ijuma]_ > Hi all, > we are facing an issue with latest kafka 0.10.1 and the log cleaner thread > with regards to the timeindex files. From the log of the log-cleaner we see > after startup that it tries to cleanup a topic called xdc_listing-status-v2 > [1]. The topic is setup with log compaction [2] and the kafka cluster > configuration has log.cleaner enabled [3]. Looking at the log and the newly > created file [4], the cleaner seems to refer to tombstones prior to > epoch_time=0 - maybe because he finds messages, which don’t have a timestamp > at all (?). All producers and consumers are using 0.10.1 and the topics have > been created completely new, so I’m not sure, where this issue would come > from. The original timeindex file [5] seems to show only valid timestamps for > the mentioned offsets. I would also like to mention that the issue happened > in two independent datacenters at the same time, so I would rather expect an > application/producer issue instead of random disk failures. We didn’t have > the problem with 0.10.0 for around half a year, it appeared shortly after the > upgrade to 0.10.1. > The puzzling message from the cl
[jira] [Comment Edited] (KAFKA-4497) log cleaner breaks on timeindex
[ https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15804660#comment-15804660 ] Michael Andre Pearce (IG) edited comment on KAFKA-4497 at 1/6/17 2:45 PM: -- [~junrao] Sorry about this, it does indeed seem to be our patching that caused the issue, now we have removed the jar in question all seems to be good. We have been testing all morning. We have not seen the issue exist again. To explain what occurred - Our patching of the confluent package was done by replacing all jars. download latest from apache site. (2.11 scala version) expand and copy all jars from: ./kafka_2.11.-0.10.1.1/libs into confluent-3.1.0/share/java/kafka then remove all files in confluent-3.1.0/share/java/kafka with *-0.10.1.0-cp1*.jar What has caught us out was that in the confluent bundle the jar "kafka_2.11-0.10.1.0.jar" seems to be renamed "kafka.jar" without any versioning, unlike all the other Apache jars, which are versioned and are appended with -CP so on doing the above we ended up with both kafka_2.11-0.10.1.1.jar and kafka.jar as the kafka.jar was not removed. We didn't easily spot this, as obviously we didn't see any 0.10.1.0 versioned jars on the class path, and like wise we did see 0.10.1.1 versions for all the expected jars. Whilst this is obviously the apache jira, we're aware that confluent bundle is by yourselves, is there a reason why this is not versioned like all the other jars and as per the apache release in the confluent bundle? This obviously has tripped us up. was (Author: michael.andre.pearce): [~junrao] Sorry about this, it does indeed seem to be our patching that caused the issue, now we have removed the jar in question all seems to be good. We have been testing all morning. We have not seen the issue exist again. To explain what occurred - Our patching of the confluent package was done by replacing all jars. download latest from apache site. (2.11 scala version) expand and copy all jars from: ./kafka_2.11.-0.10.1.1/libs into confluent-3.1-2.1/share/java/kafka then remove all files in confluent-3.1-2.1/share/java/kafka with *-0.10.1.0-cp2*.jar What has caught us out was that in the confluent bundle the jar "kafka_2.11-0.10.1.0.jar" seems to be renamed "kafka.jar" without any versioning, unlike all the other Apache jars, which are versioned and are appended with -CP so on doing the above we ended up with both kafka_2.11-0.10.1.1.jar and kafka.jar as the kafka.jar was not removed. We didn't easily spot this, as obviously we didn't see any 0.10.1.0 versioned jars on the class path, and like wise we did see 0.10.1.1 versions for all the expected jars. Whilst this is obviously the apache jira, we're aware that confluent bundle is by yourselves, is there a reason why this is not versioned like all the other jars and as per the apache release in the confluent bundle? This obviously has tripped us up. > log cleaner breaks on timeindex > --- > > Key: KAFKA-4497 > URL: https://issues.apache.org/jira/browse/KAFKA-4497 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 0.10.1.0 > Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0 >Reporter: Robert Schumann >Assignee: Jiangjie Qin >Priority: Critical > Labels: reliability > Fix For: 0.10.1.1 > > Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, > com_ig_trade_v1_position_event--demo--compacted-14.tar.gz, > log-cleaner.log.1.zip, vrtstokf005_4thJan > > > _created from ML entry by request of [~ijuma]_ > Hi all, > we are facing an issue with latest kafka 0.10.1 and the log cleaner thread > with regards to the timeindex files. From the log of the log-cleaner we see > after startup that it tries to cleanup a topic called xdc_listing-status-v2 > [1]. The topic is setup with log compaction [2] and the kafka cluster > configuration has log.cleaner enabled [3]. Looking at the log and the newly > created file [4], the cleaner seems to refer to tombstones prior to > epoch_time=0 - maybe because he finds messages, which don’t have a timestamp > at all (?). All producers and consumers are using 0.10.1 and the topics have > been created completely new, so I’m not sure, where this issue would come > from. The original timeindex file [5] seems to show only valid timestamps for > the mentioned offsets. I would also like to mention that the issue happened > in two independent datacenters at the same time, so I would rather expect an > application/producer issue instead of random disk failures. We didn’t have > the problem with 0.10.0 for around half a year, it appeared shortly after the > upgrade to 0.10.1. > The puzzling message from t
[jira] [Comment Edited] (KAFKA-4497) log cleaner breaks on timeindex
[ https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802030#comment-15802030 ] Michael Andre Pearce (IG) edited comment on KAFKA-4497 at 1/6/17 8:35 AM: -- When we got the issue on 0.10.1.1 we were just as surprised and as such that was our very first thing we checked. our internal mail at the time when this occured in UAT (first env we saw it post 0.10.1.1 upgrade): > Date: Tuesday, 3 January 2017 at 17:07 Subject: Re: Kafka Issues fixed in 0.10.1.1 how we can patch until confluent release. Yeah I already double checked that, all the cp1 jars are gone and the 0.10.1.1’s exist, pasted below, hope ive missed something..? Class path: :/opt/projects/confluent/bin/../share/java/kafka/jetty-continuation-9.2.15.v20160210.jar:/opt/projects/confluent/bin/../share/java/kafka/aopalliance-repackaged-2.4.0-b34.jar:/opt/projects/confluent/bin/../share/java/kafka/connect-file-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/argparse4j-0.5.0.jar:/opt/projects/confluent/bin/../share/java/kafka/connect-json-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/avro-1.7.7.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-beanutils-1.8.3.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-codec-1.9.jar:/opt/projects/confluent/bin/../share/java/kafka/jetty-http-9.2.15.v20160210.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-collections-3.2.1.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-compress-1.4.1.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-digester-1.8.1.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-lang3-3.1.jar:/opt/projects/confluent/bin/../share/java/kafka/log4j-1.2.17.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-logging-1.2.jar:/opt/projects/confluent/bin/../share/java/kafka/lz4-1.3.0.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-validator-1.4.1.jar:/opt/projects/confluent/bin/../share/java/kafka/metrics-core-2.2.0.jar:/opt/projects/confluent/bin/../share/java/kafka/jetty-io-9.2.15.v20160210.jar:/opt/projects/confluent/bin/../share/java/kafka/jetty-util-9.2.15.v20160210.jar:/opt/projects/confluent/bin/../share/java/kafka/jopt-simple-4.9.jar:/opt/projects/confluent/bin/../share/java/kafka/connect-runtime-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/guava-18.0.jar:/opt/projects/confluent/bin/../share/java/kafka/osgi-resource-locator-1.0.1.jar:/opt/projects/confluent/bin/../share/java/kafka/hk2-api-2.4.0-b34.jar:/opt/projects/confluent/bin/../share/java/kafka/paranamer-2.3.jar:/opt/projects/confluent/bin/../share/java/kafka/hk2-locator-2.4.0-b34.jar:/opt/projects/confluent/bin/../share/java/kafka/reflections-0.9.10.jar:/opt/projects/confluent/bin/../share/java/kafka/hk2-utils-2.4.0-b34.jar:/opt/projects/confluent/bin/../share/java/kafka/kafka-clients-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/httpclient-4.5.1.jar:/opt/projects/confluent/bin/../share/java/kafka/kafka-streams-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/httpcore-4.4.3.jar:/opt/projects/confluent/bin/../share/java/kafka/kafka-tools-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/httpmime-4.5.1.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-annotations-2.6.0.jar:/opt/projects/confluent/bin/../share/java/kafka/rocksdbjni-4.9.0.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-core-2.6.3.jar:/opt/projects/confluent/bin/../share/java/kafka/scala-library-2.11.8.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-core-asl-1.9.13.jar:/opt/projects/confluent/bin/../share/java/kafka/slf4j-api-1.7.21.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-databind-2.6.3.jar:/opt/projects/confluent/bin/../share/java/kafka/slf4j-log4j12-1.7.21.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-jaxrs-base-2.6.3.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-jaxrs-json-provider-2.6.3.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-mapper-asl-1.9.13.jar:/opt/projects/confluent/bin/../share/java/kafka/jersey-media-jaxb-2.22.2.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-module-jaxb-annotations-2.6.3.jar:/opt/projects/confluent/bin/../share/java/kafka/snappy-java-1.1.2.6.jar:/opt/projects/confluent/bin/../share/java/kafka/javassist-3.18.2-GA.jar:/opt/projects/confluent/bin/../share/java/kafka/support-metrics-client-3.1.0.jar:/opt/projects/confluent/bin/../share/java/kafka/javax.annotation-api-1.2.jar:/opt/projects/confluent/bin/../share/java/kafka/kafka_2.11-0.10.1.1-test.jar:/opt/projects/confluent/bin/../share/java/kafka/javax.inject-1.jar:/opt/projects/confluent/bin/../share/java/kafka/support-metrics-common-3.1.0.jar:/opt/projects/confluent/bin/../share/java/kafka/javax.inject-2.4.0-b34.jar:/opt/projects/confluent/bin/../s
[jira] [Comment Edited] (KAFKA-4497) log cleaner breaks on timeindex
[ https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802030#comment-15802030 ] Michael Andre Pearce (IG) edited comment on KAFKA-4497 at 1/6/17 8:33 AM: -- When we got the issue on 0.10.1.1 we were just as surprised and as such that was our very first thing we checked. our internal mail at the time when this occured in UAT (first env we saw it post 0.10.1.1 upgrade): > From: Andrew Holford Date: Tuesday, 3 January 2017 at 17:07 Subject: Re: Kafka Issues fixed in 0.10.1.1 how we can patch until confluent release. Yeah I already double checked that, all the cp1 jars are gone and the 0.10.1.1’s exist, pasted below, hope ive missed something..? Class path: :/opt/projects/confluent/bin/../share/java/kafka/jetty-continuation-9.2.15.v20160210.jar:/opt/projects/confluent/bin/../share/java/kafka/aopalliance-repackaged-2.4.0-b34.jar:/opt/projects/confluent/bin/../share/java/kafka/connect-file-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/argparse4j-0.5.0.jar:/opt/projects/confluent/bin/../share/java/kafka/connect-json-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/avro-1.7.7.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-beanutils-1.8.3.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-codec-1.9.jar:/opt/projects/confluent/bin/../share/java/kafka/jetty-http-9.2.15.v20160210.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-collections-3.2.1.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-compress-1.4.1.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-digester-1.8.1.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-lang3-3.1.jar:/opt/projects/confluent/bin/../share/java/kafka/log4j-1.2.17.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-logging-1.2.jar:/opt/projects/confluent/bin/../share/java/kafka/lz4-1.3.0.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-validator-1.4.1.jar:/opt/projects/confluent/bin/../share/java/kafka/metrics-core-2.2.0.jar:/opt/projects/confluent/bin/../share/java/kafka/jetty-io-9.2.15.v20160210.jar:/opt/projects/confluent/bin/../share/java/kafka/jetty-util-9.2.15.v20160210.jar:/opt/projects/confluent/bin/../share/java/kafka/jopt-simple-4.9.jar:/opt/projects/confluent/bin/../share/java/kafka/connect-runtime-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/guava-18.0.jar:/opt/projects/confluent/bin/../share/java/kafka/osgi-resource-locator-1.0.1.jar:/opt/projects/confluent/bin/../share/java/kafka/hk2-api-2.4.0-b34.jar:/opt/projects/confluent/bin/../share/java/kafka/paranamer-2.3.jar:/opt/projects/confluent/bin/../share/java/kafka/hk2-locator-2.4.0-b34.jar:/opt/projects/confluent/bin/../share/java/kafka/reflections-0.9.10.jar:/opt/projects/confluent/bin/../share/java/kafka/hk2-utils-2.4.0-b34.jar:/opt/projects/confluent/bin/../share/java/kafka/kafka-clients-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/httpclient-4.5.1.jar:/opt/projects/confluent/bin/../share/java/kafka/kafka-streams-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/httpcore-4.4.3.jar:/opt/projects/confluent/bin/../share/java/kafka/kafka-tools-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/httpmime-4.5.1.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-annotations-2.6.0.jar:/opt/projects/confluent/bin/../share/java/kafka/rocksdbjni-4.9.0.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-core-2.6.3.jar:/opt/projects/confluent/bin/../share/java/kafka/scala-library-2.11.8.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-core-asl-1.9.13.jar:/opt/projects/confluent/bin/../share/java/kafka/slf4j-api-1.7.21.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-databind-2.6.3.jar:/opt/projects/confluent/bin/../share/java/kafka/slf4j-log4j12-1.7.21.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-jaxrs-base-2.6.3.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-jaxrs-json-provider-2.6.3.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-mapper-asl-1.9.13.jar:/opt/projects/confluent/bin/../share/java/kafka/jersey-media-jaxb-2.22.2.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-module-jaxb-annotations-2.6.3.jar:/opt/projects/confluent/bin/../share/java/kafka/snappy-java-1.1.2.6.jar:/opt/projects/confluent/bin/../share/java/kafka/javassist-3.18.2-GA.jar:/opt/projects/confluent/bin/../share/java/kafka/support-metrics-client-3.1.0.jar:/opt/projects/confluent/bin/../share/java/kafka/javax.annotation-api-1.2.jar:/opt/projects/confluent/bin/../share/java/kafka/kafka_2.11-0.10.1.1-test.jar:/opt/projects/confluent/bin/../share/java/kafka/javax.inject-1.jar:/opt/projects/confluent/bin/../share/java/kafka/support-metrics-common-3.1.0.jar:/opt/projects/confluent/bin/../share/java/kafka/javax.inject-2.4.0-b34.jar:/opt/proje
[jira] [Comment Edited] (KAFKA-4497) log cleaner breaks on timeindex
[ https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15803984#comment-15803984 ] Michael Andre Pearce (IG) edited comment on KAFKA-4497 at 1/6/17 8:31 AM: -- [~junrao] As noted, we are patching the confluent 3.1.0 build with 0.10.1.1, just as anyway we will deploy 3.1.2 when its released (we assume you're ensuring all bugs really are fixed before releasing) to be sure, is there an SNAPSHOT build / package of the 3.1.2 with kafka 0.10.1.1 you have this way can avoid any possible bad patching or discrepancies by ourselves. was (Author: michael.andre.pearce): As noted, we are patching the confluent 3.1.0 build with 0.10.1.1, just as anyway we will deploy 3.1.2 when its released (we assume you're ensuring all bugs really are fixed before releasing) to be sure, is there an SNAPSHOT build / package of the 3.1.2 with kafka 0.10.1.1 you have this way can avoid any possible bad patching or discrepancies by ourselves. > log cleaner breaks on timeindex > --- > > Key: KAFKA-4497 > URL: https://issues.apache.org/jira/browse/KAFKA-4497 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 0.10.1.0 > Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0 >Reporter: Robert Schumann >Assignee: Jiangjie Qin >Priority: Critical > Labels: reliability > Fix For: 0.10.1.1 > > Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, > com_ig_trade_v1_position_event--demo--compacted-14.tar.gz, > log-cleaner.log.1.zip, vrtstokf005_4thJan > > > _created from ML entry by request of [~ijuma]_ > Hi all, > we are facing an issue with latest kafka 0.10.1 and the log cleaner thread > with regards to the timeindex files. From the log of the log-cleaner we see > after startup that it tries to cleanup a topic called xdc_listing-status-v2 > [1]. The topic is setup with log compaction [2] and the kafka cluster > configuration has log.cleaner enabled [3]. Looking at the log and the newly > created file [4], the cleaner seems to refer to tombstones prior to > epoch_time=0 - maybe because he finds messages, which don’t have a timestamp > at all (?). All producers and consumers are using 0.10.1 and the topics have > been created completely new, so I’m not sure, where this issue would come > from. The original timeindex file [5] seems to show only valid timestamps for > the mentioned offsets. I would also like to mention that the issue happened > in two independent datacenters at the same time, so I would rather expect an > application/producer issue instead of random disk failures. We didn’t have > the problem with 0.10.0 for around half a year, it appeared shortly after the > upgrade to 0.10.1. > The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 > CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also > confuses me a bit. Does that mean, it does not find any log segments which > can be cleaned up or the last timestamp of the last log segment is somehow > broken/missing? > I’m also a bit wondering, why the log cleaner thread stops completely after > an error with one topic. I would at least expect that it keeps on cleaning up > other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning > the __consumer_offsets anymore. > Does anybody have the same issues or can explain, what’s going on? Thanks for > any help or suggestions. > Cheers > Robert > [1] > {noformat} > [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner) > [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting > (kafka.log.LogCleaner) > [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log > xdc_listing-status-v2-1. (kafka.log.LogCleaner) > [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for > xdc_listing-status-v2-1... (kafka.log.LogCleaner) > [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log > xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log > xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log > xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, > discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log > xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into > 0, retaining deletes. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to > (kafka.log.LogClean
[jira] [Comment Edited] (KAFKA-4497) log cleaner breaks on timeindex
[ https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15803984#comment-15803984 ] Michael Andre Pearce (IG) edited comment on KAFKA-4497 at 1/6/17 8:30 AM: -- As noted, we are patching the confluent 3.1.0 build with 0.10.1.1, just as anyway we will deploy 3.1.2 when its released (we assume you're ensuring all bugs really are fixed before releasing) to be sure, is there an SNAPSHOT build / package of the 3.1.2 with kafka 0.10.1.1 you have this way can avoid any possible bad patching or discrepancies by ourselves. was (Author: michael.andre.pearce): As noted, we are patching the confluent 3.1.0 build with 0.10.1.1, just as anyway we will deploy 3.1.2 when its released (we assume you're ensuring all bugs really are fixed before releasing) to be sure, is there an SNAPSHOT build / package of the 3.1.2 you guys have this way can avoid any possible bad patching or discrepancies by ourselves. > log cleaner breaks on timeindex > --- > > Key: KAFKA-4497 > URL: https://issues.apache.org/jira/browse/KAFKA-4497 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 0.10.1.0 > Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0 >Reporter: Robert Schumann >Assignee: Jiangjie Qin >Priority: Critical > Labels: reliability > Fix For: 0.10.1.1 > > Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, > com_ig_trade_v1_position_event--demo--compacted-14.tar.gz, > log-cleaner.log.1.zip, vrtstokf005_4thJan > > > _created from ML entry by request of [~ijuma]_ > Hi all, > we are facing an issue with latest kafka 0.10.1 and the log cleaner thread > with regards to the timeindex files. From the log of the log-cleaner we see > after startup that it tries to cleanup a topic called xdc_listing-status-v2 > [1]. The topic is setup with log compaction [2] and the kafka cluster > configuration has log.cleaner enabled [3]. Looking at the log and the newly > created file [4], the cleaner seems to refer to tombstones prior to > epoch_time=0 - maybe because he finds messages, which don’t have a timestamp > at all (?). All producers and consumers are using 0.10.1 and the topics have > been created completely new, so I’m not sure, where this issue would come > from. The original timeindex file [5] seems to show only valid timestamps for > the mentioned offsets. I would also like to mention that the issue happened > in two independent datacenters at the same time, so I would rather expect an > application/producer issue instead of random disk failures. We didn’t have > the problem with 0.10.0 for around half a year, it appeared shortly after the > upgrade to 0.10.1. > The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 > CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also > confuses me a bit. Does that mean, it does not find any log segments which > can be cleaned up or the last timestamp of the last log segment is somehow > broken/missing? > I’m also a bit wondering, why the log cleaner thread stops completely after > an error with one topic. I would at least expect that it keeps on cleaning up > other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning > the __consumer_offsets anymore. > Does anybody have the same issues or can explain, what’s going on? Thanks for > any help or suggestions. > Cheers > Robert > [1] > {noformat} > [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner) > [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting > (kafka.log.LogCleaner) > [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log > xdc_listing-status-v2-1. (kafka.log.LogCleaner) > [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for > xdc_listing-status-v2-1... (kafka.log.LogCleaner) > [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log > xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log > xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log > xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, > discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log > xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into > 0, retaining deletes. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to > (kafka.log.LogCleaner) > kafka.common.Invali
[jira] [Comment Edited] (KAFKA-4497) log cleaner breaks on timeindex
[ https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15803919#comment-15803919 ] Michael Andre Pearce (IG) edited comment on KAFKA-4497 at 1/6/17 8:17 AM: -- we have one discrepancy found with an old unversioned jar on cp, going to remove it and retest, if you hold of doing any further investigations we will let you know once we have re-deployed and checked. This indeed might be our bad. was (Author: michael.andre.pearce): we have one discrepancy found, going to remove it and retest > log cleaner breaks on timeindex > --- > > Key: KAFKA-4497 > URL: https://issues.apache.org/jira/browse/KAFKA-4497 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 0.10.1.0 > Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0 >Reporter: Robert Schumann >Assignee: Jiangjie Qin >Priority: Critical > Labels: reliability > Fix For: 0.10.1.1 > > Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, > com_ig_trade_v1_position_event--demo--compacted-14.tar.gz, > log-cleaner.log.1.zip, vrtstokf005_4thJan > > > _created from ML entry by request of [~ijuma]_ > Hi all, > we are facing an issue with latest kafka 0.10.1 and the log cleaner thread > with regards to the timeindex files. From the log of the log-cleaner we see > after startup that it tries to cleanup a topic called xdc_listing-status-v2 > [1]. The topic is setup with log compaction [2] and the kafka cluster > configuration has log.cleaner enabled [3]. Looking at the log and the newly > created file [4], the cleaner seems to refer to tombstones prior to > epoch_time=0 - maybe because he finds messages, which don’t have a timestamp > at all (?). All producers and consumers are using 0.10.1 and the topics have > been created completely new, so I’m not sure, where this issue would come > from. The original timeindex file [5] seems to show only valid timestamps for > the mentioned offsets. I would also like to mention that the issue happened > in two independent datacenters at the same time, so I would rather expect an > application/producer issue instead of random disk failures. We didn’t have > the problem with 0.10.0 for around half a year, it appeared shortly after the > upgrade to 0.10.1. > The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 > CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also > confuses me a bit. Does that mean, it does not find any log segments which > can be cleaned up or the last timestamp of the last log segment is somehow > broken/missing? > I’m also a bit wondering, why the log cleaner thread stops completely after > an error with one topic. I would at least expect that it keeps on cleaning up > other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning > the __consumer_offsets anymore. > Does anybody have the same issues or can explain, what’s going on? Thanks for > any help or suggestions. > Cheers > Robert > [1] > {noformat} > [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner) > [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting > (kafka.log.LogCleaner) > [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log > xdc_listing-status-v2-1. (kafka.log.LogCleaner) > [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for > xdc_listing-status-v2-1... (kafka.log.LogCleaner) > [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log > xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log > xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log > xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, > discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log > xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into > 0, retaining deletes. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to > (kafka.log.LogCleaner) > kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot > 9 no larger than the last offset appended (11832) to > /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned. > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117) > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107) > at > kafka.log.TimeIndex$$an
[jira] [Comment Edited] (KAFKA-4497) log cleaner breaks on timeindex
[ https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15803829#comment-15803829 ] Michael Andre Pearce (IG) edited comment on KAFKA-4497 at 1/6/17 8:03 AM: -- [~junrao] We are fairly confident jars are good based on process cp, see above we double checked again the process class path on getting the issue last night, post reducing min.cleanable.dirty.ratio to make the cleaner kick in. If you note all jars are 0.10.1.1 that are kafka. We can offer to deploy a custom built jar, which can contain further logging/debug in for you if you wish (as we have this in a testing env) or we can offer to even host a webex or something so you can look at the system your self. was (Author: michael.andre.pearce): [~junrao] We are 99.9% jars are good, see above we double checked again the process class path on getting the issue last night, post reducing min.cleanable.dirty.ratio to make the cleaner kick in. If you note all jars are 0.10.1.1 that are kafka. We can offer to deploy a custom built jar, which can contain further logging/debug in for you if you wish (as we have this in a testing env) or we can offer to even host a webex or something so you can look at the system your self. > log cleaner breaks on timeindex > --- > > Key: KAFKA-4497 > URL: https://issues.apache.org/jira/browse/KAFKA-4497 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 0.10.1.0 > Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0 >Reporter: Robert Schumann >Assignee: Jiangjie Qin >Priority: Critical > Labels: reliability > Fix For: 0.10.1.1 > > Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, > com_ig_trade_v1_position_event--demo--compacted-14.tar.gz, > log-cleaner.log.1.zip, vrtstokf005_4thJan > > > _created from ML entry by request of [~ijuma]_ > Hi all, > we are facing an issue with latest kafka 0.10.1 and the log cleaner thread > with regards to the timeindex files. From the log of the log-cleaner we see > after startup that it tries to cleanup a topic called xdc_listing-status-v2 > [1]. The topic is setup with log compaction [2] and the kafka cluster > configuration has log.cleaner enabled [3]. Looking at the log and the newly > created file [4], the cleaner seems to refer to tombstones prior to > epoch_time=0 - maybe because he finds messages, which don’t have a timestamp > at all (?). All producers and consumers are using 0.10.1 and the topics have > been created completely new, so I’m not sure, where this issue would come > from. The original timeindex file [5] seems to show only valid timestamps for > the mentioned offsets. I would also like to mention that the issue happened > in two independent datacenters at the same time, so I would rather expect an > application/producer issue instead of random disk failures. We didn’t have > the problem with 0.10.0 for around half a year, it appeared shortly after the > upgrade to 0.10.1. > The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 > CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also > confuses me a bit. Does that mean, it does not find any log segments which > can be cleaned up or the last timestamp of the last log segment is somehow > broken/missing? > I’m also a bit wondering, why the log cleaner thread stops completely after > an error with one topic. I would at least expect that it keeps on cleaning up > other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning > the __consumer_offsets anymore. > Does anybody have the same issues or can explain, what’s going on? Thanks for > any help or suggestions. > Cheers > Robert > [1] > {noformat} > [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner) > [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting > (kafka.log.LogCleaner) > [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log > xdc_listing-status-v2-1. (kafka.log.LogCleaner) > [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for > xdc_listing-status-v2-1... (kafka.log.LogCleaner) > [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log > xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log > xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log > xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, > discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,580
[jira] [Comment Edited] (KAFKA-4497) log cleaner breaks on timeindex
[ https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802233#comment-15802233 ] Michael Andre Pearce (IG) edited comment on KAFKA-4497 at 1/5/17 7:14 PM: -- [~junrao] , we took it from the website, via the links. (edited remove exact mirror link, as actually im not 100% i know we downloaded from website though, will confirm which mirror exactly tomorrow if it makes any difference) Will look to reduce that down then. was (Author: michael.andre.pearce): [~junrao] , we took it from the website, via the links, downloading from the first mirror site, http://apache.mirror.anlx.net/kafka/0.10.1.1/kafka_2.11-0.10.1.1.tgz Will look to reduce that down then. > log cleaner breaks on timeindex > --- > > Key: KAFKA-4497 > URL: https://issues.apache.org/jira/browse/KAFKA-4497 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 0.10.1.0 > Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0 >Reporter: Robert Schumann >Assignee: Jiangjie Qin >Priority: Critical > Labels: reliability > Fix For: 0.10.1.1 > > Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, > vrtstokf005_4thJan > > > _created from ML entry by request of [~ijuma]_ > Hi all, > we are facing an issue with latest kafka 0.10.1 and the log cleaner thread > with regards to the timeindex files. From the log of the log-cleaner we see > after startup that it tries to cleanup a topic called xdc_listing-status-v2 > [1]. The topic is setup with log compaction [2] and the kafka cluster > configuration has log.cleaner enabled [3]. Looking at the log and the newly > created file [4], the cleaner seems to refer to tombstones prior to > epoch_time=0 - maybe because he finds messages, which don’t have a timestamp > at all (?). All producers and consumers are using 0.10.1 and the topics have > been created completely new, so I’m not sure, where this issue would come > from. The original timeindex file [5] seems to show only valid timestamps for > the mentioned offsets. I would also like to mention that the issue happened > in two independent datacenters at the same time, so I would rather expect an > application/producer issue instead of random disk failures. We didn’t have > the problem with 0.10.0 for around half a year, it appeared shortly after the > upgrade to 0.10.1. > The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 > CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also > confuses me a bit. Does that mean, it does not find any log segments which > can be cleaned up or the last timestamp of the last log segment is somehow > broken/missing? > I’m also a bit wondering, why the log cleaner thread stops completely after > an error with one topic. I would at least expect that it keeps on cleaning up > other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning > the __consumer_offsets anymore. > Does anybody have the same issues or can explain, what’s going on? Thanks for > any help or suggestions. > Cheers > Robert > [1] > {noformat} > [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner) > [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting > (kafka.log.LogCleaner) > [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log > xdc_listing-status-v2-1. (kafka.log.LogCleaner) > [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for > xdc_listing-status-v2-1... (kafka.log.LogCleaner) > [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log > xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log > xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log > xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, > discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log > xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into > 0, retaining deletes. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to > (kafka.log.LogCleaner) > kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot > 9 no larger than the last offset appended (11832) to > /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned. > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117) > at > kafka.log.TimeIndex$
[jira] [Comment Edited] (KAFKA-4497) log cleaner breaks on timeindex
[ https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15801230#comment-15801230 ] Michael Andre Pearce (IG) edited comment on KAFKA-4497 at 1/5/17 12:37 PM: --- [~becket_qin] we have managed to recover the log4j cleaner log from yesterdays, from the same node we supplied the log segments, when we had it occur on 0.10.1.1. We have now re rolled back forward that testing environment back onto 0.10.1.1 and waiting for it to re-occur. (bear in mind we had been running 6 weeks before we saw the initial issue on 0.10.1.0 version, whilst subsequently we have seen it re-occur much faster, we cannot tell how long we will have to wait) was (Author: michael.andre.pearce): [~becket_qin] we have managed to recover the log4j cleaner log from yesterdays, from the same node we supplied the log segments, when we had it occur on 0.10.1.1. We have now re rolled back forward that testing environment back onto 0.10.1.1 and waiting for it to re-occur. (bear in mind we had been running 6 weeks before we saw the initial issue on 0.10.1.0 issue, whilst subsequently we have seen it re-occur much faster, we cannot tell how long we will have to wait) > log cleaner breaks on timeindex > --- > > Key: KAFKA-4497 > URL: https://issues.apache.org/jira/browse/KAFKA-4497 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 0.10.1.0 > Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0 >Reporter: Robert Schumann >Assignee: Jiangjie Qin >Priority: Critical > Labels: reliability > Fix For: 0.10.1.1 > > Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, > vrtstokf005_4thJan > > > _created from ML entry by request of [~ijuma]_ > Hi all, > we are facing an issue with latest kafka 0.10.1 and the log cleaner thread > with regards to the timeindex files. From the log of the log-cleaner we see > after startup that it tries to cleanup a topic called xdc_listing-status-v2 > [1]. The topic is setup with log compaction [2] and the kafka cluster > configuration has log.cleaner enabled [3]. Looking at the log and the newly > created file [4], the cleaner seems to refer to tombstones prior to > epoch_time=0 - maybe because he finds messages, which don’t have a timestamp > at all (?). All producers and consumers are using 0.10.1 and the topics have > been created completely new, so I’m not sure, where this issue would come > from. The original timeindex file [5] seems to show only valid timestamps for > the mentioned offsets. I would also like to mention that the issue happened > in two independent datacenters at the same time, so I would rather expect an > application/producer issue instead of random disk failures. We didn’t have > the problem with 0.10.0 for around half a year, it appeared shortly after the > upgrade to 0.10.1. > The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 > CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also > confuses me a bit. Does that mean, it does not find any log segments which > can be cleaned up or the last timestamp of the last log segment is somehow > broken/missing? > I’m also a bit wondering, why the log cleaner thread stops completely after > an error with one topic. I would at least expect that it keeps on cleaning up > other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning > the __consumer_offsets anymore. > Does anybody have the same issues or can explain, what’s going on? Thanks for > any help or suggestions. > Cheers > Robert > [1] > {noformat} > [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner) > [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting > (kafka.log.LogCleaner) > [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log > xdc_listing-status-v2-1. (kafka.log.LogCleaner) > [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for > xdc_listing-status-v2-1... (kafka.log.LogCleaner) > [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log > xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log > xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log > xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, > discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log > xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into > 0, retai
[jira] [Comment Edited] (KAFKA-4497) log cleaner breaks on timeindex
[ https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15801230#comment-15801230 ] Michael Andre Pearce (IG) edited comment on KAFKA-4497 at 1/5/17 12:36 PM: --- [~becket_qin] we have managed to recover the log4j cleaner log from yesterdays, from the same node we supplied the log segments, when we had it occur on 0.10.1.1. We have now re rolled back forward that testing environment back onto 0.10.1.1 and waiting for it to re-occur. (bear in mind we had been running 6 weeks before we saw the initial issue on 0.10.1.0 issue, whilst subsequently we have seen it re-occur much faster, we cannot tell how long we will have to wait) was (Author: michael.andre.pearce): [~becket_qin] we have managed to recover the log4j cleaner log from yesterdays, from the same node we supplied the log segments. We have now re rolled back forward that testing environment back onto 0.10.1.1 and waiting for it to re-occur. (bear in mind we had been running 6 weeks before we saw the initial issue on 0.10.1.0 issue, whilst subsequently we have seen it re-occur much faster, we cannot tell how long we will have to wait) > log cleaner breaks on timeindex > --- > > Key: KAFKA-4497 > URL: https://issues.apache.org/jira/browse/KAFKA-4497 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 0.10.1.0 > Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0 >Reporter: Robert Schumann >Assignee: Jiangjie Qin >Priority: Critical > Labels: reliability > Fix For: 0.10.1.1 > > Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, > vrtstokf005_4thJan > > > _created from ML entry by request of [~ijuma]_ > Hi all, > we are facing an issue with latest kafka 0.10.1 and the log cleaner thread > with regards to the timeindex files. From the log of the log-cleaner we see > after startup that it tries to cleanup a topic called xdc_listing-status-v2 > [1]. The topic is setup with log compaction [2] and the kafka cluster > configuration has log.cleaner enabled [3]. Looking at the log and the newly > created file [4], the cleaner seems to refer to tombstones prior to > epoch_time=0 - maybe because he finds messages, which don’t have a timestamp > at all (?). All producers and consumers are using 0.10.1 and the topics have > been created completely new, so I’m not sure, where this issue would come > from. The original timeindex file [5] seems to show only valid timestamps for > the mentioned offsets. I would also like to mention that the issue happened > in two independent datacenters at the same time, so I would rather expect an > application/producer issue instead of random disk failures. We didn’t have > the problem with 0.10.0 for around half a year, it appeared shortly after the > upgrade to 0.10.1. > The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 > CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also > confuses me a bit. Does that mean, it does not find any log segments which > can be cleaned up or the last timestamp of the last log segment is somehow > broken/missing? > I’m also a bit wondering, why the log cleaner thread stops completely after > an error with one topic. I would at least expect that it keeps on cleaning up > other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning > the __consumer_offsets anymore. > Does anybody have the same issues or can explain, what’s going on? Thanks for > any help or suggestions. > Cheers > Robert > [1] > {noformat} > [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner) > [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting > (kafka.log.LogCleaner) > [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log > xdc_listing-status-v2-1. (kafka.log.LogCleaner) > [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for > xdc_listing-status-v2-1... (kafka.log.LogCleaner) > [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log > xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log > xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log > xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, > discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log > xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into > 0, retaining deletes. (kafka.log.LogCleaner)
[jira] [Comment Edited] (KAFKA-4497) log cleaner breaks on timeindex
[ https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15801230#comment-15801230 ] Michael Andre Pearce (IG) edited comment on KAFKA-4497 at 1/5/17 12:34 PM: --- [~becket_qin] we have managed to recover the log4j cleaner log from yesterdays, from the same node we supplied the log segments. We have now re rolled back forward that testing environment back onto 0.10.1.1 and waiting for it to re-occur. (bear in mind we had been running 6 weeks before we saw the initial issue on 0.10.1.0 issue, whilst subsequently we have seen it re-occur much faster, we cannot tell how long we will have to wait) was (Author: michael.andre.pearce): [~becket_qin] we have managed to recover the log4j cleaner log from yesterdays, from the same node we supplied the log segments. We have now re rolled back forward that testing environment back onto 0.10.1.1 and waiting for it to re-occur. (bear in mind we had been running 6 weeks before we saw the initial issue on 0.10.1.0 issue, whilst subsequently we have seen it much faster, we cannot tell how long we will have to wait) > log cleaner breaks on timeindex > --- > > Key: KAFKA-4497 > URL: https://issues.apache.org/jira/browse/KAFKA-4497 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 0.10.1.0 > Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0 >Reporter: Robert Schumann >Assignee: Jiangjie Qin >Priority: Critical > Labels: reliability > Fix For: 0.10.1.1 > > Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, > vrtstokf005_4thJan > > > _created from ML entry by request of [~ijuma]_ > Hi all, > we are facing an issue with latest kafka 0.10.1 and the log cleaner thread > with regards to the timeindex files. From the log of the log-cleaner we see > after startup that it tries to cleanup a topic called xdc_listing-status-v2 > [1]. The topic is setup with log compaction [2] and the kafka cluster > configuration has log.cleaner enabled [3]. Looking at the log and the newly > created file [4], the cleaner seems to refer to tombstones prior to > epoch_time=0 - maybe because he finds messages, which don’t have a timestamp > at all (?). All producers and consumers are using 0.10.1 and the topics have > been created completely new, so I’m not sure, where this issue would come > from. The original timeindex file [5] seems to show only valid timestamps for > the mentioned offsets. I would also like to mention that the issue happened > in two independent datacenters at the same time, so I would rather expect an > application/producer issue instead of random disk failures. We didn’t have > the problem with 0.10.0 for around half a year, it appeared shortly after the > upgrade to 0.10.1. > The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 > CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also > confuses me a bit. Does that mean, it does not find any log segments which > can be cleaned up or the last timestamp of the last log segment is somehow > broken/missing? > I’m also a bit wondering, why the log cleaner thread stops completely after > an error with one topic. I would at least expect that it keeps on cleaning up > other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning > the __consumer_offsets anymore. > Does anybody have the same issues or can explain, what’s going on? Thanks for > any help or suggestions. > Cheers > Robert > [1] > {noformat} > [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner) > [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting > (kafka.log.LogCleaner) > [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log > xdc_listing-status-v2-1. (kafka.log.LogCleaner) > [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for > xdc_listing-status-v2-1... (kafka.log.LogCleaner) > [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log > xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log > xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log > xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, > discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log > xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into > 0, retaining deletes. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,968] ERROR [kafka-l
[jira] [Comment Edited] (KAFKA-4497) log cleaner breaks on timeindex
[ https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15801230#comment-15801230 ] Michael Andre Pearce (IG) edited comment on KAFKA-4497 at 1/5/17 12:33 PM: --- [~becket_qin] we have managed to recover the log4j cleaner log from yesterdays, from the same node we supplied the log segments. We have now re rolled back forward that testing environment back onto 0.10.1.1 and waiting for it to re-occur. (bear in mind we had been running 6 weeks before we saw the initial issue on 0.10.1.0 issue, whilst subsequently we have seen it much faster, we cannot tell how long we will have to wait) was (Author: michael.andre.pearce): [~becket_qin] we have managed to recover the log4j cleaner log from yesterdays, from the same node we supplied the log segments. We have now re rolled back forward that testing environment back onto 0.10.1.1 and waiting for it to re-occur. (bear in mind we had been running 6 weeks before we saw the issue, we cannot tell how long we will have to wait) > log cleaner breaks on timeindex > --- > > Key: KAFKA-4497 > URL: https://issues.apache.org/jira/browse/KAFKA-4497 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 0.10.1.0 > Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0 >Reporter: Robert Schumann >Assignee: Jiangjie Qin >Priority: Critical > Labels: reliability > Fix For: 0.10.1.1 > > Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, > vrtstokf005_4thJan > > > _created from ML entry by request of [~ijuma]_ > Hi all, > we are facing an issue with latest kafka 0.10.1 and the log cleaner thread > with regards to the timeindex files. From the log of the log-cleaner we see > after startup that it tries to cleanup a topic called xdc_listing-status-v2 > [1]. The topic is setup with log compaction [2] and the kafka cluster > configuration has log.cleaner enabled [3]. Looking at the log and the newly > created file [4], the cleaner seems to refer to tombstones prior to > epoch_time=0 - maybe because he finds messages, which don’t have a timestamp > at all (?). All producers and consumers are using 0.10.1 and the topics have > been created completely new, so I’m not sure, where this issue would come > from. The original timeindex file [5] seems to show only valid timestamps for > the mentioned offsets. I would also like to mention that the issue happened > in two independent datacenters at the same time, so I would rather expect an > application/producer issue instead of random disk failures. We didn’t have > the problem with 0.10.0 for around half a year, it appeared shortly after the > upgrade to 0.10.1. > The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 > CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also > confuses me a bit. Does that mean, it does not find any log segments which > can be cleaned up or the last timestamp of the last log segment is somehow > broken/missing? > I’m also a bit wondering, why the log cleaner thread stops completely after > an error with one topic. I would at least expect that it keeps on cleaning up > other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning > the __consumer_offsets anymore. > Does anybody have the same issues or can explain, what’s going on? Thanks for > any help or suggestions. > Cheers > Robert > [1] > {noformat} > [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner) > [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting > (kafka.log.LogCleaner) > [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log > xdc_listing-status-v2-1. (kafka.log.LogCleaner) > [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for > xdc_listing-status-v2-1... (kafka.log.LogCleaner) > [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log > xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log > xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log > xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, > discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log > xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into > 0, retaining deletes. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to > (kafka.log.LogCleaner) > kafka.common.InvalidO
[jira] [Comment Edited] (KAFKA-4497) log cleaner breaks on timeindex
[ https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15799490#comment-15799490 ] Michael Andre Pearce (IG) edited comment on KAFKA-4497 at 1/4/17 10:09 PM: --- [~junrao]We had this occur now in our testing env also (so now 4 envs affected), so we can get the log segments from there. Currently just sorting out another environment with our operations team tonight trying to handle the fact disks are filling rapidly, once done we will go try get them out of testing env, onto this ticket tonight, so you have them to look at during your day/our night. RE: our patch apart from it being naive, and obviously a plaster not a cure, as obviously the time index code is new, and i assume you have better knowledge of that code, would you for see any knock on issues with applying that? As we can tell it does look like it doesn't affect anything else, but we don't use time index feature yet. was (Author: michael.andre.pearce): [~junrao]We had this occur now in our testing env also (so now 4 envs affected), so we can get the log segments from there. Currently just sorting out another environment with our operations team tonight trying to handle the fact disks are filling rapidly, once done we will go try get them out of testing env, onto this ticket tonight, so you have them to look at during your day/our night. > log cleaner breaks on timeindex > --- > > Key: KAFKA-4497 > URL: https://issues.apache.org/jira/browse/KAFKA-4497 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 0.10.1.0 > Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0 >Reporter: Robert Schumann >Assignee: Jiangjie Qin >Priority: Critical > Labels: reliability > Fix For: 0.10.1.1 > > > _created from ML entry by request of [~ijuma]_ > Hi all, > we are facing an issue with latest kafka 0.10.1 and the log cleaner thread > with regards to the timeindex files. From the log of the log-cleaner we see > after startup that it tries to cleanup a topic called xdc_listing-status-v2 > [1]. The topic is setup with log compaction [2] and the kafka cluster > configuration has log.cleaner enabled [3]. Looking at the log and the newly > created file [4], the cleaner seems to refer to tombstones prior to > epoch_time=0 - maybe because he finds messages, which don’t have a timestamp > at all (?). All producers and consumers are using 0.10.1 and the topics have > been created completely new, so I’m not sure, where this issue would come > from. The original timeindex file [5] seems to show only valid timestamps for > the mentioned offsets. I would also like to mention that the issue happened > in two independent datacenters at the same time, so I would rather expect an > application/producer issue instead of random disk failures. We didn’t have > the problem with 0.10.0 for around half a year, it appeared shortly after the > upgrade to 0.10.1. > The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 > CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also > confuses me a bit. Does that mean, it does not find any log segments which > can be cleaned up or the last timestamp of the last log segment is somehow > broken/missing? > I’m also a bit wondering, why the log cleaner thread stops completely after > an error with one topic. I would at least expect that it keeps on cleaning up > other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning > the __consumer_offsets anymore. > Does anybody have the same issues or can explain, what’s going on? Thanks for > any help or suggestions. > Cheers > Robert > [1] > {noformat} > [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner) > [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting > (kafka.log.LogCleaner) > [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log > xdc_listing-status-v2-1. (kafka.log.LogCleaner) > [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for > xdc_listing-status-v2-1... (kafka.log.LogCleaner) > [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log > xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log > xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log > xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, > discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log > xdc_listing-
[jira] [Comment Edited] (KAFKA-4497) log cleaner breaks on timeindex
[ https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15798847#comment-15798847 ] Michael Andre Pearce (IG) edited comment on KAFKA-4497 at 1/4/17 5:43 PM: -- [~junrao] This is causing us a production issue on the current version, currently also looking now at rolling back like eBay has to 0.10.0.1, unless we get this fixed asap. This is a terminal issue. Whilst we agree in the purist view that we should fix the root cause, and happy to add logging later to further find that, is there any reason not to accept this PR to fix/treating the symptom to avoid production issue. I would imagine others also will start hitting the issue too, we're not alone, as such would cause blocker to anyone upgrading to 0.10.1.0/0.10.1.1 now. was (Author: michael.andre.pearce): [~junrao] This is causing us a production issue on the current version, currently also looking now at rolling back like eBay has to 0.10.0.1, unless we get this fixed asap. Whilst we agree in the purist view that we should fix the root cause, and happy to add logging later to further find that, is there any reason not to accept this PR to fix/treating the symptom to avoid production issue. I would imagine others also will start hitting the issue too, we're not alone, as such would cause blocker to anyone upgrading to 0.10.1.0/0.10.1.1 now. > log cleaner breaks on timeindex > --- > > Key: KAFKA-4497 > URL: https://issues.apache.org/jira/browse/KAFKA-4497 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 0.10.1.0 > Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0 >Reporter: Robert Schumann >Assignee: Jiangjie Qin >Priority: Critical > Labels: reliability > Fix For: 0.10.1.1 > > > _created from ML entry by request of [~ijuma]_ > Hi all, > we are facing an issue with latest kafka 0.10.1 and the log cleaner thread > with regards to the timeindex files. From the log of the log-cleaner we see > after startup that it tries to cleanup a topic called xdc_listing-status-v2 > [1]. The topic is setup with log compaction [2] and the kafka cluster > configuration has log.cleaner enabled [3]. Looking at the log and the newly > created file [4], the cleaner seems to refer to tombstones prior to > epoch_time=0 - maybe because he finds messages, which don’t have a timestamp > at all (?). All producers and consumers are using 0.10.1 and the topics have > been created completely new, so I’m not sure, where this issue would come > from. The original timeindex file [5] seems to show only valid timestamps for > the mentioned offsets. I would also like to mention that the issue happened > in two independent datacenters at the same time, so I would rather expect an > application/producer issue instead of random disk failures. We didn’t have > the problem with 0.10.0 for around half a year, it appeared shortly after the > upgrade to 0.10.1. > The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 > CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also > confuses me a bit. Does that mean, it does not find any log segments which > can be cleaned up or the last timestamp of the last log segment is somehow > broken/missing? > I’m also a bit wondering, why the log cleaner thread stops completely after > an error with one topic. I would at least expect that it keeps on cleaning up > other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning > the __consumer_offsets anymore. > Does anybody have the same issues or can explain, what’s going on? Thanks for > any help or suggestions. > Cheers > Robert > [1] > {noformat} > [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner) > [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting > (kafka.log.LogCleaner) > [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log > xdc_listing-status-v2-1. (kafka.log.LogCleaner) > [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for > xdc_listing-status-v2-1... (kafka.log.LogCleaner) > [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log > xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log > xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log > xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, > discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log > xdc_listing-status
[jira] [Comment Edited] (KAFKA-4497) log cleaner breaks on timeindex
[ https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743650#comment-15743650 ] Jiangjie Qin edited comment on KAFKA-4497 at 12/13/16 12:40 AM: The bit representation of unsigned integer 2^32-1 is the same as that of signed integer -1. The negative bit is the MSB which is 1 in 0X. The entry Robert saw was {{[0158 9c490ae1 ]}}, so it is a [Long + an Integer]. The long is the timestamp, and the integer is the relative offset which is -1 in our case because the base offset was 0. was (Author: becket_qin): The negative bit is the MSB which is 1 in 0X. The entry Robert saw was {{[0158 9c490ae1 ]}}, so it is a [Long + an Integer]. The long is the timestamp, and the integer is the relative offset which is -1 in our case because the base offset was 0. > log cleaner breaks on timeindex > --- > > Key: KAFKA-4497 > URL: https://issues.apache.org/jira/browse/KAFKA-4497 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 0.10.1.0 > Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0 >Reporter: Robert Schumann >Assignee: Jiangjie Qin >Priority: Critical > Fix For: 0.10.1.1 > > > _created from ML entry by request of [~ijuma]_ > Hi all, > we are facing an issue with latest kafka 0.10.1 and the log cleaner thread > with regards to the timeindex files. From the log of the log-cleaner we see > after startup that it tries to cleanup a topic called xdc_listing-status-v2 > [1]. The topic is setup with log compaction [2] and the kafka cluster > configuration has log.cleaner enabled [3]. Looking at the log and the newly > created file [4], the cleaner seems to refer to tombstones prior to > epoch_time=0 - maybe because he finds messages, which don’t have a timestamp > at all (?). All producers and consumers are using 0.10.1 and the topics have > been created completely new, so I’m not sure, where this issue would come > from. The original timeindex file [5] seems to show only valid timestamps for > the mentioned offsets. I would also like to mention that the issue happened > in two independent datacenters at the same time, so I would rather expect an > application/producer issue instead of random disk failures. We didn’t have > the problem with 0.10.0 for around half a year, it appeared shortly after the > upgrade to 0.10.1. > The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 > CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also > confuses me a bit. Does that mean, it does not find any log segments which > can be cleaned up or the last timestamp of the last log segment is somehow > broken/missing? > I’m also a bit wondering, why the log cleaner thread stops completely after > an error with one topic. I would at least expect that it keeps on cleaning up > other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning > the __consumer_offsets anymore. > Does anybody have the same issues or can explain, what’s going on? Thanks for > any help or suggestions. > Cheers > Robert > [1] > {noformat} > [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner) > [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting > (kafka.log.LogCleaner) > [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log > xdc_listing-status-v2-1. (kafka.log.LogCleaner) > [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for > xdc_listing-status-v2-1... (kafka.log.LogCleaner) > [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log > xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log > xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log > xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, > discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log > xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into > 0, retaining deletes. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to > (kafka.log.LogCleaner) > kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot > 9 no larger than the last offset appended (11832) to > /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned. > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117) > at > kafka.log.TimeIndex$$anonfu
[jira] [Comment Edited] (KAFKA-4497) log cleaner breaks on timeindex
[ https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743650#comment-15743650 ] Jiangjie Qin edited comment on KAFKA-4497 at 12/13/16 12:38 AM: The negative bit is the MSB which is 1 in 0X. The entry Robert saw was {{[0158 9c490ae1 ]}}, so it is a [Long + an Integer]. The long is the timestamp, and the integer is the relative offset which is -1 in our case because the base offset was 0. was (Author: becket_qin): The negative bit is the MSB which is 1 in 0X. The entry Robert saw was [0158 9c490ae1 ], so it is a [Long + an Integer]. The long is the timestamp, and the integer is the relative offset which is -1 in our case because the base offset was 0. > log cleaner breaks on timeindex > --- > > Key: KAFKA-4497 > URL: https://issues.apache.org/jira/browse/KAFKA-4497 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 0.10.1.0 > Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0 >Reporter: Robert Schumann >Assignee: Jiangjie Qin >Priority: Critical > Fix For: 0.10.1.1 > > > _created from ML entry by request of [~ijuma]_ > Hi all, > we are facing an issue with latest kafka 0.10.1 and the log cleaner thread > with regards to the timeindex files. From the log of the log-cleaner we see > after startup that it tries to cleanup a topic called xdc_listing-status-v2 > [1]. The topic is setup with log compaction [2] and the kafka cluster > configuration has log.cleaner enabled [3]. Looking at the log and the newly > created file [4], the cleaner seems to refer to tombstones prior to > epoch_time=0 - maybe because he finds messages, which don’t have a timestamp > at all (?). All producers and consumers are using 0.10.1 and the topics have > been created completely new, so I’m not sure, where this issue would come > from. The original timeindex file [5] seems to show only valid timestamps for > the mentioned offsets. I would also like to mention that the issue happened > in two independent datacenters at the same time, so I would rather expect an > application/producer issue instead of random disk failures. We didn’t have > the problem with 0.10.0 for around half a year, it appeared shortly after the > upgrade to 0.10.1. > The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 > CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also > confuses me a bit. Does that mean, it does not find any log segments which > can be cleaned up or the last timestamp of the last log segment is somehow > broken/missing? > I’m also a bit wondering, why the log cleaner thread stops completely after > an error with one topic. I would at least expect that it keeps on cleaning up > other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning > the __consumer_offsets anymore. > Does anybody have the same issues or can explain, what’s going on? Thanks for > any help or suggestions. > Cheers > Robert > [1] > {noformat} > [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner) > [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting > (kafka.log.LogCleaner) > [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log > xdc_listing-status-v2-1. (kafka.log.LogCleaner) > [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for > xdc_listing-status-v2-1... (kafka.log.LogCleaner) > [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log > xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log > xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log > xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, > discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... > (kafka.log.LogCleaner) > [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log > xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into > 0, retaining deletes. (kafka.log.LogCleaner) > [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to > (kafka.log.LogCleaner) > kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot > 9 no larger than the last offset appended (11832) to > /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned. > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117) > at > kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107) > at > kafka.log.TimeIndex$$anonfun$maybeAppe