[jira] [Comment Edited] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-06 Thread Michael Andre Pearce (IG) (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15804713#comment-15804713
 ] 

Michael Andre Pearce (IG) edited comment on KAFKA-4497 at 1/6/17 3:00 PM:
--

[~ijuma], [~becket_qin], [~junrao] - thanks and sorry again for your time, 
effort and patience. 



was (Author: michael.andre.pearce):
[~ijuma][~becket_qin][~junrao] - thanks and sorry again for your time, effort 
and patience. 


> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.1.1
>
> Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, 
> com_ig_trade_v1_position_event--demo--compacted-14.tar.gz, 
> log-cleaner.log.1.zip, vrtstokf005_4thJan
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.sc

[jira] [Comment Edited] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-06 Thread Michael Andre Pearce (IG) (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15804660#comment-15804660
 ] 

Michael Andre Pearce (IG) edited comment on KAFKA-4497 at 1/6/17 2:46 PM:
--

[~junrao] Sorry about this, it does indeed seem to be our patching that caused 
the issue, now we have removed the jar in question all seems to be good.

We have been testing all morning. We have not seen the issue exist again.

To explain what occurred - Our patching of the confluent package was done by 
replacing all jars. 

download latest from apache site. (2.11 scala version)

expand and copy all jars from:
./kafka_2.11-0.10.1.1/libs

into 
confluent-3.1.0/share/java/kafka

then remove all files in
confluent-3.1.0/share/java/kafka
with *-0.10.1.0-cp1*.jar

What has caught us out was that in the confluent bundle the jar 
"kafka_2.11-0.10.1.0.jar" seems to be renamed "kafka.jar" without any 
versioning, unlike all the other Apache jars, which are versioned and are 
appended with -CP so on doing the above we ended up with both

kafka_2.11-0.10.1.1.jar and kafka.jar

as the kafka.jar was not removed.

We didn't easily spot this, as obviously we didn't see any 0.10.1.0 versioned 
jars on the class path, and like wise we did see 0.10.1.1 versions for all the 
expected jars.

Whilst this is obviously the apache jira, we're aware that confluent bundle is 
by yourselves, is there a reason why this is not versioned like all the other 
jars and as per the apache release in the confluent bundle? This obviously has 
tripped us up.



was (Author: michael.andre.pearce):
[~junrao] Sorry about this, it does indeed seem to be our patching that caused 
the issue, now we have removed the jar in question all seems to be good.

We have been testing all morning. We have not seen the issue exist again.

To explain what occurred - Our patching of the confluent package was done by 
replacing all jars. 

download latest from apache site. (2.11 scala version)

expand and copy all jars from:
./kafka_2.11.-0.10.1.1/libs

into 
confluent-3.1.0/share/java/kafka

then remove all files in
confluent-3.1.0/share/java/kafka
with *-0.10.1.0-cp1*.jar

What has caught us out was that in the confluent bundle the jar 
"kafka_2.11-0.10.1.0.jar" seems to be renamed "kafka.jar" without any 
versioning, unlike all the other Apache jars, which are versioned and are 
appended with -CP so on doing the above we ended up with both

kafka_2.11-0.10.1.1.jar and kafka.jar

as the kafka.jar was not removed.

We didn't easily spot this, as obviously we didn't see any 0.10.1.0 versioned 
jars on the class path, and like wise we did see 0.10.1.1 versions for all the 
expected jars.

Whilst this is obviously the apache jira, we're aware that confluent bundle is 
by yourselves, is there a reason why this is not versioned like all the other 
jars and as per the apache release in the confluent bundle? This obviously has 
tripped us up.


> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.1.1
>
> Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, 
> com_ig_trade_v1_position_event--demo--compacted-14.tar.gz, 
> log-cleaner.log.1.zip, vrtstokf005_4thJan
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cl

[jira] [Comment Edited] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-06 Thread Michael Andre Pearce (IG) (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15804660#comment-15804660
 ] 

Michael Andre Pearce (IG) edited comment on KAFKA-4497 at 1/6/17 2:45 PM:
--

[~junrao] Sorry about this, it does indeed seem to be our patching that caused 
the issue, now we have removed the jar in question all seems to be good.

We have been testing all morning. We have not seen the issue exist again.

To explain what occurred - Our patching of the confluent package was done by 
replacing all jars. 

download latest from apache site. (2.11 scala version)

expand and copy all jars from:
./kafka_2.11.-0.10.1.1/libs

into 
confluent-3.1.0/share/java/kafka

then remove all files in
confluent-3.1.0/share/java/kafka
with *-0.10.1.0-cp1*.jar

What has caught us out was that in the confluent bundle the jar 
"kafka_2.11-0.10.1.0.jar" seems to be renamed "kafka.jar" without any 
versioning, unlike all the other Apache jars, which are versioned and are 
appended with -CP so on doing the above we ended up with both

kafka_2.11-0.10.1.1.jar and kafka.jar

as the kafka.jar was not removed.

We didn't easily spot this, as obviously we didn't see any 0.10.1.0 versioned 
jars on the class path, and like wise we did see 0.10.1.1 versions for all the 
expected jars.

Whilst this is obviously the apache jira, we're aware that confluent bundle is 
by yourselves, is there a reason why this is not versioned like all the other 
jars and as per the apache release in the confluent bundle? This obviously has 
tripped us up.



was (Author: michael.andre.pearce):
[~junrao] Sorry about this, it does indeed seem to be our patching that caused 
the issue, now we have removed the jar in question all seems to be good.

We have been testing all morning. We have not seen the issue exist again.

To explain what occurred - Our patching of the confluent package was done by 
replacing all jars. 

download latest from apache site. (2.11 scala version)

expand and copy all jars from:
./kafka_2.11.-0.10.1.1/libs

into 
confluent-3.1-2.1/share/java/kafka

then remove all files in
confluent-3.1-2.1/share/java/kafka
with *-0.10.1.0-cp2*.jar

What has caught us out was that in the confluent bundle the jar 
"kafka_2.11-0.10.1.0.jar" seems to be renamed "kafka.jar" without any 
versioning, unlike all the other Apache jars, which are versioned and are 
appended with -CP so on doing the above we ended up with both

kafka_2.11-0.10.1.1.jar and kafka.jar

as the kafka.jar was not removed.

We didn't easily spot this, as obviously we didn't see any 0.10.1.0 versioned 
jars on the class path, and like wise we did see 0.10.1.1 versions for all the 
expected jars.

Whilst this is obviously the apache jira, we're aware that confluent bundle is 
by yourselves, is there a reason why this is not versioned like all the other 
jars and as per the apache release in the confluent bundle? This obviously has 
tripped us up.


> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.1.1
>
> Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, 
> com_ig_trade_v1_position_event--demo--compacted-14.tar.gz, 
> log-cleaner.log.1.zip, vrtstokf005_4thJan
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from t

[jira] [Comment Edited] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-06 Thread Michael Andre Pearce (IG) (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802030#comment-15802030
 ] 

Michael Andre Pearce (IG) edited comment on KAFKA-4497 at 1/6/17 8:35 AM:
--

When we got the issue on 0.10.1.1 we were just as surprised and as such that 
was our very first thing we checked.

our internal mail at the time when this occured in UAT (first env we saw it 
post 0.10.1.1 upgrade):

>

Date: Tuesday, 3 January 2017 at 17:07
Subject: Re: Kafka Issues fixed in 0.10.1.1 how we can patch until confluent 
release.

Yeah I already double checked that, all the cp1 jars are gone and the 
0.10.1.1’s exist, pasted below, hope ive missed something..?
 
Class path: 
:/opt/projects/confluent/bin/../share/java/kafka/jetty-continuation-9.2.15.v20160210.jar:/opt/projects/confluent/bin/../share/java/kafka/aopalliance-repackaged-2.4.0-b34.jar:/opt/projects/confluent/bin/../share/java/kafka/connect-file-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/argparse4j-0.5.0.jar:/opt/projects/confluent/bin/../share/java/kafka/connect-json-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/avro-1.7.7.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-beanutils-1.8.3.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-codec-1.9.jar:/opt/projects/confluent/bin/../share/java/kafka/jetty-http-9.2.15.v20160210.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-collections-3.2.1.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-compress-1.4.1.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-digester-1.8.1.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-lang3-3.1.jar:/opt/projects/confluent/bin/../share/java/kafka/log4j-1.2.17.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-logging-1.2.jar:/opt/projects/confluent/bin/../share/java/kafka/lz4-1.3.0.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-validator-1.4.1.jar:/opt/projects/confluent/bin/../share/java/kafka/metrics-core-2.2.0.jar:/opt/projects/confluent/bin/../share/java/kafka/jetty-io-9.2.15.v20160210.jar:/opt/projects/confluent/bin/../share/java/kafka/jetty-util-9.2.15.v20160210.jar:/opt/projects/confluent/bin/../share/java/kafka/jopt-simple-4.9.jar:/opt/projects/confluent/bin/../share/java/kafka/connect-runtime-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/guava-18.0.jar:/opt/projects/confluent/bin/../share/java/kafka/osgi-resource-locator-1.0.1.jar:/opt/projects/confluent/bin/../share/java/kafka/hk2-api-2.4.0-b34.jar:/opt/projects/confluent/bin/../share/java/kafka/paranamer-2.3.jar:/opt/projects/confluent/bin/../share/java/kafka/hk2-locator-2.4.0-b34.jar:/opt/projects/confluent/bin/../share/java/kafka/reflections-0.9.10.jar:/opt/projects/confluent/bin/../share/java/kafka/hk2-utils-2.4.0-b34.jar:/opt/projects/confluent/bin/../share/java/kafka/kafka-clients-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/httpclient-4.5.1.jar:/opt/projects/confluent/bin/../share/java/kafka/kafka-streams-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/httpcore-4.4.3.jar:/opt/projects/confluent/bin/../share/java/kafka/kafka-tools-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/httpmime-4.5.1.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-annotations-2.6.0.jar:/opt/projects/confluent/bin/../share/java/kafka/rocksdbjni-4.9.0.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-core-2.6.3.jar:/opt/projects/confluent/bin/../share/java/kafka/scala-library-2.11.8.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-core-asl-1.9.13.jar:/opt/projects/confluent/bin/../share/java/kafka/slf4j-api-1.7.21.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-databind-2.6.3.jar:/opt/projects/confluent/bin/../share/java/kafka/slf4j-log4j12-1.7.21.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-jaxrs-base-2.6.3.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-jaxrs-json-provider-2.6.3.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-mapper-asl-1.9.13.jar:/opt/projects/confluent/bin/../share/java/kafka/jersey-media-jaxb-2.22.2.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-module-jaxb-annotations-2.6.3.jar:/opt/projects/confluent/bin/../share/java/kafka/snappy-java-1.1.2.6.jar:/opt/projects/confluent/bin/../share/java/kafka/javassist-3.18.2-GA.jar:/opt/projects/confluent/bin/../share/java/kafka/support-metrics-client-3.1.0.jar:/opt/projects/confluent/bin/../share/java/kafka/javax.annotation-api-1.2.jar:/opt/projects/confluent/bin/../share/java/kafka/kafka_2.11-0.10.1.1-test.jar:/opt/projects/confluent/bin/../share/java/kafka/javax.inject-1.jar:/opt/projects/confluent/bin/../share/java/kafka/support-metrics-common-3.1.0.jar:/opt/projects/confluent/bin/../share/java/kafka/javax.inject-2.4.0-b34.jar:/opt/projects/confluent/bin/../s

[jira] [Comment Edited] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-06 Thread Michael Andre Pearce (IG) (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802030#comment-15802030
 ] 

Michael Andre Pearce (IG) edited comment on KAFKA-4497 at 1/6/17 8:33 AM:
--

When we got the issue on 0.10.1.1 we were just as surprised and as such that 
was our very first thing we checked.

our internal mail at the time when this occured in UAT (first env we saw it 
post 0.10.1.1 upgrade):

>

From: Andrew Holford 
Date: Tuesday, 3 January 2017 at 17:07
Subject: Re: Kafka Issues fixed in 0.10.1.1 how we can patch until confluent 
release.

Yeah I already double checked that, all the cp1 jars are gone and the 
0.10.1.1’s exist, pasted below, hope ive missed something..?
 
Class path: 
:/opt/projects/confluent/bin/../share/java/kafka/jetty-continuation-9.2.15.v20160210.jar:/opt/projects/confluent/bin/../share/java/kafka/aopalliance-repackaged-2.4.0-b34.jar:/opt/projects/confluent/bin/../share/java/kafka/connect-file-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/argparse4j-0.5.0.jar:/opt/projects/confluent/bin/../share/java/kafka/connect-json-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/avro-1.7.7.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-beanutils-1.8.3.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-codec-1.9.jar:/opt/projects/confluent/bin/../share/java/kafka/jetty-http-9.2.15.v20160210.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-collections-3.2.1.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-compress-1.4.1.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-digester-1.8.1.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-lang3-3.1.jar:/opt/projects/confluent/bin/../share/java/kafka/log4j-1.2.17.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-logging-1.2.jar:/opt/projects/confluent/bin/../share/java/kafka/lz4-1.3.0.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-validator-1.4.1.jar:/opt/projects/confluent/bin/../share/java/kafka/metrics-core-2.2.0.jar:/opt/projects/confluent/bin/../share/java/kafka/jetty-io-9.2.15.v20160210.jar:/opt/projects/confluent/bin/../share/java/kafka/jetty-util-9.2.15.v20160210.jar:/opt/projects/confluent/bin/../share/java/kafka/jopt-simple-4.9.jar:/opt/projects/confluent/bin/../share/java/kafka/connect-runtime-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/guava-18.0.jar:/opt/projects/confluent/bin/../share/java/kafka/osgi-resource-locator-1.0.1.jar:/opt/projects/confluent/bin/../share/java/kafka/hk2-api-2.4.0-b34.jar:/opt/projects/confluent/bin/../share/java/kafka/paranamer-2.3.jar:/opt/projects/confluent/bin/../share/java/kafka/hk2-locator-2.4.0-b34.jar:/opt/projects/confluent/bin/../share/java/kafka/reflections-0.9.10.jar:/opt/projects/confluent/bin/../share/java/kafka/hk2-utils-2.4.0-b34.jar:/opt/projects/confluent/bin/../share/java/kafka/kafka-clients-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/httpclient-4.5.1.jar:/opt/projects/confluent/bin/../share/java/kafka/kafka-streams-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/httpcore-4.4.3.jar:/opt/projects/confluent/bin/../share/java/kafka/kafka-tools-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/httpmime-4.5.1.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-annotations-2.6.0.jar:/opt/projects/confluent/bin/../share/java/kafka/rocksdbjni-4.9.0.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-core-2.6.3.jar:/opt/projects/confluent/bin/../share/java/kafka/scala-library-2.11.8.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-core-asl-1.9.13.jar:/opt/projects/confluent/bin/../share/java/kafka/slf4j-api-1.7.21.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-databind-2.6.3.jar:/opt/projects/confluent/bin/../share/java/kafka/slf4j-log4j12-1.7.21.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-jaxrs-base-2.6.3.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-jaxrs-json-provider-2.6.3.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-mapper-asl-1.9.13.jar:/opt/projects/confluent/bin/../share/java/kafka/jersey-media-jaxb-2.22.2.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-module-jaxb-annotations-2.6.3.jar:/opt/projects/confluent/bin/../share/java/kafka/snappy-java-1.1.2.6.jar:/opt/projects/confluent/bin/../share/java/kafka/javassist-3.18.2-GA.jar:/opt/projects/confluent/bin/../share/java/kafka/support-metrics-client-3.1.0.jar:/opt/projects/confluent/bin/../share/java/kafka/javax.annotation-api-1.2.jar:/opt/projects/confluent/bin/../share/java/kafka/kafka_2.11-0.10.1.1-test.jar:/opt/projects/confluent/bin/../share/java/kafka/javax.inject-1.jar:/opt/projects/confluent/bin/../share/java/kafka/support-metrics-common-3.1.0.jar:/opt/projects/confluent/bin/../share/java/kafka/javax.inject-2.4.0-b34.jar:/opt/proje

[jira] [Comment Edited] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-06 Thread Michael Andre Pearce (IG) (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15803984#comment-15803984
 ] 

Michael Andre Pearce (IG) edited comment on KAFKA-4497 at 1/6/17 8:31 AM:
--

[~junrao] As noted, we are patching the confluent 3.1.0 build with 0.10.1.1, 
just as anyway we will deploy 3.1.2 when its released (we assume you're 
ensuring all bugs really are fixed before releasing) to be sure, is there an 
SNAPSHOT build / package of the 3.1.2 with kafka 0.10.1.1 you have this way can 
avoid any possible bad patching or discrepancies by ourselves.


was (Author: michael.andre.pearce):
As noted, we are patching the confluent 3.1.0 build with 0.10.1.1, just as 
anyway we will deploy 3.1.2 when its released (we assume you're ensuring all 
bugs really are fixed before releasing) to be sure, is there an SNAPSHOT build 
/ package of the 3.1.2 with kafka 0.10.1.1 you have this way can avoid any 
possible bad patching or discrepancies by ourselves.

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.1.1
>
> Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, 
> com_ig_trade_v1_position_event--demo--compacted-14.tar.gz, 
> log-cleaner.log.1.zip, vrtstokf005_4thJan
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogClean

[jira] [Comment Edited] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-06 Thread Michael Andre Pearce (IG) (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15803984#comment-15803984
 ] 

Michael Andre Pearce (IG) edited comment on KAFKA-4497 at 1/6/17 8:30 AM:
--

As noted, we are patching the confluent 3.1.0 build with 0.10.1.1, just as 
anyway we will deploy 3.1.2 when its released (we assume you're ensuring all 
bugs really are fixed before releasing) to be sure, is there an SNAPSHOT build 
/ package of the 3.1.2 with kafka 0.10.1.1 you have this way can avoid any 
possible bad patching or discrepancies by ourselves.


was (Author: michael.andre.pearce):
As noted, we are patching the confluent 3.1.0 build with 0.10.1.1, just as 
anyway we will deploy 3.1.2 when its released (we assume you're ensuring all 
bugs really are fixed before releasing) to be sure, is there an SNAPSHOT build 
/ package of the 3.1.2 you guys have this way can avoid any possible bad 
patching or discrepancies by ourselves.

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.1.1
>
> Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, 
> com_ig_trade_v1_position_event--demo--compacted-14.tar.gz, 
> log-cleaner.log.1.zip, vrtstokf005_4thJan
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.Invali

[jira] [Comment Edited] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-06 Thread Michael Andre Pearce (IG) (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15803919#comment-15803919
 ] 

Michael Andre Pearce (IG) edited comment on KAFKA-4497 at 1/6/17 8:17 AM:
--

we have one discrepancy found with an old unversioned jar on cp, going to 
remove it and retest, if you hold of doing any further investigations we will 
let you know once we have re-deployed and checked. This indeed might be our bad.


was (Author: michael.andre.pearce):
we have one discrepancy found, going to remove it and retest

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.1.1
>
> Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, 
> com_ig_trade_v1_position_event--demo--compacted-14.tar.gz, 
> log-cleaner.log.1.zip, vrtstokf005_4thJan
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at 
> kafka.log.TimeIndex$$an

[jira] [Comment Edited] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-06 Thread Michael Andre Pearce (IG) (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15803829#comment-15803829
 ] 

Michael Andre Pearce (IG) edited comment on KAFKA-4497 at 1/6/17 8:03 AM:
--

[~junrao] We are fairly confident jars are good based on process cp, see above 
we double checked again the process class path on getting the issue last night, 
post reducing min.cleanable.dirty.ratio to make the cleaner kick in. If you 
note all jars are 0.10.1.1 that are kafka.

We can offer to deploy a custom built jar, which can contain further 
logging/debug in for you if you wish (as we have this in a testing env) or we 
can offer to even host a webex or something so you can look at the system your 
self.

 


was (Author: michael.andre.pearce):
[~junrao] We are 99.9% jars are good, see above we double checked again the 
process class path on getting the issue last night, post reducing 
min.cleanable.dirty.ratio to make the cleaner kick in. If you note all jars are 
0.10.1.1 that are kafka.

We can offer to deploy a custom built jar, which can contain further 
logging/debug in for you if you wish (as we have this in a testing env) or we 
can offer to even host a webex or something so you can look at the system your 
self.

 

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.1.1
>
> Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, 
> com_ig_trade_v1_position_event--demo--compacted-14.tar.gz, 
> log-cleaner.log.1.zip, vrtstokf005_4thJan
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580

[jira] [Comment Edited] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-05 Thread Michael Andre Pearce (IG) (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802233#comment-15802233
 ] 

Michael Andre Pearce (IG) edited comment on KAFKA-4497 at 1/5/17 7:14 PM:
--

[~junrao] , we took it from the website, via the links. (edited remove exact 
mirror link, as actually im not 100% i know we downloaded from website though, 
will confirm which mirror exactly tomorrow if it makes any difference)

Will look to reduce that down then.


was (Author: michael.andre.pearce):
[~junrao] , we took it from the website, via the links, downloading from the 
first mirror site, 
http://apache.mirror.anlx.net/kafka/0.10.1.1/kafka_2.11-0.10.1.1.tgz

Will look to reduce that down then.

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.1.1
>
> Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, 
> vrtstokf005_4thJan
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
> at 
> kafka.log.TimeIndex$

[jira] [Comment Edited] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-05 Thread Michael Andre Pearce (IG) (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15801230#comment-15801230
 ] 

Michael Andre Pearce (IG) edited comment on KAFKA-4497 at 1/5/17 12:37 PM:
---

[~becket_qin] we have managed to recover the log4j cleaner log from yesterdays, 
from the same node we supplied the log segments, when we had it occur on 
0.10.1.1.

We have now re rolled back forward that testing environment back onto 0.10.1.1 
and waiting for it to re-occur. (bear in mind we had been running 6 weeks 
before we saw the initial issue on 0.10.1.0 version, whilst subsequently we 
have seen it re-occur much faster, we cannot tell how long we will have to wait)


was (Author: michael.andre.pearce):
[~becket_qin] we have managed to recover the log4j cleaner log from yesterdays, 
from the same node we supplied the log segments, when we had it occur on 
0.10.1.1.

We have now re rolled back forward that testing environment back onto 0.10.1.1 
and waiting for it to re-occur. (bear in mind we had been running 6 weeks 
before we saw the initial issue on 0.10.1.0 issue, whilst subsequently we have 
seen it re-occur much faster, we cannot tell how long we will have to wait)

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.1.1
>
> Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, 
> vrtstokf005_4thJan
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retai

[jira] [Comment Edited] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-05 Thread Michael Andre Pearce (IG) (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15801230#comment-15801230
 ] 

Michael Andre Pearce (IG) edited comment on KAFKA-4497 at 1/5/17 12:36 PM:
---

[~becket_qin] we have managed to recover the log4j cleaner log from yesterdays, 
from the same node we supplied the log segments, when we had it occur on 
0.10.1.1.

We have now re rolled back forward that testing environment back onto 0.10.1.1 
and waiting for it to re-occur. (bear in mind we had been running 6 weeks 
before we saw the initial issue on 0.10.1.0 issue, whilst subsequently we have 
seen it re-occur much faster, we cannot tell how long we will have to wait)


was (Author: michael.andre.pearce):
[~becket_qin] we have managed to recover the log4j cleaner log from yesterdays, 
from the same node we supplied the log segments. 

We have now re rolled back forward that testing environment back onto 0.10.1.1 
and waiting for it to re-occur. (bear in mind we had been running 6 weeks 
before we saw the initial issue on 0.10.1.0 issue, whilst subsequently we have 
seen it re-occur much faster, we cannot tell how long we will have to wait)

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.1.1
>
> Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, 
> vrtstokf005_4thJan
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)

[jira] [Comment Edited] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-05 Thread Michael Andre Pearce (IG) (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15801230#comment-15801230
 ] 

Michael Andre Pearce (IG) edited comment on KAFKA-4497 at 1/5/17 12:34 PM:
---

[~becket_qin] we have managed to recover the log4j cleaner log from yesterdays, 
from the same node we supplied the log segments. 

We have now re rolled back forward that testing environment back onto 0.10.1.1 
and waiting for it to re-occur. (bear in mind we had been running 6 weeks 
before we saw the initial issue on 0.10.1.0 issue, whilst subsequently we have 
seen it re-occur much faster, we cannot tell how long we will have to wait)


was (Author: michael.andre.pearce):
[~becket_qin] we have managed to recover the log4j cleaner log from yesterdays, 
from the same node we supplied the log segments. 

We have now re rolled back forward that testing environment back onto 0.10.1.1 
and waiting for it to re-occur. (bear in mind we had been running 6 weeks 
before we saw the initial issue on 0.10.1.0 issue, whilst subsequently we have 
seen it much faster, we cannot tell how long we will have to wait)

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.1.1
>
> Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, 
> vrtstokf005_4thJan
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-l

[jira] [Comment Edited] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-05 Thread Michael Andre Pearce (IG) (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15801230#comment-15801230
 ] 

Michael Andre Pearce (IG) edited comment on KAFKA-4497 at 1/5/17 12:33 PM:
---

[~becket_qin] we have managed to recover the log4j cleaner log from yesterdays, 
from the same node we supplied the log segments. 

We have now re rolled back forward that testing environment back onto 0.10.1.1 
and waiting for it to re-occur. (bear in mind we had been running 6 weeks 
before we saw the initial issue on 0.10.1.0 issue, whilst subsequently we have 
seen it much faster, we cannot tell how long we will have to wait)


was (Author: michael.andre.pearce):
[~becket_qin] we have managed to recover the log4j cleaner log from yesterdays, 
from the same node we supplied the log segments. 

We have now re rolled back forward that testing environment back onto 0.10.1.1 
and waiting for it to re-occur. (bear in mind we had been running 6 weeks 
before we saw the issue, we cannot tell how long we will have to wait)

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.1.1
>
> Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, 
> vrtstokf005_4thJan
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidO

[jira] [Comment Edited] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-04 Thread Michael Andre Pearce (IG) (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15799490#comment-15799490
 ] 

Michael Andre Pearce (IG) edited comment on KAFKA-4497 at 1/4/17 10:09 PM:
---

[~junrao]We had this occur now in our testing env also (so now 4 envs 
affected), so we can get the log segments from there. Currently just sorting 
out another environment with our operations team tonight trying to handle the 
fact disks are filling rapidly, once done we will go try get them out of 
testing env, onto this ticket tonight, so you have them to look at during your 
day/our night.

RE: our patch apart from it being naive, and obviously a plaster not a cure, as 
obviously the time index code is new, and i assume you have better knowledge of 
that code, would you for see any knock on issues with applying that? As we can 
tell it does look like it doesn't affect anything else, but we don't use time 
index feature yet. 


was (Author: michael.andre.pearce):
[~junrao]We had this occur now in our testing env also (so now 4 envs 
affected), so we can get the log segments from there. Currently just sorting 
out another environment with our operations team tonight trying to handle the 
fact disks are filling rapidly, once done we will go try get them out of 
testing env, onto this ticket tonight, so you have them to look at during your 
day/our night.

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.1.1
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-

[jira] [Comment Edited] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-04 Thread Michael Andre Pearce (IG) (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15798847#comment-15798847
 ] 

Michael Andre Pearce (IG) edited comment on KAFKA-4497 at 1/4/17 5:43 PM:
--

[~junrao] This is causing us a production issue on the current version, 
currently also looking now at rolling back like eBay has to 0.10.0.1, unless we 
get this fixed asap. This is a terminal issue.

Whilst we agree in the purist view that we should fix the root cause, and happy 
to add logging later to further find that, is there any reason not to accept 
this PR to fix/treating the symptom to avoid production issue.

I would imagine others also will start hitting the issue too, we're not alone, 
as such would cause blocker to anyone upgrading to 0.10.1.0/0.10.1.1 now.


was (Author: michael.andre.pearce):
[~junrao] This is causing us a production issue on the current version, 
currently also looking now at rolling back like eBay has to 0.10.0.1, unless we 
get this fixed asap.

Whilst we agree in the purist view that we should fix the root cause, and happy 
to add logging later to further find that, is there any reason not to accept 
this PR to fix/treating the symptom to avoid production issue.

I would imagine others also will start hitting the issue too, we're not alone, 
as such would cause blocker to anyone upgrading to 0.10.1.0/0.10.1.1 now.

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.1.1
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status

[jira] [Comment Edited] (KAFKA-4497) log cleaner breaks on timeindex

2016-12-12 Thread Jiangjie Qin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743650#comment-15743650
 ] 

Jiangjie Qin edited comment on KAFKA-4497 at 12/13/16 12:40 AM:


The bit representation of unsigned integer 2^32-1 is the same as that of signed 
integer -1.
The negative bit is the MSB which is 1 in 0X. The entry Robert saw was 
{{[0158 9c490ae1 ]}}, so it is a [Long + an Integer]. The long is 
the timestamp, and the integer is the relative offset which is -1 in our case 
because the base offset was 0.


was (Author: becket_qin):
The negative bit is the MSB which is 1 in 0X. The entry Robert saw was 
{{[0158 9c490ae1 ]}}, so it is a [Long + an Integer]. The long is 
the timestamp, and the integer is the relative offset which is -1 in our case 
because the base offset was 0.

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
> Fix For: 0.10.1.1
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
> at 
> kafka.log.TimeIndex$$anonfu

[jira] [Comment Edited] (KAFKA-4497) log cleaner breaks on timeindex

2016-12-12 Thread Jiangjie Qin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743650#comment-15743650
 ] 

Jiangjie Qin edited comment on KAFKA-4497 at 12/13/16 12:38 AM:


The negative bit is the MSB which is 1 in 0X. The entry Robert saw was 
{{[0158 9c490ae1 ]}}, so it is a [Long + an Integer]. The long is 
the timestamp, and the integer is the relative offset which is -1 in our case 
because the base offset was 0.


was (Author: becket_qin):
The negative bit is the MSB which is 1 in 0X. The entry Robert saw was 
[0158 9c490ae1 ], so it is a [Long + an Integer]. The long is the 
timestamp, and the integer is the relative offset which is -1 in our case 
because the base offset was 0.

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
> Fix For: 0.10.1.1
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppe