[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-06 Thread Jiangjie Qin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15805165#comment-15805165
 ] 

Jiangjie Qin commented on KAFKA-4497:
-

[~michael.andre.pearce] Thanks for the update. Glad to know it was not a code 
issue :)
[~junrao] Thanks a lot for helping look into this!

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.1.1
>
> Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, 
> com_ig_trade_v1_position_event--demo--compacted-14.tar.gz, 
> log-cleaner.log.1.zip, vrtstokf005_4thJan
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
> at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:107)
> at kafka.log.LogSegment.append(LogSegment.scala:106)
> at kafka.log.Clean

[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15804767#comment-15804767
 ] 

ASF GitHub Bot commented on KAFKA-4497:
---

Github user ig-michaelpearce closed the pull request at:

https://github.com/apache/kafka/pull/2310


> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.1.1
>
> Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, 
> com_ig_trade_v1_position_event--demo--compacted-14.tar.gz, 
> log-cleaner.log.1.zip, vrtstokf005_4thJan
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
> at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:107)
> at kafka.log.LogSegment.append(LogSegment.scala:106)
> at kafka.log.Cleaner.cleanInto(LogCleaner.scala:

[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-06 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15804723#comment-15804723
 ] 

Ismael Juma commented on KAFKA-4497:


[~michael.andre.pearce], I am not sure about the reason. I shared the feedback 
internally and will let you know privately. Best to stick to Apache Kafka 
discussions here. :)

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.1.1
>
> Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, 
> com_ig_trade_v1_position_event--demo--compacted-14.tar.gz, 
> log-cleaner.log.1.zip, vrtstokf005_4thJan
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
> at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:107)
> at kafka.log.LogSegment.append(LogSegment.scal

[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-06 Thread Michael Andre Pearce (IG) (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15804713#comment-15804713
 ] 

Michael Andre Pearce (IG) commented on KAFKA-4497:
--

[~ijuma][~becket_qin][~junrao] - thanks and sorry again for your time, effort 
and patience. 


> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.1.1
>
> Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, 
> com_ig_trade_v1_position_event--demo--compacted-14.tar.gz, 
> log-cleaner.log.1.zip, vrtstokf005_4thJan
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
> at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:107)
> at kafka.log.LogSegment.append(LogSegment.scala:106)
> at kafka.log.Cleaner.cleanInto(LogCl

[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-06 Thread Michael Andre Pearce (IG) (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15804705#comment-15804705
 ] 

Michael Andre Pearce (IG) commented on KAFKA-4497:
--

[~ijuma]
Yes we all work together. 

Re what tripped us up, re the un-versioned kafka.jar in the confluent package 
is there a reason this is done? (I know this is Apache but worth the ask)

We are happy for closing this. We cannot see or repeat the issue on 0.10.1.1

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.1.1
>
> Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, 
> com_ig_trade_v1_position_event--demo--compacted-14.tar.gz, 
> log-cleaner.log.1.zip, vrtstokf005_4thJan
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
> 

[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-06 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15804681#comment-15804681
 ] 

Ismael Juma commented on KAFKA-4497:


Thanks [~michael.andre.pearce]. I assume that [~andyh1983] is also in your 
team? If so, I think we can close this, right?

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.1.1
>
> Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, 
> com_ig_trade_v1_position_event--demo--compacted-14.tar.gz, 
> log-cleaner.log.1.zip, vrtstokf005_4thJan
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
> at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:107)
> at kafka.log.LogSegment.append(LogSegment.scala:106)
> at kafka.log.Cleaner.cleanInto(LogCl

[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-06 Thread Michael Andre Pearce (IG) (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15804660#comment-15804660
 ] 

Michael Andre Pearce (IG) commented on KAFKA-4497:
--

[~junrao] Sorry about this, it does indeed seem to be our patching that caused 
the issue, now we have removed the jar in question all seems to be good.

We have been testing all morning. We have not seen the issue exist again.

To explain what occurred - Our patching of the confluent package was done by 
replacing all jars. 

download latest from apache site. (2.11 scala version)

expand and copy all jars from:
./kafka_2.11.-0.10.1.1/libs

into 
confluent-3.1-2.1/share/java/kafka

then remove all files in
confluent-3.1-2.1/share/java/kafka
with *-0.10.1.0-cp2*.jar

What has caught us out was that in the confluent bundle the jar 
"kafka_2.11-0.10.1.0.jar" seems to be renamed "kafka.jar" without any 
versioning, unlike all the other Apache jars, which are versioned and are 
appended with -CP so on doing the above we ended up with both

kafka_2.11-0.10.1.1.jar and kafka.jar

as the kafka.jar was not removed.

We didn't easily spot this, as obviously we didn't see any 0.10.1.0 versioned 
jars on the class path, and like wise we did see 0.10.1.1 versions for all the 
expected jars.

Whilst this is obviously the apache jira, we're aware that confluent bundle is 
by yourselves, is there a reason why this is not versioned like all the other 
jars and as per the apache release in the confluent bundle? This obviously has 
tripped us up.


> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.1.1
>
> Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, 
> com_ig_trade_v1_position_event--demo--compacted-14.tar.gz, 
> log-cleaner.log.1.zip, vrtstokf005_4thJan
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogClea

[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-06 Thread Michael Andre Pearce (IG) (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15803984#comment-15803984
 ] 

Michael Andre Pearce (IG) commented on KAFKA-4497:
--

As noted, we are patching the confluent 3.1.0 build with 0.10.1.1, just as 
anyway we will deploy 3.1.2 when its released (we assume you're ensuring all 
bugs really are fixed before releasing) to be sure, is there an SNAPSHOT build 
/ package of the 3.1.2 you guys have this way can avoid any possible bad 
patching or discrepancies by ourselves.

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.1.1
>
> Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, 
> com_ig_trade_v1_position_event--demo--compacted-14.tar.gz, 
> log-cleaner.log.1.zip, vrtstokf005_4thJan
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeInd

[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-06 Thread Michael Andre Pearce (IG) (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15803919#comment-15803919
 ] 

Michael Andre Pearce (IG) commented on KAFKA-4497:
--

we have one discrepancy found, going to remove it and retest

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.1.1
>
> Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, 
> com_ig_trade_v1_position_event--demo--compacted-14.tar.gz, 
> log-cleaner.log.1.zip, vrtstokf005_4thJan
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
> at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:107)
> at kafka.log.LogSegment.append(LogSegment.scala:106)
> at kafka.log.Cleaner.cleanInto(LogCleaner.scala:518)
> at 
> k

[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-05 Thread Michael Andre Pearce (IG) (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15803893#comment-15803893
 ] 

Michael Andre Pearce (IG) commented on KAFKA-4497:
--

we are going to md5 checksum check every file this morning.

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.1.1
>
> Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, 
> com_ig_trade_v1_position_event--demo--compacted-14.tar.gz, 
> log-cleaner.log.1.zip, vrtstokf005_4thJan
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
> at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:107)
> at kafka.log.LogSegment.append(LogSegment.scala:106)
> at kafka.log.Cleaner.cleanInto(LogCleaner.scala:518)
> at 
> ka

[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-05 Thread Michael Andre Pearce (IG) (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15803829#comment-15803829
 ] 

Michael Andre Pearce (IG) commented on KAFKA-4497:
--

[~junrao] We are 99.9% jars are good, see above we double checked again the 
process class path on getting the issue last night, post reducing 
min.cleanable.dirty.ratio to make the cleaner kick in. If you note all jars are 
0.10.1.1 that are kafka.

We can offer to deploy a custom built jar, which can contain further 
logging/debug in for you if you wish (as we have this in a testing env) or we 
can offer to even host a webex or something so you can look at the system your 
self.

 

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.1.1
>
> Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, 
> com_ig_trade_v1_position_event--demo--compacted-14.tar.gz, 
> log-cleaner.log.1.zip, vrtstokf005_4thJan
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
> at 
> k

[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-05 Thread Michael Andre Pearce (IG) (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15803830#comment-15803830
 ] 

Michael Andre Pearce (IG) commented on KAFKA-4497:
--

[~junrao] We are 99.9% jars are good, see above we double checked again the 
process class path on getting the issue last night, post reducing 
min.cleanable.dirty.ratio to make the cleaner kick in. If you note all jars are 
0.10.1.1 that are kafka.

We can offer to deploy a custom built jar, which can contain further 
logging/debug in for you if you wish (as we have this in a testing env) or we 
can offer to even host a webex or something so you can look at the system your 
self.

 

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.1.1
>
> Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, 
> com_ig_trade_v1_position_event--demo--compacted-14.tar.gz, 
> log-cleaner.log.1.zip, vrtstokf005_4thJan
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
> at 
> k

[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-05 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15803184#comment-15803184
 ] 

Jun Rao commented on KAFKA-4497:


[~michael.andre.pearce], I did the following test. (1) Run Kafka 0.10.1.1 
binary from the website on the latest log you provided. The cleaner ran 
successfully. (2) Run Kafka 0.10.1.0 binary from the website on the same log. 
The cleaner failed on the same exception your reported. So, I am wondering if 
you can check your environment again to make sure the right jar is picked up at 
runtime. Also, you mentioned that someone else reported the same issue on 
0.10.1.1. Do you know where it is reported?

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.1.1
>
> Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, 
> com_ig_trade_v1_position_event--demo--compacted-14.tar.gz, 
> log-cleaner.log.1.zip, vrtstokf005_4thJan
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
> at 
> kafka.log.TimeIndex$

Re: [jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-05 Thread James Cheng

> On Jan 5, 2017, at 3:47 AM, Edoardo Comar  wrote:
> 
> "Jiangjie Qin (JIRA)"  wrote on 05/01/2017 08:06:58:
> 
>>[ https://issues.apache.org/jira/browse/KAFKA-4497?
>> page=com.atlassian.jira.plugin.system.issuetabpanels:comment-
>> tabpanel&focusedCommentId=15800696#comment-15800696 ] 
>> 
>> Jiangjie Qin commented on KAFKA-4497:
>> -
>> 
>> [~michael.andre.pearce] Thanks for helping test this again. 
>> 
>> If possible, could you enable trace level logging on 
>> kafka.log.LogSegment. This will print out the timestamp and the 
>> offset of the message with the largest timestamp, but it will 
>> probably flood the log4j log. So enable this logging is optional and
>> preferred only when the issue can be reproduced pretty quickly.
>> 
> 
> Is it possible to *dynamically* reconfigure log4j so that one could turn 
> tracing on/off for selected components 
> without editing the log4j.properties and restarting Kafka ? 
> 

I've done it before using JMX. 

If you have JMX enabled, you can use jconsole, connect to kafka, and do a 
setLoggerLevel() to change the logging level of a particular package. It 
changes it on a live server, without needing to reboot. It does not save the 
change past a reboot.

I found this page that has the steps (with screenshots) that show how to do it:
http://www.javaworld.com/article/2073316/dynamic-java-log-levels-with-jmx-loggingmxbean--jconsole--visualvm--and-groovy.html
 


-James


> The log4j API supports a watch-file-and-reload
> http://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/PropertyConfigurator.html#configureAndWatch(java.lang.String)
> but apparently that cannot happen declaratively by just putting the 
> properties file on the classpath.
> 
> Worth a new JIRA/KIP ?
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number 
> 741598. 
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU



[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-05 Thread Michael Andre Pearce (IG) (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802412#comment-15802412
 ] 

Michael Andre Pearce (IG) commented on KAFKA-4497:
--

confirming class path via the process: 

conflue+ 14435 14401  1 13:21 ?00:05:53 
/opt/projects/java/sun-jdk8-1.8.0.66/jre/bin/java -cp 
:/opt/projects/confluent/bin/../share/java/kafka/commons-beanutils-1.8.3.jar:/opt/projects/confluent/bin/../share/java/kafka/support-metrics-client-3.1.0.jar:/opt/projects/confluent/bin/../share/java/kafka/argparse4j-0.5.0.jar:/opt/projects/confluent/bin/../share/java/kafka/avro-1.7.7.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-codec-1.9.jar:/opt/projects/confluent/bin/../share/java/kafka/kafka.jar:/opt/projects/confluent/bin/../share/java/kafka/support-metrics-common-3.1.0.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-collections-3.2.1.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-digester-1.8.1.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-lang3-3.1.jar:/opt/projects/confluent/bin/../share/java/kafka/lz4-1.3.0.jar:/opt/projects/confluent/bin/../share/java/kafka/kafka_2.11-0.10.1.1-test.jar:/opt/projects/confluent/bin/../share/java/kafka/xz-1.0.jar:/opt/projects/confluent/bin/../share/java/kafka/connect-api-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/connect-runtime-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/kafka-log4j-appender-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/snappy-java-1.1.2.6.jar:/opt/projects/confluent/bin/../share/java/kafka/zkclient-0.9.jar:/opt/projects/confluent/bin/../share/java/kafka/httpclient-4.5.1.jar:/opt/projects/confluent/bin/../share/java/kafka/kafka_2.11-0.10.1.1-scaladoc.jar:/opt/projects/confluent/bin/../share/java/kafka/httpcore-4.4.3.jar:/opt/projects/confluent/bin/../share/java/kafka/connect-file-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/kafka-clients-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/kafka-streams-examples-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/kafka_2.11-0.10.1.1-sources.jar:/opt/projects/confluent/bin/../share/java/kafka/zookeeper-3.4.8.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-mapper-asl-1.9.13.jar:/opt/projects/confluent/bin/../share/java/kafka/httpmime-4.5.1.jar:/opt/projects/confluent/bin/../share/java/kafka/javax.annotation-api-1.2.jar:/opt/projects/confluent/bin/../share/java/kafka/kafka-tools-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/connect-json-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/kafka-streams-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/kafka_2.11-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/javax.inject-1.jar:/opt/projects/confluent/bin/../share/java/kafka/rocksdbjni-4.9.0.jar:/opt/projects/confluent/bin/../share/java/kafka/javax.servlet-api-3.1.0.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-core-asl-1.9.13.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-logging-1.2.jar:/opt/projects/confluent/bin/../share/java/kafka/javax.ws.rs-api-2.0.1.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-validator-1.4.1.jar:/opt/projects/confluent/bin/../share/java/kafka/kafka_2.11-0.10.1.1-javadoc.jar:/opt/projects/confluent/bin/../share/java/kafka/kafka_2.11-0.10.1.1-test-sources.jar:/opt/projects/confluent/bin/../share/java/kafka/hk2-locator-2.4.0-b34.jar:/opt/projects/confluent/bin/../share/java/kafka/hk2-utils-2.4.0-b34.jar:/opt/projects/confluent/bin/../share/java/kafka/jetty-util-9.2.15.v20160210.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-databind-2.6.3.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-jaxrs-base-2.6.3.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-jaxrs-json-provider-2.6.3.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-module-jaxb-annotations-2.6.3.jar:/opt/projects/confluent/bin/../share/java/kafka/javassist-3.18.2-GA.jar:/opt/projects/confluent/bin/../share/java/kafka/javax.inject-2.4.0-b34.jar:/opt/projects/confluent/bin/../share/java/kafka/reflections-0.9.10.jar:/opt/projects/confluent/bin/../share/java/kafka/jersey-client-2.22.2.jar:/opt/projects/confluent/bin/../share/java/kafka/jersey-common-2.22.2.jar:/opt/projects/confluent/bin/../share/java/kafka/jopt-simple-4.9.jar:/opt/projects/confluent/bin/../share/java/kafka/jersey-container-servlet-2.22.2.jar:/opt/projects/confluent/bin/../share/java/kafka/jersey-container-servlet-core-2.22.2.jar:/opt/projects/confluent/bin/../share/java/kafka/jersey-guava-2.22.2.jar:/opt/projects/confluent/bin/../share/java/kafka/jersey-media-jaxb-2.22.2.jar:/opt/projects/confluent/bin/../share/java/kafka/scala-library-2.11.8.jar:/opt/projects/confluent/bin/../share/java/kafka/log4j-1.2.17.jar:/opt/projects/confluent/bin/../share/java/kafka/metr

[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-05 Thread Michael Andre Pearce (IG) (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802394#comment-15802394
 ] 

Michael Andre Pearce (IG) commented on KAFKA-4497:
--

[~junrao] [~becket_qin] anything else you guys need? As its just happened and 
more chance of capturing anything extra.

[2017-01-05 19:37:07,144] ERROR [kafka-log-cleaner-thread-0], Error due to  
(kafka.log.LogCleaner)
kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
20 no larger than the last offset appended (50346) to /v
ar/kafka/logs/com_ig_trade_v1_position_event--demo--compacted-14/.timeindex.cleaned.
at 
kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
at kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
at kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:107)
at kafka.log.LogSegment.append(LogSegment.scala:106)
at kafka.log.Cleaner.cleanInto(LogCleaner.scala:518)
at 
kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCleaner.scala:404)
at 
kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCleaner.scala:400)
at scala.collection.immutable.List.foreach(List.scala:381)
at kafka.log.Cleaner.cleanSegments(LogCleaner.scala:400)
at kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:364)
at kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:363)
at scala.collection.immutable.List.foreach(List.scala:381)
at kafka.log.Cleaner.clean(LogCleaner.scala:363)
at kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:239)
at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:218)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
[2017-01-05 19:37:07,144] INFO [kafka-log-cleaner-thread-0], Stopped  
(kafka.log.LogCleaner)

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.1.1
>
> Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, 
> com_ig_trade_v1_position_event--demo--compacted-14.tar.gz, 
> log-cleaner.log.1.zip, vrtstokf005_4thJan
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,9

[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-05 Thread Michael Andre Pearce (IG) (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802390#comment-15802390
 ] 

Michael Andre Pearce (IG) commented on KAFKA-4497:
--

log4j  log-cleaner.log with TRACE, 
https://issues.apache.org/jira/secure/attachment/12845849/log-cleaner.log.1.zip

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.1.1
>
> Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, 
> com_ig_trade_v1_position_event--demo--compacted-14.tar.gz, 
> log-cleaner.log.1.zip, vrtstokf005_4thJan
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
> at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:107)
> at kafka.log.LogSegment.append(LogSegment.scala:106)
> at kafka.log.Cl

[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-05 Thread Michael Andre Pearce (IG) (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802365#comment-15802365
 ] 

Michael Andre Pearce (IG) commented on KAFKA-4497:
--

https://issues.apache.org/jira/secure/attachment/12845847/com_ig_trade_v1_position_event--demo--compacted-14.tar.gz

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.1.1
>
> Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, 
> com_ig_trade_v1_position_event--demo--compacted-14.tar.gz, vrtstokf005_4thJan
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
> at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:107)
> at kafka.log.LogSegment.append(LogSegment.scala:106)
> at kafka.log.Cleaner.cleanInto(LogCleaner

[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-05 Thread Michael Andre Pearce (IG) (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802326#comment-15802326
 ] 

Michael Andre Pearce (IG) commented on KAFKA-4497:
--

[2017-01-05 19:37:07,137] TRACE Inserting 88493 bytes at offset 49647 at 
position 4393661 with largest timestamp 1480518430821 at offset 49859 
(kafka.log.LogSegment)
[2017-01-05 19:37:07,138] INFO Cleaner 0: Cleaning segment 49860 in log 
com_ig_trade_v1_position_event--demo--compacted-14 (largest timestamp Wed Dec 
07 15:06:26 GMT 2016) into 0, retaining deletes. (kafka.log.LogCleaner)
[2017-01-05 19:37:07,138] TRACE Inserting 15951 bytes at offset 49863 at 
position 4482154 with largest timestamp 1481123186887 at offset 49971 
(kafka.log.LogSegment)
[2017-01-05 19:37:07,138] INFO Cleaner 0: Cleaning segment 49972 in log 
com_ig_trade_v1_position_event--demo--compacted-14 (largest timestamp Wed Dec 
14 14:22:00 GMT 2016) into 0, retaining deletes. (kafka.log.LogCleaner)
[2017-01-05 19:37:07,138] TRACE Inserting 7496 bytes at offset 49972 at 
position 4498105 with largest timestamp 1481722619382 at offset 50151 
(kafka.log.LogSegment)
[2017-01-05 19:37:07,138] INFO Cleaner 0: Cleaning segment 50155 in log 
com_ig_trade_v1_position_event--demo--compacted-14 (largest timestamp Wed Dec 
21 15:26:33 GMT 2016) into 0, retaining deletes. (kafka.log.LogCleaner)
[2017-01-05 19:37:07,138] TRACE Inserting 12632 bytes at offset 50158 at 
position 4505601 with largest timestamp 1482321739128 at offset 50346 
(kafka.log.LogSegment)
[2017-01-05 19:37:07,138] INFO Cleaner 0: Cleaning segment 50349 in log 
com_ig_trade_v1_position_event--demo--compacted-14 (largest timestamp Wed Dec 
28 16:07:21 GMT 2016) into 0, retaining deletes. (kafka.log.LogCleaner)
[2017-01-05 19:37:07,140] TRACE Inserting 11743 bytes at offset 50350 at 
position 4518233 with largest timestamp 1482933300393 at offset -1 
(kafka.log.LogSegment)
[2017-01-05 19:37:07,144] ERROR [kafka-log-cleaner-thread-0], Error due to  
(kafka.log.LogCleaner)
kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
20 no larger than the last offset appended (50346) to 
/var/kafka/logs/com_ig_trade_v1_position_event--demo--compacted-14/.timeindex.cleaned.
at 
kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
at kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
at kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:107)
at kafka.log.LogSegment.append(LogSegment.scala:106)
at kafka.log.Cleaner.cleanInto(LogCleaner.scala:518)
at 
kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCleaner.scala:404)
at 
kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCleaner.scala:400)
at scala.collection.immutable.List.foreach(List.scala:381)
at kafka.log.Cleaner.cleanSegments(LogCleaner.scala:400)
at kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:364)
at kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:363)
at scala.collection.immutable.List.foreach(List.scala:381)
at kafka.log.Cleaner.clean(LogCleaner.scala:363)
at kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:239)
at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:218)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.1.1
>
> Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, 
> vrtstokf005_4thJan
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created comple

[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-05 Thread Michael Andre Pearce (IG) (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802319#comment-15802319
 ] 

Michael Andre Pearce (IG) commented on KAFKA-4497:
--

Perfect, that triggered it to run, and now has occured again, just going to 
capture all the bits for you again.

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.1.1
>
> Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, 
> vrtstokf005_4thJan
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
> at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:107)
> at kafka.log.LogSegment.append(LogSegment.scala:106)
> at kafka.log.Cleaner.cleanInto(LogCleaner.scala:518)
> at 
> kafka.log.Cleaner$$anonfun$cleanSe

[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-05 Thread Michael Andre Pearce (IG) (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802233#comment-15802233
 ] 

Michael Andre Pearce (IG) commented on KAFKA-4497:
--

[~junrao] , we took it from the website, via the links, downloading from the 
first mirror site, 
http://apache.mirror.anlx.net/kafka/0.10.1.1/kafka_2.11-0.10.1.1.tgz

Will look to reduce that down then.

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.1.1
>
> Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, 
> vrtstokf005_4thJan
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
> at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:107)
> at kafka.log.LogSegment.append(LogSegment.scala:106)
> at kafka.log

[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-05 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802131#comment-15802131
 ] 

Jun Rao commented on KAFKA-4497:


[~michael.andre.pearce], thanks for the update. Did you get the 0.10.1.1 binary 
from the Apache Kafka website or from maven? Also, if you reduce 
min.cleanable.dirty.ratio (can be done dynamically on a topic), it will force 
log cleaner to kick in sooner.

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.1.1
>
> Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, 
> vrtstokf005_4thJan
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
> at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:107)
> at kafka.log.LogSegment.append(LogSegment.scala:106)
>

[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-05 Thread Michael Andre Pearce (IG) (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802034#comment-15802034
 ] 

Michael Andre Pearce (IG) commented on KAFKA-4497:
--

Btw since re-rolling forward the 0.10.1.1 back into a testing env, we're still 
waiting for re-occurance.

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.1.1
>
> Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, 
> vrtstokf005_4thJan
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
> at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:107)
> at kafka.log.LogSegment.append(LogSegment.scala:106)
> at kafka.log.Cleaner.cleanInto(LogCleaner.scala:518)
> at 
> kafka.log.Cleaner$$anonfun$cleanSegments$

[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-05 Thread Michael Andre Pearce (IG) (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802030#comment-15802030
 ] 

Michael Andre Pearce (IG) commented on KAFKA-4497:
--

When we got the issue on 0.10.1.1 we were just as surprised and as such that 
was our very first thing we checked.

our internal mail at the time when this occured in UAT (first env we saw it 
post 0.10.1.1 upgrade):

>

From: Andrew Holford 
Date: Tuesday, 3 January 2017 at 17:07
To: Michael Pearce , Thomas Brown 
Cc: William Hargrove , Anton Goldenfarb 
, Neil Laurance , Gavin Sandie 

Subject: Re: Kafka Issues fixed in 0.10.1.1 how we can patch until confluent 
release.

Yeah I already double checked that, all the cp1 jars are gone and the 
0.10.1.1’s exist, pasted below, hope ive missed something..?
 
Class path: 
:/opt/projects/confluent/bin/../share/java/kafka/jetty-continuation-9.2.15.v20160210.jar:/opt/projects/confluent/bin/../share/java/kafka/aopalliance-repackaged-2.4.0-b34.jar:/opt/projects/confluent/bin/../share/java/kafka/connect-file-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/argparse4j-0.5.0.jar:/opt/projects/confluent/bin/../share/java/kafka/connect-json-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/avro-1.7.7.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-beanutils-1.8.3.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-codec-1.9.jar:/opt/projects/confluent/bin/../share/java/kafka/jetty-http-9.2.15.v20160210.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-collections-3.2.1.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-compress-1.4.1.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-digester-1.8.1.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-lang3-3.1.jar:/opt/projects/confluent/bin/../share/java/kafka/log4j-1.2.17.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-logging-1.2.jar:/opt/projects/confluent/bin/../share/java/kafka/lz4-1.3.0.jar:/opt/projects/confluent/bin/../share/java/kafka/commons-validator-1.4.1.jar:/opt/projects/confluent/bin/../share/java/kafka/metrics-core-2.2.0.jar:/opt/projects/confluent/bin/../share/java/kafka/jetty-io-9.2.15.v20160210.jar:/opt/projects/confluent/bin/../share/java/kafka/jetty-util-9.2.15.v20160210.jar:/opt/projects/confluent/bin/../share/java/kafka/jopt-simple-4.9.jar:/opt/projects/confluent/bin/../share/java/kafka/connect-runtime-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/guava-18.0.jar:/opt/projects/confluent/bin/../share/java/kafka/osgi-resource-locator-1.0.1.jar:/opt/projects/confluent/bin/../share/java/kafka/hk2-api-2.4.0-b34.jar:/opt/projects/confluent/bin/../share/java/kafka/paranamer-2.3.jar:/opt/projects/confluent/bin/../share/java/kafka/hk2-locator-2.4.0-b34.jar:/opt/projects/confluent/bin/../share/java/kafka/reflections-0.9.10.jar:/opt/projects/confluent/bin/../share/java/kafka/hk2-utils-2.4.0-b34.jar:/opt/projects/confluent/bin/../share/java/kafka/kafka-clients-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/httpclient-4.5.1.jar:/opt/projects/confluent/bin/../share/java/kafka/kafka-streams-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/httpcore-4.4.3.jar:/opt/projects/confluent/bin/../share/java/kafka/kafka-tools-0.10.1.1.jar:/opt/projects/confluent/bin/../share/java/kafka/httpmime-4.5.1.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-annotations-2.6.0.jar:/opt/projects/confluent/bin/../share/java/kafka/rocksdbjni-4.9.0.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-core-2.6.3.jar:/opt/projects/confluent/bin/../share/java/kafka/scala-library-2.11.8.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-core-asl-1.9.13.jar:/opt/projects/confluent/bin/../share/java/kafka/slf4j-api-1.7.21.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-databind-2.6.3.jar:/opt/projects/confluent/bin/../share/java/kafka/slf4j-log4j12-1.7.21.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-jaxrs-base-2.6.3.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-jaxrs-json-provider-2.6.3.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-mapper-asl-1.9.13.jar:/opt/projects/confluent/bin/../share/java/kafka/jersey-media-jaxb-2.22.2.jar:/opt/projects/confluent/bin/../share/java/kafka/jackson-module-jaxb-annotations-2.6.3.jar:/opt/projects/confluent/bin/../share/java/kafka/snappy-java-1.1.2.6.jar:/opt/projects/confluent/bin/../share/java/kafka/javassist-3.18.2-GA.jar:/opt/projects/confluent/bin/../share/java/kafka/support-metrics-client-3.1.0.jar:/opt/projects/confluent/bin/../share/java/kafka/javax.annotation-api-1.2.jar:/opt/projects/confluent/bin/../share/java/kafka/kafka_2.11-0.10.1.1-test.jar:/opt/projects/confluent/bin/../share/java/kafka/javax.inject-1.jar:/opt/projects/confluent/bin/../share/java/kafka/support-metrics-common-3.1.0.jar:/opt/projects/confluent

[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-05 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15801977#comment-15801977
 ] 

Jun Rao commented on KAFKA-4497:


[~michael.andre.pearce], similar to [~becket_qin], I couldn't reproduce the 
issue locally. I copied the Kafka log you provided to a local directory, 
started the 0.10.1.1 broker and waited for the log cleaner to kick in. The 
cleaner was able to complete the cleaning up to the offset in the last segment 
successfully. Could you double check that you have applied the correct 0.10.1.1 
jar in your environment (make sure there are no old Kafka jars also in the 
class path)?

[2017-01-04 17:22:03,502] INFO [kafka-log-cleaner-thread-0], Starting  
(kafka.log.LogCleaner)
[2017-01-04 17:22:03,517] INFO Cleaner 0: Beginning cleaning of log test-0. 
(kafka.log.LogCleaner)
[2017-01-04 17:22:03,518] INFO Cleaner 0: Building offset map for test-0... 
(kafka.log.LogCleaner)
[2017-01-04 17:22:03,544] INFO Cleaner 0: Building offset map for log test-0 
for 3 segments in offset range [0, 4961). (kafka.log.LogCleaner)
[2017-01-04 17:22:03,775] INFO Cleaner 0: Offset map for log test-0 complete. 
(kafka.log.LogCleaner)
[2017-01-04 17:22:03,780] INFO Cleaner 0: Cleaning log test-0 (cleaning prior 
to Wed Jan 04 17:21:10 PST 2017, discarding tombstones prior to Wed Dec 31 
16:00:00 PST 1969)... (kafka.log.LogCleaner)
[2017-01-04 17:22:03,783] INFO Cleaner 0: Cleaning segment 0 in log test-0 
(largest timestamp Wed Dec 21 04:37:54 PST 2016) into 0, retaining deletes. 
(kafka.log.LogCleaner)
[2017-01-04 17:22:03,829] INFO Cleaner 0: Cleaning segment 3374 in log test-0 
(largest timestamp Wed Dec 28 07:22:50 PST 2016) into 0, retaining deletes. 
(kafka.log.LogCleaner)
[2017-01-04 17:22:03,921] INFO Cleaner 0: Cleaning segment 4172 in log test-0 
(largest timestamp Wed Jan 04 07:37:48 PST 2017) into 0, retaining deletes. 
(kafka.log.LogCleaner)
[2017-01-04 17:22:04,065] INFO Cleaner 0: Swapping in cleaned segment 0 for 
segment(s) 0,3374,4172 in log test-0. (kafka.log.LogCleaner)
[2017-01-04 17:22:04,095] INFO [kafka-log-cleaner-thread-0], 
Log cleaner thread 0 cleaned log test-0 (dirty section = [0, 4961])
1.0 MB of log processed in 0.6 seconds (1.7 MB/sec).
Indexed 1.0 MB in 0.3 seconds (3.7 Mb/sec, 46.2% of total time)
Buffer utilization: 0.0%
Cleaned 1.0 MB in 0.3 seconds (3.2 Mb/sec, 53.8% of total time)
Start size: 1.0 MB (2,909 messages)
End size: 0.4 MB (1,325 messages)
62.2% size reduction (54.5% fewer messages)
 (kafka.log.LogCleaner)


> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.1.1
>
> Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, 
> vrtstokf005_4thJan
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other top

[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-05 Thread Michael Andre Pearce (IG) (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15801244#comment-15801244
 ] 

Michael Andre Pearce (IG) commented on KAFKA-4497:
--

https://issues.apache.org/jira/secure/attachment/12845787/vrtstokf005_4thJan

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.1.1
>
> Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2, 
> vrtstokf005_4thJan
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
> at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:107)
> at kafka.log.LogSegment.append(LogSegment.scala:106)
> at kafka.log.Cleaner.cleanInto(LogCleaner.scala:518)
> at 
> kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCleaner.scala:404)

Re: [jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-05 Thread Edoardo Comar
"Jiangjie Qin (JIRA)"  wrote on 05/01/2017 08:06:58:

> [ https://issues.apache.org/jira/browse/KAFKA-4497?
> page=com.atlassian.jira.plugin.system.issuetabpanels:comment-
> tabpanel&focusedCommentId=15800696#comment-15800696 ] 
> 
> Jiangjie Qin commented on KAFKA-4497:
> -
> 
> [~michael.andre.pearce] Thanks for helping test this again. 
> 
> If possible, could you enable trace level logging on 
> kafka.log.LogSegment. This will print out the timestamp and the 
> offset of the message with the largest timestamp, but it will 
> probably flood the log4j log. So enable this logging is optional and
> preferred only when the issue can be reproduced pretty quickly.
> 

Is it possible to *dynamically* reconfigure log4j so that one could turn 
tracing on/off for selected components 
without editing the log4j.properties and restarting Kafka ? 

The log4j API supports a watch-file-and-reload
http://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/PropertyConfigurator.html#configureAndWatch(java.lang.String)
but apparently that cannot happen declaratively by just putting the 
properties file on the classpath.

Worth a new JIRA/KIP ?
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU


[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-05 Thread Michael Andre Pearce (IG) (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15800869#comment-15800869
 ] 

Michael Andre Pearce (IG) commented on KAFKA-4497:
--

re configs, i can give these now.

log.cleaner.backoff.ms = 15000
log.cleaner.dedupe.buffer.size = 134217728
log.cleaner.delete.retention.ms = 8640
log.cleaner.enable = true
log.cleaner.io.buffer.load.factor = 0.9
log.cleaner.io.buffer.size = 524288
log.cleaner.io.max.bytes.per.second = 1.7976931348623157E308
log.cleaner.min.cleanable.ratio = 0.5
log.cleaner.min.compaction.lag.ms = 0
log.cleaner.threads = 1
log.cleanup.policy = [delete]

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.1.1
>
> Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:

[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-05 Thread Jiangjie Qin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15800696#comment-15800696
 ] 

Jiangjie Qin commented on KAFKA-4497:
-

[~michael.andre.pearce] Thanks for helping test this again. 

I was just asking for the log4j log because you mentioned the time when the 
issue occurred. I think the raw log segment files when log cleaner thread 
throws exception should be sufficient. Also could you let us know the 
{{log.cleaner.io.buffer.size}} config value? 

Not sure if you are going to test in production environment or not. If 
possible, could you enable trace level logging on kafka.log.LogSegment. This 
will print out the timestamp and the offset of the message with the largest 
timestamp, but it will probably flood the log4j log. So enable this logging is 
optional and preferred only when the issue can be reproduced pretty quickly.

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.1.1
>
> Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> k

[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-04 Thread Michael Andre Pearce (IG) (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15800617#comment-15800617
 ] 

Michael Andre Pearce (IG) commented on KAFKA-4497:
--

Hi [~becket_qin] Sorry we thought [~junrao] only asked for the segment logs so 
we just packaged up that partition logs not the log4j also. We've now rolled 
back that environment. 

I will discuss today to see if we can roll back forwards and see if we can 
re-create again on 0.10.1.1, so we're 100% clear, as if we do it, we will 
probably roll back once we detect the issue again, can you list what else you 
will need or want precisely so we capture everything.

Cheers
Mike

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.1.1
>
> Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at 
> kafka.log.TimeIndex$$anonfun$may

[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-04 Thread Jiangjie Qin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15800498#comment-15800498
 ] 

Jiangjie Qin commented on KAFKA-4497:
-

[~michael.andre.pearce] Sorry for the late response, just got time to look at 
this. I have been trying to reproduce the issue with the log segments files you 
uploaded but was not able to reproduce. I did not find the log4j logs in the 
attached gz file. Am I missing something?

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.1.1
>
> Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
> at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:107)
> at kafka.log.LogSegment.append(LogSegment.scala:

[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-04 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15799617#comment-15799617
 ] 

Ismael Juma commented on KAFKA-4497:


Yes, the plan is to include the fix in Confluent  3.1.2.

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.1.1
>
> Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
> at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:107)
> at kafka.log.LogSegment.append(LogSegment.scala:106)
> at kafka.log.Cleaner.cleanInto(LogCleaner.scala:518)
> at 
> kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCleaner.scala:404)
> at 
> kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCl

[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-04 Thread Michael Andre Pearce (IG) (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15799599#comment-15799599
 ] 

Michael Andre Pearce (IG) commented on KAFKA-4497:
--

Btw, we are using actually normally the Confluent build. though we decided to 
patch 3.1.x with 0.10.1.1 manually to see if it solved the issue in our testing 
and uat envs, thus we found it didn't resolve the issue. 

If you find the issue and have a fix, would you imagine you'd get it patched 
into the upcoming confluent build based on 0.10.1.1 , so we wouldn't have to 
immediately hot patch that?

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.1.1
>
> Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at kafka.utils.CoreUtils$.inL

[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-04 Thread Michael Andre Pearce (IG) (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15799594#comment-15799594
 ] 

Michael Andre Pearce (IG) commented on KAFKA-4497:
--

[~junrao] just attached, with everything. As its testing env its all non real 
data, so we can share.

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.1.1
>
> Attachments: com_ig_trade_v1_order_event--demo--compacted-7.tar.bz2
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
> at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:107)
> at kafka.log.LogSegment.append(LogSegment.scala:106)
> at kafka.log.Cleaner.cleanInto(LogCleaner.scala:518)
> at 
> kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCleaner.scala:40

[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-04 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15799565#comment-15799565
 ] 

Jun Rao commented on KAFKA-4497:


[~michael.andre.pearce], if you could attach the problematic .log file to the 
jira, that would be great. The patch you provided is a good temporary fix. So, 
if you want to patch it locally to stabilize your local environment, it should 
be fine. Once we have identified the root cause and have it fixed, we probably 
should assert that offsetOfMaxTimestamp is always non-negative (when 
maxTimestamp is >= 0), instead of ignore it.

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.1.1
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
> at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala

[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-04 Thread Michael Andre Pearce (IG) (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15799490#comment-15799490
 ] 

Michael Andre Pearce (IG) commented on KAFKA-4497:
--

[~junrao]We had this occur now in our testing env also (so now 4 envs 
affected), so we can get the log segments from there. Currently just sorting 
out another environment with our operations team tonight trying to handle the 
fact disks are filling rapidly, once done we will go try get them out of 
testing env, onto this ticket tonight, so you have them to look at during your 
day/our night.

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.1.1
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
> at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:

[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-04 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15799192#comment-15799192
 ] 

Jun Rao commented on KAFKA-4497:


[~michael.andre.pearce], sorry for the production issue this is causing. My 
only concern of hiding the problem is that it may lead to other issues down the 
road. Since you can reliably reproduce this issue, it shouldn't take too long 
for us to figure out the exact cause. To avoid impacting the production system, 
perhaps you can just copy the affected log segment to a separate cluster, add a 
bit instrumentation to the code on offsetOfMaxTimestamp and force log cleaning. 
Alternatively, if you are ok with attaching the log segment to the jira, we can 
also debug it for you.

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.1.1
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.a

[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-04 Thread Michael Andre Pearce (IG) (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15798847#comment-15798847
 ] 

Michael Andre Pearce (IG) commented on KAFKA-4497:
--

[~junrao] This is causing us a production issue on the current version, 
currently also looking now at rolling back like eBay has to 0.10.0.1, unless we 
get this fixed asap.

Whilst we agree in the purist view that we should fix the root cause, and happy 
to add logging later to further find that, is there any reason not to accept 
this PR to fix/treating the symptom to avoid production issue.

I would imagine others also will start hitting the issue too, we're not alone, 
as such would cause blocker to anyone upgrading to 0.10.1.0/0.10.1.1 now.

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.1.1
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppen

[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-04 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15798702#comment-15798702
 ] 

Jun Rao commented on KAFKA-4497:


[~michael.andre.pearce], thanks for reporting this. It would be good to fix the 
cause instead of ignoring the -1 offset passed to the time index. Looking at 
the code, there isn't anything obvious that can lead to this. The only case 
when offsetOfMaxTimestamp can be -1 is when maxTimestamp is NoTimestamp. In 
that case, the time index won't be updated. Could you patch the code to log the 
maxTimestamp when offsetOfMaxTimestamp is -1? The value of maxTimestamp may 
give us some clue on what the issue might be.

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.1.1
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at kafka.utils.CoreUtils$.inLoc

[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-04 Thread Michael Andre Pearce (IG) (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15798007#comment-15798007
 ] 

Michael Andre Pearce (IG) commented on KAFKA-4497:
--

[~ijuma] it seems exactly the same, we note that its blowing up because -1 is 
still being sent to the TimeIndex, as noted above when -1 it means there is no 
offset for the messageset, as noted above it can be empty,  also as noted 
earlier a naive approach to solve is to simply avoid it throwing the exception 
so the compaction thread isn't killed. 

As such can you review and comment on our PR which is simply in the maybeAppend 
method to not do anything if offset is -1 (special case). Whilst naive it seems 
this avoids the issue. Can we confirm there is nothing adversely missed with 
this approach.



> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.1.1
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:1

[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-04 Thread Michael Andre Pearce (IG) (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15798006#comment-15798006
 ] 

Michael Andre Pearce (IG) commented on KAFKA-4497:
--

[~ijuma] it seems exactly the same, we note that its blowing up because -1 is 
still being sent to the TimeIndex, as noted above when -1 it means there is no 
offset for the messageset, as noted above it can be empty,  also as noted 
earlier a naive approach to solve is to simply avoid it throwing the exception 
so the compaction thread isn't killed. 

As such can you review and comment on our PR which is simply in the maybeAppend 
method to not do anything if offset is -1 (special case). Whilst naive it seems 
this avoids the issue. Can we confirm there is nothing adversely missed with 
this approach.



> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.1.1
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:1

[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15797997#comment-15797997
 ] 

ASF GitHub Bot commented on KAFKA-4497:
---

GitHub user ig-michaelpearce opened a pull request:

https://github.com/apache/kafka/pull/2310

KAFKA-4497 - Ignore offset -1 in TimeIndex avoiding exception issue

Ignore offset -1 in TimeIndex avoiding exception issue Killing compaction 
thread.

As it seems issue remains, in 0.10.1.1 that during compaction -1 is
being sent to TimeIndex, as anyhow this means a messages with no
messages, we should ignore -1 offset, to avoid exception being thrown
and killing the compaction thread.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/IG-Group/kafka KAFKA-4497

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2310.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2310


commit 64c428f231d5c8be98d5a6a649e0e09d6cd341ce
Author: Michael Pearce 
Date:   2017-01-04T11:18:50Z

KAFKA-4497 - Ignore offset -1 in TimeIndex avoiding exception issue killing 
compaction thread.

As it seems issue remains, in 0.10.1.1 that during compaction -1 is
being sent to TimeIndex, as anyhow this means a messages with no
messages, we should ignore -1 offset, to avoid exception being thrown
and killing the compaction thread.




> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.1.1
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones pr

[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-04 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15797900#comment-15797900
 ] 

Ismael Juma commented on KAFKA-4497:


[~andyh1983], is the error exactly the same? Can you perhaps provide the logs 
so that we can double-check? Regarding `inter.broker.protocol.version` and 
`log.message.format.version`, `0.10.1.0` is fne. Downgrading is only safe if 
you haven't bumped `log.message.format.version`. Once that is done, it's hard.

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.1.1
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
> at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:107)
> at kafka.log.LogSegment.append(LogSegment.scala:106)
> at kafka.log.Cleaner.cleanInto(LogC

[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2017-01-03 Thread Andrew Holford (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15795598#comment-15795598
 ] 

Andrew Holford commented on KAFKA-4497:
---

Hi,

We have hit this issue in one of our test environments with version 0.10.1.0 so 
have been able to upgrade to 0.10.1.1, unfortunately the issue does not seem to 
be resolved and the log cleaner crashes with the same error.

We still have the following set post upgrade:

inter.broker.protocol.version=0.10.1.0
log.message.format.version=0.10.1.0

Do we need to move these up to 0.10.1.1 for the fix to apply?

If not is there a safe way / procedure we could follow to downgrade to 
0.10.0.1, where we didn't experience this issue.

Thanks,
Andy

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
> Fix For: 0.10.1.1
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at kafka.utils.Core

[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2016-12-29 Thread Jiangjie Qin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15786213#comment-15786213
 ] 

Jiangjie Qin commented on KAFKA-4497:
-

[~sandieg] I cannot think of a good workaround on this without deploying new 
code. If the log cleaner thread is dead, bouncing the broker seems the best we 
can do although the log cleaner thread may die again after the bounce. If 
possible, you can try reduce the log.cleaner.io.buffer.size a little to reduce 
the likelihood of hitting this issue, but notice that the IO buffer size should 
be at least 2x of max message size.

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
> Fix For: 0.10.1.1
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
> at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:107)
> at kafka.log.

[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2016-12-28 Thread Gavin Sandie (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15783222#comment-15783222
 ] 

Gavin Sandie commented on KAFKA-4497:
-

Hello,

We are experiencing this problem in one of our environments. Exactly the same 
error and symptoms.

Are there any manual actions we can take to work around the problem? Right now 
we're not able to upgrade to 0.10.1.1 and we're using disk space quite quickly.

Cheers, Gavin

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
> Fix For: 0.10.1.1
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
> at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:107)
> at kafka.log.LogSegment.append(LogSegment.scala:106)
> at kafka.log.Cleaner.cleanInto(LogCleaner.scala:518)
> at 
> kafka.log.Cleaner$$anonfun$c

[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2016-12-14 Thread Robert Schumann (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15747829#comment-15747829
 ] 

Robert Schumann commented on KAFKA-4497:


[~becket_qin] [~junrao]: Many thanks for fixing it! Due to time pressure in our 
project we decided to downgrade to 0.10.0.1, so I'm currently not really able 
to test this. However, I think we will give 0.10.1.1 a try soon in some 
integration system and report back.

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
> Fix For: 0.10.1.1
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
> at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:107)
> at kafka.log.LogSegment.append(LogSegment.scala:106)
> at kafka.log.Cleaner.cleanInto(LogCleaner.scala:518)
> at 
> kafka.log.Cleaner$$anonfun$cleanSegm

[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2016-12-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15746249#comment-15746249
 ] 

ASF GitHub Bot commented on KAFKA-4497:
---

Github user becketqin closed the pull request at:

https://github.com/apache/kafka/pull/2242


> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
> Fix For: 0.10.1.1
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
> at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:107)
> at kafka.log.LogSegment.append(LogSegment.scala:106)
> at kafka.log.Cleaner.cleanInto(LogCleaner.scala:518)
> at 
> kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCleaner.scala:404)
> at 
> kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCleaner.scala:400)
> at scala.collection.immutable.List.foreac

[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2016-12-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15745742#comment-15745742
 ] 

ASF GitHub Bot commented on KAFKA-4497:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2243


> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
> Fix For: 0.10.1.1
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
> at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:107)
> at kafka.log.LogSegment.append(LogSegment.scala:106)
> at kafka.log.Cleaner.cleanInto(LogCleaner.scala:518)
> at 
> kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCleaner.scala:404)
> at 
> kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCleaner.scala:400)
> at scala.collection.immutable.List.foreach(L

[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2016-12-12 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743756#comment-15743756
 ] 

Jun Rao commented on KAFKA-4497:


Got it. I thought Robert was showing the DumpLogSegment output and I was 
wondering why it didn't show the offset as -1.

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
> Fix For: 0.10.1.1
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
> at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:107)
> at kafka.log.LogSegment.append(LogSegment.scala:106)
> at kafka.log.Cleaner.cleanInto(LogCleaner.scala:518)
> at 
> kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCleaner.scala:404)
> at 
> kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCleaner.scala:400)
> at scala.collection.immutable.Li

[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2016-12-12 Thread Jiangjie Qin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743650#comment-15743650
 ] 

Jiangjie Qin commented on KAFKA-4497:
-

The negative bit is the MSB which is 1 in 0X. The entry Robert saw was 
[0158 9c490ae1 ], so it is a [Long + an Integer]. The long is the 
timestamp, and the integer is the relative offset which is -1 in our case 
because the base offset was 0.

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
> Fix For: 0.10.1.1
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
> at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:107)
> at kafka.log.LogSegment.append(LogSegment.scala:106)
> at kafka.log.Cleaner.cleanInto(LogCleaner.scala:518)
> at 
> kafka.log.Cleaner$$anonfun$cleanSegments$1.app

[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2016-12-12 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743297#comment-15743297
 ] 

Jun Rao commented on KAFKA-4497:


[~becket_qin], it could be. Where do we lose the negative sign bit?

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
> Fix For: 0.10.1.1
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
> at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:107)
> at kafka.log.LogSegment.append(LogSegment.scala:106)
> at kafka.log.Cleaner.cleanInto(LogCleaner.scala:518)
> at 
> kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCleaner.scala:404)
> at 
> kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCleaner.scala:400)
> at scala.collection.immutable.List.foreach(List.scala:381)
> at kafka.log.Cle

[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2016-12-12 Thread Jiangjie Qin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743122#comment-15743122
 ] 

Jiangjie Qin commented on KAFKA-4497:
-

[~junrao] I think 0x is actually -1, right? In the time index, we are 
not checking negative offset. Maybe we can add that check.

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
> Fix For: 0.10.1.1
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
> at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:107)
> at kafka.log.LogSegment.append(LogSegment.scala:106)
> at kafka.log.Cleaner.cleanInto(LogCleaner.scala:518)
> at 
> kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCleaner.scala:404)
> at 
> kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCleaner.scala:400)
> at sc

[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2016-12-12 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15742677#comment-15742677
 ] 

Jun Rao commented on KAFKA-4497:


[~becket_qin], thanks for providing the fix. The fix seems to address the issue 
with inserting a -1 offset to time index. Do you know what might cause an 
offset of 2^32-1 to be inserted into .timeindex.cleaned as Robert described 
earlier?

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
> Fix For: 0.10.1.1
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
> at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:107)
> at kafka.log.LogSegment.append(LogSegment.scala:106)
> at kafka.log.Cleaner.cleanInto(LogCleaner.scala:518)
> at 
> kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCleaner.scala:404)
>   

[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2016-12-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15740688#comment-15740688
 ] 

ASF GitHub Bot commented on KAFKA-4497:
---

GitHub user becketqin opened a pull request:

https://github.com/apache/kafka/pull/2243

KAFKA-4497: LogCleaner appended the wrong offset to time index. Backport 
the fix to 0.10.1.

The code in trunk has been changed, so we need to provide a different fix 
for this problem in 0.10.1. 

Some explanations. I was able to reproduce the initial issue with a unit 
test. I did not include the unit test here because the tests requires a lot of 
hard coded constants to precisely control the message size after compression, 
read buffer and which message is filtered out. In trunk I have added a unit 
test to `ByteBufferMessageSet.fliterInto()` which holds the current buggy 
logic. Given no one will be developing on 0.10.1 branch anymore except bug 
fixes, I think it is fine to leave the ugly unit test out of this patch. 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/becketqin/kafka KAFKA-4497-0.10.1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2243.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2243


commit 5b869e7ee4c6b554787330c386988b5447b127ca
Author: Jiangjie Qin 
Date:   2016-12-12T01:32:05Z

KAFKA-4497: LogCleaner appended the wrong offset to time index. Backport 
the fix to 0.10.1.




> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
> Fix For: 0.10.1.1
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 

[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2016-12-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15740662#comment-15740662
 ] 

ASF GitHub Bot commented on KAFKA-4497:
---

GitHub user becketqin opened a pull request:

https://github.com/apache/kafka/pull/2242

KAFKA-4497: Fix the ByteBufferMessageSet.filterInto

There seems a few bugs in `ByteBufferMessageSet.filterInto`, including 
incorrect offsets, messages read and retained. This patch fixes them and added 
unit test.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/becketqin/kafka KAFKA-4497

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2242.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2242


commit b01f88466461f04c683eb35207b6ec5d4fa04698
Author: Jiangjie Qin 
Date:   2016-12-12T00:38:54Z

KAFKA-4497: Fix the ByteBufferMessageSet.filterInto




> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
> Fix For: 0.10.1.1
>
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status

[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2016-12-10 Thread Jiangjie Qin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15739035#comment-15739035
 ] 

Jiangjie Qin commented on KAFKA-4497:
-

To be more clear, this part of the code has been changed in the trunk (the bug 
remains). 

I will provide two patches, one for trunk the other one for 0.10.1 branch.

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Assignee: Jiangjie Qin
>Priority: Critical
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
> at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:107)
> at kafka.log.LogSegment.append(LogSegment.scala:106)
> at kafka.log.Cleaner.cleanInto(LogCleaner.scala:518)
> at 
> kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCleaner.scala:404)
> at 
> kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCleaner.scala:400)
> at scala.c

[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2016-12-10 Thread Jiangjie Qin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15739018#comment-15739018
 ] 

Jiangjie Qin commented on KAFKA-4497:
-

I looked at the log cleaner code in 0.10.1.  There is a bug that could cause 
this problem. For each compressed message set, The time index should have the 
offset of the exact inner message with the largest timestamp, but in the log 
cleaner we are using the last message's offset in the last message set it reads 
into the read buffer. 

The current logic has a problem of generating a -1 offset for the compacted 
message sets even though the write buffer is not empty.

In log cleaner, we may read a few compressed message set into read buffer at a 
time, and compact out some messages from each of the compressed message set. 
Consider the following case:

The read buffer reads two message sets, MessageSet 1 and MessageSet 2. And 
after compaction, 
1. MessageSet 1 has retained messages [M1, M2]
2. MessageSet 2 does not have retained messages.

What Log Cleaner will do is to append [M1, M2] to the log but the 
OffsetOfTheLargestTimestamp will be -1 because the last message set is 
MessageSet 2 and it has no retained messages.

The fix is to update the {{OffsetOfLargestTimestamp}} together with the update 
of {{maxTimestamp}} (the same as we are doing in 
{{ByteBufferMessageSet.validateMessagesAndAssignOffsets()}}. 

I'll provide a patch.



> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Priority: Critical
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka

[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2016-12-09 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15736646#comment-15736646
 ] 

Ismael Juma commented on KAFKA-4497:


There's no way to disable the time index. [~becket_qin], any ideas?

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Priority: Critical
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
> at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:107)
> at kafka.log.LogSegment.append(LogSegment.scala:106)
> at kafka.log.Cleaner.cleanInto(LogCleaner.scala:518)
> at 
> kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCleaner.scala:404)
> at 
> kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCleaner.scala:400)
> at scala.collection.immutable.List.foreach(List.scala:381)
> at kafka.log.Cleaner.cleanSegments(LogCleaner.scala:400)
> at kafka.lo

[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2016-12-09 Thread Robert Schumann (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15736628#comment-15736628
 ] 

Robert Schumann commented on KAFKA-4497:


Catching the right exception and continuing is a valid point, but I would like 
to understand, how the cleaner writes such a broken timeindex.cleaned file from 
an obviously ok-ish timeindex file, e.g. comparing the real timeindex file:
{noformat}
file:[/var/lib/kafka/xdc_listing-status-v2-1/.timeindex] 
raw:[01589c46b0f507bb] timestamp:[Fri Nov 25 17:17:08 CET 2016] 
offset:[1979]
file:[/var/lib/kafka/xdc_listing-status-v2-1/.timeindex] 
raw:[01589c474b8a07c3] timestamp:[Fri Nov 25 17:17:47 CET 2016] 
offset:[1987]
file:[/var/lib/kafka/xdc_listing-status-v2-1/.timeindex] 
raw:[01589c48921807cd] timestamp:[Fri Nov 25 17:19:11 CET 2016] 
offset:[1997]
file:[/var/lib/kafka/xdc_listing-status-v2-1/.timeindex] 
raw:[01589c48db0c07d5] timestamp:[Fri Nov 25 17:19:30 CET 2016] 
offset:[2005]
file:[/var/lib/kafka/xdc_listing-status-v2-1/.timeindex] 
raw:[01589c49283807de] timestamp:[Fri Nov 25 17:19:49 CET 2016] 
offset:[2014]
file:[/var/lib/kafka/xdc_listing-status-v2-1/.timeindex] 
raw:[01589c496fcf07e6] timestamp:[Fri Nov 25 17:20:08 CET 2016] 
offset:[2022]
file:[/var/lib/kafka/xdc_listing-status-v2-1/.timeindex] 
raw:[01589c4a47a507ef] timestamp:[Fri Nov 25 17:21:03 CET 2016] 
offset:[2031]
{noformat}

To the one the log cleaner produced:
{noformat}
file:[/var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned]
 raw:[01589c490ae1] timestamp:[Fri Nov 25 17:19:42 CET 2016] 
offset:[4294967295]
{noformat}

Why is it ending up writing a 2^32-1 as offset? This is obviously wrong and 
it's also not written in the original file. Does that mean the .log file is 
somehow unreadable to him? Does it even look at the old timeindex file?

Is it possible to disable the timeindex feature in 0.10.1.0? I'm stressing 
this, because a broken log cleaning for compacted topics is basically a 
production blocker. It fills up disks quickly and deleting something manually 
does not feel very consistency-inspiring.

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> x

[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2016-12-09 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15734657#comment-15734657
 ] 

huxi commented on KAFKA-4497:
-

A safe but naive solution is to reject this invalid index entry and captures 
InvalidOffsetException and IllegalStateException in order to not halt the 
cleaner thread. What do you think? [~becket_qin]

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
> at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:107)
> at kafka.log.LogSegment.append(LogSegment.scala:106)
> at kafka.log.Cleaner.cleanInto(LogCleaner.scala:518)
> at 
> kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCleaner.scala:404)
> at 
> kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCleaner.scala:400)
> at scala.collection.immutable.List.foreach(List.scala:381)

[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2016-12-08 Thread Jan Ehrhardt (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15732126#comment-15732126
 ] 

Jan Ehrhardt commented on KAFKA-4497:
-

[~huxi_2b]: The producer is sending data lz4 compressed. These properties are 
used:

{code}
"acks" = "all"
"compression.type" = "lz4"
"key.serializer" = "org.apache.kafka.common.serialization.StringSerializer"
"value.serializer" = "org.apache.kafka.common.serialization.StringSerializer"
{code}

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
> at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:107)
> at kafka.log.LogSegment.append(LogSegment.scala:106)
> at kafka.log.Cleaner.cleanInto(LogCleaner.scala:518)
> at 
> kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCleaner.scala:404)
> at 
> kafka.log.Cleaner$$anonfun$cle

[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2016-12-08 Thread Robert Schumann (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15732091#comment-15732091
 ] 

Robert Schumann commented on KAFKA-4497:


[~huxi_2b]: Yes, AFAIK we enabled lz4 compression on the producer side (but not 
on the cluster itself). I think [~jehrhardt] can help out with more infos.

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
> at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:107)
> at kafka.log.LogSegment.append(LogSegment.scala:106)
> at kafka.log.Cleaner.cleanInto(LogCleaner.scala:518)
> at 
> kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCleaner.scala:404)
> at 
> kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCleaner.scala:400)
> at scala.collection.immutable.List.foreach(List.scala:381)
> at kafka.log.C

[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2016-12-08 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15731654#comment-15731654
 ] 

huxi commented on KAFKA-4497:
-

[~roschumann] Did you enable compression for topic 'xdc_listing-status-v2' ?

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
> at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:107)
> at kafka.log.LogSegment.append(LogSegment.scala:106)
> at kafka.log.Cleaner.cleanInto(LogCleaner.scala:518)
> at 
> kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCleaner.scala:404)
> at 
> kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCleaner.scala:400)
> at scala.collection.immutable.List.foreach(List.scala:381)
> at kafka.log.Cleaner.cleanSegments(LogCleaner.scala:400)
> at kafka.log.Cleaner$$anonfun$clean$4.apply(LogC

[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2016-12-08 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15731588#comment-15731588
 ] 

huxi commented on KAFKA-4497:
-

LogCleaner.cleanOrSleep only captures LogCleaningAbortedException exception, so 
it has to throw up the InvalidOffsetException to the caller on the stack and it 
fially halts the log cleaner thread. Still figure out why you run into this 
problem.

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
> at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:107)
> at kafka.log.LogSegment.append(LogSegment.scala:106)
> at kafka.log.Cleaner.cleanInto(LogCleaner.scala:518)
> at 
> kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCleaner.scala:404)
> at 
> kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCleaner.scala:400)
> at scala.coll

[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2016-12-06 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15726651#comment-15726651
 ] 

Ismael Juma commented on KAFKA-4497:


cc [~becket_qin]

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
> at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:107)
> at kafka.log.LogSegment.append(LogSegment.scala:106)
> at kafka.log.Cleaner.cleanInto(LogCleaner.scala:518)
> at 
> kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCleaner.scala:404)
> at 
> kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCleaner.scala:400)
> at scala.collection.immutable.List.foreach(List.scala:381)
> at kafka.log.Cleaner.cleanSegments(LogCleaner.scala:400)
> at kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:364)
> at kafka.log.Clean