Re: kafka 0.10.1 / log-cleaner stopped / timeindex issues

2016-12-06 Thread Ismael Juma
Hi, can you please file a JIRA ticket so this doesn't get lost?

Thanks,
Ismael

On 6 Dec 2016 5:06 pm, "Schumann,Robert"  wrote:

> Hi all,
>
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread
> with regards to the timeindex files. From the log of the log-cleaner we see
> after startup that it tries to cleanup a topic called xdc_listing-status-v2
> [1]. The topic is setup with log compaction [2] and the kafka cluster
> configuration has log.cleaner enabled [3]. Looking at the log and the newly
> created file [4], the cleaner seems to refer to tombstones prior to
> epoch_time=0 - maybe because he finds messages, which don’t have a
> timestamp at all (?). All producers and consumers are using 0.10.1 and the
> topics have been created completely new, so I’m not sure, where this issue
> would come from. The original timeindex file [5] seems to show only valid
> timestamps for the mentioned offsets. I would also like to mention that the
> issue happened in two independent datacenters at the same time, so I would
> rather expect an application/producer issue instead of random disk
> failures. We didn’t have the problem with 0.10.0 for around half a year, it
> appeared shortly after the upgrade to 0.10.1.
>
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02
> 16:35:50 CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET
> 1970” also confuses me a bit. Does that mean, it does not find any log
> segments which can be cleaned up or the last timestamp of the last log
> segment is somehow broken/missing?
>
> I’m also a bit wondering, why the log cleaner thread stops completely
> after an error with one topic. I would at least expect that it keeps on
> cleaning up other topics, but apparently it doesn’t do that, e.g. it’s not
> even cleaning the __consumer_offsets anymore.
>
> Does anybody have the same issues or can explain, what’s going on? Thanks
> for any help or suggestions.
>
> Cheers
> Robert
>
> [1]
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991).
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016,
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)...
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016)
> into 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due
> to  (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to
> slot 9 no larger than the last offset appended (11832) to
> /var/lib/kafka/xdc_listing-status-v2-1/.
> timeindex.cleaned.
> at kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(
> TimeIndex.scala:117)
> at kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.
> scala:107)
> at kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.
> scala:107)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
> at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:107)
> at kafka.log.LogSegment.append(LogSegment.scala:106)
> at kafka.log.Cleaner.cleanInto(LogCleaner.scala:518)
> at kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(
> LogCleaner.scala:404)
> at kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(
> LogCleaner.scala:400)
> at scala.collection.immutable.List.foreach(List.scala:381)
> at kafka.log.Cleaner.cleanSegments(LogCleaner.scala:400)
> at kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:364)
> at kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:363)
> at scala.collection.immutable.List.foreach(List.scala:381)
> at kafka.log.Cleaner.clean(LogCleaner.scala:363)
> at kafka.log.LogCleaner$CleanerThread.cleanOrSleep(
> LogCleaner.scala:239)
> at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:218)
> at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
> [2016-12-06 12:49:24,971] INFO [kafka-log-cleaner-thread-0], Stopped
> (kafka.log.LogCleaner)
>
> [2]
> $ kafka-configs-topics xdc_listing-status-v2 --describe
> Configs for topic 'xdc_listing-status-v2' are c

kafka 0.10.1 / log-cleaner stopped / timeindex issues

2016-12-06 Thread Schumann,Robert
Hi all,

we are facing an issue with latest kafka 0.10.1 and the log cleaner thread with 
regards to the timeindex files. From the log of the log-cleaner we see after 
startup that it tries to cleanup a topic called xdc_listing-status-v2 [1]. The 
topic is setup with log compaction [2] and the kafka cluster configuration has 
log.cleaner enabled [3]. Looking at the log and the newly created file [4], the 
cleaner seems to refer to tombstones prior to epoch_time=0 - maybe because he 
finds messages, which don’t have a timestamp at all (?). All producers and 
consumers are using 0.10.1 and the topics have been created completely new, so 
I’m not sure, where this issue would come from. The original timeindex file [5] 
seems to show only valid timestamps for the mentioned offsets. I would also 
like to mention that the issue happened in two independent datacenters at the 
same time, so I would rather expect an application/producer issue instead of 
random disk failures. We didn’t have the problem with 0.10.0 for around half a 
year, it appeared shortly after the upgrade to 0.10.1.

The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
confuses me a bit. Does that mean, it does not find any log segments which can 
be cleaned up or the last timestamp of the last log segment is somehow 
broken/missing?

I’m also a bit wondering, why the log cleaner thread stops completely after an 
error with one topic. I would at least expect that it keeps on cleaning up 
other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
the __consumer_offsets anymore.

Does anybody have the same issues or can explain, what’s going on? Thanks for 
any help or suggestions.

Cheers
Robert

[1]
[2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
[2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
(kafka.log.LogCleaner)
[2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
xdc_listing-status-v2-1. (kafka.log.LogCleaner)
[2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
xdc_listing-status-v2-1... (kafka.log.LogCleaner)
[2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
(kafka.log.LogCleaner)
[2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
[2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log xdc_listing-status-v2-1 
(cleaning prior to Fri Dec 02 16:35:50 CET 2016, discarding tombstones prior to 
Thu Jan 01 01:00:00 CET 1970)... (kafka.log.LogCleaner)
[2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
0, retaining deletes. (kafka.log.LogCleaner)
[2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
(kafka.log.LogCleaner)
kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 9 
no larger than the last offset appended (11832) to 
/var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
at 
kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
at kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
at kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:107)
at kafka.log.LogSegment.append(LogSegment.scala:106)
at kafka.log.Cleaner.cleanInto(LogCleaner.scala:518)
at 
kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCleaner.scala:404)
at 
kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCleaner.scala:400)
at scala.collection.immutable.List.foreach(List.scala:381)
at kafka.log.Cleaner.cleanSegments(LogCleaner.scala:400)
at kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:364)
at kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:363)
at scala.collection.immutable.List.foreach(List.scala:381)
at kafka.log.Cleaner.clean(LogCleaner.scala:363)
at kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:239)
at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:218)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
[2016-12-06 12:49:24,971] INFO [kafka-log-cleaner-thread-0], Stopped  
(kafka.log.LogCleaner)

[2]
$ kafka-configs-topics xdc_listing-status-v2 --describe
Configs for topic 'xdc_listing-status-v2' are cleanup.policy=compact

[3]
broker.id=74
listeners = PLAINTEXT://0.0.0.0:9092,TRACE://localhost:9093
advertised.listeners = PLAINTEXT://broker2:9092
num.network.threads=2
num.io.threads=2
inter.broker.protocol.version=0.10.1.0
socket.send.buffer.bytes=1048576
socket.receive.b