[jira] [Commented] (KAFKA-5431) LogCleaner stopped due to org.apache.kafka.common.errors.CorruptRecordException

Swathi Mocharla (JIRA) Sun, 16 Dec 2018 22:56:04 -0800


    [ 
https://issues.apache.org/jira/browse/KAFKA-5431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16722728#comment-16722728
 ]


Swathi Mocharla commented on KAFKA-5431:
----------------------------------------

hi [~huxi_2b], we are currently on .11.0.0 and are seeing this issue with the 
default value of log.preallocate which is false. We have a large number of 
segement files in the __consumer_offsets that are not getting compacted. 

{{[2018-12-12 00:11:04,597] INFO Cleaner 0: Building offset map for log 
__consumer_offsets-45 for 124 segments in offset range [16446991, 85239736). 
(kafka.log.LogCleaner)}}
{{[2018-12-12 00:11:04,831] ERROR [kafka-log-cleaner-thread-0]: Error due to 
(kafka.log.LogCleaner)}}
{{org.apache.kafka.common.errors.CorruptRecordException: Record size is less 
than the minimum record overhead (14)}}
{{[2018-12-12 00:11:04,837] INFO [kafka-log-cleaner-thread-0]: Stopped 
(kafka.log.LogCleaner)}}

 

We previously deleted the segment files and restarted our consumers. But this 
didn't help and we are running towards a disk full issue. Can you please help.

> LogCleaner stopped due to 
> org.apache.kafka.common.errors.CorruptRecordException
> -------------------------------------------------------------------------------
>
>                 Key: KAFKA-5431
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5431
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.10.2.1
>            Reporter: Carsten Rietz
>            Assignee: huxihx
>            Priority: Major
>              Labels: reliability
>             Fix For: 0.11.0.1, 1.0.0
>
>
> Hey all,
> i have a strange problem with our uat cluster of 3 kafka brokers.
> the __consumer_offsets topic was replicated to two instances and our disks 
> ran full due to a wrong configuration of the log cleaner. We fixed the 
> configuration and updated from 0.10.1.1 to 0.10.2.1 .
> Today i increased the replication of the __consumer_offsets topic to 3 and 
> triggered replication to the third cluster via kafka-reassign-partitions.sh. 
> That went well but i get many errors like
> {code}
> [2017-06-12 09:59:50,342] ERROR Found invalid messages during fetch for 
> partition [__consumer_offsets,18] offset 0 error Record size is less than the 
> minimum record overhead (14) (kafka.server.ReplicaFetcherThread)
> [2017-06-12 09:59:50,342] ERROR Found invalid messages during fetch for 
> partition [__consumer_offsets,24] offset 0 error Record size is less than the 
> minimum record overhead (14) (kafka.server.ReplicaFetcherThread)
> {code}
> Which i think are due to the full disk event.
> The log cleaner threads died on these wrong messages:
> {code}
> [2017-06-12 09:59:50,722] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> org.apache.kafka.common.errors.CorruptRecordException: Record size is less 
> than the minimum record overhead (14)
> [2017-06-12 09:59:50,722] INFO [kafka-log-cleaner-thread-0], Stopped  
> (kafka.log.LogCleaner)
> {code}
> Looking at the file is see that some are truncated and some are jsut empty:
> $ ls -lsh 00000000000000594653.log
> 0 -rw-r--r-- 1 user user 100M Jun 12 11:00 00000000000000594653.log
> Sadly i do not have the logs any more from the disk full event itsself.
> I have three questions:
> * What is the best way to clean this up? Deleting the old log files and 
> restarting the brokers?
> * Why did kafka not handle the disk full event well? Is this only affecting 
> the cleanup or may we also loose data?
> * Is this maybe caused by the combination of upgrade and disk full?
> And last but not least: Keep up the good work. Kafka is really performing 
> well while being easy to administer and has good documentation!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (KAFKA-5431) LogCleaner stopped due to org.apache.kafka.common.errors.CorruptRecordException

Reply via email to