[ https://issues.apache.org/jira/browse/KAFKA-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14167223#comment-14167223 ]
Sriharsha Chintalapani commented on KAFKA-1670: ----------------------------------------------- [~guozhang] Sorry will fix those. > Corrupt log files for segment.bytes values close to Int.MaxInt > -------------------------------------------------------------- > > Key: KAFKA-1670 > URL: https://issues.apache.org/jira/browse/KAFKA-1670 > Project: Kafka > Issue Type: Bug > Affects Versions: 0.8.1.1 > Reporter: Ryan Berdeen > Assignee: Sriharsha Chintalapani > Priority: Blocker > Fix For: 0.8.2 > > Attachments: KAFKA-1670.patch, KAFKA-1670_2014-10-04_20:17:46.patch, > KAFKA-1670_2014-10-06_09:48:25.patch, KAFKA-1670_2014-10-07_13:39:13.patch, > KAFKA-1670_2014-10-07_13:49:10.patch, KAFKA-1670_2014-10-07_18:39:31.patch > > > The maximum value for the topic-level config {{segment.bytes}} is > {{Int.MaxInt}} (2147483647). *Using this value causes brokers to corrupt > their log files, leaving them unreadable.* > We set {{segment.bytes}} to {{2122317824}} which is well below the maximum. > One by one, the ISR of all partitions shrunk to 1. Brokers would crash when > restarted, attempting to read from a negative offset in a log file. After > discovering that many segment files had grown to 4GB or more, we were forced > to shut down our *entire production Kafka cluster* for several hours while we > split all segment files into 1GB chunks. > Looking into the {{kafka.log}} code, the {{segment.bytes}} parameter is used > inconsistently. It is treated as a *soft* maximum for the size of the segment > file > (https://github.com/apache/kafka/blob/0.8.1.1/core/src/main/scala/kafka/log/LogConfig.scala#L26) > with logs rolled only after > (https://github.com/apache/kafka/blob/0.8.1.1/core/src/main/scala/kafka/log/Log.scala#L246) > they exceed this value. However, much of the code that deals with log files > uses *ints* to store the size of the file and the position in the file. > Overflow of these ints leads the broker to append to the segments > indefinitely, and to fail to read these segments for consuming or recovery. > This is trivial to reproduce: > {code} > $ bin/kafka-topics.sh --topic segment-bytes-test --create > --replication-factor 2 --partitions 1 --zookeeper zkhost:2181 > $ bin/kafka-topics.sh --topic segment-bytes-test --alter --config > segment.bytes=2147483647 --zookeeper zkhost:2181 > $ yes "Int.MaxValue is a ridiculous bound on file size in 2014" | > bin/kafka-console-producer.sh --broker-list localhost:6667 zkhost:2181 > --topic segment-bytes-test > {code} > After running for a few minutes, the log file is corrupt: > {code} > $ ls -lh data/segment-bytes-test-0/ > total 9.7G > -rw-r--r-- 1 root root 10M Oct 3 19:39 00000000000000000000.index > -rw-r--r-- 1 root root 9.7G Oct 3 19:39 00000000000000000000.log > {code} > We recovered the data from the log files using a simple Python script: > https://gist.github.com/also/9f823d9eb9dc0a410796 -- This message was sent by Atlassian JIRA (v6.3.4#6332)