[
https://issues.apache.org/jira/browse/KAFKA-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sriharsha Chintalapani updated KAFKA-1670:
------------------------------------------
Fix Version/s: 0.8.2
> Corrupt log files for segment.bytes values close to Int.MaxInt
> --------------------------------------------------------------
>
> Key: KAFKA-1670
> URL: https://issues.apache.org/jira/browse/KAFKA-1670
> Project: Kafka
> Issue Type: Bug
> Affects Versions: 0.8.1.1
> Reporter: Ryan Berdeen
> Assignee: Sriharsha Chintalapani
> Priority: Blocker
> Fix For: 0.8.2
>
> Attachments: KAFKA-1670.patch
>
>
> The maximum value for the topic-level config {{segment.bytes}} is
> {{Int.MaxInt}} (2147483647). *Using this value causes brokers to corrupt
> their log files, leaving them unreadable.*
> We set {{segment.bytes}} to {{2122317824}} which is well below the maximum.
> One by one, the ISR of all partitions shrunk to 1. Brokers would crash when
> restarted, attempting to read from a negative offset in a log file. After
> discovering that many segment files had grown to 4GB or more, we were forced
> to shut down our *entire production Kafka cluster* for several hours while we
> split all segment files into 1GB chunks.
> Looking into the {{kafka.log}} code, the {{segment.bytes}} parameter is used
> inconsistently. It is treated as a *soft* maximum for the size of the segment
> file
> (https://github.com/apache/kafka/blob/0.8.1.1/core/src/main/scala/kafka/log/LogConfig.scala#L26)
> with logs rolled only after
> (https://github.com/apache/kafka/blob/0.8.1.1/core/src/main/scala/kafka/log/Log.scala#L246)
> they exceed this value. However, much of the code that deals with log files
> uses *ints* to store the size of the file and the position in the file.
> Overflow of these ints leads the broker to append to the segments
> indefinitely, and to fail to read these segments for consuming or recovery.
> This is trivial to reproduce:
> {code}
> $ bin/kafka-topics.sh --topic segment-bytes-test --create
> --replication-factor 2 --partitions 1 --zookeeper zkhost:2181
> $ bin/kafka-topics.sh --topic segment-bytes-test --alter --config
> segment.bytes=2147483647 --zookeeper zkhost:2181
> $ yes "Int.MaxValue is a ridiculous bound on file size in 2014" |
> bin/kafka-console-producer.sh --broker-list localhost:6667 zkhost:2181
> --topic segment-bytes-test
> {code}
> After running for a few minutes, the log file is corrupt:
> {code}
> $ ls -lh data/segment-bytes-test-0/
> total 9.7G
> -rw-r--r-- 1 root root 10M Oct 3 19:39 00000000000000000000.index
> -rw-r--r-- 1 root root 9.7G Oct 3 19:39 00000000000000000000.log
> {code}
> We recovered the data from the log files using a simple Python script:
> https://gist.github.com/also/9f823d9eb9dc0a410796
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)