[ https://issues.apache.org/jira/browse/KAFKA-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gwen Shapira updated KAFKA-2189: -------------------------------- Fix Version/s: 0.8.2.2 > Snappy compression of message batches less efficient in 0.8.2.1 > --------------------------------------------------------------- > > Key: KAFKA-2189 > URL: https://issues.apache.org/jira/browse/KAFKA-2189 > Project: Kafka > Issue Type: Bug > Components: build, compression, log > Affects Versions: 0.8.2.1 > Reporter: Olson,Andrew > Assignee: Ismael Juma > Priority: Blocker > Labels: trivial > Fix For: 0.8.3, 0.8.2.2 > > Attachments: KAFKA-2189.patch > > > We are using snappy compression and noticed a fairly substantial increase > (about 2.25x) in log filesystem space consumption after upgrading a Kafka > cluster from 0.8.1.1 to 0.8.2.1. We found that this is caused by messages > being seemingly recompressed individually (or possibly with a much smaller > buffer or dictionary?) instead of as a batch as sent by producers. We > eventually tracked down the change in compression ratio/scope to this [1] > commit that updated the snappy version from 1.0.5 to 1.1.1.3. The Kafka > client version does not appear to be relevant as we can reproduce this with > both the 0.8.1.1 and 0.8.2.1 Producer. > Here are the log files from our troubleshooting that contain the same set of > 1000 messages, for batch sizes of 1, 10, 100, and 1000. f9d9b was the last > commit with 0.8.1.1-like behavior prior to f5ab8 introducing the issue. > {noformat} > -rw-rw-r-- 1 kafka kafka 404967 May 12 11:45 > /var/kafka2/f9d9b-batch-1-0/00000000000000000000.log > -rw-rw-r-- 1 kafka kafka 119951 May 12 11:45 > /var/kafka2/f9d9b-batch-10-0/00000000000000000000.log > -rw-rw-r-- 1 kafka kafka 89645 May 12 11:45 > /var/kafka2/f9d9b-batch-100-0/00000000000000000000.log > -rw-rw-r-- 1 kafka kafka 88279 May 12 11:45 > /var/kafka2/f9d9b-batch-1000-0/00000000000000000000.log > -rw-rw-r-- 1 kafka kafka 402837 May 12 11:41 > /var/kafka2/f5ab8-batch-1-0/00000000000000000000.log > -rw-rw-r-- 1 kafka kafka 382437 May 12 11:41 > /var/kafka2/f5ab8-batch-10-0/00000000000000000000.log > -rw-rw-r-- 1 kafka kafka 364791 May 12 11:41 > /var/kafka2/f5ab8-batch-100-0/00000000000000000000.log > -rw-rw-r-- 1 kafka kafka 380693 May 12 11:41 > /var/kafka2/f5ab8-batch-1000-0/00000000000000000000.log > {noformat} > [1] > https://github.com/apache/kafka/commit/f5ab8e1780cf80f267906e3259ad4f9278c32d28 > -- This message was sent by Atlassian JIRA (v6.3.4#6332)