[ 
https://issues.apache.org/jira/browse/KAFKA-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olson,Andrew updated KAFKA-2189:
--------------------------------
    Component/s: compression
                 build

> Snappy compression of message batches less efficient in 0.8.2.1
> ---------------------------------------------------------------
>
>                 Key: KAFKA-2189
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2189
>             Project: Kafka
>          Issue Type: Bug
>          Components: build, compression, log
>    Affects Versions: 0.8.2.1
>            Reporter: Olson,Andrew
>            Assignee: Jay Kreps
>
> We are using snappy compression and noticed a fairly substantial increase 
> (about 2.25x) in log filesystem space consumption after upgrading a Kafka 
> cluster from 0.8.1.1 to 0.8.2.1. We found that this is caused by messages 
> being seemingly recompressed individually (or possibly with a much smaller 
> buffer or dictionary?) instead of as a batch as sent by producers. We 
> eventually tracked down the change in compression ratio/scope to this [1] 
> commit that updated the snappy version from 1.0.5 to 1.1.1.3. The Kafka 
> client version does not appear to be relevant as we can reproduce this with 
> both the 0.8.1.1 and 0.8.2.1 Producer.
> Here are the log files from our troubleshooting that contain the same set of 
> 1000 messages, for batch sizes of 1, 10, 100, and 1000. f9d9b was the last 
> commit with 0.8.1.1-like behavior prior to f5ab8 introducing the issue.
> {noformat}
> -rw-rw-r-- 1 kafka kafka 404967 May 12 11:45 
> /var/kafka2/f9d9b-batch-1-0/00000000000000000000.log
> -rw-rw-r-- 1 kafka kafka 119951 May 12 11:45 
> /var/kafka2/f9d9b-batch-10-0/00000000000000000000.log
> -rw-rw-r-- 1 kafka kafka  89645 May 12 11:45 
> /var/kafka2/f9d9b-batch-100-0/00000000000000000000.log
> -rw-rw-r-- 1 kafka kafka  88279 May 12 11:45 
> /var/kafka2/f9d9b-batch-1000-0/00000000000000000000.log
> -rw-rw-r-- 1 kafka kafka 402837 May 12 11:41 
> /var/kafka2/f5ab8-batch-1-0/00000000000000000000.log
> -rw-rw-r-- 1 kafka kafka 382437 May 12 11:41 
> /var/kafka2/f5ab8-batch-10-0/00000000000000000000.log
> -rw-rw-r-- 1 kafka kafka 364791 May 12 11:41 
> /var/kafka2/f5ab8-batch-100-0/00000000000000000000.log
> -rw-rw-r-- 1 kafka kafka 380693 May 12 11:41 
> /var/kafka2/f5ab8-batch-1000-0/00000000000000000000.log
> {noformat}
> [1] 
> https://github.com/apache/kafka/commit/f5ab8e1780cf80f267906e3259ad4f9278c32d28
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to