[ 
https://issues.apache.org/jira/browse/KAFKA-562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13473463#comment-13473463
 ] 

Jay Kreps commented on KAFKA-562:
---------------------------------

Okay, this is not a bug exactly, I was mistaken. Here is what is happening:

The leader receives one message at a time, gzip'd. The follower fetches chunks 
of multiple gzip'd messages.

The current logic is that when appending a message set we check if there are 
any compressed messages. If there are we need to uncompress all messages and 
re-compress with new offsets assigned. Because the follower is getting chunks 
of five messages at a time, it is compressing these together. The reason the 
follower logs are so much smaller is because they are batch compressed.

Not sure what the best thing to do here is. On one hand it is much nicer if the 
follower has byte-for-byte identical logs. On the other hand batch compression 
is a good thing.
                
> Non-failure System Test Log Segment File Checksums mismatched
> -------------------------------------------------------------
>
>                 Key: KAFKA-562
>                 URL: https://issues.apache.org/jira/browse/KAFKA-562
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: John Fung
>         Attachments: kafka-562-reproduce-issue.patch
>
>
> To reproduce this issue
> 1. Download 0.8 branch (reproduced in r1396343)
> 2. Apply the patch attached
> 3. Build Kafka under <kafka_home> by running "./sbt update package"
> 4. In the directory <kafka_home>/system_test, run "python -B 
> system_test_runner.py" and it will run the case "testcase_0002" which will 
> reproduce this issue.
> 5. The log segment files will be located in /tmp

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to