[ https://issues.apache.org/jira/browse/KAFKA-562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13473463#comment-13473463 ]
Jay Kreps commented on KAFKA-562: --------------------------------- Okay, this is not a bug exactly, I was mistaken. Here is what is happening: The leader receives one message at a time, gzip'd. The follower fetches chunks of multiple gzip'd messages. The current logic is that when appending a message set we check if there are any compressed messages. If there are we need to uncompress all messages and re-compress with new offsets assigned. Because the follower is getting chunks of five messages at a time, it is compressing these together. The reason the follower logs are so much smaller is because they are batch compressed. Not sure what the best thing to do here is. On one hand it is much nicer if the follower has byte-for-byte identical logs. On the other hand batch compression is a good thing. > Non-failure System Test Log Segment File Checksums mismatched > ------------------------------------------------------------- > > Key: KAFKA-562 > URL: https://issues.apache.org/jira/browse/KAFKA-562 > Project: Kafka > Issue Type: Bug > Reporter: John Fung > Attachments: kafka-562-reproduce-issue.patch > > > To reproduce this issue > 1. Download 0.8 branch (reproduced in r1396343) > 2. Apply the patch attached > 3. Build Kafka under <kafka_home> by running "./sbt update package" > 4. In the directory <kafka_home>/system_test, run "python -B > system_test_runner.py" and it will run the case "testcase_0002" which will > reproduce this issue. > 5. The log segment files will be located in /tmp -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira