mjsax commented on code in PR #14322: URL: https://github.com/apache/kafka/pull/14322#discussion_r1317996699
########## docs/design.html: ########## @@ -136,8 +136,10 @@ <h4 class="anchor-heading"><a id="design_compression" class="anchor-link"></a><a the user can always compress its messages one at a time without any support needed from Kafka, but this can lead to very poor compression ratios as much of the redundancy is due to repetition between messages of the same type (e.g. field names in JSON or user agents in web logs or common string values). Efficient compression requires compressing multiple messages together rather than compressing each message individually. <p> - Kafka supports this with an efficient batching format. A batch of messages can be clumped together compressed and sent to the server in this form. This batch of messages will be written in compressed form and will - remain compressed in the log and will only be decompressed by the consumer. + Kafka supports this with an efficient batching format. A batch of messages can be grouped together, compressed, and sent to the server in this form. The broker decompresses the batch in order to validate it. For + example, it validates that the number of records in the batch is same as what batch header states. The broker may also potentially modify the batch (e.g., if the topic is compacted, the broker will filter out Review Comment: Yeah, the sentence sounds as if the broker would perform a compaction, what from my understanding won't be the case -- my understanding is, that the broker would never _modify_ a batch (it might re-compress is with a different compression-format though, depending on broker/topic configs). For compacted topics and null-keys, the batch would be rejected with an error message back to the producer. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org