Hey folks, I noticed that the code path for compressed messages does a very large number of data copies: 1. One to turn message contents to Messages 2. Then again to write all the messages into a ByteBuffer 3. Then this ByteBuffer is copied into an intermediate buffer and from there into an unsized ByteArrayOutputStream. Since it is unsized this may internally resize and copy several times over as the internal buffer grows. 4. Then again to copy the final contents of the ByteArrayOutputStream into a Message 5. Then again to another ByteBuffer to add the 4 byte size delimeter
Since this is really on the core data path, I would like to ask people to be a little more careful! A few of these are easy to fix and can be eliminated as part of KAFKA-506. I filed a bug to to optimize this further: https://issues.apache.org/jira/browse/KAFKA-527 -Jay