[
https://issues.apache.org/jira/browse/KAFKA-527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14352290#comment-14352290
]
Yasuhiro Matsuda commented on KAFKA-527:
----------------------------------------
>>This patch is mainly aimed at #1 above
If you read the patch carefully, there are more for the compression part. It
avoids copies to an intermediate buffer (byte array) when we do
ByteArrayOutputStream to ByteBuffer, also a copy form ByteBuffer to ByteBuffer
when we create a MessageSet from a Message at the end of compression.
For the decompression part, your iterator patch looks nice. It seems to make
ByteBufferSessageSet.decompress obsolete if you clean up all callers by using
your iterator.
> Compression support does numerous byte copies
> ---------------------------------------------
>
> Key: KAFKA-527
> URL: https://issues.apache.org/jira/browse/KAFKA-527
> Project: Kafka
> Issue Type: Bug
> Components: compression
> Reporter: Jay Kreps
> Assignee: Yasuhiro Matsuda
> Priority: Critical
> Attachments: KAFKA-527.message-copy.history, KAFKA-527.patch,
> java.hprof.no-compression.txt, java.hprof.snappy.text
>
>
> The data path for compressing or decompressing messages is extremely
> inefficient. We do something like 7 (?) complete copies of the data, often
> for simple things like adding a 4 byte size to the front. I am not sure how
> this went by unnoticed.
> This is likely the root cause of the performance issues we saw in doing bulk
> recompression of data in mirror maker.
> The mismatch between the InputStream and OutputStream interfaces and the
> Message/MessageSet interfaces which are based on byte buffers is the cause of
> many of these.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)