[ 
https://issues.apache.org/jira/browse/KAFKA-527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347478#comment-14347478
 ] 

Yasuhiro Matsuda commented on KAFKA-527:
----------------------------------------

This patch introduces BufferingOutputStream, an alternative for 
ByteArrayOutputStream. It is backed by a chain of byte arrays, so it does not 
copy bytes when increasing its capacity. Also, it has a method that writes the 
content to ByteBuffer directly, so there is no need to create an array instance 
to transfer the content to ByteBuffer. Lastly, it has a deferred write, which 
means that you reserve a number of bytes before knowing the value and fill it 
later. In MessageWriter (a new class), it is used for writing the CRC value and 
the payload length.

On laptop,I tested the performance using TestLinearWriteSpeed with snappy.

Previously
26.64786026813998 MB per sec

With the patch
35.78401869390889 MB per sec

The improvement is about 34% better throughput.

> Compression support does numerous byte copies
> ---------------------------------------------
>
>                 Key: KAFKA-527
>                 URL: https://issues.apache.org/jira/browse/KAFKA-527
>             Project: Kafka
>          Issue Type: Bug
>          Components: compression
>            Reporter: Jay Kreps
>            Assignee: Yasuhiro Matsuda
>            Priority: Critical
>         Attachments: KAFKA-527.message-copy.history, KAFKA-527.patch, 
> java.hprof.no-compression.txt, java.hprof.snappy.text
>
>
> The data path for compressing or decompressing messages is extremely 
> inefficient. We do something like 7 (?) complete copies of the data, often 
> for simple things like adding a 4 byte size to the front. I am not sure how 
> this went by unnoticed.
> This is likely the root cause of the performance issues we saw in doing bulk 
> recompression of data in mirror maker.
> The mismatch between the InputStream and OutputStream interfaces and the 
> Message/MessageSet interfaces which are based on byte buffers is the cause of 
> many of these.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to