[ https://issues.apache.org/jira/browse/KAFKA-527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347478#comment-14347478 ]
Yasuhiro Matsuda commented on KAFKA-527: ---------------------------------------- This patch introduces BufferingOutputStream, an alternative for ByteArrayOutputStream. It is backed by a chain of byte arrays, so it does not copy bytes when increasing its capacity. Also, it has a method that writes the content to ByteBuffer directly, so there is no need to create an array instance to transfer the content to ByteBuffer. Lastly, it has a deferred write, which means that you reserve a number of bytes before knowing the value and fill it later. In MessageWriter (a new class), it is used for writing the CRC value and the payload length. On laptop,I tested the performance using TestLinearWriteSpeed with snappy. Previously 26.64786026813998 MB per sec With the patch 35.78401869390889 MB per sec The improvement is about 34% better throughput. > Compression support does numerous byte copies > --------------------------------------------- > > Key: KAFKA-527 > URL: https://issues.apache.org/jira/browse/KAFKA-527 > Project: Kafka > Issue Type: Bug > Components: compression > Reporter: Jay Kreps > Assignee: Yasuhiro Matsuda > Priority: Critical > Attachments: KAFKA-527.message-copy.history, KAFKA-527.patch, > java.hprof.no-compression.txt, java.hprof.snappy.text > > > The data path for compressing or decompressing messages is extremely > inefficient. We do something like 7 (?) complete copies of the data, often > for simple things like adding a 4 byte size to the front. I am not sure how > this went by unnoticed. > This is likely the root cause of the performance issues we saw in doing bulk > recompression of data in mirror maker. > The mismatch between the InputStream and OutputStream interfaces and the > Message/MessageSet interfaces which are based on byte buffers is the cause of > many of these. -- This message was sent by Atlassian JIRA (v6.3.4#6332)