[ 
https://issues.apache.org/jira/browse/KAFKA-527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355133#comment-14355133
 ] 

Guozhang Wang commented on KAFKA-527:
-------------------------------------

Hi Yasuhiro,

I thought for compressed writes, the linked list buffers in 
BufferingOutputStream still need to be copied to a newly allocated buffer (in 
line 54/55 of ByteBufferMessageSet) whereas for MemoryRecord, it append 
messages to the compressed stream in-place and no extra copy is required at the 
end of the writes, but I may misunderstood Scala's function-parameter syntax 
and please let me know if I did.

As for the migration plan, I agree that ByteBufferMessageSet replacement would 
not come in the near future, and we can definitely commit the patches now as 
compress / de-compress has been a pain for us.

> Compression support does numerous byte copies
> ---------------------------------------------
>
>                 Key: KAFKA-527
>                 URL: https://issues.apache.org/jira/browse/KAFKA-527
>             Project: Kafka
>          Issue Type: Bug
>          Components: compression
>            Reporter: Jay Kreps
>            Assignee: Yasuhiro Matsuda
>            Priority: Critical
>         Attachments: KAFKA-527.message-copy.history, KAFKA-527.patch, 
> java.hprof.no-compression.txt, java.hprof.snappy.text
>
>
> The data path for compressing or decompressing messages is extremely 
> inefficient. We do something like 7 (?) complete copies of the data, often 
> for simple things like adding a 4 byte size to the front. I am not sure how 
> this went by unnoticed.
> This is likely the root cause of the performance issues we saw in doing bulk 
> recompression of data in mirror maker.
> The mismatch between the InputStream and OutputStream interfaces and the 
> Message/MessageSet interfaces which are based on byte buffers is the cause of 
> many of these.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to