[jira] [Commented] (KAFKA-527) Compression support does numerous byte copies

Jay Kreps (JIRA) Sun, 08 Mar 2015 13:12:07 -0700

    [ 
https://issues.apache.org/jira/browse/KAFKA-527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14352227#comment-14352227
 ]


Jay Kreps commented on KAFKA-527:
---------------------------------

The clients already use MemoryRecords, so 0.8.2 and 0.8.3 will give the 
speed-up to people uses the clients. I think the question is how best to get 
the perf improvement to the server which should be largely independent.

Guozhang is correct that moving the server to MemoryRecords should be our long 
term plan and is the end-state we want. However the Message interface is fairly 
heavily used inside kafka.log so this would be a very large change to those 
classes. We haven't had a real discussion about how we would go about this and 
I don't think there is really a timeline. Several options I see:
1. We could do Yasu and Guozhang's fixes now: they are limited in scope, 
compression is a painpoint now, and we have lots of things in flight right now.
2. We could do a larger conversion of kafka.log to move it off 
Message/MessageSet/FileMessageSet/ByteBufferMessageSet as Guozhang proposes. 
This would be a fairly big refactoring, as there are a number of things tied to 
the MessageSet interface that would all have to move, and there is a 
significant amount of test code so this would be a big change. However this is 
certainly where we want to end up.
3. We could decide that we actually prefer java code, and given that the a 
significant chunk of the common code has to be in Java we should start moving 
chunks of the server as well. We had talked about this before but I don't think 
we should start until we have a real plan to finish. But anyhow if we did that 
we would say instead of just migrating the server from 
Message/MessageSet/FileMessageSet/ByteBufferMessageSet we would also just 
wholesale move the log subpackage to java as the first step in a larger 
migration. The argument both for and against this would be that instead of 
doing two rewrites, one to change interfaces, and a second to move scala=>java 
we could just do both at the same time.

> Compression support does numerous byte copies
> ---------------------------------------------
>
>                 Key: KAFKA-527
>                 URL: https://issues.apache.org/jira/browse/KAFKA-527
>             Project: Kafka
>          Issue Type: Bug
>          Components: compression
>            Reporter: Jay Kreps
>            Assignee: Jay Kreps
>            Priority: Critical
>         Attachments: KAFKA-527.message-copy.history, KAFKA-527.patch, 
> java.hprof.no-compression.txt, java.hprof.snappy.text
>
>
> The data path for compressing or decompressing messages is extremely 
> inefficient. We do something like 7 (?) complete copies of the data, often 
> for simple things like adding a 4 byte size to the front. I am not sure how 
> this went by unnoticed.
> This is likely the root cause of the performance issues we saw in doing bulk 
> recompression of data in mirror maker.
> The mismatch between the InputStream and OutputStream interfaces and the 
> Message/MessageSet interfaces which are based on byte buffers is the cause of 
> many of these.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-527) Compression support does numerous byte copies

Reply via email to