[
https://issues.apache.org/jira/browse/KAFKA-527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14352227#comment-14352227
]
Jay Kreps commented on KAFKA-527:
---------------------------------
The clients already use MemoryRecords, so 0.8.2 and 0.8.3 will give the
speed-up to people uses the clients. I think the question is how best to get
the perf improvement to the server which should be largely independent.
Guozhang is correct that moving the server to MemoryRecords should be our long
term plan and is the end-state we want. However the Message interface is fairly
heavily used inside kafka.log so this would be a very large change to those
classes. We haven't had a real discussion about how we would go about this and
I don't think there is really a timeline. Several options I see:
1. We could do Yasu and Guozhang's fixes now: they are limited in scope,
compression is a painpoint now, and we have lots of things in flight right now.
2. We could do a larger conversion of kafka.log to move it off
Message/MessageSet/FileMessageSet/ByteBufferMessageSet as Guozhang proposes.
This would be a fairly big refactoring, as there are a number of things tied to
the MessageSet interface that would all have to move, and there is a
significant amount of test code so this would be a big change. However this is
certainly where we want to end up.
3. We could decide that we actually prefer java code, and given that the a
significant chunk of the common code has to be in Java we should start moving
chunks of the server as well. We had talked about this before but I don't think
we should start until we have a real plan to finish. But anyhow if we did that
we would say instead of just migrating the server from
Message/MessageSet/FileMessageSet/ByteBufferMessageSet we would also just
wholesale move the log subpackage to java as the first step in a larger
migration. The argument both for and against this would be that instead of
doing two rewrites, one to change interfaces, and a second to move scala=>java
we could just do both at the same time.
> Compression support does numerous byte copies
> ---------------------------------------------
>
> Key: KAFKA-527
> URL: https://issues.apache.org/jira/browse/KAFKA-527
> Project: Kafka
> Issue Type: Bug
> Components: compression
> Reporter: Jay Kreps
> Assignee: Jay Kreps
> Priority: Critical
> Attachments: KAFKA-527.message-copy.history, KAFKA-527.patch,
> java.hprof.no-compression.txt, java.hprof.snappy.text
>
>
> The data path for compressing or decompressing messages is extremely
> inefficient. We do something like 7 (?) complete copies of the data, often
> for simple things like adding a 4 byte size to the front. I am not sure how
> this went by unnoticed.
> This is likely the root cause of the performance issues we saw in doing bulk
> recompression of data in mirror maker.
> The mismatch between the InputStream and OutputStream interfaces and the
> Message/MessageSet interfaces which are based on byte buffers is the cause of
> many of these.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)