[
https://issues.apache.org/jira/browse/KAFKA-527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13728452#comment-13728452
]
Jay Kreps commented on KAFKA-527:
---------------------------------
To test performance checkout trunk and do
./sbt package test:package
./bin/kafka-run-class.sh kafka.TestLinearWriteSpeed --bytes 1007483648 --files
1 --log --message-size 4096 --size 109600 --compression snappy
./bin/kafka-run-class.sh kafka.TestLinearWriteSpeed --bytes 1007483648 --files
1 --log --message-size 4096 --size 109600 --compression none
Performance (on my laptop):
none: 75 MB/sec
snappy: 18 MB/sec
Why is this so slow? After all snappy claims a roundtrip performance of around
200MB/sec...
If you look at the attached hprof traces without compression you see the
following:
1 79.87% 79.87% 2004 300968 sun.nio.ch.FileDispatcher.write0
2 5.38% 85.25% 135 300978 kafka.utils.Utils$.crc32
3 5.06% 90.31% 127 301074 sun.nio.ch.FileChannelImpl.force0
4 1.79% 92.11% 45 301075 java.nio.MappedByteBuffer.force0
I.e. 80% goes to writing, 5 percent goes to computing crcs, and about 7% goes
to flushing, then a long tail
If you look at the same trace with compression you expect to see that all the
time goes to compressing stuff, right? Wrong:
1 16.44% 16.44% 807 301044 org.xerial.snappy.SnappyNative.arrayCopy
2 14.81% 31.24% 727 301073 java.util.Arrays.copyOf
3 9.61% 40.86% 472 301053 org.xerial.snappy.SnappyNative.rawCompress
4 7.45% 48.31% 366 301084 org.xerial.snappy.SnappyNative.rawUncompress
5 5.60% 53.91% 275 301063 java.io.ByteArrayOutputStream.<init>
6 5.13% 59.04% 252 301090 sun.nio.ch.FileDispatcher.write0
7 4.32% 63.36% 212 301074 java.nio.HeapByteBuffer.<init>
8 3.97% 67.33% 195 301049 java.util.Arrays.copyOf
9 3.83% 71.16% 188 301070 org.xerial.snappy.SnappyNative.arrayCopy
10 3.22% 74.38% 158 301068 org.xerial.snappy.SnappyNative.rawUncompress
11 2.10% 76.48% 103 301065 org.xerial.snappy.SnappyNative.arrayCopy
12 1.91% 78.39% 94 301089 java.nio.HeapByteBuffer.<init>
If you tally this up you see that about 50% of the time is going to array
copying and allocation and a mere 22% going to compression with the actual cost
of the write knocked down to 5%.
Pretty sad.
> Compression support does numerous byte copies
> ---------------------------------------------
>
> Key: KAFKA-527
> URL: https://issues.apache.org/jira/browse/KAFKA-527
> Project: Kafka
> Issue Type: Bug
> Reporter: Jay Kreps
> Attachments: java.hprof.no-compression.txt, java.hprof.snappy.text
>
>
> The data path for compressing or decompressing messages is extremely
> inefficient. We do something like 7 (?) complete copies of the data, often
> for simple things like adding a 4 byte size to the front. I am not sure how
> this went by unnoticed.
> This is likely the root cause of the performance issues we saw in doing bulk
> recompression of data in mirror maker.
> The mismatch between the InputStream and OutputStream interfaces and the
> Message/MessageSet interfaces which are based on byte buffers is the cause of
> many of these.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira