[
https://issues.apache.org/jira/browse/KAFKA-5150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16016146#comment-16016146
]
ASF GitHub Bot commented on KAFKA-5150:
---------------------------------------
GitHub user xvrl opened a pull request:
https://github.com/apache/kafka/pull/3090
KAFKA-5150 reduce lz4 decompression overhead - Backport to 0.10.2.x
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/xvrl/kafka kafka-5150-0.10
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/kafka/pull/3090.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3090
----
commit 265ee89ec76b223eb97c6b364260443346e6dac1
Author: Xavier Léauté <[email protected]>
Date: 2017-05-03T17:03:44Z
KAFKA-5150 reduce lz4 decompression overhead
- reuse decompression buffers, keeping one per thread
- switch lz4 input stream to operate directly on ByteBuffers
- more tests with both compressible / incompressible data, multiple
blocks, and various other combinations to increase code coverage
- fixes bug that would cause EOFException instead of invalid block size
for invalid incompressible blocks
commit cef091d0353a8a1f45ac913750f1e0dba04d7ab1
Author: Xavier Léauté <[email protected]>
Date: 2017-05-05T22:18:55Z
avoid exception when reaching end of batch
----
> LZ4 decompression is 4-5x slower than Snappy on small batches / messages
> ------------------------------------------------------------------------
>
> Key: KAFKA-5150
> URL: https://issues.apache.org/jira/browse/KAFKA-5150
> Project: Kafka
> Issue Type: Bug
> Components: consumer
> Affects Versions: 0.8.2.2, 0.9.0.1, 0.11.0.0, 0.10.2.1
> Reporter: Xavier Léauté
> Assignee: Xavier Léauté
> Fix For: 0.11.0.0
>
>
> I benchmarked RecordsIteratorDeepRecordsIterator instantiation on small batch
> sizes with small messages after observing some performance bottlenecks in the
> consumer.
> For batch sizes of 1 with messages of 100 bytes, LZ4 heavily underperforms
> compared to Snappy (see benchmark below). Most of our time is currently spent
> allocating memory blocks in KafkaLZ4BlockInputStream, due to the fact that we
> default to larger 64kB block sizes. Some quick testing shows we could improve
> performance by almost an order of magnitude for small batches and messages if
> we re-used buffers between instantiations of the input stream.
> [Benchmark
> Code|https://github.com/xvrl/kafka/blob/small-batch-lz4-benchmark/clients/src/test/java/org/apache/kafka/common/record/DeepRecordsIteratorBenchmark.java#L86]
> {code}
> Benchmark (compressionType)
> (messageSize) Mode Cnt Score Error Units
> DeepRecordsIteratorBenchmark.measureSingleMessage LZ4
> 100 thrpt 20 84802.279 ± 1983.847 ops/s
> DeepRecordsIteratorBenchmark.measureSingleMessage SNAPPY
> 100 thrpt 20 407585.747 ± 9877.073 ops/s
> DeepRecordsIteratorBenchmark.measureSingleMessage NONE
> 100 thrpt 20 579141.634 ± 18482.093 ops/s
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)