[ 
https://issues.apache.org/jira/browse/KAFKA-5150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16016146#comment-16016146
 ] 

ASF GitHub Bot commented on KAFKA-5150:
---------------------------------------

GitHub user xvrl opened a pull request:

    https://github.com/apache/kafka/pull/3090

    KAFKA-5150 reduce lz4 decompression overhead - Backport to 0.10.2.x

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/xvrl/kafka kafka-5150-0.10

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/kafka/pull/3090.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3090
    
----
commit 265ee89ec76b223eb97c6b364260443346e6dac1
Author: Xavier Léauté <xav...@confluent.io>
Date:   2017-05-03T17:03:44Z

    KAFKA-5150 reduce lz4 decompression overhead
    
    - reuse decompression buffers, keeping one per thread
    - switch lz4 input stream to operate directly on ByteBuffers
    - more tests with both compressible / incompressible data, multiple
      blocks, and various other combinations to increase code coverage
    - fixes bug that would cause EOFException instead of invalid block size
      for invalid incompressible blocks

commit cef091d0353a8a1f45ac913750f1e0dba04d7ab1
Author: Xavier Léauté <xav...@confluent.io>
Date:   2017-05-05T22:18:55Z

    avoid exception when reaching end of batch

----


> LZ4 decompression is 4-5x slower than Snappy on small batches / messages
> ------------------------------------------------------------------------
>
>                 Key: KAFKA-5150
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5150
>             Project: Kafka
>          Issue Type: Bug
>          Components: consumer
>    Affects Versions: 0.8.2.2, 0.9.0.1, 0.11.0.0, 0.10.2.1
>            Reporter: Xavier Léauté
>            Assignee: Xavier Léauté
>             Fix For: 0.11.0.0
>
>
> I benchmarked RecordsIteratorDeepRecordsIterator instantiation on small batch 
> sizes with small messages after observing some performance bottlenecks in the 
> consumer. 
> For batch sizes of 1 with messages of 100 bytes, LZ4 heavily underperforms 
> compared to Snappy (see benchmark below). Most of our time is currently spent 
> allocating memory blocks in KafkaLZ4BlockInputStream, due to the fact that we 
> default to larger 64kB block sizes. Some quick testing shows we could improve 
> performance by almost an order of magnitude for small batches and messages if 
> we re-used buffers between instantiations of the input stream.
> [Benchmark 
> Code|https://github.com/xvrl/kafka/blob/small-batch-lz4-benchmark/clients/src/test/java/org/apache/kafka/common/record/DeepRecordsIteratorBenchmark.java#L86]
> {code}
> Benchmark                                              (compressionType)  
> (messageSize)   Mode  Cnt       Score       Error  Units
> DeepRecordsIteratorBenchmark.measureSingleMessage                    LZ4      
>       100  thrpt   20   84802.279 ±  1983.847  ops/s
> DeepRecordsIteratorBenchmark.measureSingleMessage                 SNAPPY      
>       100  thrpt   20  407585.747 ±  9877.073  ops/s
> DeepRecordsIteratorBenchmark.measureSingleMessage                   NONE      
>       100  thrpt   20  579141.634 ± 18482.093  ops/s
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to