[jira] [Created] (LUCENE-10112) Improve LZ4 Compression performance with direct primitive read/writes

Tim Brooks (Jira) Sat, 18 Sep 2021 11:08:09 -0700

Tim Brooks created LUCENE-10112:
-----------------------------------

             Summary: Improve LZ4 Compression performance with direct primitive 
read/writes
                 Key: LUCENE-10112
                 URL: https://issues.apache.org/jira/browse/LUCENE-10112
             Project: Lucene - Core
          Issue Type: Improvement
            Reporter: Tim Brooks

*Summary*

Java9 introduced VarHandles as a tool to quickly read and write primitive types
directly to byte arrays without bound checks. The LZ4 compressor class must
consistently read ints from a byte array to analyze matches. The performance
can be improved by reading these using a VarHandle.

Additionally, the LZ4 compressor/decompressor methods currently individually
read/write the bytes for LE shorts. Lucene's DataOutput/DataInput abstractions
already have dedicated methods for reading/writing LE shorts. These methods are
selectively optimized in certain implementations and will provide superior
performance than individual byte reads.

*Concerns*

The DataOutput/DataInput readShort() and writeShort() methods do not call out
that they are LE. It just looks to me that the DataOutput/DataInput are LE?
Since this particular change does not appear to provide significant performance
wins, maybe the patch is better leaving the explicit individual byte reads?

Additionally, this patch changes read ints to read them in the platform native
order which should be fine since it is just matching bytes. But I can change it
to only read in the order the previous version did.

*Benchmarks*

I created JMH benchmarks which compresses 1MB of highly compressible JSON
observability data. And compresses it 64KB at a time. In order to simulate the
"short" changes, I use a forked version `ByteArrayDataOutput` which writes
shorts using a VarHandle (to simulate fast writes that the ByteBuffer versions
would get.) I also ran a benchmark without the short changes, just the reading
ints using a VarHandle.

{noformat}
Benchmark Mode Cnt Score Error
Units
MyBenchmark.testCompressLuceneLZ4 thrpt 9 712.430 ± 3.616
ops/s
MyBenchmark.testCompressLuceneLZ4Forked thrpt 9 945.380 ± 4.776
ops/s
MyBenchmark.testCompressLuceneLZ4ForkedNoShort thrpt 9 940.812 ± 3.868
ops/s
MyBenchmark.testCompressLuceneLZ4HC thrpt 9 147.432 ± 4.730
ops/s
MyBenchmark.testCompressLuceneLZ4HCForked thrpt 9 183.954 ± 2.534
ops/s
MyBenchmark.testCompressLuceneLZ4HCForkedNoShort thrpt 9 188.065 ± 0.727
ops/s{noformat}

--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (LUCENE-10112) Improve LZ4 Compression performance with direct primitive read/writes

Reply via email to