Hi dev@,

I’ve been working on performance improvements across the main
encoding/decoding hot paths of Apache Parquet Java. I presented this
work during last week’s Parquet community sync and I am sharing a
summary here for broader visibility, in line with Apache best
practices.

Using AI assisted tools and JMH, I expanded the existing coverage of
microbenchmarks covering critical hot paths. I then iterated on a
series of optimizations, validated for correctness, and reviewed with
other AI tools. The results are promising.

The improvements focus on eliminating per-value overhead in the hot
loops without changing the file format or public API. Key changes:

- Plain INT32/LONG: bulk System.arraycopy instead of per-value
ByteBuffer.putInt (~4x encode, ~3x decode)
- ByteStreamSplit: zero-allocation batch scatter/gather (3-5x encode, 2x decode)
- Dictionary encoding: custom open-addressing hash map replacing
java.util.HashMap (up to 80x for low-cardinality string columns)
- RLE dictionary index decoder: direct ByteBuffer access bypassing InputStream
- New batch read APIs: readIntegers()/readLongs() for vectorized consumers

End-to-end file read/write throughput improves by ~13–14% on average
across codecs in my test suite (Java 11, AMD EPYC). Full JMH results
(303 benchmarks) and a more detailed write-up will follow.

Most changes have been grouped and tracked under the following issue,
which provides background and links to the related pull requests
https://github.com/apache/parquet-java/issues/3530

The first set of pull requests is ready for review. Feedback and
comments from Java committers would be greatly appreciated.

Thanks,
Ismaël

ps. Kudos to Fokko Driesprong who already started reviewing some of them.

Reply via email to