Thanks Ismaël for working on this. I did a first round of reviews with great interest, and I'll do another one soon.
I noticed that there is some overlap with the work by André (https://github.com/apache/parquet-java/issues?q=is%3Apr+is%3Aopen+author%3Aarouel) maybe it would be good to align the effort. Thanks! Kind regards, Fokko On 2026/04/29 10:13:43 Steve Loughran wrote: > there's a JMH comparer tool at https://github.com/JohnTortugo/jmh-tabulate > ... > > Even though it comes from an AWS engineer I did review that code for > security, and even got claude to (dynamically) generate the config file > needed to run the project in a chroot-style sandbox on macos. Only tangible > risk is the chart.js file, and now that's cryptographically locked down. > > https://github.com/steveloughran/jmh-tabulate/tree/hardened > > Nobody should be pulling head dependencies from NPM repos, hard coded > version numbers can be subverted by new tags. Hash codes are the only thing > to trust for something you run on file:// > Even if you bypass the sandbox, the .html file generated does enforce > chart.js version integrity. So all should be good. > > Given all that, what do your numbers look like? > > > > > On Wed, 29 Apr 2026 at 08:28, Ismaël Mejía <[email protected]> wrote: > > > Hi dev@, > > > > I’ve been working on performance improvements across the main > > encoding/decoding hot paths of Apache Parquet Java. I presented this > > work during last week’s Parquet community sync and I am sharing a > > summary here for broader visibility, in line with Apache best > > practices. > > > > Using AI assisted tools and JMH, I expanded the existing coverage of > > microbenchmarks covering critical hot paths. I then iterated on a > > series of optimizations, validated for correctness, and reviewed with > > other AI tools. The results are promising. > > > > The improvements focus on eliminating per-value overhead in the hot > > loops without changing the file format or public API. Key changes: > > > > - Plain INT32/LONG: bulk System.arraycopy instead of per-value > > ByteBuffer.putInt (~4x encode, ~3x decode) > > - ByteStreamSplit: zero-allocation batch scatter/gather (3-5x encode, 2x > > decode) > > - Dictionary encoding: custom open-addressing hash map replacing > > java.util.HashMap (up to 80x for low-cardinality string columns) > > - RLE dictionary index decoder: direct ByteBuffer access bypassing > > InputStream > > - New batch read APIs: readIntegers()/readLongs() for vectorized consumers > > > > End-to-end file read/write throughput improves by ~13–14% on average > > across codecs in my test suite (Java 11, AMD EPYC). Full JMH results > > (303 benchmarks) and a more detailed write-up will follow. > > > > Most changes have been grouped and tracked under the following issue, > > which provides background and links to the related pull requests > > https://github.com/apache/parquet-java/issues/3530 > > > > The first set of pull requests is ready for review. Feedback and > > comments from Java committers would be greatly appreciated. > > > > Thanks, > > Ismaël > > > > ps. Kudos to Fokko Driesprong who already started reviewing some of them. > > >
