Re: Java performance optimization work — seeking code reviews

Ismaël Mejía Sun, 17 May 2026 15:52:16 -0700

I initially submitted a series of small, focused PRs thinking they'd
be easier to review. In practice the sheer number (~16 PRs, with more
pending) made things harder to follow — even for me. I've regrouped
the changes by encoding type / performance area so that each PR is
self-contained with its own benchmarks and test coverage, which should
make review and performance analysis much more straightforward.


Apologies for the churn. If you've been reviewing this PR, please
continue the discussion on #3566 which supersedes it. Thank you.

For those interested, the issue summarising the ongoing work is now
much cleaner. Feel free to take a look and leave any comments.
https://github.com/apache/parquet-java/issues/3530

Regards,
Ismaël

On Mon, May 11, 2026 at 8:55 PM Fokko Driesprong <[email protected]> wrote:
>
> Brilliant, thanks for the reminder. I just merged the PR.
>
> Kind regards,
> Fokko
>
> On 2026/05/11 18:43:00 Steve Loughran wrote:
> > Fokko, could you merge my benchmark test PR, you've already approved it.
> > Benchmark only as Neelesh picked up the only performance change I'd one in
> > my pr
> >
> > https://github.com/apache/parquet-java/pull/3452
> >
> > thanks
> >
> > On Mon, 11 May 2026 at 17:12, Fokko Driesprong <[email protected]> wrote:
> >
> > > Thanks Ismaël for working on this. I did a first round of reviews with
> > > great interest, and I'll do another one soon.
> > >
> > > I noticed that there is some overlap with the work by André (
> > > https://github.com/apache/parquet-java/issues?q=is%3Apr+is%3Aopen+author%3Aarouel)
> > > maybe it would be good to align the effort.
> > >
> > > Thanks!
> > >
> > > Kind regards,
> > > Fokko
> > >
> > > On 2026/04/29 10:13:43 Steve Loughran wrote:
> > > > there's a JMH comparer tool at
> > > https://github.com/JohnTortugo/jmh-tabulate
> > > > ...
> > > >
> > > > Even though it comes from an AWS engineer I did review that code for
> > > > security, and even  got claude to (dynamically) generate the config file
> > > > needed to run the project in a chroot-style sandbox on macos. Only
> > > tangible
> > > > risk is the chart.js file, and now that's cryptographically locked down.
> > > >
> > > > https://github.com/steveloughran/jmh-tabulate/tree/hardened
> > > >
> > > > Nobody should be pulling head dependencies from NPM repos, hard coded
> > > > version numbers can be subverted by new tags. Hash codes are the only
> > > thing
> > > > to trust for something you run on file://
> > > > Even if you bypass the sandbox, the .html file generated does enforce
> > > > chart.js version integrity. So all should be good.
> > > >
> > > > Given all that, what do your numbers look like?
> > > >
> > > >
> > > >
> > > >
> > > > On Wed, 29 Apr 2026 at 08:28, Ismaël Mejía <[email protected]> wrote:
> > > >
> > > > > Hi dev@,
> > > > >
> > > > > I’ve been working on performance improvements across the main
> > > > > encoding/decoding hot paths of Apache Parquet Java. I presented this
> > > > > work during last week’s Parquet community sync and I am sharing a
> > > > > summary here for broader visibility, in line with Apache best
> > > > > practices.
> > > > >
> > > > > Using AI assisted tools and JMH, I expanded the existing coverage of
> > > > > microbenchmarks covering critical hot paths. I then iterated on a
> > > > > series of optimizations, validated for correctness, and reviewed with
> > > > > other AI tools. The results are promising.
> > > > >
> > > > > The improvements focus on eliminating per-value overhead in the hot
> > > > > loops without changing the file format or public API. Key changes:
> > > > >
> > > > > - Plain INT32/LONG: bulk System.arraycopy instead of per-value
> > > > > ByteBuffer.putInt (~4x encode, ~3x decode)
> > > > > - ByteStreamSplit: zero-allocation batch scatter/gather (3-5x encode,
> > > 2x
> > > > > decode)
> > > > > - Dictionary encoding: custom open-addressing hash map replacing
> > > > > java.util.HashMap (up to 80x for low-cardinality string columns)
> > > > > - RLE dictionary index decoder: direct ByteBuffer access bypassing
> > > > > InputStream
> > > > > - New batch read APIs: readIntegers()/readLongs() for vectorized
> > > consumers
> > > > >
> > > > > End-to-end file read/write throughput improves by ~13–14% on average
> > > > > across codecs in my test suite (Java 11, AMD EPYC). Full JMH results
> > > > > (303 benchmarks) and a more detailed write-up will follow.
> > > > >
> > > > > Most changes have been grouped and tracked under the following issue,
> > > > > which provides background and links to the related pull requests
> > > > > https://github.com/apache/parquet-java/issues/3530
> > > > >
> > > > > The first set of pull requests is ready for review. Feedback and
> > > > > comments from Java committers would be greatly appreciated.
> > > > >
> > > > > Thanks,
> > > > > Ismaël
> > > > >
> > > > > ps. Kudos to Fokko Driesprong who already started reviewing some of
> > > them.
> > > > >
> > > >
> > >
> >

Re: Java performance optimization work — seeking code reviews

Reply via email to