there's a JMH comparer tool at https://github.com/JohnTortugo/jmh-tabulate
...
Even though it comes from an AWS engineer I did review that code for
security, and even got claude to (dynamically) generate the config file
needed to run the project in a chroot-style sandbox on macos. Only tangible
risk is the chart.js file, and now that's cryptographically locked down.
https://github.com/steveloughran/jmh-tabulate/tree/hardened
Nobody should be pulling head dependencies from NPM repos, hard coded
version numbers can be subverted by new tags. Hash codes are the only thing
to trust for something you run on file://
Even if you bypass the sandbox, the .html file generated does enforce
chart.js version integrity. So all should be good.
Given all that, what do your numbers look like?
On Wed, 29 Apr 2026 at 08:28, Ismaël Mejía <[email protected]> wrote:
> Hi dev@,
>
> I’ve been working on performance improvements across the main
> encoding/decoding hot paths of Apache Parquet Java. I presented this
> work during last week’s Parquet community sync and I am sharing a
> summary here for broader visibility, in line with Apache best
> practices.
>
> Using AI assisted tools and JMH, I expanded the existing coverage of
> microbenchmarks covering critical hot paths. I then iterated on a
> series of optimizations, validated for correctness, and reviewed with
> other AI tools. The results are promising.
>
> The improvements focus on eliminating per-value overhead in the hot
> loops without changing the file format or public API. Key changes:
>
> - Plain INT32/LONG: bulk System.arraycopy instead of per-value
> ByteBuffer.putInt (~4x encode, ~3x decode)
> - ByteStreamSplit: zero-allocation batch scatter/gather (3-5x encode, 2x
> decode)
> - Dictionary encoding: custom open-addressing hash map replacing
> java.util.HashMap (up to 80x for low-cardinality string columns)
> - RLE dictionary index decoder: direct ByteBuffer access bypassing
> InputStream
> - New batch read APIs: readIntegers()/readLongs() for vectorized consumers
>
> End-to-end file read/write throughput improves by ~13–14% on average
> across codecs in my test suite (Java 11, AMD EPYC). Full JMH results
> (303 benchmarks) and a more detailed write-up will follow.
>
> Most changes have been grouped and tracked under the following issue,
> which provides background and links to the related pull requests
> https://github.com/apache/parquet-java/issues/3530
>
> The first set of pull requests is ready for review. Feedback and
> comments from Java committers would be greatly appreciated.
>
> Thanks,
> Ismaël
>
> ps. Kudos to Fokko Driesprong who already started reviewing some of them.
>
#!/usr/bin/env bash
# run_sandboxed.sh — run generate_report.py under macOS sandbox-exec.
#
# Restrictions:
# - network: DENIED
# - writes: only tmpdir + experiment directory (+ /dev/null)
# - reads: unrestricted (focus is on network + write safety)
# - exec: /bin, /usr/bin, /usr/local/bin, /opt/homebrew, /System
#
# Usage: ./run_sandboxed.sh [-v] <experiment-directory> [extra-write-dir ...]
# -v print the sandbox profile before running (for debugging)
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
VERBOSE=0
while [[ $# -gt 0 && "$1" == -* ]]; do
case "$1" in
-v) VERBOSE=1 ;;
*) echo "Unknown option: $1" >&2; exit 1 ;;
esac
shift
done
if [[ $# -lt 1 ]]; then
echo "Usage: $0 [-v] <experiment-directory> [extra-write-dir ...]" >&2
exit 1
fi
EXPERIMENT_DIR="$(cd "$1" && pwd -P)" # canonical, no symlinks
shift
PYTHON="$(command -v python3)"
TMPDIR_REAL="$(python3 -c "import os; print(os.path.realpath('${TMPDIR:-/tmp}'))")"
rm -f /tmp/jmh-sandbox.*.sb
SANDBOX_PROFILE="$(mktemp /tmp/jmh-sandbox.XXXXXX.sb)"
cat > "$SANDBOX_PROFILE" <<SBPL
(version 1)
(deny default)
; reads: unrestricted — we trust the filesystem; focus is network + writes
(allow file-read*)
(allow file-read-metadata)
; writes: /dev/null, tmpdir, experiment dir only
(allow file-write* (literal "/dev/null"))
(allow file-write* (subpath "${TMPDIR_REAL}"))
(allow file-write* (subpath "${EXPERIMENT_DIR}"))
; exec: standard Unix + Homebrew dirs
(allow process-fork)
(allow process-exec
(subpath "/bin")
(subpath "/usr/bin")
(subpath "/usr/local/bin")
(subpath "/opt/homebrew")
(subpath "/System"))
; IPC / signals
(allow signal (target self))
(allow sysctl-read)
(allow mach-lookup)
; network: DENIED (covered by default deny)
SBPL
# Append any extra writable directories passed as additional arguments
for extra in "$@"; do
extra_real="$(cd "$extra" && pwd -P)"
echo "sandbox: extra-write=${extra_real}"
printf '(allow file-write* (subpath "%s"))\n' "$extra_real" >> "$SANDBOX_PROFILE"
done
echo "sandbox: experiment=${EXPERIMENT_DIR}"
echo "sandbox: python=${PYTHON}"
if [[ $VERBOSE -eq 1 ]]; then
echo "── sandbox profile ──────────────────────────────────────"
cat "$SANDBOX_PROFILE"
echo "─────────────────────────────────────────────────────────"
fi
export PYTHONDONTWRITEBYTECODE=1
sandbox-exec -f "$SANDBOX_PROFILE" \
"$PYTHON" "${SCRIPT_DIR}/generate_report.py" "$EXPERIMENT_DIR"