Hi everyone,

The Variant umbrella issue (#10392
<https://github.com/apache/iceberg/issues/10392>) hasn't been updated in a
while, and with active work happening across multiple PRs in Iceberg,
Spark, and Parquet, it's been hard to keep track of where things stand.

Since a few of us are actively working on variant features, I thought it
would help to put together a tracking document so the community has a
single place to see the current state, open work, and benchmark findings. I
plan to update this on a weekly basis to keep track of the issues and PRs
that are updated.

Iceberg Variant Community Document
<https://docs.google.com/document/d/1IuhLRxw1rcPD_f4jgHuGe3SwFgy7Y5wgEGvLzf6311s/edit?usp=sharing>

The document has three tabs:

   1. Overview - what shipped in 1.10, what's merged to main, open work
   areas, and the dependency graph across Iceberg, Spark, and Parquet
   2. Tracker - all open variant issues and PRs across Iceberg,
   Parquet-Java, Parquet-Format, and Spark with authors and status
   3. Benchmarks - summary of three independent benchmark efforts (details
   below)

*Benchmark findings*

Three independent benchmarks have measured variant performance. All
converge on the same picture: variant is a modest improvement over JSON
strings today (1.1-1.7x faster reads), but 15-17x slower than typed columns.

   1. Qiegang Long - 14 queries on GitHub Archive, 5 configs:
   https://qlong.github.io/posts/2026-03-30-variant-early-results
   2. Steve Loughran - JMH microbenchmarks, profiler-driven optimization:
   
https://steveloughran.github.io/benchmarking-variants/benchmarking-variants.html
   
<https://steveloughran.github.io/benchmarking-variants/benchmarking-variants.html>
   3. Neelesh Salian - Controlled baseline, 10M+100M rows, write + read:
   https://github.com/nssalian/iceberg/tree/iceberg-variant-benchmark/benchmark

If you're working on variant-related changes, please chime in or let me
know and I'll add it to the tracker. Feedback on the benchmarks or anything
else is welcome.

I've been giving variant updates during the Iceberg Spark Sync (Tuesdays,
10 AM PT), but given that this work now spans Iceberg, Spark, Parquet, and
Flink, I think it deserves its own forum. I'd like to propose a monthly
Variant Sync; a short call where contributors can share progress, surface
blockers, and coordinate across repos. If there's interest, I'll set one up
and share an invite on this thread.

Thanks,
Neelesh Salian.

Reply via email to