This is great, we need that tracker as it is cross-project. piece of work to say "this is readly
I did have an agenda item from last month's community call which didn't get through. If we can retain that open time slot we could do a very quick summary of where we are (summarly slides of Qiegang's results and mine, key outstanding issues and next steps, then we can start that monthly session on it. Meanwhile, I have both parquet and iceberg PRs for benchmarks which I think are ready for review -please take a look Finally, I'm thinking about interop of those many, many variant readers out there. Has anyone explicitly cross-tested their implementations of variant? what about consistent handling of invalid data? That includes iceberg-rust, parquet-cpp and more... Steve On Sun, 19 Apr 2026 at 21:57, Neelesh Salian <[email protected]> wrote: > Hi everyone, > > The Variant umbrella issue (#10392 > <https://github.com/apache/iceberg/issues/10392>) hasn't been updated in > a while, and with active work happening across multiple PRs in Iceberg, > Spark, and Parquet, it's been hard to keep track of where things stand. > > Since a few of us are actively working on variant features, I thought it > would help to put together a tracking document so the community has a > single place to see the current state, open work, and benchmark findings. I > plan to update this on a weekly basis to keep track of the issues and PRs > that are updated. > > Iceberg Variant Community Document > <https://docs.google.com/document/d/1IuhLRxw1rcPD_f4jgHuGe3SwFgy7Y5wgEGvLzf6311s/edit?usp=sharing> > > The document has three tabs: > > 1. Overview - what shipped in 1.10, what's merged to main, open work > areas, and the dependency graph across Iceberg, Spark, and Parquet > 2. Tracker - all open variant issues and PRs across Iceberg, > Parquet-Java, Parquet-Format, and Spark with authors and status > 3. Benchmarks - summary of three independent benchmark efforts > (details below) > > *Benchmark findings* > > Three independent benchmarks have measured variant performance. All > converge on the same picture: variant is a modest improvement over JSON > strings today (1.1-1.7x faster reads), but 15-17x slower than typed columns. > > 1. Qiegang Long - 14 queries on GitHub Archive, 5 configs: > https://qlong.github.io/posts/2026-03-30-variant-early-results > 2. Steve Loughran - JMH microbenchmarks, profiler-driven optimization: > > https://steveloughran.github.io/benchmarking-variants/benchmarking-variants.html > > <https://steveloughran.github.io/benchmarking-variants/benchmarking-variants.html> > 3. Neelesh Salian - Controlled baseline, 10M+100M rows, write + read: > > https://github.com/nssalian/iceberg/tree/iceberg-variant-benchmark/benchmark > > If you're working on variant-related changes, please chime in or let me > know and I'll add it to the tracker. Feedback on the benchmarks or anything > else is welcome. > > I've been giving variant updates during the Iceberg Spark Sync (Tuesdays, > 10 AM PT), but given that this work now spans Iceberg, Spark, Parquet, and > Flink, I think it deserves its own forum. I'd like to propose a monthly > Variant Sync; a short call where contributors can share progress, surface > blockers, and coordinate across repos. If there's interest, I'll set one up > and share an invite on this thread. > > Thanks, > Neelesh Salian. >
