Hi everyone, The Variant umbrella issue (#10392 <https://github.com/apache/iceberg/issues/10392>) hasn't been updated in a while, and with active work happening across multiple PRs in Iceberg, Spark, and Parquet, it's been hard to keep track of where things stand.
Since a few of us are actively working on variant features, I thought it would help to put together a tracking document so the community has a single place to see the current state, open work, and benchmark findings. I plan to update this on a weekly basis to keep track of the issues and PRs that are updated. Iceberg Variant Community Document <https://docs.google.com/document/d/1IuhLRxw1rcPD_f4jgHuGe3SwFgy7Y5wgEGvLzf6311s/edit?usp=sharing> The document has three tabs: 1. Overview - what shipped in 1.10, what's merged to main, open work areas, and the dependency graph across Iceberg, Spark, and Parquet 2. Tracker - all open variant issues and PRs across Iceberg, Parquet-Java, Parquet-Format, and Spark with authors and status 3. Benchmarks - summary of three independent benchmark efforts (details below) *Benchmark findings* Three independent benchmarks have measured variant performance. All converge on the same picture: variant is a modest improvement over JSON strings today (1.1-1.7x faster reads), but 15-17x slower than typed columns. 1. Qiegang Long - 14 queries on GitHub Archive, 5 configs: https://qlong.github.io/posts/2026-03-30-variant-early-results 2. Steve Loughran - JMH microbenchmarks, profiler-driven optimization: https://steveloughran.github.io/benchmarking-variants/benchmarking-variants.html <https://steveloughran.github.io/benchmarking-variants/benchmarking-variants.html> 3. Neelesh Salian - Controlled baseline, 10M+100M rows, write + read: https://github.com/nssalian/iceberg/tree/iceberg-variant-benchmark/benchmark If you're working on variant-related changes, please chime in or let me know and I'll add it to the tracker. Feedback on the benchmarks or anything else is welcome. I've been giving variant updates during the Iceberg Spark Sync (Tuesdays, 10 AM PT), but given that this work now spans Iceberg, Spark, Parquet, and Flink, I think it deserves its own forum. I'd like to propose a monthly Variant Sync; a short call where contributors can share progress, surface blockers, and coordinate across repos. If there's interest, I'll set one up and share an invite on this thread. Thanks, Neelesh Salian.
