Re: [DISCUSS] Iceberg Variant - Tracking Document & Sync Proposal

Qiegang Long Mon, 20 Apr 2026 11:08:00 -0700

Thanks for the doc to track the status! +1 on the dedicated sync—definitely
feels like there’s a lot of work before we see Variant’s full potential.


Qiegang

On Mon, Apr 20, 2026 at 11:09 AM Steve Loughran <[email protected]> wrote:

>
> This is great, we need that tracker as it is cross-project. piece of work
> to say "this is readly
>
> I did have an agenda item from last month's community call which didn't
> get through. If we can retain that open time slot we could do a very quick
> summary of where we are (summarly slides of Qiegang's results and mine, key
> outstanding issues and next steps, then we can start that monthly session
> on it.
>
> Meanwhile, I have both parquet and iceberg PRs for benchmarks which I
> think are ready for review -please take a look
>
> Finally, I'm thinking about interop of those many, many variant readers
> out there. Has anyone explicitly cross-tested their implementations of
> variant? what about consistent handling of invalid data? That includes
> iceberg-rust, parquet-cpp and more...
>
> Steve
>
> On Sun, 19 Apr 2026 at 21:57, Neelesh Salian <[email protected]>
> wrote:
>
>> Hi everyone,
>>
>> The Variant umbrella issue (#10392
>> <https://github.com/apache/iceberg/issues/10392>) hasn't been updated in
>> a while, and with active work happening across multiple PRs in Iceberg,
>> Spark, and Parquet, it's been hard to keep track of where things stand.
>>
>> Since a few of us are actively working on variant features, I thought it
>> would help to put together a tracking document so the community has a
>> single place to see the current state, open work, and benchmark findings. I
>> plan to update this on a weekly basis to keep track of the issues and PRs
>> that are updated.
>>
>> Iceberg Variant Community Document
>> <https://docs.google.com/document/d/1IuhLRxw1rcPD_f4jgHuGe3SwFgy7Y5wgEGvLzf6311s/edit?usp=sharing>
>>
>> The document has three tabs:
>>
>>    1. Overview - what shipped in 1.10, what's merged to main, open work
>>    areas, and the dependency graph across Iceberg, Spark, and Parquet
>>    2. Tracker - all open variant issues and PRs across Iceberg,
>>    Parquet-Java, Parquet-Format, and Spark with authors and status
>>    3. Benchmarks - summary of three independent benchmark efforts
>>    (details below)
>>
>> *Benchmark findings*
>>
>> Three independent benchmarks have measured variant performance. All
>> converge on the same picture: variant is a modest improvement over JSON
>> strings today (1.1-1.7x faster reads), but 15-17x slower than typed columns.
>>
>>    1. Qiegang Long - 14 queries on GitHub Archive, 5 configs:
>>    https://qlong.github.io/posts/2026-03-30-variant-early-results
>>    2. Steve Loughran - JMH microbenchmarks, profiler-driven
>>    optimization:
>>    
>> https://steveloughran.github.io/benchmarking-variants/benchmarking-variants.html
>>    
>> <https://steveloughran.github.io/benchmarking-variants/benchmarking-variants.html>
>>    3. Neelesh Salian - Controlled baseline, 10M+100M rows, write + read:
>>    
>> https://github.com/nssalian/iceberg/tree/iceberg-variant-benchmark/benchmark
>>
>> If you're working on variant-related changes, please chime in or let me
>> know and I'll add it to the tracker. Feedback on the benchmarks or anything
>> else is welcome.
>>
>> I've been giving variant updates during the Iceberg Spark Sync (Tuesdays,
>> 10 AM PT), but given that this work now spans Iceberg, Spark, Parquet, and
>> Flink, I think it deserves its own forum. I'd like to propose a monthly
>> Variant Sync; a short call where contributors can share progress, surface
>> blockers, and coordinate across repos. If there's interest, I'll set one up
>> and share an invite on this thread.
>>
>> Thanks,
>> Neelesh Salian.
>>
>

Re: [DISCUSS] Iceberg Variant - Tracking Document & Sync Proposal

Reply via email to