Hi folks, Here is the Meeting Notes and Recording link from the Variant Sync today July 02, 2026: [Notes <https://docs.google.com/document/d/1IuhLRxw1rcPD_f4jgHuGe3SwFgy7Y5wgEGvLzf6311s/edit?tab=t.g06kaw1fbwhz> ]. There are some action items and active work. Please reach out if you have any questions. Thanks
On Fri, Jun 5, 2026 at 12:22 PM Neelesh Salian <[email protected]> wrote: > Hi folks, > > Here is the Meeting Notes and Recording link from the Variant Sync on June > 4, 2026: [Notes > <https://docs.google.com/document/d/1IuhLRxw1rcPD_f4jgHuGe3SwFgy7Y5wgEGvLzf6311s/edit?tab=t.g06kaw1fbwhz#heading=h.r977qio1wsv2> > ]. > There are some action items and active work. > Please reach out if you have any questions. > Thanks. > > On Fri, May 8, 2026 at 3:20 PM Neelesh Salian <[email protected]> > wrote: > >> Hi folks, >> >> Here is the Meeting Notes and Recording link from the Variant Sync on May >> 7, 2026: [Notes >> <https://docs.google.com/document/d/1IuhLRxw1rcPD_f4jgHuGe3SwFgy7Y5wgEGvLzf6311s/edit?tab=t.g06kaw1fbwhz> >> ]. >> There are some action items and active work. >> Please reach out if you have any questions. >> Thanks. >> >> On Thu, Apr 30, 2026 at 1:36 PM Neelesh Salian <[email protected]> >> wrote: >> >>> Hi folks, >>> >>> I've set up a time starting next week on Thursday (May 7, 2026) at 10 am >>> Pacific time for a sync for the active work on Variant. >>> This will be a monthly sync (on the first Thursday of every month). >>> You can find it on the dev calendar. >>> Here is the calendar invite: >>> https://calendar.app.google/b8ykdTV3EaNnVnkv8 >>> I'll be recording the call and capturing notes in the sync document: Iceberg >>> - Variant Community Update >>> <https://docs.google.com/document/d/1IuhLRxw1rcPD_f4jgHuGe3SwFgy7Y5wgEGvLzf6311s/edit?usp=sharing> >>> (Meeting >>> Notes tab). >>> Thanks. >>> >>> >>> >>> >>> >>> >>> On Mon, Apr 20, 2026 at 1:49 PM Steve Loughran <[email protected]> >>> wrote: >>> >>>> + regarding the rust, go and cpp impls, a status from each team would >>>> be great! >>>> >>>> I've been reviewing arrow parquet variant stuff and it is all there, >>>> including with some benchmarks and optimisations. Which may put it ahead of >>>> the others. >>>> >>>> It also has some special handling for sorted variants, as key search >>>> there is straightforward. AFAIK I don't think the others do that, and nor >>>> do I see them going to any effort to sort fields in an object. I think >>>> sorting would be good, but you would have to handle the case where there >>>> are duplicate keys. It's allowed in the spec, and seems like itcould creep >>>> in from nested variants. Has anyone looked at this? >>>> >>>> Also: has anyone created malformed parquet files with a shredded >>>> variant and a metadata entry of the same name. The requirement is "ignore >>>> the metadata one", but that's something to test. You'd have to write a >>>> shredded file and then edit the binary content to achieve this, or manually >>>> create one and put it into the parquet-testing repository under bad-data/ >>>> >>>> >>>> On Mon, 20 Apr 2026 at 19:08, Qiegang Long <[email protected]> wrote: >>>> >>>>> Thanks for the doc to track the status! +1 on the dedicated >>>>> sync—definitely feels like there’s a lot of work before we see Variant’s >>>>> full potential. >>>>> >>>>> Qiegang >>>>> >>>>> On Mon, Apr 20, 2026 at 11:09 AM Steve Loughran <[email protected]> >>>>> wrote: >>>>> >>>>>> >>>>>> This is great, we need that tracker as it is cross-project. piece of >>>>>> work to say "this is readly >>>>>> >>>>>> I did have an agenda item from last month's community call which >>>>>> didn't get through. If we can retain that open time slot we could do a >>>>>> very >>>>>> quick summary of where we are (summarly slides of Qiegang's results and >>>>>> mine, key outstanding issues and next steps, then we can start that >>>>>> monthly session on it. >>>>>> >>>>>> Meanwhile, I have both parquet and iceberg PRs for benchmarks which I >>>>>> think are ready for review -please take a look >>>>>> >>>>>> Finally, I'm thinking about interop of those many, many variant >>>>>> readers out there. Has anyone explicitly cross-tested their >>>>>> implementations >>>>>> of variant? what about consistent handling of invalid data? That includes >>>>>> iceberg-rust, parquet-cpp and more... >>>>>> >>>>>> Steve >>>>>> >>>>>> On Sun, 19 Apr 2026 at 21:57, Neelesh Salian < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Hi everyone, >>>>>>> >>>>>>> The Variant umbrella issue (#10392 >>>>>>> <https://github.com/apache/iceberg/issues/10392>) hasn't been >>>>>>> updated in a while, and with active work happening across multiple PRs >>>>>>> in >>>>>>> Iceberg, Spark, and Parquet, it's been hard to keep track of where >>>>>>> things >>>>>>> stand. >>>>>>> >>>>>>> Since a few of us are actively working on variant features, I >>>>>>> thought it would help to put together a tracking document so the >>>>>>> community >>>>>>> has a single place to see the current state, open work, and benchmark >>>>>>> findings. I plan to update this on a weekly basis to keep track of the >>>>>>> issues and PRs that are updated. >>>>>>> >>>>>>> Iceberg Variant Community Document >>>>>>> <https://docs.google.com/document/d/1IuhLRxw1rcPD_f4jgHuGe3SwFgy7Y5wgEGvLzf6311s/edit?usp=sharing> >>>>>>> >>>>>>> The document has three tabs: >>>>>>> >>>>>>> 1. Overview - what shipped in 1.10, what's merged to main, open >>>>>>> work areas, and the dependency graph across Iceberg, Spark, and >>>>>>> Parquet >>>>>>> 2. Tracker - all open variant issues and PRs across Iceberg, >>>>>>> Parquet-Java, Parquet-Format, and Spark with authors and status >>>>>>> 3. Benchmarks - summary of three independent benchmark efforts >>>>>>> (details below) >>>>>>> >>>>>>> *Benchmark findings* >>>>>>> >>>>>>> Three independent benchmarks have measured variant performance. All >>>>>>> converge on the same picture: variant is a modest improvement over JSON >>>>>>> strings today (1.1-1.7x faster reads), but 15-17x slower than typed >>>>>>> columns. >>>>>>> >>>>>>> 1. Qiegang Long - 14 queries on GitHub Archive, 5 configs: >>>>>>> https://qlong.github.io/posts/2026-03-30-variant-early-results >>>>>>> 2. Steve Loughran - JMH microbenchmarks, profiler-driven >>>>>>> optimization: >>>>>>> >>>>>>> https://steveloughran.github.io/benchmarking-variants/benchmarking-variants.html >>>>>>> >>>>>>> <https://steveloughran.github.io/benchmarking-variants/benchmarking-variants.html> >>>>>>> 3. Neelesh Salian - Controlled baseline, 10M+100M rows, write + >>>>>>> read: >>>>>>> >>>>>>> https://github.com/nssalian/iceberg/tree/iceberg-variant-benchmark/benchmark >>>>>>> >>>>>>> If you're working on variant-related changes, please chime in or let >>>>>>> me know and I'll add it to the tracker. Feedback on the benchmarks or >>>>>>> anything else is welcome. >>>>>>> >>>>>>> I've been giving variant updates during the Iceberg Spark Sync >>>>>>> (Tuesdays, 10 AM PT), but given that this work now spans Iceberg, Spark, >>>>>>> Parquet, and Flink, I think it deserves its own forum. I'd like to >>>>>>> propose >>>>>>> a monthly Variant Sync; a short call where contributors can share >>>>>>> progress, >>>>>>> surface blockers, and coordinate across repos. If there's interest, I'll >>>>>>> set one up and share an invite on this thread. >>>>>>> >>>>>>> Thanks, >>>>>>> Neelesh Salian. >>>>>>> >>>>>>
