Hi all, Thanks for joining the Iceberg-Spark community sync today. You can find the recording here <https://youtu.be/vNvOHMHXHGw>.
Below are the highlights from the meeting: - Comet Integration: Thanks to everyone involved in reviewing and merging the FileFormat API PRs to unblock the Comet team. The final required PR is #15328 <https://github.com/apache/iceberg/pull/15328>. - Variant Support: There are questions regarding the status of the variant support PR (#14297 <https://github.com/apache/iceberg/pull/14297>), specifically whether it completes the full end-to-end implementation for reading and writing variants in Spark and if it will be included in the next release. - DSV2 Versions Framework: We discussed the DSV2 version framework PRs and whether they are required for the upcoming release. ~ Anurag On Tue, Jan 20, 2026 at 1:49 PM Aihua Xu <[email protected]> wrote: > Thanks Anurag for driving this and included Support writing shredded > variant in Iceberg-Spark (#14297 > <https://github.com/apache/iceberg/pull/14297>). Appreciate the folks > from Spark side can help review the PR. > > Thanks, > Aihua > > > > On Tue, Jan 20, 2026 at 1:17 PM karuppayya <[email protected]> > wrote: > >> Thanks Anurag for driving this. Unfortunately, I wasnt able to attend >> today. I reviewed the video. >> >> @prashant, regarding >> >> - >> >> Alpha family aggregate support - #52551 >> <https://github.com/apache/spark/pull/52551> >> >> The Iceberg spec[1] requires that NDV computed via Alpha sketch. We >> currently have a ThetaAgg expression in the Iceberg code that uses the >> Alpha family[2]. >> Spark recently introduced support for ThetaSketch aggregates[3] (*which >> currently only supports the quickselect family*), this PR introduces *Alpha >> family support*. >> Once we have this, we don't need to maintain the code in Iceberg and also >> get the benefits from Spark community(either with improvement to Spark >> Catalyst in general or the expression specifically) >> >> With regards to >> >> - >> >> Codegen for MergeRowsExec - #52399 >> <https://github.com/apache/spark/pull/52399> >> >> This would help speed up merge execution (and also simplify SplitIterator >> <https://github.com/apache/spark/pull/52399/changes#diff-a572ff40254b26b4a903f101ee466dd2dff9b8c7954a3b957fe5fc25b87ee10aR241-R242> >> logic). >> >> - Karuppayya >> [1] - >> https://iceberg.apache.org/puffin-spec/#apache-datasketches-theta-v1-blob-type >> [2] - >> https://github.com/apache/iceberg/blob/main/spark/v4.1/spark/src/main/scala/org/apache/spark/sql/stats/ThetaSketchAgg.scala#L66-L68 >> [3] - >> https://github.com/karuppayya/spark/commit/6ff9edcaf16c90007508f15de98fac361e234381 >> >> >> >> >> On Tue, Jan 20, 2026 at 12:28 PM Anurag Mantripragada < >> [email protected]> wrote: >> >>> Thanks everyone for joining the first Iceberg/Spark community sync. >>> >>> Here is the recording: https://youtu.be/g4n2hwdFosE?si=n9hVRhCThshuOqd5 >>> >>> >>> Below are the discussion highlights. >>> >>> Datafusion Comet integration >>> >>> - >>> - >>> >>> Spark: Encapsulate parquet objects for Comet (#13786 >>> <https://github.com/apache/iceberg/pull/13786>) >>> - >>> >>> Future of Iceberg support in Comet (datafusion-comet#2921 >>> <https://github.com/apache/datafusion-comet/issues/2921>) >>> - >>> >>> Mailing List Discussion >>> >>> <https://lists.apache.org/thread/vr9nsbd5nhg3d20nmtyj4b3zsw9229gd> >>> - >>> >>> Notes: >>> - >>> >>> Rust vs Java - Discuss and vote in the dev list >>> - >>> >>> To move forward with (#13786 >>> <https://github.com/apache/iceberg/pull/13786>) - Discuss >>> in FileFormat API sync if there are any pending items this PR >>> needs updates >>> on. >>> - >>> >>> Make a decision to merge the PR vs waiting for FileFormat >>> API >>> >>> >>> - >>> >>> Spark 3.4 Deprecation >>> - >>> >>> Spark: Remove Spark 3.4 support (#14122 >>> <https://github.com/apache/iceberg/pull/14122>) >>> - >>> >>> Notes: >>> - >>> >>> Wait until comet integration is resolved. >>> >>> - >>> >>> Spark 4.1/4.2 >>> - >>> >>> Spark: Add support for 4.2.0-preview (#14984 >>> <https://github.com/apache/iceberg/pull/14984>) >>> - >>> >>> Spark 4.1: Initial support for MERGE INTO schema evolution (# >>> 14970 <https://github.com/apache/iceberg/pull/14970>) >>> - >>> >>> Notes: >>> - >>> >>> 4.1 is the current latest version. New PRs must go to it >>> - >>> >>> Spark 4.1 introduces a version framework. Anton is working >>> on integrating it with Iceberg. This greatly simplifies Iceberg >>> lifecycle >>> management but requires non-trivial integration work. >>> - >>> >>> Prefer not to make any releases with 4.1 until this is in. >>> >>> >>> - >>> >>> DSv2 and sort order reporting >>> - >>> >>> Spark (4.0, 3.5): Set data file sort_order_id in manifest for >>> writes from Spark (#14683 >>> <https://github.com/apache/iceberg/pull/14683>) >>> - >>> >>> The rebase has many changes. Ask author to fix. >>> - >>> >>> Spark 4.0: Implement SupportsReportOrdering DSv2 API (#14948 >>> <https://github.com/apache/iceberg/pull/14948>) >>> - >>> >>> Move to 4.1 for easier review >>> >>> >>> - >>> >>> Compaction/Table maintenance/DR >>> - >>> >>> Spark 4.0: RewriteTablePath support for multiple source and >>> destination prefixes (#14355 >>> <https://github.com/apache/iceberg/pull/14355>) >>> - >>> >>> Spark 4.0: Optional switch to log expire data files during >>> ExpireSnapshots action (#14354 >>> <https://github.com/apache/iceberg/pull/14354>) >>> - >>> >>> Notes: >>> - >>> >>> Trace level logging >>> - >>> >>> How about logging it to another Iceberg table? >>> - >>> >>> Use the dataframe of files and log separately? >>> >>> >>> - >>> >>> V3 spec implementation >>> - >>> >>> Spark: Support writing shredded variant in Iceberg-Spark ( >>> #14297 <https://github.com/apache/iceberg/pull/14297>) >>> - >>> >>> Notes: >>> - >>> >>> Status of Variant type support - consolidate and track >>> somewhere >>> - >>> >>> Filter pushdown not implemented >>> - >>> >>> The write support PR is new, will review. It should have >>> Iceberg metadata changes to indicate the variant shredding so >>> Spark can use >>> it. >>> - >>> >>> #14297 <https://github.com/apache/iceberg/pull/14297> Will >>> be reviewed >>> >>> >>> - >>> >>> Spark UDF Support >>> - >>> >>> SQL UDF support Stage 1 (#14954 >>> <https://github.com/apache/iceberg/pull/14954>) (The >>> corresponding Spark SPIP: SPIP: Catalog-backed Code-Literal >>> Functions (SQL and Python) with Catalog SPI and CRUD >>> >>> <https://docs.google.com/document/d/186cTAZxoXp1p8vaSunIaJmVLXcPR-FxSiLiDUl8kK8A/edit?tab=t.0#heading=h.for1fb3tezo3> >>> ) >>> - >>> >>> Notes: >>> - >>> >>> Waiting for the proposal vote and spark side SPIP related >>> to this. >>> - >>> >>> Spark 4.0: Spark UDF POC (#14505 >>> <https://github.com/apache/iceberg/pull/14505>) >>> - >>> >>> Huaxin to delete this PR. This version is hacky. >>> >>> >>> - >>> >>> DDL and schema evolution >>> - >>> >>> CREATE TABLE LIKE support (#14269 >>> <https://github.com/apache/iceberg/pull/14269>) >>> - >>> >>> Notes: >>> - >>> >>> Recommend not to add SQL extensions in Iceberg code >>> anymore. >>> - >>> >>> They are fragile and need maintenance and have to work >>> well with Spark >>> - >>> >>> Alternatively, consider writing a procedure to do this >>> until Spark has native support. >>> - >>> >>> Native Spark support for CREATE TABLE LIKE is not yet >>> implemented. >>> Spark PRs >>> - >>> >>> Alpha family aggregate support - #52551 >>> <https://github.com/apache/spark/pull/52551> >>> - >>> >>> Notes: >>> - >>> >>> Okay to have Spark only changes that can potentially help >>> in Iceberg use-cases >>> - >>> >>> Elaborate on the use of this? How does this integrate with >>> Iceberg? >>> >>> >>> - >>> >>> Codegen for MergeRowsExec - #52399 >>> <https://github.com/apache/spark/pull/52399> >>> - >>> >>> Notes: >>> - >>> >>> This is a heavily used Exec node in Iceberg so this is good >>> to have. >>> - >>> >>> The community will review this >>> >>> >>> >>> >>> Thanks, >>> ~Anurag >>> >>> On Thu, Jan 15, 2026 at 6:48 PM Anton Okolnychyi <[email protected]> >>> wrote: >>> >>>> If anyone has long-standing PRs related to Spark, it may be a good >>>> forum to get some reviews and help from the community. >>>> >>>> ср, 14 січ. 2026 р. о 11:23 Anurag Mantripragada < >>>> [email protected]> пише: >>>> >>>>> Thanks Kevin, >>>>> >>>>> All, please review the doc >>>>> <https://docs.google.com/document/d/19nno1RoPznbbxKOZZddZNHHafa7XULjbN6RPExdr2n4/edit?tab=t.0> >>>>> and >>>>> add any agenda items I may have missed. See you on Tuesday. >>>>> >>>>> ~ Anurag >>>>> >>>>> On Wed, Jan 14, 2026 at 11:20 AM Kevin Liu <[email protected]> >>>>> wrote: >>>>> >>>>>> Connected with Anurag on Slack. I just added a new event to the >>>>>> Iceberg Dev calendar for next week Tuesday Jan 20th from 10AM - 11AM PT, >>>>>> "*Iceberg >>>>>> - Spark Community Sync*". It's a monthly recurring meeting and the >>>>>> google meets link is set to open to the public. >>>>>> Happy to make changes based on feedback. >>>>>> >>>>>> Best, >>>>>> Kevin Liu >>>>>> >>>>>> >>>>>> On Wed, Jan 14, 2026 at 10:57 AM Kevin Liu <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Looking at the current Iceberg dev calendar schedule, we have a slot >>>>>>> next week Tuesday or Friday for a monthly recurring sync. Wednesday >>>>>>> corresponds with the main Community Sync in some weeks. >>>>>>> Please let me know the preferred day and time and I can help set it >>>>>>> up! >>>>>>> >>>>>>> Best, >>>>>>> Kevin Liu >>>>>>> >>>>>>> On Tue, Jan 13, 2026 at 10:58 AM Anurag Mantripragada < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Hi Kevin, >>>>>>>> >>>>>>>> I'm open to ideas, but I think we could start with monthly cadence >>>>>>>> for Spark syncs and increase the frequency if the community feels we >>>>>>>> need >>>>>>>> to meet more often. Could you please set up a time on the Iceberg dev >>>>>>>> calendar? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Anurag >>>>>>>> >>>>>>>> On Fri, Jan 9, 2026 at 10:16 AM Anurag Mantripragada < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> Thanks Anton and Kevin, >>>>>>>>> >>>>>>>>> I wrote a doc with general themes from the Spark PRs and Issues I >>>>>>>>> browsed in the repo. Please feel free to add more if I may have missed >>>>>>>>> anything. >>>>>>>>> >>>>>>>>> https://docs.google.com/document/d/19nno1RoPznbbxKOZZddZNHHafa7XULjbN6RPExdr2n4/edit?tab=t.0 >>>>>>>>> >>>>>>>>> Looking forward to meeting you all and talking about all things >>>>>>>>> Spark! >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Anurag >>>>>>>>> >>>>>>>>> On Fri, Jan 9, 2026 at 10:03 AM Kevin Liu <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> +1 great idea! >>>>>>>>>> Let's start a doc with potential discussion items and find a time >>>>>>>>>> on the calendar. I have permission to add events to the "iceberg dev >>>>>>>>>> events" calendar. Happy to help with the logistics once the time and >>>>>>>>>> cadence is decided. >>>>>>>>>> >>>>>>>>>> Best, >>>>>>>>>> Kevin Liu >>>>>>>>>> >>>>>>>>>> On Wed, Jan 7, 2026 at 4:35 PM Anton Okolnychyi < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> YES! I have been meaning to suggest the same. >>>>>>>>>>> >>>>>>>>>>> Can you start a doc with the pool of items to which everyone can >>>>>>>>>>> contribute to? >>>>>>>>>>> >>>>>>>>>>> - Anton >>>>>>>>>>> >>>>>>>>>>> ср, 7 січ. 2026 р. о 15:30 Anurag Mantripragada < >>>>>>>>>>> [email protected]> пише: >>>>>>>>>>> >>>>>>>>>>>> Hi folks, happy new year! >>>>>>>>>>>> >>>>>>>>>>>> (Sorry if I sent this email more than once, my attempts of >>>>>>>>>>>> sending this from a different email failed) >>>>>>>>>>>> >>>>>>>>>>>> There are a few Spark changes the community is working on >>>>>>>>>>>> including >>>>>>>>>>>> - Sort order reporting [1], [2] >>>>>>>>>>>> - Spark 4.1 support [3] >>>>>>>>>>>> - Future of Datafusion-Comet support [4] [5] >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Community members interested in the Spark integration have been >>>>>>>>>>>> discussing it in smaller groups. However, we believe that the >>>>>>>>>>>> general >>>>>>>>>>>> community sync should include all updates, and discussing >>>>>>>>>>>> Spark-specific >>>>>>>>>>>> matters may not be the most effective use of that sync. I was >>>>>>>>>>>> wondering if >>>>>>>>>>>> it will be useful to create a Spark-Iceberg integration-specific >>>>>>>>>>>> sync on >>>>>>>>>>>> the calendar, similar to what we have for individual proposals. >>>>>>>>>>>> This sync >>>>>>>>>>>> will not replace the community sync, which will still be used for >>>>>>>>>>>> broader >>>>>>>>>>>> discussions including any new spark topics that come out of the >>>>>>>>>>>> spark sync. >>>>>>>>>>>> >>>>>>>>>>>> If there’s interest in doing these spark breakout syncs, I’m >>>>>>>>>>>> happy to volunteer to run them. Please let me know what you all >>>>>>>>>>>> think. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> ~ Anurag >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> [1] - https://github.com/apache/iceberg/pull/14683 >>>>>>>>>>>> [2] - https://github.com/apache/iceberg/pull/14948 >>>>>>>>>>>> [3] - https://github.com/apache/iceberg/pull/14970 >>>>>>>>>>>> [4] - https://github.com/apache/datafusion-comet/issues/2921 >>>>>>>>>>>> [5] - >>>>>>>>>>>> https://lists.apache.org/thread/vr9nsbd5nhg3d20nmtyj4b3zsw9229gd >>>>>>>>>>>> >>>>>>>>>>>
