Thanks everyone for joining the first Iceberg/Spark community sync.
Here is the recording: https://youtu.be/g4n2hwdFosE?si=n9hVRhCThshuOqd5
Below are the discussion highlights.
Datafusion Comet integration
-
-
Spark: Encapsulate parquet objects for Comet (#13786
<https://github.com/apache/iceberg/pull/13786>)
-
Future of Iceberg support in Comet (datafusion-comet#2921
<https://github.com/apache/datafusion-comet/issues/2921>)
-
Mailing List Discussion
<https://lists.apache.org/thread/vr9nsbd5nhg3d20nmtyj4b3zsw9229gd>
-
Notes:
-
Rust vs Java - Discuss and vote in the dev list
-
To move forward with (#13786
<https://github.com/apache/iceberg/pull/13786>) - Discuss in
FileFormat API sync if there are any pending items this PR
needs updates on.
-
Make a decision to merge the PR vs waiting for FileFormat API
-
Spark 3.4 Deprecation
-
Spark: Remove Spark 3.4 support (#14122
<https://github.com/apache/iceberg/pull/14122>)
-
Notes:
-
Wait until comet integration is resolved.
-
Spark 4.1/4.2
-
Spark: Add support for 4.2.0-preview (#14984
<https://github.com/apache/iceberg/pull/14984>)
-
Spark 4.1: Initial support for MERGE INTO schema evolution (#14970
<https://github.com/apache/iceberg/pull/14970>)
-
Notes:
-
4.1 is the current latest version. New PRs must go to it
-
Spark 4.1 introduces a version framework. Anton is working on
integrating it with Iceberg. This greatly simplifies
Iceberg lifecycle
management but requires non-trivial integration work.
-
Prefer not to make any releases with 4.1 until this is in.
-
DSv2 and sort order reporting
-
Spark (4.0, 3.5): Set data file sort_order_id in manifest for
writes from Spark (#14683
<https://github.com/apache/iceberg/pull/14683>)
-
The rebase has many changes. Ask author to fix.
-
Spark 4.0: Implement SupportsReportOrdering DSv2 API (#14948
<https://github.com/apache/iceberg/pull/14948>)
-
Move to 4.1 for easier review
-
Compaction/Table maintenance/DR
-
Spark 4.0: RewriteTablePath support for multiple source and
destination prefixes (#14355
<https://github.com/apache/iceberg/pull/14355>)
-
Spark 4.0: Optional switch to log expire data files during
ExpireSnapshots action (#14354
<https://github.com/apache/iceberg/pull/14354>)
-
Notes:
-
Trace level logging
-
How about logging it to another Iceberg table?
-
Use the dataframe of files and log separately?
-
V3 spec implementation
-
Spark: Support writing shredded variant in Iceberg-Spark (#14297
<https://github.com/apache/iceberg/pull/14297>)
-
Notes:
-
Status of Variant type support - consolidate and track somewhere
-
Filter pushdown not implemented
-
The write support PR is new, will review. It should have
Iceberg metadata changes to indicate the variant shredding
so Spark can use
it.
-
#14297 <https://github.com/apache/iceberg/pull/14297> Will be
reviewed
-
Spark UDF Support
-
SQL UDF support Stage 1 (#14954
<https://github.com/apache/iceberg/pull/14954>) (The corresponding
Spark SPIP: SPIP: Catalog-backed Code-Literal Functions (SQL and
Python) with Catalog SPI and CRUD
<https://docs.google.com/document/d/186cTAZxoXp1p8vaSunIaJmVLXcPR-FxSiLiDUl8kK8A/edit?tab=t.0#heading=h.for1fb3tezo3>
)
-
Notes:
-
Waiting for the proposal vote and spark side SPIP related to
this.
-
Spark 4.0: Spark UDF POC (#14505
<https://github.com/apache/iceberg/pull/14505>)
-
Huaxin to delete this PR. This version is hacky.
-
DDL and schema evolution
-
CREATE TABLE LIKE support (#14269
<https://github.com/apache/iceberg/pull/14269>)
-
Notes:
-
Recommend not to add SQL extensions in Iceberg code anymore.
-
They are fragile and need maintenance and have to work well
with Spark
-
Alternatively, consider writing a procedure to do this until
Spark has native support.
-
Native Spark support for CREATE TABLE LIKE is not yet
implemented.
Spark PRs
-
Alpha family aggregate support - #52551
<https://github.com/apache/spark/pull/52551>
-
Notes:
-
Okay to have Spark only changes that can potentially help in
Iceberg use-cases
-
Elaborate on the use of this? How does this integrate with
Iceberg?
-
Codegen for MergeRowsExec - #52399
<https://github.com/apache/spark/pull/52399>
-
Notes:
-
This is a heavily used Exec node in Iceberg so this is good to
have.
-
The community will review this
Thanks,
~Anurag
On Thu, Jan 15, 2026 at 6:48 PM Anton Okolnychyi <[email protected]>
wrote:
> If anyone has long-standing PRs related to Spark, it may be a good forum
> to get some reviews and help from the community.
>
> ср, 14 січ. 2026 р. о 11:23 Anurag Mantripragada <
> [email protected]> пише:
>
>> Thanks Kevin,
>>
>> All, please review the doc
>> <https://docs.google.com/document/d/19nno1RoPznbbxKOZZddZNHHafa7XULjbN6RPExdr2n4/edit?tab=t.0>
>> and
>> add any agenda items I may have missed. See you on Tuesday.
>>
>> ~ Anurag
>>
>> On Wed, Jan 14, 2026 at 11:20 AM Kevin Liu <[email protected]> wrote:
>>
>>> Connected with Anurag on Slack. I just added a new event to the Iceberg
>>> Dev calendar for next week Tuesday Jan 20th from 10AM - 11AM PT, "*Iceberg
>>> - Spark Community Sync*". It's a monthly recurring meeting and the
>>> google meets link is set to open to the public.
>>> Happy to make changes based on feedback.
>>>
>>> Best,
>>> Kevin Liu
>>>
>>>
>>> On Wed, Jan 14, 2026 at 10:57 AM Kevin Liu <[email protected]>
>>> wrote:
>>>
>>>> Looking at the current Iceberg dev calendar schedule, we have a slot
>>>> next week Tuesday or Friday for a monthly recurring sync. Wednesday
>>>> corresponds with the main Community Sync in some weeks.
>>>> Please let me know the preferred day and time and I can help set it up!
>>>>
>>>> Best,
>>>> Kevin Liu
>>>>
>>>> On Tue, Jan 13, 2026 at 10:58 AM Anurag Mantripragada <
>>>> [email protected]> wrote:
>>>>
>>>>> Hi Kevin,
>>>>>
>>>>> I'm open to ideas, but I think we could start with monthly cadence for
>>>>> Spark syncs and increase the frequency if the community feels we need to
>>>>> meet more often. Could you please set up a time on the Iceberg dev
>>>>> calendar?
>>>>>
>>>>> Thanks,
>>>>> Anurag
>>>>>
>>>>> On Fri, Jan 9, 2026 at 10:16 AM Anurag Mantripragada <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Thanks Anton and Kevin,
>>>>>>
>>>>>> I wrote a doc with general themes from the Spark PRs and Issues I
>>>>>> browsed in the repo. Please feel free to add more if I may have missed
>>>>>> anything.
>>>>>>
>>>>>> https://docs.google.com/document/d/19nno1RoPznbbxKOZZddZNHHafa7XULjbN6RPExdr2n4/edit?tab=t.0
>>>>>>
>>>>>> Looking forward to meeting you all and talking about all things Spark!
>>>>>>
>>>>>> Thanks,
>>>>>> Anurag
>>>>>>
>>>>>> On Fri, Jan 9, 2026 at 10:03 AM Kevin Liu <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> +1 great idea!
>>>>>>> Let's start a doc with potential discussion items and find a time on
>>>>>>> the calendar. I have permission to add events to the "iceberg dev
>>>>>>> events"
>>>>>>> calendar. Happy to help with the logistics once the time and cadence is
>>>>>>> decided.
>>>>>>>
>>>>>>> Best,
>>>>>>> Kevin Liu
>>>>>>>
>>>>>>> On Wed, Jan 7, 2026 at 4:35 PM Anton Okolnychyi <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> YES! I have been meaning to suggest the same.
>>>>>>>>
>>>>>>>> Can you start a doc with the pool of items to which everyone can
>>>>>>>> contribute to?
>>>>>>>>
>>>>>>>> - Anton
>>>>>>>>
>>>>>>>> ср, 7 січ. 2026 р. о 15:30 Anurag Mantripragada <
>>>>>>>> [email protected]> пише:
>>>>>>>>
>>>>>>>>> Hi folks, happy new year!
>>>>>>>>>
>>>>>>>>> (Sorry if I sent this email more than once, my attempts of
>>>>>>>>> sending this from a different email failed)
>>>>>>>>>
>>>>>>>>> There are a few Spark changes the community is working on including
>>>>>>>>> - Sort order reporting [1], [2]
>>>>>>>>> - Spark 4.1 support [3]
>>>>>>>>> - Future of Datafusion-Comet support [4] [5]
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Community members interested in the Spark integration have been
>>>>>>>>> discussing it in smaller groups. However, we believe that the general
>>>>>>>>> community sync should include all updates, and discussing
>>>>>>>>> Spark-specific
>>>>>>>>> matters may not be the most effective use of that sync. I was
>>>>>>>>> wondering if
>>>>>>>>> it will be useful to create a Spark-Iceberg integration-specific
>>>>>>>>> sync on
>>>>>>>>> the calendar, similar to what we have for individual proposals. This
>>>>>>>>> sync
>>>>>>>>> will not replace the community sync, which will still be used for
>>>>>>>>> broader
>>>>>>>>> discussions including any new spark topics that come out of the spark
>>>>>>>>> sync.
>>>>>>>>>
>>>>>>>>> If there’s interest in doing these spark breakout syncs, I’m happy
>>>>>>>>> to volunteer to run them. Please let me know what you all think.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> ~ Anurag
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> [1] - https://github.com/apache/iceberg/pull/14683
>>>>>>>>> [2] - https://github.com/apache/iceberg/pull/14948
>>>>>>>>> [3] - https://github.com/apache/iceberg/pull/14970
>>>>>>>>> [4] - https://github.com/apache/datafusion-comet/issues/2921
>>>>>>>>> [5] -
>>>>>>>>> https://lists.apache.org/thread/vr9nsbd5nhg3d20nmtyj4b3zsw9229gd
>>>>>>>>>
>>>>>>>>