Re: [Discuss] Efficient column updates in Iceberg

Eduard Tudenhöfner Thu, 12 Feb 2026 03:32:39 -0800

Hey Anurag,

I wasn't able to make it to the sync but was hoping to watch the recording
afterwards.
I'm curious what the reasons were for discarding the Parquet-native
approach. Could you share a summary from what was discussed in the sync
please on that topic?


On Tue, Feb 10, 2026 at 8:20 PM Anurag Mantripragada <
[email protected]> wrote:

> Hi all,
>
> Thank you for attending today's sync. Please find the meeting notes below.
> I apologize that we were unable to record the session due to attendees not
> having record access.
>
> Key updates and discussion points:
>
> *Decisions:*
>
>    - Table Format vs. Parquet: There is a general consensus that column
>    update support should reside in the table format. Consequently, we have
>    discarded the Parquet-native approach.
>    - Metadata Representation: To maintain clean metadata and avoid
>    complex resolution logic for readers, the goal is to keep only one metadata
>    file per column. However, achieving this is challenging if we support
>    partial updates, as multiple column files may exist for the same column
>    (See open questions).
>    - Data Representation: Sparse column files are preferred for compact
>    representation and are better suited for partial column updates. We can
>    optimize sparse representation for vectorized reads by filling in null or
>    default values at read time for missing positions from the base file, which
>    avoids joins during reads.
>
>
> *Open Questions: *
>
>    - We are still determining what restrictions are necessary when
>    supporting partial updates. For instance, we need to decide whether to add
>    a new column and subsequently allow partial updates on it. This would
>    involve managing both a base column file and subsequent update files.
>    - We need a better understanding of the use cases for partial updates.
>    - We need to further discuss the handling of equality deletes.
>
> If I missed anything, or if others took notes, please share them here.
> Thanks!
>
> I will go ahead and update the doc with what we have discussed so we can
> continue next time from where we left off.
>
> ~ Anurag
>
> On Mon, Feb 9, 2026 at 11:55 AM Anurag Mantripragada <
> [email protected]> wrote:
>
>> Hi all,
>>
>> This design
>> <https://docs.google.com/document/d/1Bd7JVzgajA8-DozzeEE24mID_GLuz6iwj0g4TlcVJcs/edit?tab=t.0>
>> will be discussed tomorrow in a dedicated sync.
>>
>> Efficient column updates sync
>> Tuesday, February 10 · 9:00 – 10:00am
>> Time zone: America/Los_Angeles
>> Google Meet joining info
>> Video call link: https://meet.google.com/xsd-exug-tcd
>>
>> ~ Anurag
>>
>> On Fri, Feb 6, 2026 at 8:30 AM Anurag Mantripragada <
>> [email protected]> wrote:
>>
>>> Hi Gabor,
>>>
>>> Thanks for the detailed example.
>>>
>>> I agree with Steven that Option 2 seems reasonable. I will add a section
>>> to the design doc regarding equality delete handling, and we can discuss
>>> this further during our meeting on Tuesday.
>>>
>>> ~Anurag
>>>
>>> On Fri, Feb 6, 2026 at 7:08 AM Steven Wu <[email protected]> wrote:
>>>
>>>> > 1) When deleting with eq-deletes: If there is a column update on the
>>>> equality-filed ID we use for the delete, reject deletion
>>>> > 2) When adding a column update on a column that is part of the
>>>> equality field IDs in some delete, we reject the column update
>>>>
>>>> Gabor, this is a good scenario. The 2nd option makes sense to me, since
>>>> equality ids are like primary key fields. If we have the 2nd rule enforced,
>>>> the first option is not applicable anymore.
>>>>
>>>> On Fri, Feb 6, 2026 at 3:13 AM Gábor Kaszab <[email protected]>
>>>> wrote:
>>>>
>>>>> Hey,
>>>>>
>>>>> Thank you for the proposal, Anurag! I made a pass recently and I think
>>>>> there is some interference between column updates and equality deletes. 
>>>>> Let
>>>>> me describe below:
>>>>>
>>>>> Steps:
>>>>>
>>>>> CREATE TABLE tbl (int a, int b);
>>>>>
>>>>> INSERT INTO tbl VALUES (1, 11), (2, 22);  -- creates the base data file
>>>>>
>>>>> DELETE FROM tbl WHERE b=11;               -- creates an equality
>>>>> delete file
>>>>>
>>>>> UPDATE tbl SET b=11;                                   -- writes
>>>>> column update
>>>>>
>>>>>
>>>>>
>>>>> SELECT * FROM tbl;
>>>>>
>>>>> Expected result:
>>>>>
>>>>> (2, 11)
>>>>>
>>>>>
>>>>>
>>>>> Data and metadata created after the above steps:
>>>>>
>>>>> Base file
>>>>>
>>>>> (1, 11), (2, 22),
>>>>>
>>>>> seqnum=1
>>>>>
>>>>> EQ-delete
>>>>>
>>>>> b=11
>>>>>
>>>>> seqnum=2
>>>>>
>>>>> Column update
>>>>>
>>>>> Field ids: [field_id_for_col_b]
>>>>>
>>>>> seqnum=3
>>>>>
>>>>> Data file content: (dummy_value),(11)
>>>>>
>>>>>
>>>>>
>>>>> Read steps:
>>>>>
>>>>>    1. Stitch base file with column updates in reader:
>>>>>
>>>>> Rows: (1,dummy_value), (2,11) (Note, dummy value can be either null,
>>>>> or 11, see the proposal for more details)
>>>>>
>>>>> Seqnum for base file=1
>>>>>
>>>>> Seqnum for column update=3
>>>>>
>>>>>    2. Apply eq-delete b=11, seqnum=3 on the stitched result
>>>>>    3. Query result depends on which seqnum we carry forward to
>>>>>    compare with the eq-delete's seqnum, but it's not correct in any of the
>>>>>    cases
>>>>>       1. Use seqnum from base file: we get either an empty result if
>>>>>       'dummy_value' is 11 or we get (1, null) otherwise
>>>>>       2. Use seqnum from last update file: don't delete any rows,
>>>>>       result set is (1, dummy_value),(2,11)
>>>>>
>>>>>
>>>>>
>>>>> Problem:
>>>>>
>>>>> EQ-delete should be applied midway applying the column updates to the
>>>>> base file based on sequence number, during the stitching process. If I'm
>>>>> not mistaken, this is not feasible with the way readers work.
>>>>>
>>>>>
>>>>> Proposal:
>>>>>
>>>>> Don't allow equality deletes together with column updates.
>>>>>
>>>>>   1) When deleting with eq-deletes: If there is a column update on the
>>>>> equality-filed ID we use for the delete, reject deletion
>>>>>
>>>>>   2) When adding a column update on a column that is part of the
>>>>> equality field IDs in some delete, we reject the column update
>>>>>
>>>>> Alternatively, column updates could be controlled by a property of the
>>>>> table (immutable), and reject eq-deletes if the property indicates column
>>>>> updates are turned on for the table
>>>>>
>>>>>
>>>>> Let me know what you think!
>>>>>
>>>>> Best Regards,
>>>>>
>>>>> Gabor
>>>>>
>>>>> Anurag Mantripragada <[email protected]> ezt írta (időpont:
>>>>> 2026. jan. 28., Sze, 3:31):
>>>>>
>>>>>> Thank you everyone for the initial review comments. It is exciting to
>>>>>> see so much interest in this proposal.
>>>>>>
>>>>>> I am currently reviewing and responding to each comment. The general
>>>>>> themes of the feedback so far include:
>>>>>> - Including partial updates (column updates on a subset of rows in a
>>>>>> table).
>>>>>> - Adding details on how SQL engines will write the update files.
>>>>>> - Adding details on split planning and row alignment for update files.
>>>>>>
>>>>>> I will think through these points and update the design accordingly.
>>>>>>
>>>>>> Best
>>>>>> Anurag
>>>>>>
>>>>>> On Tue, Jan 27, 2026 at 6:25 PM Anurag Mantripragada <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Hi Xiangin,
>>>>>>>
>>>>>>> Happy to learn from your experience in supporting
>>>>>>> backfill use-cases. Please feel free to review the proposal and add your
>>>>>>> comments. I will wait for a couple of days more to ensure everyone has a
>>>>>>> chance to review the proposal.
>>>>>>>
>>>>>>> ~ Anurag
>>>>>>>
>>>>>>> On Tue, Jan 27, 2026 at 6:42 AM Xianjin Ye <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Anurag and Peter,
>>>>>>>>
>>>>>>>> It’s great to see the partial column update has gained great
>>>>>>>> interest in the community. I internally built a BackfillColumns action 
>>>>>>>> to
>>>>>>>> efficiently backfill columns(by writing the partial columns only and 
>>>>>>>> copies
>>>>>>>> the binary data of other columns into a new DataFile). The speedup 
>>>>>>>> could be
>>>>>>>> 10x for wide tables but the write amplification is still there. I 
>>>>>>>> would be
>>>>>>>> happy to collaborate on the work and eliminate the write amplification.
>>>>>>>>
>>>>>>>> On 2026/01/27 10:12:54 Péter Váry wrote:
>>>>>>>> > Hi Anurag,
>>>>>>>> >
>>>>>>>> > It’s great to see how much interest there is in the community
>>>>>>>> around this
>>>>>>>> > potential new feature. Gábor and I have actually submitted an
>>>>>>>> Iceberg
>>>>>>>> > Summit talk proposal on this topic, and we would be very happy to
>>>>>>>> > collaborate on the work. I was mainly waiting for the File Format
>>>>>>>> API to be
>>>>>>>> > finalized, as I believe this feature should build on top of it.
>>>>>>>> >
>>>>>>>> > For reference, our related work includes:
>>>>>>>> >
>>>>>>>> >    - *Dev list thread:*
>>>>>>>> >
>>>>>>>> https://lists.apache.org/thread/h0941sdq9jwrb6sj0pjfjjxov8tx7ov9
>>>>>>>> >    - *Proposal document:*
>>>>>>>> >
>>>>>>>> https://docs.google.com/document/d/1OHuZ6RyzZvCOQ6UQoV84GzwVp3UPiu_cfXClsOi03ww
>>>>>>>> >    (not shared widely yet)
>>>>>>>> >    - *Performance testing PR for readers and writers:*
>>>>>>>> >    https://github.com/apache/iceberg/pull/13306
>>>>>>>> >
>>>>>>>> > During earlier discussions about possible metadata changes,
>>>>>>>> another option
>>>>>>>> > came up that hasn’t been documented yet: separating planner
>>>>>>>> metadata from
>>>>>>>> > reader metadata. Since the planner does not need to know about
>>>>>>>> the actual
>>>>>>>> > files, we could store the file composition in a separate file
>>>>>>>> (potentially
>>>>>>>> > a Puffin file). This file could hold the column_files metadata,
>>>>>>>> while the
>>>>>>>> > manifest would reference the Puffin file and blob position
>>>>>>>> instead of the
>>>>>>>> > data filename.
>>>>>>>> > This approach has the advantage of keeping the existing metadata
>>>>>>>> largely
>>>>>>>> > intact, and it could also give us a natural place later to add
>>>>>>>> file-level
>>>>>>>> > indexes or Bloom filters for use during reads or secondary
>>>>>>>> filtering. The
>>>>>>>> > downsides are the additional files and the increased complexity of
>>>>>>>> > identifying files that are no longer referenced by the table, so
>>>>>>>> this may
>>>>>>>> > not be an ideal solution.
>>>>>>>> >
>>>>>>>> > I do have some concerns about the MoR metadata proposal described
>>>>>>>> in the
>>>>>>>> > document. At first glance, it seems to complicate distributed
>>>>>>>> planning, as
>>>>>>>> > all entries for a given file would need to be collected and
>>>>>>>> merged to
>>>>>>>> > provide the information required by both the planner and the
>>>>>>>> reader.
>>>>>>>> > Additionally, when a new column is added or updated, we would
>>>>>>>> still need to
>>>>>>>> > add a new metadata entry for every existing data file. If we
>>>>>>>> immediately
>>>>>>>> > write out the merged metadata, the total number of entries
>>>>>>>> remains the
>>>>>>>> > same. The main benefit is avoiding rewriting statistics, which
>>>>>>>> can be
>>>>>>>> > significant, but this comes at the cost of increased planning
>>>>>>>> complexity.
>>>>>>>> > If we choose to store the merged statistics in the
>>>>>>>> column_families entry, I
>>>>>>>> > don’t see much benefit in excluding the rest of the metadata,
>>>>>>>> especially
>>>>>>>> > since including it would simplify the planning process.
>>>>>>>> >
>>>>>>>> > As Anton already pointed out, we should also discuss how this
>>>>>>>> change would
>>>>>>>> > affect split handling, particularly how to avoid double reads
>>>>>>>> when row
>>>>>>>> > groups are not aligned between the original data files and the
>>>>>>>> new column
>>>>>>>> > files.
>>>>>>>> >
>>>>>>>> > Finally, I’d like to see some discussion around the Java API
>>>>>>>> implications.
>>>>>>>> > In particular, what API changes are required, and how SQL engines
>>>>>>>> would
>>>>>>>> > perform updates. Since the new column files must have the same
>>>>>>>> number of
>>>>>>>> > rows as the original data files, with a strict one-to-one
>>>>>>>> relationship, SQL
>>>>>>>> > engines would need access to the source filename, position, and
>>>>>>>> deletion
>>>>>>>> > status in the DataFrame in order to generate the new files. This
>>>>>>>> is more
>>>>>>>> > involved than a simple update and deserves some explicit
>>>>>>>> consideration.
>>>>>>>> >
>>>>>>>> > Looking forward to your thoughts.
>>>>>>>> > Best regards,
>>>>>>>> > Peter
>>>>>>>> >
>>>>>>>> > On Tue, Jan 27, 2026, 03:58 Anurag Mantripragada <
>>>>>>>> [email protected]>
>>>>>>>> > wrote:
>>>>>>>> >
>>>>>>>> > > Thanks Anton and others, for providing some initial feedback. I
>>>>>>>> will
>>>>>>>> > > address all your comments soon.
>>>>>>>> > >
>>>>>>>> > > On Mon, Jan 26, 2026 at 11:10 AM Anton Okolnychyi <
>>>>>>>> [email protected]>
>>>>>>>> > > wrote:
>>>>>>>> > >
>>>>>>>> > >> I had a chance to see the proposal before it landed and I
>>>>>>>> think it is a
>>>>>>>> > >> cool idea and both presented approaches would likely work. I
>>>>>>>> am looking
>>>>>>>> > >> forward to discussing the tradeoffs and would encourage
>>>>>>>> everyone to
>>>>>>>> > >> push/polish each approach to see what issues can be mitigated
>>>>>>>> and what are
>>>>>>>> > >> fundamental.
>>>>>>>> > >>
>>>>>>>> > >> [1] Iceberg-native approach: better visibility into column
>>>>>>>> files from the
>>>>>>>> > >> metadata, potentially better concurrency for non-overlapping
>>>>>>>> column
>>>>>>>> > >> updates, no dep on Parquet.
>>>>>>>> > >> [2] Parquet-native approach: almost no changes to the table
>>>>>>>> format
>>>>>>>> > >> metadata beyond tracking of base files.
>>>>>>>> > >>
>>>>>>>> > >> I think [1] sounds a bit better on paper but I am worried
>>>>>>>> about the
>>>>>>>> > >> complexity in writers and readers (especially around keeping
>>>>>>>> row groups
>>>>>>>> > >> aligned and split planning). It would be great to cover this
>>>>>>>> in detail in
>>>>>>>> > >> the proposal.
>>>>>>>> > >>
>>>>>>>> > >> пн, 26 січ. 2026 р. о 09:00 Anurag Mantripragada <
>>>>>>>> > >> [email protected]> пише:
>>>>>>>> > >>
>>>>>>>> > >>> Hi all,
>>>>>>>> > >>>
>>>>>>>> > >>> "Wide tables" with thousands of columns present significant
>>>>>>>> challenges
>>>>>>>> > >>> for AI/ML workloads, particularly when only a subset of
>>>>>>>> columns needs to be
>>>>>>>> > >>> added or updated. Current Copy-on-Write (COW) and
>>>>>>>> Merge-on-Read (MOR)
>>>>>>>> > >>> operations in Iceberg apply at the row level, which leads to
>>>>>>>> substantial
>>>>>>>> > >>> write amplification in scenarios such as:
>>>>>>>> > >>>
>>>>>>>> > >>>    - Feature Backfilling & Column Updates: Adding new feature
>>>>>>>> columns
>>>>>>>> > >>>    (e.g., model embeddings) to petabyte-scale tables.
>>>>>>>> > >>>    - Model Score Updates: Refresh prediction scores after
>>>>>>>> retraining.
>>>>>>>> > >>>    - Embedding Refresh: Updating vector embeddings, which
>>>>>>>> currently
>>>>>>>> > >>>    triggers a rewrite of the entire row.
>>>>>>>> > >>>    - Incremental Feature Computation: Daily updates to a
>>>>>>>> small fraction
>>>>>>>> > >>>    of features in wide tables.
>>>>>>>> > >>>
>>>>>>>> > >>> With the Iceberg V4 proposal introducing single-file commits
>>>>>>>> and column
>>>>>>>> > >>> stats improvements, this is an ideal time to address
>>>>>>>> column-level updates
>>>>>>>> > >>> to better support these use cases.
>>>>>>>> > >>>
>>>>>>>> > >>> I have drafted a proposal that explores both table-format
>>>>>>>> enhancements
>>>>>>>> > >>> and file-format (Parquet) changes to enable more efficient
>>>>>>>> updates.
>>>>>>>> > >>>
>>>>>>>> > >>> Proposal Details:
>>>>>>>> > >>> - GitHub Issue: #15146 <
>>>>>>>> https://github.com/apache/iceberg/issues/15146>
>>>>>>>> > >>> - Design Document: Efficient Column Updates in Iceberg
>>>>>>>> > >>> <
>>>>>>>> https://docs.google.com/document/d/1Bd7JVzgajA8-DozzeEE24mID_GLuz6iwj0g4TlcVJcs/edit?tab=t.0
>>>>>>>> >
>>>>>>>> > >>>
>>>>>>>> > >>> Next Steps:
>>>>>>>> > >>> I plan to create POCs to benchmark the approaches described
>>>>>>>> in the
>>>>>>>> > >>> document.
>>>>>>>> > >>>
>>>>>>>> > >>> Please review the proposal and share your feedback.
>>>>>>>> > >>>
>>>>>>>> > >>> Thanks,
>>>>>>>> > >>> Anurag
>>>>>>>> > >>>
>>>>>>>> > >>
>>>>>>>> >
>>>>>>>>
>>>>>>>

Re: [Discuss] Efficient column updates in Iceberg

Reply via email to