Re: [Discuss] Efficient column updates in Iceberg

Anurag Mantripragada Fri, 06 Feb 2026 08:32:27 -0800

Hi Gabor,

Thanks for the detailed example.


I agree with Steven that Option 2 seems reasonable. I will add a section to
the design doc regarding equality delete handling, and we can discuss this
further during our meeting on Tuesday.

~Anurag

On Fri, Feb 6, 2026 at 7:08 AM Steven Wu <[email protected]> wrote:

> > 1) When deleting with eq-deletes: If there is a column update on the
> equality-filed ID we use for the delete, reject deletion
> > 2) When adding a column update on a column that is part of the equality
> field IDs in some delete, we reject the column update
>
> Gabor, this is a good scenario. The 2nd option makes sense to me, since
> equality ids are like primary key fields. If we have the 2nd rule enforced,
> the first option is not applicable anymore.
>
> On Fri, Feb 6, 2026 at 3:13 AM Gábor Kaszab <[email protected]>
> wrote:
>
>> Hey,
>>
>> Thank you for the proposal, Anurag! I made a pass recently and I think
>> there is some interference between column updates and equality deletes. Let
>> me describe below:
>>
>> Steps:
>>
>> CREATE TABLE tbl (int a, int b);
>>
>> INSERT INTO tbl VALUES (1, 11), (2, 22);  -- creates the base data file
>>
>> DELETE FROM tbl WHERE b=11;               -- creates an equality delete
>> file
>>
>> UPDATE tbl SET b=11;                                   -- writes column
>> update
>>
>>
>>
>> SELECT * FROM tbl;
>>
>> Expected result:
>>
>> (2, 11)
>>
>>
>>
>> Data and metadata created after the above steps:
>>
>> Base file
>>
>> (1, 11), (2, 22),
>>
>> seqnum=1
>>
>> EQ-delete
>>
>> b=11
>>
>> seqnum=2
>>
>> Column update
>>
>> Field ids: [field_id_for_col_b]
>>
>> seqnum=3
>>
>> Data file content: (dummy_value),(11)
>>
>>
>>
>> Read steps:
>>
>>    1. Stitch base file with column updates in reader:
>>
>> Rows: (1,dummy_value), (2,11) (Note, dummy value can be either null, or
>> 11, see the proposal for more details)
>>
>> Seqnum for base file=1
>>
>> Seqnum for column update=3
>>
>>    2. Apply eq-delete b=11, seqnum=3 on the stitched result
>>    3. Query result depends on which seqnum we carry forward to compare
>>    with the eq-delete's seqnum, but it's not correct in any of the cases
>>       1. Use seqnum from base file: we get either an empty result if
>>       'dummy_value' is 11 or we get (1, null) otherwise
>>       2. Use seqnum from last update file: don't delete any rows, result
>>       set is (1, dummy_value),(2,11)
>>
>>
>>
>> Problem:
>>
>> EQ-delete should be applied midway applying the column updates to the
>> base file based on sequence number, during the stitching process. If I'm
>> not mistaken, this is not feasible with the way readers work.
>>
>>
>> Proposal:
>>
>> Don't allow equality deletes together with column updates.
>>
>>   1) When deleting with eq-deletes: If there is a column update on the
>> equality-filed ID we use for the delete, reject deletion
>>
>>   2) When adding a column update on a column that is part of the equality
>> field IDs in some delete, we reject the column update
>>
>> Alternatively, column updates could be controlled by a property of the
>> table (immutable), and reject eq-deletes if the property indicates column
>> updates are turned on for the table
>>
>>
>> Let me know what you think!
>>
>> Best Regards,
>>
>> Gabor
>>
>> Anurag Mantripragada <[email protected]> ezt írta (időpont: 2026.
>> jan. 28., Sze, 3:31):
>>
>>> Thank you everyone for the initial review comments. It is exciting to
>>> see so much interest in this proposal.
>>>
>>> I am currently reviewing and responding to each comment. The general
>>> themes of the feedback so far include:
>>> - Including partial updates (column updates on a subset of rows in a
>>> table).
>>> - Adding details on how SQL engines will write the update files.
>>> - Adding details on split planning and row alignment for update files.
>>>
>>> I will think through these points and update the design accordingly.
>>>
>>> Best
>>> Anurag
>>>
>>> On Tue, Jan 27, 2026 at 6:25 PM Anurag Mantripragada <
>>> [email protected]> wrote:
>>>
>>>> Hi Xiangin,
>>>>
>>>> Happy to learn from your experience in supporting backfill use-cases.
>>>> Please feel free to review the proposal and add your comments. I will wait
>>>> for a couple of days more to ensure everyone has a chance to review the
>>>> proposal.
>>>>
>>>> ~ Anurag
>>>>
>>>> On Tue, Jan 27, 2026 at 6:42 AM Xianjin Ye <[email protected]> wrote:
>>>>
>>>>> Hi Anurag and Peter,
>>>>>
>>>>> It’s great to see the partial column update has gained great interest
>>>>> in the community. I internally built a BackfillColumns action to
>>>>> efficiently backfill columns(by writing the partial columns only and 
>>>>> copies
>>>>> the binary data of other columns into a new DataFile). The speedup could 
>>>>> be
>>>>> 10x for wide tables but the write amplification is still there. I would be
>>>>> happy to collaborate on the work and eliminate the write amplification.
>>>>>
>>>>> On 2026/01/27 10:12:54 Péter Váry wrote:
>>>>> > Hi Anurag,
>>>>> >
>>>>> > It’s great to see how much interest there is in the community around
>>>>> this
>>>>> > potential new feature. Gábor and I have actually submitted an Iceberg
>>>>> > Summit talk proposal on this topic, and we would be very happy to
>>>>> > collaborate on the work. I was mainly waiting for the File Format
>>>>> API to be
>>>>> > finalized, as I believe this feature should build on top of it.
>>>>> >
>>>>> > For reference, our related work includes:
>>>>> >
>>>>> >    - *Dev list thread:*
>>>>> >    https://lists.apache.org/thread/h0941sdq9jwrb6sj0pjfjjxov8tx7ov9
>>>>> >    - *Proposal document:*
>>>>> >
>>>>> https://docs.google.com/document/d/1OHuZ6RyzZvCOQ6UQoV84GzwVp3UPiu_cfXClsOi03ww
>>>>> >    (not shared widely yet)
>>>>> >    - *Performance testing PR for readers and writers:*
>>>>> >    https://github.com/apache/iceberg/pull/13306
>>>>> >
>>>>> > During earlier discussions about possible metadata changes, another
>>>>> option
>>>>> > came up that hasn’t been documented yet: separating planner metadata
>>>>> from
>>>>> > reader metadata. Since the planner does not need to know about the
>>>>> actual
>>>>> > files, we could store the file composition in a separate file
>>>>> (potentially
>>>>> > a Puffin file). This file could hold the column_files metadata,
>>>>> while the
>>>>> > manifest would reference the Puffin file and blob position instead
>>>>> of the
>>>>> > data filename.
>>>>> > This approach has the advantage of keeping the existing metadata
>>>>> largely
>>>>> > intact, and it could also give us a natural place later to add
>>>>> file-level
>>>>> > indexes or Bloom filters for use during reads or secondary
>>>>> filtering. The
>>>>> > downsides are the additional files and the increased complexity of
>>>>> > identifying files that are no longer referenced by the table, so
>>>>> this may
>>>>> > not be an ideal solution.
>>>>> >
>>>>> > I do have some concerns about the MoR metadata proposal described in
>>>>> the
>>>>> > document. At first glance, it seems to complicate distributed
>>>>> planning, as
>>>>> > all entries for a given file would need to be collected and merged to
>>>>> > provide the information required by both the planner and the reader.
>>>>> > Additionally, when a new column is added or updated, we would still
>>>>> need to
>>>>> > add a new metadata entry for every existing data file. If we
>>>>> immediately
>>>>> > write out the merged metadata, the total number of entries remains
>>>>> the
>>>>> > same. The main benefit is avoiding rewriting statistics, which can be
>>>>> > significant, but this comes at the cost of increased planning
>>>>> complexity.
>>>>> > If we choose to store the merged statistics in the column_families
>>>>> entry, I
>>>>> > don’t see much benefit in excluding the rest of the metadata,
>>>>> especially
>>>>> > since including it would simplify the planning process.
>>>>> >
>>>>> > As Anton already pointed out, we should also discuss how this change
>>>>> would
>>>>> > affect split handling, particularly how to avoid double reads when
>>>>> row
>>>>> > groups are not aligned between the original data files and the new
>>>>> column
>>>>> > files.
>>>>> >
>>>>> > Finally, I’d like to see some discussion around the Java API
>>>>> implications.
>>>>> > In particular, what API changes are required, and how SQL engines
>>>>> would
>>>>> > perform updates. Since the new column files must have the same
>>>>> number of
>>>>> > rows as the original data files, with a strict one-to-one
>>>>> relationship, SQL
>>>>> > engines would need access to the source filename, position, and
>>>>> deletion
>>>>> > status in the DataFrame in order to generate the new files. This is
>>>>> more
>>>>> > involved than a simple update and deserves some explicit
>>>>> consideration.
>>>>> >
>>>>> > Looking forward to your thoughts.
>>>>> > Best regards,
>>>>> > Peter
>>>>> >
>>>>> > On Tue, Jan 27, 2026, 03:58 Anurag Mantripragada <
>>>>> [email protected]>
>>>>> > wrote:
>>>>> >
>>>>> > > Thanks Anton and others, for providing some initial feedback. I
>>>>> will
>>>>> > > address all your comments soon.
>>>>> > >
>>>>> > > On Mon, Jan 26, 2026 at 11:10 AM Anton Okolnychyi <
>>>>> [email protected]>
>>>>> > > wrote:
>>>>> > >
>>>>> > >> I had a chance to see the proposal before it landed and I think
>>>>> it is a
>>>>> > >> cool idea and both presented approaches would likely work. I am
>>>>> looking
>>>>> > >> forward to discussing the tradeoffs and would encourage everyone
>>>>> to
>>>>> > >> push/polish each approach to see what issues can be mitigated and
>>>>> what are
>>>>> > >> fundamental.
>>>>> > >>
>>>>> > >> [1] Iceberg-native approach: better visibility into column files
>>>>> from the
>>>>> > >> metadata, potentially better concurrency for non-overlapping
>>>>> column
>>>>> > >> updates, no dep on Parquet.
>>>>> > >> [2] Parquet-native approach: almost no changes to the table format
>>>>> > >> metadata beyond tracking of base files.
>>>>> > >>
>>>>> > >> I think [1] sounds a bit better on paper but I am worried about
>>>>> the
>>>>> > >> complexity in writers and readers (especially around keeping row
>>>>> groups
>>>>> > >> aligned and split planning). It would be great to cover this in
>>>>> detail in
>>>>> > >> the proposal.
>>>>> > >>
>>>>> > >> пн, 26 січ. 2026 р. о 09:00 Anurag Mantripragada <
>>>>> > >> [email protected]> пише:
>>>>> > >>
>>>>> > >>> Hi all,
>>>>> > >>>
>>>>> > >>> "Wide tables" with thousands of columns present significant
>>>>> challenges
>>>>> > >>> for AI/ML workloads, particularly when only a subset of columns
>>>>> needs to be
>>>>> > >>> added or updated. Current Copy-on-Write (COW) and Merge-on-Read
>>>>> (MOR)
>>>>> > >>> operations in Iceberg apply at the row level, which leads to
>>>>> substantial
>>>>> > >>> write amplification in scenarios such as:
>>>>> > >>>
>>>>> > >>>    - Feature Backfilling & Column Updates: Adding new feature
>>>>> columns
>>>>> > >>>    (e.g., model embeddings) to petabyte-scale tables.
>>>>> > >>>    - Model Score Updates: Refresh prediction scores after
>>>>> retraining.
>>>>> > >>>    - Embedding Refresh: Updating vector embeddings, which
>>>>> currently
>>>>> > >>>    triggers a rewrite of the entire row.
>>>>> > >>>    - Incremental Feature Computation: Daily updates to a small
>>>>> fraction
>>>>> > >>>    of features in wide tables.
>>>>> > >>>
>>>>> > >>> With the Iceberg V4 proposal introducing single-file commits and
>>>>> column
>>>>> > >>> stats improvements, this is an ideal time to address
>>>>> column-level updates
>>>>> > >>> to better support these use cases.
>>>>> > >>>
>>>>> > >>> I have drafted a proposal that explores both table-format
>>>>> enhancements
>>>>> > >>> and file-format (Parquet) changes to enable more efficient
>>>>> updates.
>>>>> > >>>
>>>>> > >>> Proposal Details:
>>>>> > >>> - GitHub Issue: #15146 <
>>>>> https://github.com/apache/iceberg/issues/15146>
>>>>> > >>> - Design Document: Efficient Column Updates in Iceberg
>>>>> > >>> <
>>>>> https://docs.google.com/document/d/1Bd7JVzgajA8-DozzeEE24mID_GLuz6iwj0g4TlcVJcs/edit?tab=t.0
>>>>> >
>>>>> > >>>
>>>>> > >>> Next Steps:
>>>>> > >>> I plan to create POCs to benchmark the approaches described in
>>>>> the
>>>>> > >>> document.
>>>>> > >>>
>>>>> > >>> Please review the proposal and share your feedback.
>>>>> > >>>
>>>>> > >>> Thanks,
>>>>> > >>> Anurag
>>>>> > >>>
>>>>> > >>
>>>>> >
>>>>>
>>>>

Re: [Discuss] Efficient column updates in Iceberg

Reply via email to