Re: [Discuss] Efficient column updates in Iceberg

Anurag Mantripragada Fri, 13 Feb 2026 11:14:08 -0800

Hi Eduard.

I'm curious what the reasons were for discarding the Parquet-native
> approach. Could you share a summary from what was discussed in the sync
> please on that topic?



My apologies, I should have been more elaborate in the meeting notes. While
we didn't discuss the Parquet approach extensively during the sync, the
consensus to focus on the table format approach formed from feedback on the
doc and agreed on in the sync. The decision to discard the Parquet-native
approach came down to these main arguments:

   - The major advantage of handling column updates in the Iceberg layer is
   that our manifests will always have a complete, self-contained view of all
   files, including the merged column stats from both base and update files.
   This is critical for efficient file pruning. In the Parquet approach,
   Iceberg would only store a reference to the latest logical file, requiring
   a more complex planning phase to discover the full set of stats needed for
   pruning.
   - The Parquet approach makes it difficult to track file lineage. Because
   the relationship between a base file and its update file is hidden inside a
   Parquet footer, it becomes very tricky to determine which physical files
   belong to the table. This complicates operations like removing orphan
   files, especially with stacked updates to the same column, and would likely
   require complex naming conventions to manage.
   - The Parquet-native approach would require changes to the Parquet
   format and its readers. This would involve collaborating with the Parquet
   community to align on a common goal and tying this feature to their release
   cycle.
   - It would be unfortunate to have such a useful feature tied only to
   Parquet. By building the column update logic into the table format itself,
   we create a design that can extend to other formats like ORC in the future.

Hope this helps.

~ Anurag

On Thu, Feb 12, 2026 at 3:32 AM Eduard Tudenhöfner <[email protected]>
wrote:

> Hey Anurag,
>
> I wasn't able to make it to the sync but was hoping to watch the recording
> afterwards.
> I'm curious what the reasons were for discarding the Parquet-native
> approach. Could you share a summary from what was discussed in the sync
> please on that topic?
>
> On Tue, Feb 10, 2026 at 8:20 PM Anurag Mantripragada <
> [email protected]> wrote:
>
>> Hi all,
>>
>> Thank you for attending today's sync. Please find the meeting notes
>> below. I apologize that we were unable to record the session due to
>> attendees not having record access.
>>
>> Key updates and discussion points:
>>
>> *Decisions:*
>>
>>    - Table Format vs. Parquet: There is a general consensus that column
>>    update support should reside in the table format. Consequently, we have
>>    discarded the Parquet-native approach.
>>    - Metadata Representation: To maintain clean metadata and avoid
>>    complex resolution logic for readers, the goal is to keep only one 
>> metadata
>>    file per column. However, achieving this is challenging if we support
>>    partial updates, as multiple column files may exist for the same column
>>    (See open questions).
>>    - Data Representation: Sparse column files are preferred for compact
>>    representation and are better suited for partial column updates. We can
>>    optimize sparse representation for vectorized reads by filling in null or
>>    default values at read time for missing positions from the base file, 
>> which
>>    avoids joins during reads.
>>
>>
>> *Open Questions: *
>>
>>    - We are still determining what restrictions are necessary when
>>    supporting partial updates. For instance, we need to decide whether to add
>>    a new column and subsequently allow partial updates on it. This would
>>    involve managing both a base column file and subsequent update files.
>>    - We need a better understanding of the use cases for partial updates.
>>    - We need to further discuss the handling of equality deletes.
>>
>> If I missed anything, or if others took notes, please share them here.
>> Thanks!
>>
>> I will go ahead and update the doc with what we have discussed so we can
>> continue next time from where we left off.
>>
>> ~ Anurag
>>
>> On Mon, Feb 9, 2026 at 11:55 AM Anurag Mantripragada <
>> [email protected]> wrote:
>>
>>> Hi all,
>>>
>>> This design
>>> <https://docs.google.com/document/d/1Bd7JVzgajA8-DozzeEE24mID_GLuz6iwj0g4TlcVJcs/edit?tab=t.0>
>>> will be discussed tomorrow in a dedicated sync.
>>>
>>> Efficient column updates sync
>>> Tuesday, February 10 · 9:00 – 10:00am
>>> Time zone: America/Los_Angeles
>>> Google Meet joining info
>>> Video call link: https://meet.google.com/xsd-exug-tcd
>>>
>>> ~ Anurag
>>>
>>> On Fri, Feb 6, 2026 at 8:30 AM Anurag Mantripragada <
>>> [email protected]> wrote:
>>>
>>>> Hi Gabor,
>>>>
>>>> Thanks for the detailed example.
>>>>
>>>> I agree with Steven that Option 2 seems reasonable. I will add a
>>>> section to the design doc regarding equality delete handling, and we can
>>>> discuss this further during our meeting on Tuesday.
>>>>
>>>> ~Anurag
>>>>
>>>> On Fri, Feb 6, 2026 at 7:08 AM Steven Wu <[email protected]> wrote:
>>>>
>>>>> > 1) When deleting with eq-deletes: If there is a column update on
>>>>> the equality-filed ID we use for the delete, reject deletion
>>>>> > 2) When adding a column update on a column that is part of the
>>>>> equality field IDs in some delete, we reject the column update
>>>>>
>>>>> Gabor, this is a good scenario. The 2nd option makes sense to me,
>>>>> since equality ids are like primary key fields. If we have the 2nd rule
>>>>> enforced, the first option is not applicable anymore.
>>>>>
>>>>> On Fri, Feb 6, 2026 at 3:13 AM Gábor Kaszab <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hey,
>>>>>>
>>>>>> Thank you for the proposal, Anurag! I made a pass recently and I
>>>>>> think there is some interference between column updates and equality
>>>>>> deletes. Let me describe below:
>>>>>>
>>>>>> Steps:
>>>>>>
>>>>>> CREATE TABLE tbl (int a, int b);
>>>>>>
>>>>>> INSERT INTO tbl VALUES (1, 11), (2, 22);  -- creates the base data
>>>>>> file
>>>>>>
>>>>>> DELETE FROM tbl WHERE b=11;               -- creates an equality
>>>>>> delete file
>>>>>>
>>>>>> UPDATE tbl SET b=11;                                   -- writes
>>>>>> column update
>>>>>>
>>>>>>
>>>>>>
>>>>>> SELECT * FROM tbl;
>>>>>>
>>>>>> Expected result:
>>>>>>
>>>>>> (2, 11)
>>>>>>
>>>>>>
>>>>>>
>>>>>> Data and metadata created after the above steps:
>>>>>>
>>>>>> Base file
>>>>>>
>>>>>> (1, 11), (2, 22),
>>>>>>
>>>>>> seqnum=1
>>>>>>
>>>>>> EQ-delete
>>>>>>
>>>>>> b=11
>>>>>>
>>>>>> seqnum=2
>>>>>>
>>>>>> Column update
>>>>>>
>>>>>> Field ids: [field_id_for_col_b]
>>>>>>
>>>>>> seqnum=3
>>>>>>
>>>>>> Data file content: (dummy_value),(11)
>>>>>>
>>>>>>
>>>>>>
>>>>>> Read steps:
>>>>>>
>>>>>>    1. Stitch base file with column updates in reader:
>>>>>>
>>>>>> Rows: (1,dummy_value), (2,11) (Note, dummy value can be either null,
>>>>>> or 11, see the proposal for more details)
>>>>>>
>>>>>> Seqnum for base file=1
>>>>>>
>>>>>> Seqnum for column update=3
>>>>>>
>>>>>>    2. Apply eq-delete b=11, seqnum=3 on the stitched result
>>>>>>    3. Query result depends on which seqnum we carry forward to
>>>>>>    compare with the eq-delete's seqnum, but it's not correct in any of 
>>>>>> the
>>>>>>    cases
>>>>>>       1. Use seqnum from base file: we get either an empty result if
>>>>>>       'dummy_value' is 11 or we get (1, null) otherwise
>>>>>>       2. Use seqnum from last update file: don't delete any rows,
>>>>>>       result set is (1, dummy_value),(2,11)
>>>>>>
>>>>>>
>>>>>>
>>>>>> Problem:
>>>>>>
>>>>>> EQ-delete should be applied midway applying the column updates to the
>>>>>> base file based on sequence number, during the stitching process. If I'm
>>>>>> not mistaken, this is not feasible with the way readers work.
>>>>>>
>>>>>>
>>>>>> Proposal:
>>>>>>
>>>>>> Don't allow equality deletes together with column updates.
>>>>>>
>>>>>>   1) When deleting with eq-deletes: If there is a column update on
>>>>>> the equality-filed ID we use for the delete, reject deletion
>>>>>>
>>>>>>   2) When adding a column update on a column that is part of the
>>>>>> equality field IDs in some delete, we reject the column update
>>>>>>
>>>>>> Alternatively, column updates could be controlled by a property of
>>>>>> the table (immutable), and reject eq-deletes if the property indicates
>>>>>> column updates are turned on for the table
>>>>>>
>>>>>>
>>>>>> Let me know what you think!
>>>>>>
>>>>>> Best Regards,
>>>>>>
>>>>>> Gabor
>>>>>>
>>>>>> Anurag Mantripragada <[email protected]> ezt írta (időpont:
>>>>>> 2026. jan. 28., Sze, 3:31):
>>>>>>
>>>>>>> Thank you everyone for the initial review comments. It is exciting
>>>>>>> to see so much interest in this proposal.
>>>>>>>
>>>>>>> I am currently reviewing and responding to each comment. The general
>>>>>>> themes of the feedback so far include:
>>>>>>> - Including partial updates (column updates on a subset of rows in a
>>>>>>> table).
>>>>>>> - Adding details on how SQL engines will write the update files.
>>>>>>> - Adding details on split planning and row alignment for update
>>>>>>> files.
>>>>>>>
>>>>>>> I will think through these points and update the design accordingly.
>>>>>>>
>>>>>>> Best
>>>>>>> Anurag
>>>>>>>
>>>>>>> On Tue, Jan 27, 2026 at 6:25 PM Anurag Mantripragada <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Hi Xiangin,
>>>>>>>>
>>>>>>>> Happy to learn from your experience in supporting
>>>>>>>> backfill use-cases. Please feel free to review the proposal and add 
>>>>>>>> your
>>>>>>>> comments. I will wait for a couple of days more to ensure everyone has 
>>>>>>>> a
>>>>>>>> chance to review the proposal.
>>>>>>>>
>>>>>>>> ~ Anurag
>>>>>>>>
>>>>>>>> On Tue, Jan 27, 2026 at 6:42 AM Xianjin Ye <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Anurag and Peter,
>>>>>>>>>
>>>>>>>>> It’s great to see the partial column update has gained great
>>>>>>>>> interest in the community. I internally built a BackfillColumns 
>>>>>>>>> action to
>>>>>>>>> efficiently backfill columns(by writing the partial columns only and 
>>>>>>>>> copies
>>>>>>>>> the binary data of other columns into a new DataFile). The speedup 
>>>>>>>>> could be
>>>>>>>>> 10x for wide tables but the write amplification is still there. I 
>>>>>>>>> would be
>>>>>>>>> happy to collaborate on the work and eliminate the write 
>>>>>>>>> amplification.
>>>>>>>>>
>>>>>>>>> On 2026/01/27 10:12:54 Péter Váry wrote:
>>>>>>>>> > Hi Anurag,
>>>>>>>>> >
>>>>>>>>> > It’s great to see how much interest there is in the community
>>>>>>>>> around this
>>>>>>>>> > potential new feature. Gábor and I have actually submitted an
>>>>>>>>> Iceberg
>>>>>>>>> > Summit talk proposal on this topic, and we would be very happy to
>>>>>>>>> > collaborate on the work. I was mainly waiting for the File
>>>>>>>>> Format API to be
>>>>>>>>> > finalized, as I believe this feature should build on top of it.
>>>>>>>>> >
>>>>>>>>> > For reference, our related work includes:
>>>>>>>>> >
>>>>>>>>> >    - *Dev list thread:*
>>>>>>>>> >
>>>>>>>>> https://lists.apache.org/thread/h0941sdq9jwrb6sj0pjfjjxov8tx7ov9
>>>>>>>>> >    - *Proposal document:*
>>>>>>>>> >
>>>>>>>>> https://docs.google.com/document/d/1OHuZ6RyzZvCOQ6UQoV84GzwVp3UPiu_cfXClsOi03ww
>>>>>>>>> >    (not shared widely yet)
>>>>>>>>> >    - *Performance testing PR for readers and writers:*
>>>>>>>>> >    https://github.com/apache/iceberg/pull/13306
>>>>>>>>> >
>>>>>>>>> > During earlier discussions about possible metadata changes,
>>>>>>>>> another option
>>>>>>>>> > came up that hasn’t been documented yet: separating planner
>>>>>>>>> metadata from
>>>>>>>>> > reader metadata. Since the planner does not need to know about
>>>>>>>>> the actual
>>>>>>>>> > files, we could store the file composition in a separate file
>>>>>>>>> (potentially
>>>>>>>>> > a Puffin file). This file could hold the column_files metadata,
>>>>>>>>> while the
>>>>>>>>> > manifest would reference the Puffin file and blob position
>>>>>>>>> instead of the
>>>>>>>>> > data filename.
>>>>>>>>> > This approach has the advantage of keeping the existing metadata
>>>>>>>>> largely
>>>>>>>>> > intact, and it could also give us a natural place later to add
>>>>>>>>> file-level
>>>>>>>>> > indexes or Bloom filters for use during reads or secondary
>>>>>>>>> filtering. The
>>>>>>>>> > downsides are the additional files and the increased complexity
>>>>>>>>> of
>>>>>>>>> > identifying files that are no longer referenced by the table, so
>>>>>>>>> this may
>>>>>>>>> > not be an ideal solution.
>>>>>>>>> >
>>>>>>>>> > I do have some concerns about the MoR metadata proposal
>>>>>>>>> described in the
>>>>>>>>> > document. At first glance, it seems to complicate distributed
>>>>>>>>> planning, as
>>>>>>>>> > all entries for a given file would need to be collected and
>>>>>>>>> merged to
>>>>>>>>> > provide the information required by both the planner and the
>>>>>>>>> reader.
>>>>>>>>> > Additionally, when a new column is added or updated, we would
>>>>>>>>> still need to
>>>>>>>>> > add a new metadata entry for every existing data file. If we
>>>>>>>>> immediately
>>>>>>>>> > write out the merged metadata, the total number of entries
>>>>>>>>> remains the
>>>>>>>>> > same. The main benefit is avoiding rewriting statistics, which
>>>>>>>>> can be
>>>>>>>>> > significant, but this comes at the cost of increased planning
>>>>>>>>> complexity.
>>>>>>>>> > If we choose to store the merged statistics in the
>>>>>>>>> column_families entry, I
>>>>>>>>> > don’t see much benefit in excluding the rest of the metadata,
>>>>>>>>> especially
>>>>>>>>> > since including it would simplify the planning process.
>>>>>>>>> >
>>>>>>>>> > As Anton already pointed out, we should also discuss how this
>>>>>>>>> change would
>>>>>>>>> > affect split handling, particularly how to avoid double reads
>>>>>>>>> when row
>>>>>>>>> > groups are not aligned between the original data files and the
>>>>>>>>> new column
>>>>>>>>> > files.
>>>>>>>>> >
>>>>>>>>> > Finally, I’d like to see some discussion around the Java API
>>>>>>>>> implications.
>>>>>>>>> > In particular, what API changes are required, and how SQL
>>>>>>>>> engines would
>>>>>>>>> > perform updates. Since the new column files must have the same
>>>>>>>>> number of
>>>>>>>>> > rows as the original data files, with a strict one-to-one
>>>>>>>>> relationship, SQL
>>>>>>>>> > engines would need access to the source filename, position, and
>>>>>>>>> deletion
>>>>>>>>> > status in the DataFrame in order to generate the new files. This
>>>>>>>>> is more
>>>>>>>>> > involved than a simple update and deserves some explicit
>>>>>>>>> consideration.
>>>>>>>>> >
>>>>>>>>> > Looking forward to your thoughts.
>>>>>>>>> > Best regards,
>>>>>>>>> > Peter
>>>>>>>>> >
>>>>>>>>> > On Tue, Jan 27, 2026, 03:58 Anurag Mantripragada <
>>>>>>>>> [email protected]>
>>>>>>>>> > wrote:
>>>>>>>>> >
>>>>>>>>> > > Thanks Anton and others, for providing some initial feedback.
>>>>>>>>> I will
>>>>>>>>> > > address all your comments soon.
>>>>>>>>> > >
>>>>>>>>> > > On Mon, Jan 26, 2026 at 11:10 AM Anton Okolnychyi <
>>>>>>>>> [email protected]>
>>>>>>>>> > > wrote:
>>>>>>>>> > >
>>>>>>>>> > >> I had a chance to see the proposal before it landed and I
>>>>>>>>> think it is a
>>>>>>>>> > >> cool idea and both presented approaches would likely work. I
>>>>>>>>> am looking
>>>>>>>>> > >> forward to discussing the tradeoffs and would encourage
>>>>>>>>> everyone to
>>>>>>>>> > >> push/polish each approach to see what issues can be mitigated
>>>>>>>>> and what are
>>>>>>>>> > >> fundamental.
>>>>>>>>> > >>
>>>>>>>>> > >> [1] Iceberg-native approach: better visibility into column
>>>>>>>>> files from the
>>>>>>>>> > >> metadata, potentially better concurrency for non-overlapping
>>>>>>>>> column
>>>>>>>>> > >> updates, no dep on Parquet.
>>>>>>>>> > >> [2] Parquet-native approach: almost no changes to the table
>>>>>>>>> format
>>>>>>>>> > >> metadata beyond tracking of base files.
>>>>>>>>> > >>
>>>>>>>>> > >> I think [1] sounds a bit better on paper but I am worried
>>>>>>>>> about the
>>>>>>>>> > >> complexity in writers and readers (especially around keeping
>>>>>>>>> row groups
>>>>>>>>> > >> aligned and split planning). It would be great to cover this
>>>>>>>>> in detail in
>>>>>>>>> > >> the proposal.
>>>>>>>>> > >>
>>>>>>>>> > >> пн, 26 січ. 2026 р. о 09:00 Anurag Mantripragada <
>>>>>>>>> > >> [email protected]> пише:
>>>>>>>>> > >>
>>>>>>>>> > >>> Hi all,
>>>>>>>>> > >>>
>>>>>>>>> > >>> "Wide tables" with thousands of columns present significant
>>>>>>>>> challenges
>>>>>>>>> > >>> for AI/ML workloads, particularly when only a subset of
>>>>>>>>> columns needs to be
>>>>>>>>> > >>> added or updated. Current Copy-on-Write (COW) and
>>>>>>>>> Merge-on-Read (MOR)
>>>>>>>>> > >>> operations in Iceberg apply at the row level, which leads to
>>>>>>>>> substantial
>>>>>>>>> > >>> write amplification in scenarios such as:
>>>>>>>>> > >>>
>>>>>>>>> > >>>    - Feature Backfilling & Column Updates: Adding new
>>>>>>>>> feature columns
>>>>>>>>> > >>>    (e.g., model embeddings) to petabyte-scale tables.
>>>>>>>>> > >>>    - Model Score Updates: Refresh prediction scores after
>>>>>>>>> retraining.
>>>>>>>>> > >>>    - Embedding Refresh: Updating vector embeddings, which
>>>>>>>>> currently
>>>>>>>>> > >>>    triggers a rewrite of the entire row.
>>>>>>>>> > >>>    - Incremental Feature Computation: Daily updates to a
>>>>>>>>> small fraction
>>>>>>>>> > >>>    of features in wide tables.
>>>>>>>>> > >>>
>>>>>>>>> > >>> With the Iceberg V4 proposal introducing single-file commits
>>>>>>>>> and column
>>>>>>>>> > >>> stats improvements, this is an ideal time to address
>>>>>>>>> column-level updates
>>>>>>>>> > >>> to better support these use cases.
>>>>>>>>> > >>>
>>>>>>>>> > >>> I have drafted a proposal that explores both table-format
>>>>>>>>> enhancements
>>>>>>>>> > >>> and file-format (Parquet) changes to enable more efficient
>>>>>>>>> updates.
>>>>>>>>> > >>>
>>>>>>>>> > >>> Proposal Details:
>>>>>>>>> > >>> - GitHub Issue: #15146 <
>>>>>>>>> https://github.com/apache/iceberg/issues/15146>
>>>>>>>>> > >>> - Design Document: Efficient Column Updates in Iceberg
>>>>>>>>> > >>> <
>>>>>>>>> https://docs.google.com/document/d/1Bd7JVzgajA8-DozzeEE24mID_GLuz6iwj0g4TlcVJcs/edit?tab=t.0
>>>>>>>>> >
>>>>>>>>> > >>>
>>>>>>>>> > >>> Next Steps:
>>>>>>>>> > >>> I plan to create POCs to benchmark the approaches described
>>>>>>>>> in the
>>>>>>>>> > >>> document.
>>>>>>>>> > >>>
>>>>>>>>> > >>> Please review the proposal and share your feedback.
>>>>>>>>> > >>>
>>>>>>>>> > >>> Thanks,
>>>>>>>>> > >>> Anurag
>>>>>>>>> > >>>
>>>>>>>>> > >>
>>>>>>>>> >
>>>>>>>>>
>>>>>>>>

Re: [Discuss] Efficient column updates in Iceberg

Reply via email to