Re: [Discuss] Efficient column updates in Iceberg

Shawn Chang Mon, 16 Feb 2026 17:44:49 -0800

Hi all,

Just got a chance to follow up on the discussion here. Making column files
additive to the existing base files seems reasonable to me, but I think it
also implies that compaction is a must, similar to how we manage delete
files today. An important difference is that updates usually occur much
more frequently than deletes.


This may be a separate concern, but have we considered whether compaction
should be more closely tied to writes? For example, triggering a rewrite
once we have X number of column files, rather than relying solely on an
independant compaction job. There can be minor compactions to just collapse
one file set (base file + column files) so we don't block writers too much.

Best,
Shawn

On Mon, Feb 16, 2026 at 7:23 AM Gábor Kaszab <[email protected]> wrote:

> Hey All,
>
> Thanks Anurag for the summary!
>
> I regret we don't have a recording for the sync, but I had the impression
> that, even though there was a lengthy discussion about the implementation
> requirements for partial updates, there wasn't a strong consensus around
> the need and there were no strong use cases to justify partial updates
> either. Let me sum up where I see we are at now:
>
> *Scope of the updates*
>
> *1) Full column updates*
> There is a consensus and common understanding that this use case makes
> sense. If this was the only supported use-case, the implementation would be
> relatively simple. We could guarantee there is no overlap in column updates
> by deduplicating the field IDs in the column update metadata. E.g. Let's
> say we have a column update on columns {1,2} and we write another column
> update for {2,3}: we can change the metadata for the first one to only
> cover {1} and not {1,2}. With this the write and the read/stitching process
> is also straightforward (if we decide not to support equality deletes
> together with column updates).
>
> Both row matching approaches could work here:
>     - row number matching update files, where we fill the deleted rows
> with an arbitrary value (preferably null)
>     - sparse update files with some auxiliary column written into the
> column update file, like row position in base file
>
> *2) Partial column updates (row-level)*
> I see 2 use cases mentioned for this: bug-fixing a subset of rows,
> updating features for active users
> My initial impression here is that whether to use column updates or not
> heavily depends on the selectivity of the partial update queries. I'm sure
> there is a percentage of the affected rows where if we go below it's simply
> better to use the traditional row level updates (cow/mor). I'm not entirely
> convinced that covering these scenarios is worth the extra complexity here:
>     - We can't deduplicate the column updates by field IDs on the
> metadata-side
>     - We have two options for writers:
>          - Merge the existing column update files themselves when writing
> a new one with an overlap of field Ids. No need to sort out the different
> column updates files and merge them on the read side, but there is overhead
> on write side
>         - Don't bother merging existing column updates when writing a new
> one. This makes overhead on the read side.
>
> Handling of sparse update files is a must here, with the chance for
> optimisation if all the rows are covered with the update file, as Micah
> suggested.
>
> To sum up, I think to justify this approach we need to have strong
> use-cases and measurements to verify that the extra complexity results
> convincingly better results compared to existing CoW/MoR approaches.
>
> *3) Partial column updates (file-level)*
> This option wasn't brought up during our conversation but might be worth
> considering. This is basically a middleground between the above two
> approaches. Partial updates are allowed as long as they affect entire data
> files, and it's allowed to only cover a subset of the files. One use-case
> would be to do column updates per partition for instance.
>
> With this approach the metadata representation could be as simple as in
> 1), where we can deduplicate the updates files by field IDs. Also there is
> no write and read overhead on top of 1) apart from the verification step to
> ensure that the WHERE filter on the update is doing the split on file
> boundaries.
> Also similarly to 1), sparse update files weren't a must here, we could
> consider row-matching update files too.
>
> *Row alignment*
> Sparse update files are required for row-level partial updates, but if we
> decide to go with any of the other options we could also evaluate the "row
> count matching" approach too. Even though it requires filling the missing
> rows with arbitrary values (null seems a good candidate) it would result in
> less write overhead (no need to write row position) and read overhead (no
> need to join rows by row position) too that could worth the inconvenience
> of having 'invalid' but inaccessible values in the files. The num nulls
> stats being off is a good argument against this, but I think we could have
> a way of fixing this too by keeping track of how many rows were deleted
> (and subtract this value from the num nulls counter returned by the writer).
>
>
> *Next steps*
> I'm actively working on a very basic PoC implementation where we would be
> able to test the different approaches comparing pros and cons so that we
> can make a decision on the above questions. I'll sync with Anurag on this
> and will let you know once we have something.
>
> Best Regards,
> Gabor
>
>
> Micah Kornfield <[email protected]> ezt írta (időpont: 2026. febr.
> 14., Szo, 2:20):
>
>> Given that, the sparse representation with alignment at read time (using
>>> dummy/null values) seems to provide the benefits of both efficient
>>> vectorized reads and stitching as well as support for partial column
>>> updates. Would you agree?
>>
>>
>> Thinking more about it, I think the sparse approach is actually a
>> superset set approach, so it is not a concern.  If writers want they can
>> write out the fully populated columns with position indexes from 1 to N,
>> and readers can take an optimized path if they detect the number of rows in
>> the update is equal to the number of base rows.
>>
>> I still think there is a question on what writers should do (i.e. when do
>> they decide to duplicate data instead of trying to give sparse updates) but
>> that is an implementation question and not necessarily something that needs
>> to block spec work.
>>
>> Cheers,
>> Micah
>>
>> On Fri, Feb 13, 2026 at 11:29 AM Anurag Mantripragada <
>> [email protected]> wrote:
>>
>>> Hi Micah,
>>>
>>> This seems like a classic MoR vs CoW trade-off.  But it seems like maybe
>>>> both sparse and full should be available (I understand this adds
>>>> complexity). For adding a new column or completely updating a new column,
>>>> the performance would be better to prefill the data
>>>
>>>
>>> Our internal use cases are very similar to what you describe. We
>>> primarily deal with full column updates. However, the feedback on the
>>> proposal from the wider community indicated that partial updates (e.g.,
>>> bug-fixing a subset of rows, updating features for active users) are also a
>>> very common and critical use case.
>>>
>>> Is there evidence to say that partial column updates are more common in
>>>> practice then full rewrites?
>>>
>>>
>>> Personally, I don't have hard data on which use case is more common in
>>> the wild, only that both appear to be important. I also agree that a good
>>> long term solution should support both strategies. Given that, the sparse
>>> representation with alignment at read time (using dummy/null values) seems
>>> to provide the benefits of both efficient vectorized reads and stitching as
>>> well as support for partial column updates. Would you agree?
>>>
>>> ~ Anurag
>>>
>>> On Fri, Feb 13, 2026 at 9:33 AM Micah Kornfield <[email protected]>
>>> wrote:
>>>
>>>> Hi Anurag,
>>>>
>>>>> Data Representation: Sparse column files are preferred for compact
>>>>> representation and are better suited for partial column updates. We can
>>>>> optimize sparse representation for vectorized reads by filling in
>>>>> null or default values at read time for missing positions from the base
>>>>> file, which avoids joins during reads.
>>>>
>>>>
>>>> This seems like a classic MoR vs CoW trade-off.  But it seems like
>>>> maybe both sparse and full should be available (I understand this adds
>>>> complexity).  For adding a new column or completely updating a new column,
>>>> the performance would be better to prefill the data (otherwise one ends up
>>>> duplicating the work that is already happening under the hood in parquet).
>>>>
>>>> Is there evidence to say that partial column updates are more common in
>>>> practice then full rewrites?
>>>>
>>>> Thanks,
>>>> Micah
>>>>
>>>>
>>>> On Thu, Feb 12, 2026 at 3:32 AM Eduard Tudenhöfner <
>>>> [email protected]> wrote:
>>>>
>>>>> Hey Anurag,
>>>>>
>>>>> I wasn't able to make it to the sync but was hoping to watch the
>>>>> recording afterwards.
>>>>> I'm curious what the reasons were for discarding the Parquet-native
>>>>> approach. Could you share a summary from what was discussed in the sync
>>>>> please on that topic?
>>>>>
>>>>> On Tue, Feb 10, 2026 at 8:20 PM Anurag Mantripragada <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> Thank you for attending today's sync. Please find the meeting notes
>>>>>> below. I apologize that we were unable to record the session due to
>>>>>> attendees not having record access.
>>>>>>
>>>>>> Key updates and discussion points:
>>>>>>
>>>>>> *Decisions:*
>>>>>>
>>>>>>    - Table Format vs. Parquet: There is a general consensus that
>>>>>>    column update support should reside in the table format. 
>>>>>> Consequently, we
>>>>>>    have discarded the Parquet-native approach.
>>>>>>    - Metadata Representation: To maintain clean metadata and avoid
>>>>>>    complex resolution logic for readers, the goal is to keep only one 
>>>>>> metadata
>>>>>>    file per column. However, achieving this is challenging if we support
>>>>>>    partial updates, as multiple column files may exist for the same 
>>>>>> column
>>>>>>    (See open questions).
>>>>>>    - Data Representation: Sparse column files are preferred for
>>>>>>    compact representation and are better suited for partial column 
>>>>>> updates. We
>>>>>>    can optimize sparse representation for vectorized reads by filling in 
>>>>>> null
>>>>>>    or default values at read time for missing positions from the base 
>>>>>> file,
>>>>>>    which avoids joins during reads.
>>>>>>
>>>>>>
>>>>>> *Open Questions: *
>>>>>>
>>>>>>    - We are still determining what restrictions are necessary when
>>>>>>    supporting partial updates. For instance, we need to decide whether 
>>>>>> to add
>>>>>>    a new column and subsequently allow partial updates on it. This would
>>>>>>    involve managing both a base column file and subsequent update files.
>>>>>>    - We need a better understanding of the use cases for partial
>>>>>>    updates.
>>>>>>    - We need to further discuss the handling of equality deletes.
>>>>>>
>>>>>> If I missed anything, or if others took notes, please share them
>>>>>> here. Thanks!
>>>>>>
>>>>>> I will go ahead and update the doc with what we have discussed so we
>>>>>> can continue next time from where we left off.
>>>>>>
>>>>>> ~ Anurag
>>>>>>
>>>>>> On Mon, Feb 9, 2026 at 11:55 AM Anurag Mantripragada <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> This design
>>>>>>> <https://docs.google.com/document/d/1Bd7JVzgajA8-DozzeEE24mID_GLuz6iwj0g4TlcVJcs/edit?tab=t.0>
>>>>>>> will be discussed tomorrow in a dedicated sync.
>>>>>>>
>>>>>>> Efficient column updates sync
>>>>>>> Tuesday, February 10 · 9:00 – 10:00am
>>>>>>> Time zone: America/Los_Angeles
>>>>>>> Google Meet joining info
>>>>>>> Video call link: https://meet.google.com/xsd-exug-tcd
>>>>>>>
>>>>>>> ~ Anurag
>>>>>>>
>>>>>>> On Fri, Feb 6, 2026 at 8:30 AM Anurag Mantripragada <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Hi Gabor,
>>>>>>>>
>>>>>>>> Thanks for the detailed example.
>>>>>>>>
>>>>>>>> I agree with Steven that Option 2 seems reasonable. I will add a
>>>>>>>> section to the design doc regarding equality delete handling, and we 
>>>>>>>> can
>>>>>>>> discuss this further during our meeting on Tuesday.
>>>>>>>>
>>>>>>>> ~Anurag
>>>>>>>>
>>>>>>>> On Fri, Feb 6, 2026 at 7:08 AM Steven Wu <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> > 1) When deleting with eq-deletes: If there is a column update
>>>>>>>>> on the equality-filed ID we use for the delete, reject deletion
>>>>>>>>> > 2) When adding a column update on a column that is part of the
>>>>>>>>> equality field IDs in some delete, we reject the column update
>>>>>>>>>
>>>>>>>>> Gabor, this is a good scenario. The 2nd option makes sense to me,
>>>>>>>>> since equality ids are like primary key fields. If we have the 2nd 
>>>>>>>>> rule
>>>>>>>>> enforced, the first option is not applicable anymore.
>>>>>>>>>
>>>>>>>>> On Fri, Feb 6, 2026 at 3:13 AM Gábor Kaszab <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> Hey,
>>>>>>>>>>
>>>>>>>>>> Thank you for the proposal, Anurag! I made a pass recently and I
>>>>>>>>>> think there is some interference between column updates and equality
>>>>>>>>>> deletes. Let me describe below:
>>>>>>>>>>
>>>>>>>>>> Steps:
>>>>>>>>>>
>>>>>>>>>> CREATE TABLE tbl (int a, int b);
>>>>>>>>>>
>>>>>>>>>> INSERT INTO tbl VALUES (1, 11), (2, 22);  -- creates the base
>>>>>>>>>> data file
>>>>>>>>>>
>>>>>>>>>> DELETE FROM tbl WHERE b=11;               -- creates an equality
>>>>>>>>>> delete file
>>>>>>>>>>
>>>>>>>>>> UPDATE tbl SET b=11;                                   -- writes
>>>>>>>>>> column update
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> SELECT * FROM tbl;
>>>>>>>>>>
>>>>>>>>>> Expected result:
>>>>>>>>>>
>>>>>>>>>> (2, 11)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Data and metadata created after the above steps:
>>>>>>>>>>
>>>>>>>>>> Base file
>>>>>>>>>>
>>>>>>>>>> (1, 11), (2, 22),
>>>>>>>>>>
>>>>>>>>>> seqnum=1
>>>>>>>>>>
>>>>>>>>>> EQ-delete
>>>>>>>>>>
>>>>>>>>>> b=11
>>>>>>>>>>
>>>>>>>>>> seqnum=2
>>>>>>>>>>
>>>>>>>>>> Column update
>>>>>>>>>>
>>>>>>>>>> Field ids: [field_id_for_col_b]
>>>>>>>>>>
>>>>>>>>>> seqnum=3
>>>>>>>>>>
>>>>>>>>>> Data file content: (dummy_value),(11)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Read steps:
>>>>>>>>>>
>>>>>>>>>>    1. Stitch base file with column updates in reader:
>>>>>>>>>>
>>>>>>>>>> Rows: (1,dummy_value), (2,11) (Note, dummy value can be either
>>>>>>>>>> null, or 11, see the proposal for more details)
>>>>>>>>>>
>>>>>>>>>> Seqnum for base file=1
>>>>>>>>>>
>>>>>>>>>> Seqnum for column update=3
>>>>>>>>>>
>>>>>>>>>>    2. Apply eq-delete b=11, seqnum=3 on the stitched result
>>>>>>>>>>    3. Query result depends on which seqnum we carry forward to
>>>>>>>>>>    compare with the eq-delete's seqnum, but it's not correct in any 
>>>>>>>>>> of the
>>>>>>>>>>    cases
>>>>>>>>>>       1. Use seqnum from base file: we get either an empty
>>>>>>>>>>       result if 'dummy_value' is 11 or we get (1, null) otherwise
>>>>>>>>>>       2. Use seqnum from last update file: don't delete any
>>>>>>>>>>       rows, result set is (1, dummy_value),(2,11)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Problem:
>>>>>>>>>>
>>>>>>>>>> EQ-delete should be applied midway applying the column updates to
>>>>>>>>>> the base file based on sequence number, during the stitching 
>>>>>>>>>> process. If
>>>>>>>>>> I'm not mistaken, this is not feasible with the way readers work.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Proposal:
>>>>>>>>>>
>>>>>>>>>> Don't allow equality deletes together with column updates.
>>>>>>>>>>
>>>>>>>>>>   1) When deleting with eq-deletes: If there is a column update
>>>>>>>>>> on the equality-filed ID we use for the delete, reject deletion
>>>>>>>>>>
>>>>>>>>>>   2) When adding a column update on a column that is part of the
>>>>>>>>>> equality field IDs in some delete, we reject the column update
>>>>>>>>>>
>>>>>>>>>> Alternatively, column updates could be controlled by a property
>>>>>>>>>> of the table (immutable), and reject eq-deletes if the property 
>>>>>>>>>> indicates
>>>>>>>>>> column updates are turned on for the table
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Let me know what you think!
>>>>>>>>>>
>>>>>>>>>> Best Regards,
>>>>>>>>>>
>>>>>>>>>> Gabor
>>>>>>>>>>
>>>>>>>>>> Anurag Mantripragada <[email protected]> ezt írta
>>>>>>>>>> (időpont: 2026. jan. 28., Sze, 3:31):
>>>>>>>>>>
>>>>>>>>>>> Thank you everyone for the initial review comments. It is
>>>>>>>>>>> exciting to see so much interest in this proposal.
>>>>>>>>>>>
>>>>>>>>>>> I am currently reviewing and responding to each comment. The
>>>>>>>>>>> general themes of the feedback so far include:
>>>>>>>>>>> - Including partial updates (column updates on a subset of rows
>>>>>>>>>>> in a table).
>>>>>>>>>>> - Adding details on how SQL engines will write the update files.
>>>>>>>>>>> - Adding details on split planning and row alignment for update
>>>>>>>>>>> files.
>>>>>>>>>>>
>>>>>>>>>>> I will think through these points and update the design
>>>>>>>>>>> accordingly.
>>>>>>>>>>>
>>>>>>>>>>> Best
>>>>>>>>>>> Anurag
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jan 27, 2026 at 6:25 PM Anurag Mantripragada <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Xiangin,
>>>>>>>>>>>>
>>>>>>>>>>>> Happy to learn from your experience in supporting
>>>>>>>>>>>> backfill use-cases. Please feel free to review the proposal and 
>>>>>>>>>>>> add your
>>>>>>>>>>>> comments. I will wait for a couple of days more to ensure everyone 
>>>>>>>>>>>> has a
>>>>>>>>>>>> chance to review the proposal.
>>>>>>>>>>>>
>>>>>>>>>>>> ~ Anurag
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Jan 27, 2026 at 6:42 AM Xianjin Ye <[email protected]>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Anurag and Peter,
>>>>>>>>>>>>>
>>>>>>>>>>>>> It’s great to see the partial column update has gained great
>>>>>>>>>>>>> interest in the community. I internally built a BackfillColumns 
>>>>>>>>>>>>> action to
>>>>>>>>>>>>> efficiently backfill columns(by writing the partial columns only 
>>>>>>>>>>>>> and copies
>>>>>>>>>>>>> the binary data of other columns into a new DataFile). The 
>>>>>>>>>>>>> speedup could be
>>>>>>>>>>>>> 10x for wide tables but the write amplification is still there. I 
>>>>>>>>>>>>> would be
>>>>>>>>>>>>> happy to collaborate on the work and eliminate the write 
>>>>>>>>>>>>> amplification.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 2026/01/27 10:12:54 Péter Váry wrote:
>>>>>>>>>>>>> > Hi Anurag,
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > It’s great to see how much interest there is in the
>>>>>>>>>>>>> community around this
>>>>>>>>>>>>> > potential new feature. Gábor and I have actually submitted
>>>>>>>>>>>>> an Iceberg
>>>>>>>>>>>>> > Summit talk proposal on this topic, and we would be very
>>>>>>>>>>>>> happy to
>>>>>>>>>>>>> > collaborate on the work. I was mainly waiting for the File
>>>>>>>>>>>>> Format API to be
>>>>>>>>>>>>> > finalized, as I believe this feature should build on top of
>>>>>>>>>>>>> it.
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > For reference, our related work includes:
>>>>>>>>>>>>> >
>>>>>>>>>>>>> >    - *Dev list thread:*
>>>>>>>>>>>>> >
>>>>>>>>>>>>> https://lists.apache.org/thread/h0941sdq9jwrb6sj0pjfjjxov8tx7ov9
>>>>>>>>>>>>> >    - *Proposal document:*
>>>>>>>>>>>>> >
>>>>>>>>>>>>> https://docs.google.com/document/d/1OHuZ6RyzZvCOQ6UQoV84GzwVp3UPiu_cfXClsOi03ww
>>>>>>>>>>>>> >    (not shared widely yet)
>>>>>>>>>>>>> >    - *Performance testing PR for readers and writers:*
>>>>>>>>>>>>> >    https://github.com/apache/iceberg/pull/13306
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > During earlier discussions about possible metadata changes,
>>>>>>>>>>>>> another option
>>>>>>>>>>>>> > came up that hasn’t been documented yet: separating planner
>>>>>>>>>>>>> metadata from
>>>>>>>>>>>>> > reader metadata. Since the planner does not need to know
>>>>>>>>>>>>> about the actual
>>>>>>>>>>>>> > files, we could store the file composition in a separate
>>>>>>>>>>>>> file (potentially
>>>>>>>>>>>>> > a Puffin file). This file could hold the column_files
>>>>>>>>>>>>> metadata, while the
>>>>>>>>>>>>> > manifest would reference the Puffin file and blob position
>>>>>>>>>>>>> instead of the
>>>>>>>>>>>>> > data filename.
>>>>>>>>>>>>> > This approach has the advantage of keeping the existing
>>>>>>>>>>>>> metadata largely
>>>>>>>>>>>>> > intact, and it could also give us a natural place later to
>>>>>>>>>>>>> add file-level
>>>>>>>>>>>>> > indexes or Bloom filters for use during reads or secondary
>>>>>>>>>>>>> filtering. The
>>>>>>>>>>>>> > downsides are the additional files and the increased
>>>>>>>>>>>>> complexity of
>>>>>>>>>>>>> > identifying files that are no longer referenced by the
>>>>>>>>>>>>> table, so this may
>>>>>>>>>>>>> > not be an ideal solution.
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > I do have some concerns about the MoR metadata proposal
>>>>>>>>>>>>> described in the
>>>>>>>>>>>>> > document. At first glance, it seems to complicate
>>>>>>>>>>>>> distributed planning, as
>>>>>>>>>>>>> > all entries for a given file would need to be collected and
>>>>>>>>>>>>> merged to
>>>>>>>>>>>>> > provide the information required by both the planner and the
>>>>>>>>>>>>> reader.
>>>>>>>>>>>>> > Additionally, when a new column is added or updated, we
>>>>>>>>>>>>> would still need to
>>>>>>>>>>>>> > add a new metadata entry for every existing data file. If we
>>>>>>>>>>>>> immediately
>>>>>>>>>>>>> > write out the merged metadata, the total number of entries
>>>>>>>>>>>>> remains the
>>>>>>>>>>>>> > same. The main benefit is avoiding rewriting statistics,
>>>>>>>>>>>>> which can be
>>>>>>>>>>>>> > significant, but this comes at the cost of increased
>>>>>>>>>>>>> planning complexity.
>>>>>>>>>>>>> > If we choose to store the merged statistics in the
>>>>>>>>>>>>> column_families entry, I
>>>>>>>>>>>>> > don’t see much benefit in excluding the rest of the
>>>>>>>>>>>>> metadata, especially
>>>>>>>>>>>>> > since including it would simplify the planning process.
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > As Anton already pointed out, we should also discuss how
>>>>>>>>>>>>> this change would
>>>>>>>>>>>>> > affect split handling, particularly how to avoid double
>>>>>>>>>>>>> reads when row
>>>>>>>>>>>>> > groups are not aligned between the original data files and
>>>>>>>>>>>>> the new column
>>>>>>>>>>>>> > files.
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > Finally, I’d like to see some discussion around the Java API
>>>>>>>>>>>>> implications.
>>>>>>>>>>>>> > In particular, what API changes are required, and how SQL
>>>>>>>>>>>>> engines would
>>>>>>>>>>>>> > perform updates. Since the new column files must have the
>>>>>>>>>>>>> same number of
>>>>>>>>>>>>> > rows as the original data files, with a strict one-to-one
>>>>>>>>>>>>> relationship, SQL
>>>>>>>>>>>>> > engines would need access to the source filename, position,
>>>>>>>>>>>>> and deletion
>>>>>>>>>>>>> > status in the DataFrame in order to generate the new files.
>>>>>>>>>>>>> This is more
>>>>>>>>>>>>> > involved than a simple update and deserves some explicit
>>>>>>>>>>>>> consideration.
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > Looking forward to your thoughts.
>>>>>>>>>>>>> > Best regards,
>>>>>>>>>>>>> > Peter
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > On Tue, Jan 27, 2026, 03:58 Anurag Mantripragada <
>>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>> > wrote:
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > > Thanks Anton and others, for providing some initial
>>>>>>>>>>>>> feedback. I will
>>>>>>>>>>>>> > > address all your comments soon.
>>>>>>>>>>>>> > >
>>>>>>>>>>>>> > > On Mon, Jan 26, 2026 at 11:10 AM Anton Okolnychyi <
>>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>> > > wrote:
>>>>>>>>>>>>> > >
>>>>>>>>>>>>> > >> I had a chance to see the proposal before it landed and I
>>>>>>>>>>>>> think it is a
>>>>>>>>>>>>> > >> cool idea and both presented approaches would likely
>>>>>>>>>>>>> work. I am looking
>>>>>>>>>>>>> > >> forward to discussing the tradeoffs and would encourage
>>>>>>>>>>>>> everyone to
>>>>>>>>>>>>> > >> push/polish each approach to see what issues can be
>>>>>>>>>>>>> mitigated and what are
>>>>>>>>>>>>> > >> fundamental.
>>>>>>>>>>>>> > >>
>>>>>>>>>>>>> > >> [1] Iceberg-native approach: better visibility into
>>>>>>>>>>>>> column files from the
>>>>>>>>>>>>> > >> metadata, potentially better concurrency for
>>>>>>>>>>>>> non-overlapping column
>>>>>>>>>>>>> > >> updates, no dep on Parquet.
>>>>>>>>>>>>> > >> [2] Parquet-native approach: almost no changes to the
>>>>>>>>>>>>> table format
>>>>>>>>>>>>> > >> metadata beyond tracking of base files.
>>>>>>>>>>>>> > >>
>>>>>>>>>>>>> > >> I think [1] sounds a bit better on paper but I am worried
>>>>>>>>>>>>> about the
>>>>>>>>>>>>> > >> complexity in writers and readers (especially around
>>>>>>>>>>>>> keeping row groups
>>>>>>>>>>>>> > >> aligned and split planning). It would be great to cover
>>>>>>>>>>>>> this in detail in
>>>>>>>>>>>>> > >> the proposal.
>>>>>>>>>>>>> > >>
>>>>>>>>>>>>> > >> пн, 26 січ. 2026 р. о 09:00 Anurag Mantripragada <
>>>>>>>>>>>>> > >> [email protected]> пише:
>>>>>>>>>>>>> > >>
>>>>>>>>>>>>> > >>> Hi all,
>>>>>>>>>>>>> > >>>
>>>>>>>>>>>>> > >>> "Wide tables" with thousands of columns present
>>>>>>>>>>>>> significant challenges
>>>>>>>>>>>>> > >>> for AI/ML workloads, particularly when only a subset of
>>>>>>>>>>>>> columns needs to be
>>>>>>>>>>>>> > >>> added or updated. Current Copy-on-Write (COW) and
>>>>>>>>>>>>> Merge-on-Read (MOR)
>>>>>>>>>>>>> > >>> operations in Iceberg apply at the row level, which
>>>>>>>>>>>>> leads to substantial
>>>>>>>>>>>>> > >>> write amplification in scenarios such as:
>>>>>>>>>>>>> > >>>
>>>>>>>>>>>>> > >>>    - Feature Backfilling & Column Updates: Adding new
>>>>>>>>>>>>> feature columns
>>>>>>>>>>>>> > >>>    (e.g., model embeddings) to petabyte-scale tables.
>>>>>>>>>>>>> > >>>    - Model Score Updates: Refresh prediction scores
>>>>>>>>>>>>> after retraining.
>>>>>>>>>>>>> > >>>    - Embedding Refresh: Updating vector embeddings,
>>>>>>>>>>>>> which currently
>>>>>>>>>>>>> > >>>    triggers a rewrite of the entire row.
>>>>>>>>>>>>> > >>>    - Incremental Feature Computation: Daily updates to a
>>>>>>>>>>>>> small fraction
>>>>>>>>>>>>> > >>>    of features in wide tables.
>>>>>>>>>>>>> > >>>
>>>>>>>>>>>>> > >>> With the Iceberg V4 proposal introducing single-file
>>>>>>>>>>>>> commits and column
>>>>>>>>>>>>> > >>> stats improvements, this is an ideal time to address
>>>>>>>>>>>>> column-level updates
>>>>>>>>>>>>> > >>> to better support these use cases.
>>>>>>>>>>>>> > >>>
>>>>>>>>>>>>> > >>> I have drafted a proposal that explores both
>>>>>>>>>>>>> table-format enhancements
>>>>>>>>>>>>> > >>> and file-format (Parquet) changes to enable more
>>>>>>>>>>>>> efficient updates.
>>>>>>>>>>>>> > >>>
>>>>>>>>>>>>> > >>> Proposal Details:
>>>>>>>>>>>>> > >>> - GitHub Issue: #15146 <
>>>>>>>>>>>>> https://github.com/apache/iceberg/issues/15146>
>>>>>>>>>>>>> > >>> - Design Document: Efficient Column Updates in Iceberg
>>>>>>>>>>>>> > >>> <
>>>>>>>>>>>>> https://docs.google.com/document/d/1Bd7JVzgajA8-DozzeEE24mID_GLuz6iwj0g4TlcVJcs/edit?tab=t.0
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > >>>
>>>>>>>>>>>>> > >>> Next Steps:
>>>>>>>>>>>>> > >>> I plan to create POCs to benchmark the approaches
>>>>>>>>>>>>> described in the
>>>>>>>>>>>>> > >>> document.
>>>>>>>>>>>>> > >>>
>>>>>>>>>>>>> > >>> Please review the proposal and share your feedback.
>>>>>>>>>>>>> > >>>
>>>>>>>>>>>>> > >>> Thanks,
>>>>>>>>>>>>> > >>> Anurag
>>>>>>>>>>>>> > >>>
>>>>>>>>>>>>> > >>
>>>>>>>>>>>>> >
>>>>>>>>>>>>>
>>>>>>>>>>>>

Re: [Discuss] Efficient column updates in Iceberg

Reply via email to