Re: [Discuss] Efficient column updates in Iceberg

Gábor Kaszab Mon, 16 Feb 2026 07:23:25 -0800

Hey All,

Thanks Anurag for the summary!


I regret we don't have a recording for the sync, but I had the impression
that, even though there was a lengthy discussion about the implementation
requirements for partial updates, there wasn't a strong consensus around
the need and there were no strong use cases to justify partial updates
either. Let me sum up where I see we are at now:

*Scope of the updates*

*1) Full column updates*
There is a consensus and common understanding that this use case makes
sense. If this was the only supported use-case, the implementation would be
relatively simple. We could guarantee there is no overlap in column updates
by deduplicating the field IDs in the column update metadata. E.g. Let's
say we have a column update on columns {1,2} and we write another column
update for {2,3}: we can change the metadata for the first one to only
cover {1} and not {1,2}. With this the write and the read/stitching process
is also straightforward (if we decide not to support equality deletes
together with column updates).

Both row matching approaches could work here:
    - row number matching update files, where we fill the deleted rows with
an arbitrary value (preferably null)
    - sparse update files with some auxiliary column written into the
column update file, like row position in base file

*2) Partial column updates (row-level)*
I see 2 use cases mentioned for this: bug-fixing a subset of rows, updating
features for active users
My initial impression here is that whether to use column updates or not
heavily depends on the selectivity of the partial update queries. I'm sure
there is a percentage of the affected rows where if we go below it's simply
better to use the traditional row level updates (cow/mor). I'm not entirely
convinced that covering these scenarios is worth the extra complexity here:
    - We can't deduplicate the column updates by field IDs on the
metadata-side
    - We have two options for writers:
         - Merge the existing column update files themselves when writing a
new one with an overlap of field Ids. No need to sort out the different
column updates files and merge them on the read side, but there is overhead
on write side
        - Don't bother merging existing column updates when writing a new
one. This makes overhead on the read side.

Handling of sparse update files is a must here, with the chance for
optimisation if all the rows are covered with the update file, as Micah
suggested.

To sum up, I think to justify this approach we need to have strong
use-cases and measurements to verify that the extra complexity results
convincingly better results compared to existing CoW/MoR approaches.

*3) Partial column updates (file-level)*
This option wasn't brought up during our conversation but might be worth
considering. This is basically a middleground between the above two
approaches. Partial updates are allowed as long as they affect entire data
files, and it's allowed to only cover a subset of the files. One use-case
would be to do column updates per partition for instance.

With this approach the metadata representation could be as simple as in 1),
where we can deduplicate the updates files by field IDs. Also there is no
write and read overhead on top of 1) apart from the verification step to
ensure that the WHERE filter on the update is doing the split on file
boundaries.
Also similarly to 1), sparse update files weren't a must here, we could
consider row-matching update files too.

*Row alignment*
Sparse update files are required for row-level partial updates, but if we
decide to go with any of the other options we could also evaluate the "row
count matching" approach too. Even though it requires filling the missing
rows with arbitrary values (null seems a good candidate) it would result in
less write overhead (no need to write row position) and read overhead (no
need to join rows by row position) too that could worth the inconvenience
of having 'invalid' but inaccessible values in the files. The num nulls
stats being off is a good argument against this, but I think we could have
a way of fixing this too by keeping track of how many rows were deleted
(and subtract this value from the num nulls counter returned by the writer).


*Next steps*
I'm actively working on a very basic PoC implementation where we would be
able to test the different approaches comparing pros and cons so that we
can make a decision on the above questions. I'll sync with Anurag on this
and will let you know once we have something.

Best Regards,
Gabor


Micah Kornfield <[email protected]> ezt írta (időpont: 2026. febr. 14.,
Szo, 2:20):

> Given that, the sparse representation with alignment at read time (using
>> dummy/null values) seems to provide the benefits of both efficient
>> vectorized reads and stitching as well as support for partial column
>> updates. Would you agree?
>
>
> Thinking more about it, I think the sparse approach is actually a superset
> set approach, so it is not a concern.  If writers want they can write out
> the fully populated columns with position indexes from 1 to N, and readers
> can take an optimized path if they detect the number of rows in the update
> is equal to the number of base rows.
>
> I still think there is a question on what writers should do (i.e. when do
> they decide to duplicate data instead of trying to give sparse updates) but
> that is an implementation question and not necessarily something that needs
> to block spec work.
>
> Cheers,
> Micah
>
> On Fri, Feb 13, 2026 at 11:29 AM Anurag Mantripragada <
> [email protected]> wrote:
>
>> Hi Micah,
>>
>> This seems like a classic MoR vs CoW trade-off.  But it seems like maybe
>>> both sparse and full should be available (I understand this adds
>>> complexity). For adding a new column or completely updating a new column,
>>> the performance would be better to prefill the data
>>
>>
>> Our internal use cases are very similar to what you describe. We
>> primarily deal with full column updates. However, the feedback on the
>> proposal from the wider community indicated that partial updates (e.g.,
>> bug-fixing a subset of rows, updating features for active users) are also a
>> very common and critical use case.
>>
>> Is there evidence to say that partial column updates are more common in
>>> practice then full rewrites?
>>
>>
>> Personally, I don't have hard data on which use case is more common in
>> the wild, only that both appear to be important. I also agree that a good
>> long term solution should support both strategies. Given that, the sparse
>> representation with alignment at read time (using dummy/null values) seems
>> to provide the benefits of both efficient vectorized reads and stitching as
>> well as support for partial column updates. Would you agree?
>>
>> ~ Anurag
>>
>> On Fri, Feb 13, 2026 at 9:33 AM Micah Kornfield <[email protected]>
>> wrote:
>>
>>> Hi Anurag,
>>>
>>>> Data Representation: Sparse column files are preferred for compact
>>>> representation and are better suited for partial column updates. We can
>>>> optimize sparse representation for vectorized reads by filling in null
>>>> or default values at read time for missing positions from the base file,
>>>> which avoids joins during reads.
>>>
>>>
>>> This seems like a classic MoR vs CoW trade-off.  But it seems like maybe
>>> both sparse and full should be available (I understand this adds
>>> complexity).  For adding a new column or completely updating a new column,
>>> the performance would be better to prefill the data (otherwise one ends up
>>> duplicating the work that is already happening under the hood in parquet).
>>>
>>> Is there evidence to say that partial column updates are more common in
>>> practice then full rewrites?
>>>
>>> Thanks,
>>> Micah
>>>
>>>
>>> On Thu, Feb 12, 2026 at 3:32 AM Eduard Tudenhöfner <
>>> [email protected]> wrote:
>>>
>>>> Hey Anurag,
>>>>
>>>> I wasn't able to make it to the sync but was hoping to watch the
>>>> recording afterwards.
>>>> I'm curious what the reasons were for discarding the Parquet-native
>>>> approach. Could you share a summary from what was discussed in the sync
>>>> please on that topic?
>>>>
>>>> On Tue, Feb 10, 2026 at 8:20 PM Anurag Mantripragada <
>>>> [email protected]> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> Thank you for attending today's sync. Please find the meeting notes
>>>>> below. I apologize that we were unable to record the session due to
>>>>> attendees not having record access.
>>>>>
>>>>> Key updates and discussion points:
>>>>>
>>>>> *Decisions:*
>>>>>
>>>>>    - Table Format vs. Parquet: There is a general consensus that
>>>>>    column update support should reside in the table format. Consequently, 
>>>>> we
>>>>>    have discarded the Parquet-native approach.
>>>>>    - Metadata Representation: To maintain clean metadata and avoid
>>>>>    complex resolution logic for readers, the goal is to keep only one 
>>>>> metadata
>>>>>    file per column. However, achieving this is challenging if we support
>>>>>    partial updates, as multiple column files may exist for the same column
>>>>>    (See open questions).
>>>>>    - Data Representation: Sparse column files are preferred for
>>>>>    compact representation and are better suited for partial column 
>>>>> updates. We
>>>>>    can optimize sparse representation for vectorized reads by filling in 
>>>>> null
>>>>>    or default values at read time for missing positions from the base 
>>>>> file,
>>>>>    which avoids joins during reads.
>>>>>
>>>>>
>>>>> *Open Questions: *
>>>>>
>>>>>    - We are still determining what restrictions are necessary when
>>>>>    supporting partial updates. For instance, we need to decide whether to 
>>>>> add
>>>>>    a new column and subsequently allow partial updates on it. This would
>>>>>    involve managing both a base column file and subsequent update files.
>>>>>    - We need a better understanding of the use cases for partial
>>>>>    updates.
>>>>>    - We need to further discuss the handling of equality deletes.
>>>>>
>>>>> If I missed anything, or if others took notes, please share them here.
>>>>> Thanks!
>>>>>
>>>>> I will go ahead and update the doc with what we have discussed so we
>>>>> can continue next time from where we left off.
>>>>>
>>>>> ~ Anurag
>>>>>
>>>>> On Mon, Feb 9, 2026 at 11:55 AM Anurag Mantripragada <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> This design
>>>>>> <https://docs.google.com/document/d/1Bd7JVzgajA8-DozzeEE24mID_GLuz6iwj0g4TlcVJcs/edit?tab=t.0>
>>>>>> will be discussed tomorrow in a dedicated sync.
>>>>>>
>>>>>> Efficient column updates sync
>>>>>> Tuesday, February 10 · 9:00 – 10:00am
>>>>>> Time zone: America/Los_Angeles
>>>>>> Google Meet joining info
>>>>>> Video call link: https://meet.google.com/xsd-exug-tcd
>>>>>>
>>>>>> ~ Anurag
>>>>>>
>>>>>> On Fri, Feb 6, 2026 at 8:30 AM Anurag Mantripragada <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Hi Gabor,
>>>>>>>
>>>>>>> Thanks for the detailed example.
>>>>>>>
>>>>>>> I agree with Steven that Option 2 seems reasonable. I will add a
>>>>>>> section to the design doc regarding equality delete handling, and we can
>>>>>>> discuss this further during our meeting on Tuesday.
>>>>>>>
>>>>>>> ~Anurag
>>>>>>>
>>>>>>> On Fri, Feb 6, 2026 at 7:08 AM Steven Wu <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> > 1) When deleting with eq-deletes: If there is a column update on
>>>>>>>> the equality-filed ID we use for the delete, reject deletion
>>>>>>>> > 2) When adding a column update on a column that is part of the
>>>>>>>> equality field IDs in some delete, we reject the column update
>>>>>>>>
>>>>>>>> Gabor, this is a good scenario. The 2nd option makes sense to me,
>>>>>>>> since equality ids are like primary key fields. If we have the 2nd rule
>>>>>>>> enforced, the first option is not applicable anymore.
>>>>>>>>
>>>>>>>> On Fri, Feb 6, 2026 at 3:13 AM Gábor Kaszab <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hey,
>>>>>>>>>
>>>>>>>>> Thank you for the proposal, Anurag! I made a pass recently and I
>>>>>>>>> think there is some interference between column updates and equality
>>>>>>>>> deletes. Let me describe below:
>>>>>>>>>
>>>>>>>>> Steps:
>>>>>>>>>
>>>>>>>>> CREATE TABLE tbl (int a, int b);
>>>>>>>>>
>>>>>>>>> INSERT INTO tbl VALUES (1, 11), (2, 22);  -- creates the base data
>>>>>>>>> file
>>>>>>>>>
>>>>>>>>> DELETE FROM tbl WHERE b=11;               -- creates an equality
>>>>>>>>> delete file
>>>>>>>>>
>>>>>>>>> UPDATE tbl SET b=11;                                   -- writes
>>>>>>>>> column update
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> SELECT * FROM tbl;
>>>>>>>>>
>>>>>>>>> Expected result:
>>>>>>>>>
>>>>>>>>> (2, 11)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Data and metadata created after the above steps:
>>>>>>>>>
>>>>>>>>> Base file
>>>>>>>>>
>>>>>>>>> (1, 11), (2, 22),
>>>>>>>>>
>>>>>>>>> seqnum=1
>>>>>>>>>
>>>>>>>>> EQ-delete
>>>>>>>>>
>>>>>>>>> b=11
>>>>>>>>>
>>>>>>>>> seqnum=2
>>>>>>>>>
>>>>>>>>> Column update
>>>>>>>>>
>>>>>>>>> Field ids: [field_id_for_col_b]
>>>>>>>>>
>>>>>>>>> seqnum=3
>>>>>>>>>
>>>>>>>>> Data file content: (dummy_value),(11)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Read steps:
>>>>>>>>>
>>>>>>>>>    1. Stitch base file with column updates in reader:
>>>>>>>>>
>>>>>>>>> Rows: (1,dummy_value), (2,11) (Note, dummy value can be either
>>>>>>>>> null, or 11, see the proposal for more details)
>>>>>>>>>
>>>>>>>>> Seqnum for base file=1
>>>>>>>>>
>>>>>>>>> Seqnum for column update=3
>>>>>>>>>
>>>>>>>>>    2. Apply eq-delete b=11, seqnum=3 on the stitched result
>>>>>>>>>    3. Query result depends on which seqnum we carry forward to
>>>>>>>>>    compare with the eq-delete's seqnum, but it's not correct in any 
>>>>>>>>> of the
>>>>>>>>>    cases
>>>>>>>>>       1. Use seqnum from base file: we get either an empty result
>>>>>>>>>       if 'dummy_value' is 11 or we get (1, null) otherwise
>>>>>>>>>       2. Use seqnum from last update file: don't delete any rows,
>>>>>>>>>       result set is (1, dummy_value),(2,11)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Problem:
>>>>>>>>>
>>>>>>>>> EQ-delete should be applied midway applying the column updates to
>>>>>>>>> the base file based on sequence number, during the stitching process. 
>>>>>>>>> If
>>>>>>>>> I'm not mistaken, this is not feasible with the way readers work.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Proposal:
>>>>>>>>>
>>>>>>>>> Don't allow equality deletes together with column updates.
>>>>>>>>>
>>>>>>>>>   1) When deleting with eq-deletes: If there is a column update on
>>>>>>>>> the equality-filed ID we use for the delete, reject deletion
>>>>>>>>>
>>>>>>>>>   2) When adding a column update on a column that is part of the
>>>>>>>>> equality field IDs in some delete, we reject the column update
>>>>>>>>>
>>>>>>>>> Alternatively, column updates could be controlled by a property of
>>>>>>>>> the table (immutable), and reject eq-deletes if the property indicates
>>>>>>>>> column updates are turned on for the table
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Let me know what you think!
>>>>>>>>>
>>>>>>>>> Best Regards,
>>>>>>>>>
>>>>>>>>> Gabor
>>>>>>>>>
>>>>>>>>> Anurag Mantripragada <[email protected]> ezt írta
>>>>>>>>> (időpont: 2026. jan. 28., Sze, 3:31):
>>>>>>>>>
>>>>>>>>>> Thank you everyone for the initial review comments. It is
>>>>>>>>>> exciting to see so much interest in this proposal.
>>>>>>>>>>
>>>>>>>>>> I am currently reviewing and responding to each comment. The
>>>>>>>>>> general themes of the feedback so far include:
>>>>>>>>>> - Including partial updates (column updates on a subset of rows
>>>>>>>>>> in a table).
>>>>>>>>>> - Adding details on how SQL engines will write the update files.
>>>>>>>>>> - Adding details on split planning and row alignment for update
>>>>>>>>>> files.
>>>>>>>>>>
>>>>>>>>>> I will think through these points and update the design
>>>>>>>>>> accordingly.
>>>>>>>>>>
>>>>>>>>>> Best
>>>>>>>>>> Anurag
>>>>>>>>>>
>>>>>>>>>> On Tue, Jan 27, 2026 at 6:25 PM Anurag Mantripragada <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Xiangin,
>>>>>>>>>>>
>>>>>>>>>>> Happy to learn from your experience in supporting
>>>>>>>>>>> backfill use-cases. Please feel free to review the proposal and add 
>>>>>>>>>>> your
>>>>>>>>>>> comments. I will wait for a couple of days more to ensure everyone 
>>>>>>>>>>> has a
>>>>>>>>>>> chance to review the proposal.
>>>>>>>>>>>
>>>>>>>>>>> ~ Anurag
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jan 27, 2026 at 6:42 AM Xianjin Ye <[email protected]>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Anurag and Peter,
>>>>>>>>>>>>
>>>>>>>>>>>> It’s great to see the partial column update has gained great
>>>>>>>>>>>> interest in the community. I internally built a BackfillColumns 
>>>>>>>>>>>> action to
>>>>>>>>>>>> efficiently backfill columns(by writing the partial columns only 
>>>>>>>>>>>> and copies
>>>>>>>>>>>> the binary data of other columns into a new DataFile). The speedup 
>>>>>>>>>>>> could be
>>>>>>>>>>>> 10x for wide tables but the write amplification is still there. I 
>>>>>>>>>>>> would be
>>>>>>>>>>>> happy to collaborate on the work and eliminate the write 
>>>>>>>>>>>> amplification.
>>>>>>>>>>>>
>>>>>>>>>>>> On 2026/01/27 10:12:54 Péter Váry wrote:
>>>>>>>>>>>> > Hi Anurag,
>>>>>>>>>>>> >
>>>>>>>>>>>> > It’s great to see how much interest there is in the community
>>>>>>>>>>>> around this
>>>>>>>>>>>> > potential new feature. Gábor and I have actually submitted an
>>>>>>>>>>>> Iceberg
>>>>>>>>>>>> > Summit talk proposal on this topic, and we would be very
>>>>>>>>>>>> happy to
>>>>>>>>>>>> > collaborate on the work. I was mainly waiting for the File
>>>>>>>>>>>> Format API to be
>>>>>>>>>>>> > finalized, as I believe this feature should build on top of
>>>>>>>>>>>> it.
>>>>>>>>>>>> >
>>>>>>>>>>>> > For reference, our related work includes:
>>>>>>>>>>>> >
>>>>>>>>>>>> >    - *Dev list thread:*
>>>>>>>>>>>> >
>>>>>>>>>>>> https://lists.apache.org/thread/h0941sdq9jwrb6sj0pjfjjxov8tx7ov9
>>>>>>>>>>>> >    - *Proposal document:*
>>>>>>>>>>>> >
>>>>>>>>>>>> https://docs.google.com/document/d/1OHuZ6RyzZvCOQ6UQoV84GzwVp3UPiu_cfXClsOi03ww
>>>>>>>>>>>> >    (not shared widely yet)
>>>>>>>>>>>> >    - *Performance testing PR for readers and writers:*
>>>>>>>>>>>> >    https://github.com/apache/iceberg/pull/13306
>>>>>>>>>>>> >
>>>>>>>>>>>> > During earlier discussions about possible metadata changes,
>>>>>>>>>>>> another option
>>>>>>>>>>>> > came up that hasn’t been documented yet: separating planner
>>>>>>>>>>>> metadata from
>>>>>>>>>>>> > reader metadata. Since the planner does not need to know
>>>>>>>>>>>> about the actual
>>>>>>>>>>>> > files, we could store the file composition in a separate file
>>>>>>>>>>>> (potentially
>>>>>>>>>>>> > a Puffin file). This file could hold the column_files
>>>>>>>>>>>> metadata, while the
>>>>>>>>>>>> > manifest would reference the Puffin file and blob position
>>>>>>>>>>>> instead of the
>>>>>>>>>>>> > data filename.
>>>>>>>>>>>> > This approach has the advantage of keeping the existing
>>>>>>>>>>>> metadata largely
>>>>>>>>>>>> > intact, and it could also give us a natural place later to
>>>>>>>>>>>> add file-level
>>>>>>>>>>>> > indexes or Bloom filters for use during reads or secondary
>>>>>>>>>>>> filtering. The
>>>>>>>>>>>> > downsides are the additional files and the increased
>>>>>>>>>>>> complexity of
>>>>>>>>>>>> > identifying files that are no longer referenced by the table,
>>>>>>>>>>>> so this may
>>>>>>>>>>>> > not be an ideal solution.
>>>>>>>>>>>> >
>>>>>>>>>>>> > I do have some concerns about the MoR metadata proposal
>>>>>>>>>>>> described in the
>>>>>>>>>>>> > document. At first glance, it seems to complicate distributed
>>>>>>>>>>>> planning, as
>>>>>>>>>>>> > all entries for a given file would need to be collected and
>>>>>>>>>>>> merged to
>>>>>>>>>>>> > provide the information required by both the planner and the
>>>>>>>>>>>> reader.
>>>>>>>>>>>> > Additionally, when a new column is added or updated, we would
>>>>>>>>>>>> still need to
>>>>>>>>>>>> > add a new metadata entry for every existing data file. If we
>>>>>>>>>>>> immediately
>>>>>>>>>>>> > write out the merged metadata, the total number of entries
>>>>>>>>>>>> remains the
>>>>>>>>>>>> > same. The main benefit is avoiding rewriting statistics,
>>>>>>>>>>>> which can be
>>>>>>>>>>>> > significant, but this comes at the cost of increased planning
>>>>>>>>>>>> complexity.
>>>>>>>>>>>> > If we choose to store the merged statistics in the
>>>>>>>>>>>> column_families entry, I
>>>>>>>>>>>> > don’t see much benefit in excluding the rest of the metadata,
>>>>>>>>>>>> especially
>>>>>>>>>>>> > since including it would simplify the planning process.
>>>>>>>>>>>> >
>>>>>>>>>>>> > As Anton already pointed out, we should also discuss how this
>>>>>>>>>>>> change would
>>>>>>>>>>>> > affect split handling, particularly how to avoid double reads
>>>>>>>>>>>> when row
>>>>>>>>>>>> > groups are not aligned between the original data files and
>>>>>>>>>>>> the new column
>>>>>>>>>>>> > files.
>>>>>>>>>>>> >
>>>>>>>>>>>> > Finally, I’d like to see some discussion around the Java API
>>>>>>>>>>>> implications.
>>>>>>>>>>>> > In particular, what API changes are required, and how SQL
>>>>>>>>>>>> engines would
>>>>>>>>>>>> > perform updates. Since the new column files must have the
>>>>>>>>>>>> same number of
>>>>>>>>>>>> > rows as the original data files, with a strict one-to-one
>>>>>>>>>>>> relationship, SQL
>>>>>>>>>>>> > engines would need access to the source filename, position,
>>>>>>>>>>>> and deletion
>>>>>>>>>>>> > status in the DataFrame in order to generate the new files.
>>>>>>>>>>>> This is more
>>>>>>>>>>>> > involved than a simple update and deserves some explicit
>>>>>>>>>>>> consideration.
>>>>>>>>>>>> >
>>>>>>>>>>>> > Looking forward to your thoughts.
>>>>>>>>>>>> > Best regards,
>>>>>>>>>>>> > Peter
>>>>>>>>>>>> >
>>>>>>>>>>>> > On Tue, Jan 27, 2026, 03:58 Anurag Mantripragada <
>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>> > wrote:
>>>>>>>>>>>> >
>>>>>>>>>>>> > > Thanks Anton and others, for providing some initial
>>>>>>>>>>>> feedback. I will
>>>>>>>>>>>> > > address all your comments soon.
>>>>>>>>>>>> > >
>>>>>>>>>>>> > > On Mon, Jan 26, 2026 at 11:10 AM Anton Okolnychyi <
>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>> > > wrote:
>>>>>>>>>>>> > >
>>>>>>>>>>>> > >> I had a chance to see the proposal before it landed and I
>>>>>>>>>>>> think it is a
>>>>>>>>>>>> > >> cool idea and both presented approaches would likely work.
>>>>>>>>>>>> I am looking
>>>>>>>>>>>> > >> forward to discussing the tradeoffs and would encourage
>>>>>>>>>>>> everyone to
>>>>>>>>>>>> > >> push/polish each approach to see what issues can be
>>>>>>>>>>>> mitigated and what are
>>>>>>>>>>>> > >> fundamental.
>>>>>>>>>>>> > >>
>>>>>>>>>>>> > >> [1] Iceberg-native approach: better visibility into column
>>>>>>>>>>>> files from the
>>>>>>>>>>>> > >> metadata, potentially better concurrency for
>>>>>>>>>>>> non-overlapping column
>>>>>>>>>>>> > >> updates, no dep on Parquet.
>>>>>>>>>>>> > >> [2] Parquet-native approach: almost no changes to the
>>>>>>>>>>>> table format
>>>>>>>>>>>> > >> metadata beyond tracking of base files.
>>>>>>>>>>>> > >>
>>>>>>>>>>>> > >> I think [1] sounds a bit better on paper but I am worried
>>>>>>>>>>>> about the
>>>>>>>>>>>> > >> complexity in writers and readers (especially around
>>>>>>>>>>>> keeping row groups
>>>>>>>>>>>> > >> aligned and split planning). It would be great to cover
>>>>>>>>>>>> this in detail in
>>>>>>>>>>>> > >> the proposal.
>>>>>>>>>>>> > >>
>>>>>>>>>>>> > >> пн, 26 січ. 2026 р. о 09:00 Anurag Mantripragada <
>>>>>>>>>>>> > >> [email protected]> пише:
>>>>>>>>>>>> > >>
>>>>>>>>>>>> > >>> Hi all,
>>>>>>>>>>>> > >>>
>>>>>>>>>>>> > >>> "Wide tables" with thousands of columns present
>>>>>>>>>>>> significant challenges
>>>>>>>>>>>> > >>> for AI/ML workloads, particularly when only a subset of
>>>>>>>>>>>> columns needs to be
>>>>>>>>>>>> > >>> added or updated. Current Copy-on-Write (COW) and
>>>>>>>>>>>> Merge-on-Read (MOR)
>>>>>>>>>>>> > >>> operations in Iceberg apply at the row level, which leads
>>>>>>>>>>>> to substantial
>>>>>>>>>>>> > >>> write amplification in scenarios such as:
>>>>>>>>>>>> > >>>
>>>>>>>>>>>> > >>>    - Feature Backfilling & Column Updates: Adding new
>>>>>>>>>>>> feature columns
>>>>>>>>>>>> > >>>    (e.g., model embeddings) to petabyte-scale tables.
>>>>>>>>>>>> > >>>    - Model Score Updates: Refresh prediction scores after
>>>>>>>>>>>> retraining.
>>>>>>>>>>>> > >>>    - Embedding Refresh: Updating vector embeddings, which
>>>>>>>>>>>> currently
>>>>>>>>>>>> > >>>    triggers a rewrite of the entire row.
>>>>>>>>>>>> > >>>    - Incremental Feature Computation: Daily updates to a
>>>>>>>>>>>> small fraction
>>>>>>>>>>>> > >>>    of features in wide tables.
>>>>>>>>>>>> > >>>
>>>>>>>>>>>> > >>> With the Iceberg V4 proposal introducing single-file
>>>>>>>>>>>> commits and column
>>>>>>>>>>>> > >>> stats improvements, this is an ideal time to address
>>>>>>>>>>>> column-level updates
>>>>>>>>>>>> > >>> to better support these use cases.
>>>>>>>>>>>> > >>>
>>>>>>>>>>>> > >>> I have drafted a proposal that explores both table-format
>>>>>>>>>>>> enhancements
>>>>>>>>>>>> > >>> and file-format (Parquet) changes to enable more
>>>>>>>>>>>> efficient updates.
>>>>>>>>>>>> > >>>
>>>>>>>>>>>> > >>> Proposal Details:
>>>>>>>>>>>> > >>> - GitHub Issue: #15146 <
>>>>>>>>>>>> https://github.com/apache/iceberg/issues/15146>
>>>>>>>>>>>> > >>> - Design Document: Efficient Column Updates in Iceberg
>>>>>>>>>>>> > >>> <
>>>>>>>>>>>> https://docs.google.com/document/d/1Bd7JVzgajA8-DozzeEE24mID_GLuz6iwj0g4TlcVJcs/edit?tab=t.0
>>>>>>>>>>>> >
>>>>>>>>>>>> > >>>
>>>>>>>>>>>> > >>> Next Steps:
>>>>>>>>>>>> > >>> I plan to create POCs to benchmark the approaches
>>>>>>>>>>>> described in the
>>>>>>>>>>>> > >>> document.
>>>>>>>>>>>> > >>>
>>>>>>>>>>>> > >>> Please review the proposal and share your feedback.
>>>>>>>>>>>> > >>>
>>>>>>>>>>>> > >>> Thanks,
>>>>>>>>>>>> > >>> Anurag
>>>>>>>>>>>> > >>>
>>>>>>>>>>>> > >>
>>>>>>>>>>>> >
>>>>>>>>>>>>
>>>>>>>>>>>

Re: [Discuss] Efficient column updates in Iceberg

Reply via email to