Re: [Discuss] Efficient column updates in Iceberg

Péter Váry Fri, 20 Feb 2026 01:28:17 -0800

In some scenarios, keeping files vertically split can be
advantageous,especially for tables with many columns that have very
different characteristics. For example, a table might contain numerous
boolean or int/long feature columns alongside large binary blobs, text
fields, or even image data. Storing these groups of columns in separate
Parquet files can improve both encoding efficiency and query performance.
We could introduce a sort‑order-like mechanism that defines the desired
column layout for the table, and let compaction jobs enforce the
appropriate column‑family structure when files are compacted.


Engines would remain free to merge column files or perform full
copy‑on‑write rewrites when wide updates occur. However, I would avoid
adding extra complexity by trying to support this directly in the commit or
write paths, especially since the value of compaction varies significantly
across different datasets and use cases.

Shawn Chang <[email protected]> ezt írta (időpont: 2026. febr. 17., K,
2:44):

> Hi all,
>
> Just got a chance to follow up on the discussion here. Making column files
> additive to the existing base files seems reasonable to me, but I think it
> also implies that compaction is a must, similar to how we manage delete
> files today. An important difference is that updates usually occur much
> more frequently than deletes.
>
> This may be a separate concern, but have we considered whether compaction
> should be more closely tied to writes? For example, triggering a rewrite
> once we have X number of column files, rather than relying solely on an
> independant compaction job. There can be minor compactions to just collapse
> one file set (base file + column files) so we don't block writers too much.
>
> Best,
> Shawn
>
> On Mon, Feb 16, 2026 at 7:23 AM Gábor Kaszab <[email protected]>
> wrote:
>
>> Hey All,
>>
>> Thanks Anurag for the summary!
>>
>> I regret we don't have a recording for the sync, but I had the impression
>> that, even though there was a lengthy discussion about the implementation
>> requirements for partial updates, there wasn't a strong consensus around
>> the need and there were no strong use cases to justify partial updates
>> either. Let me sum up where I see we are at now:
>>
>> *Scope of the updates*
>>
>> *1) Full column updates*
>> There is a consensus and common understanding that this use case makes
>> sense. If this was the only supported use-case, the implementation would be
>> relatively simple. We could guarantee there is no overlap in column updates
>> by deduplicating the field IDs in the column update metadata. E.g. Let's
>> say we have a column update on columns {1,2} and we write another column
>> update for {2,3}: we can change the metadata for the first one to only
>> cover {1} and not {1,2}. With this the write and the read/stitching process
>> is also straightforward (if we decide not to support equality deletes
>> together with column updates).
>>
>> Both row matching approaches could work here:
>>     - row number matching update files, where we fill the deleted rows
>> with an arbitrary value (preferably null)
>>     - sparse update files with some auxiliary column written into the
>> column update file, like row position in base file
>>
>> *2) Partial column updates (row-level)*
>> I see 2 use cases mentioned for this: bug-fixing a subset of rows,
>> updating features for active users
>> My initial impression here is that whether to use column updates or not
>> heavily depends on the selectivity of the partial update queries. I'm sure
>> there is a percentage of the affected rows where if we go below it's simply
>> better to use the traditional row level updates (cow/mor). I'm not entirely
>> convinced that covering these scenarios is worth the extra complexity here:
>>     - We can't deduplicate the column updates by field IDs on the
>> metadata-side
>>     - We have two options for writers:
>>          - Merge the existing column update files themselves when writing
>> a new one with an overlap of field Ids. No need to sort out the different
>> column updates files and merge them on the read side, but there is overhead
>> on write side
>>         - Don't bother merging existing column updates when writing a new
>> one. This makes overhead on the read side.
>>
>> Handling of sparse update files is a must here, with the chance for
>> optimisation if all the rows are covered with the update file, as Micah
>> suggested.
>>
>> To sum up, I think to justify this approach we need to have strong
>> use-cases and measurements to verify that the extra complexity results
>> convincingly better results compared to existing CoW/MoR approaches.
>>
>> *3) Partial column updates (file-level)*
>> This option wasn't brought up during our conversation but might be worth
>> considering. This is basically a middleground between the above two
>> approaches. Partial updates are allowed as long as they affect entire data
>> files, and it's allowed to only cover a subset of the files. One use-case
>> would be to do column updates per partition for instance.
>>
>> With this approach the metadata representation could be as simple as in
>> 1), where we can deduplicate the updates files by field IDs. Also there is
>> no write and read overhead on top of 1) apart from the verification step to
>> ensure that the WHERE filter on the update is doing the split on file
>> boundaries.
>> Also similarly to 1), sparse update files weren't a must here, we could
>> consider row-matching update files too.
>>
>> *Row alignment*
>> Sparse update files are required for row-level partial updates, but if we
>> decide to go with any of the other options we could also evaluate the "row
>> count matching" approach too. Even though it requires filling the missing
>> rows with arbitrary values (null seems a good candidate) it would result in
>> less write overhead (no need to write row position) and read overhead (no
>> need to join rows by row position) too that could worth the inconvenience
>> of having 'invalid' but inaccessible values in the files. The num nulls
>> stats being off is a good argument against this, but I think we could have
>> a way of fixing this too by keeping track of how many rows were deleted
>> (and subtract this value from the num nulls counter returned by the writer).
>>
>>
>> *Next steps*
>> I'm actively working on a very basic PoC implementation where we would be
>> able to test the different approaches comparing pros and cons so that we
>> can make a decision on the above questions. I'll sync with Anurag on this
>> and will let you know once we have something.
>>
>> Best Regards,
>> Gabor
>>
>>
>> Micah Kornfield <[email protected]> ezt írta (időpont: 2026. febr.
>> 14., Szo, 2:20):
>>
>>> Given that, the sparse representation with alignment at read time (using
>>>> dummy/null values) seems to provide the benefits of both efficient
>>>> vectorized reads and stitching as well as support for partial column
>>>> updates. Would you agree?
>>>
>>>
>>> Thinking more about it, I think the sparse approach is actually a
>>> superset set approach, so it is not a concern.  If writers want they can
>>> write out the fully populated columns with position indexes from 1 to N,
>>> and readers can take an optimized path if they detect the number of rows in
>>> the update is equal to the number of base rows.
>>>
>>> I still think there is a question on what writers should do (i.e. when
>>> do they decide to duplicate data instead of trying to give sparse updates)
>>> but that is an implementation question and not necessarily something that
>>> needs to block spec work.
>>>
>>> Cheers,
>>> Micah
>>>
>>> On Fri, Feb 13, 2026 at 11:29 AM Anurag Mantripragada <
>>> [email protected]> wrote:
>>>
>>>> Hi Micah,
>>>>
>>>> This seems like a classic MoR vs CoW trade-off.  But it seems like
>>>>> maybe both sparse and full should be available (I understand this adds
>>>>> complexity). For adding a new column or completely updating a new column,
>>>>> the performance would be better to prefill the data
>>>>
>>>>
>>>> Our internal use cases are very similar to what you describe. We
>>>> primarily deal with full column updates. However, the feedback on the
>>>> proposal from the wider community indicated that partial updates (e.g.,
>>>> bug-fixing a subset of rows, updating features for active users) are also a
>>>> very common and critical use case.
>>>>
>>>> Is there evidence to say that partial column updates are more common in
>>>>> practice then full rewrites?
>>>>
>>>>
>>>> Personally, I don't have hard data on which use case is more common in
>>>> the wild, only that both appear to be important. I also agree that a good
>>>> long term solution should support both strategies. Given that, the sparse
>>>> representation with alignment at read time (using dummy/null values) seems
>>>> to provide the benefits of both efficient vectorized reads and stitching as
>>>> well as support for partial column updates. Would you agree?
>>>>
>>>> ~ Anurag
>>>>
>>>> On Fri, Feb 13, 2026 at 9:33 AM Micah Kornfield <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi Anurag,
>>>>>
>>>>>> Data Representation: Sparse column files are preferred for compact
>>>>>> representation and are better suited for partial column updates. We can
>>>>>> optimize sparse representation for vectorized reads by filling in
>>>>>> null or default values at read time for missing positions from the base
>>>>>> file, which avoids joins during reads.
>>>>>
>>>>>
>>>>> This seems like a classic MoR vs CoW trade-off.  But it seems like
>>>>> maybe both sparse and full should be available (I understand this adds
>>>>> complexity).  For adding a new column or completely updating a new column,
>>>>> the performance would be better to prefill the data (otherwise one ends up
>>>>> duplicating the work that is already happening under the hood in parquet).
>>>>>
>>>>> Is there evidence to say that partial column updates are more common
>>>>> in practice then full rewrites?
>>>>>
>>>>> Thanks,
>>>>> Micah
>>>>>
>>>>>
>>>>> On Thu, Feb 12, 2026 at 3:32 AM Eduard Tudenhöfner <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Hey Anurag,
>>>>>>
>>>>>> I wasn't able to make it to the sync but was hoping to watch the
>>>>>> recording afterwards.
>>>>>> I'm curious what the reasons were for discarding the Parquet-native
>>>>>> approach. Could you share a summary from what was discussed in the sync
>>>>>> please on that topic?
>>>>>>
>>>>>> On Tue, Feb 10, 2026 at 8:20 PM Anurag Mantripragada <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> Thank you for attending today's sync. Please find the meeting notes
>>>>>>> below. I apologize that we were unable to record the session due to
>>>>>>> attendees not having record access.
>>>>>>>
>>>>>>> Key updates and discussion points:
>>>>>>>
>>>>>>> *Decisions:*
>>>>>>>
>>>>>>>    - Table Format vs. Parquet: There is a general consensus that
>>>>>>>    column update support should reside in the table format. 
>>>>>>> Consequently, we
>>>>>>>    have discarded the Parquet-native approach.
>>>>>>>    - Metadata Representation: To maintain clean metadata and avoid
>>>>>>>    complex resolution logic for readers, the goal is to keep only one 
>>>>>>> metadata
>>>>>>>    file per column. However, achieving this is challenging if we support
>>>>>>>    partial updates, as multiple column files may exist for the same 
>>>>>>> column
>>>>>>>    (See open questions).
>>>>>>>    - Data Representation: Sparse column files are preferred for
>>>>>>>    compact representation and are better suited for partial column 
>>>>>>> updates. We
>>>>>>>    can optimize sparse representation for vectorized reads by filling 
>>>>>>> in null
>>>>>>>    or default values at read time for missing positions from the base 
>>>>>>> file,
>>>>>>>    which avoids joins during reads.
>>>>>>>
>>>>>>>
>>>>>>> *Open Questions: *
>>>>>>>
>>>>>>>    - We are still determining what restrictions are necessary when
>>>>>>>    supporting partial updates. For instance, we need to decide whether 
>>>>>>> to add
>>>>>>>    a new column and subsequently allow partial updates on it. This would
>>>>>>>    involve managing both a base column file and subsequent update files.
>>>>>>>    - We need a better understanding of the use cases for partial
>>>>>>>    updates.
>>>>>>>    - We need to further discuss the handling of equality deletes.
>>>>>>>
>>>>>>> If I missed anything, or if others took notes, please share them
>>>>>>> here. Thanks!
>>>>>>>
>>>>>>> I will go ahead and update the doc with what we have discussed so we
>>>>>>> can continue next time from where we left off.
>>>>>>>
>>>>>>> ~ Anurag
>>>>>>>
>>>>>>> On Mon, Feb 9, 2026 at 11:55 AM Anurag Mantripragada <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> This design
>>>>>>>> <https://docs.google.com/document/d/1Bd7JVzgajA8-DozzeEE24mID_GLuz6iwj0g4TlcVJcs/edit?tab=t.0>
>>>>>>>> will be discussed tomorrow in a dedicated sync.
>>>>>>>>
>>>>>>>> Efficient column updates sync
>>>>>>>> Tuesday, February 10 · 9:00 – 10:00am
>>>>>>>> Time zone: America/Los_Angeles
>>>>>>>> Google Meet joining info
>>>>>>>> Video call link: https://meet.google.com/xsd-exug-tcd
>>>>>>>>
>>>>>>>> ~ Anurag
>>>>>>>>
>>>>>>>> On Fri, Feb 6, 2026 at 8:30 AM Anurag Mantripragada <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> Hi Gabor,
>>>>>>>>>
>>>>>>>>> Thanks for the detailed example.
>>>>>>>>>
>>>>>>>>> I agree with Steven that Option 2 seems reasonable. I will add a
>>>>>>>>> section to the design doc regarding equality delete handling, and we 
>>>>>>>>> can
>>>>>>>>> discuss this further during our meeting on Tuesday.
>>>>>>>>>
>>>>>>>>> ~Anurag
>>>>>>>>>
>>>>>>>>> On Fri, Feb 6, 2026 at 7:08 AM Steven Wu <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> > 1) When deleting with eq-deletes: If there is a column update
>>>>>>>>>> on the equality-filed ID we use for the delete, reject deletion
>>>>>>>>>> > 2) When adding a column update on a column that is part of the
>>>>>>>>>> equality field IDs in some delete, we reject the column update
>>>>>>>>>>
>>>>>>>>>> Gabor, this is a good scenario. The 2nd option makes sense to me,
>>>>>>>>>> since equality ids are like primary key fields. If we have the 2nd 
>>>>>>>>>> rule
>>>>>>>>>> enforced, the first option is not applicable anymore.
>>>>>>>>>>
>>>>>>>>>> On Fri, Feb 6, 2026 at 3:13 AM Gábor Kaszab <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hey,
>>>>>>>>>>>
>>>>>>>>>>> Thank you for the proposal, Anurag! I made a pass recently and I
>>>>>>>>>>> think there is some interference between column updates and equality
>>>>>>>>>>> deletes. Let me describe below:
>>>>>>>>>>>
>>>>>>>>>>> Steps:
>>>>>>>>>>>
>>>>>>>>>>> CREATE TABLE tbl (int a, int b);
>>>>>>>>>>>
>>>>>>>>>>> INSERT INTO tbl VALUES (1, 11), (2, 22);  -- creates the base
>>>>>>>>>>> data file
>>>>>>>>>>>
>>>>>>>>>>> DELETE FROM tbl WHERE b=11;               -- creates an equality
>>>>>>>>>>> delete file
>>>>>>>>>>>
>>>>>>>>>>> UPDATE tbl SET b=11;                                   -- writes
>>>>>>>>>>> column update
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> SELECT * FROM tbl;
>>>>>>>>>>>
>>>>>>>>>>> Expected result:
>>>>>>>>>>>
>>>>>>>>>>> (2, 11)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Data and metadata created after the above steps:
>>>>>>>>>>>
>>>>>>>>>>> Base file
>>>>>>>>>>>
>>>>>>>>>>> (1, 11), (2, 22),
>>>>>>>>>>>
>>>>>>>>>>> seqnum=1
>>>>>>>>>>>
>>>>>>>>>>> EQ-delete
>>>>>>>>>>>
>>>>>>>>>>> b=11
>>>>>>>>>>>
>>>>>>>>>>> seqnum=2
>>>>>>>>>>>
>>>>>>>>>>> Column update
>>>>>>>>>>>
>>>>>>>>>>> Field ids: [field_id_for_col_b]
>>>>>>>>>>>
>>>>>>>>>>> seqnum=3
>>>>>>>>>>>
>>>>>>>>>>> Data file content: (dummy_value),(11)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Read steps:
>>>>>>>>>>>
>>>>>>>>>>>    1. Stitch base file with column updates in reader:
>>>>>>>>>>>
>>>>>>>>>>> Rows: (1,dummy_value), (2,11) (Note, dummy value can be either
>>>>>>>>>>> null, or 11, see the proposal for more details)
>>>>>>>>>>>
>>>>>>>>>>> Seqnum for base file=1
>>>>>>>>>>>
>>>>>>>>>>> Seqnum for column update=3
>>>>>>>>>>>
>>>>>>>>>>>    2. Apply eq-delete b=11, seqnum=3 on the stitched result
>>>>>>>>>>>    3. Query result depends on which seqnum we carry forward to
>>>>>>>>>>>    compare with the eq-delete's seqnum, but it's not correct in any 
>>>>>>>>>>> of the
>>>>>>>>>>>    cases
>>>>>>>>>>>       1. Use seqnum from base file: we get either an empty
>>>>>>>>>>>       result if 'dummy_value' is 11 or we get (1, null) otherwise
>>>>>>>>>>>       2. Use seqnum from last update file: don't delete any
>>>>>>>>>>>       rows, result set is (1, dummy_value),(2,11)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Problem:
>>>>>>>>>>>
>>>>>>>>>>> EQ-delete should be applied midway applying the column updates
>>>>>>>>>>> to the base file based on sequence number, during the stitching 
>>>>>>>>>>> process. If
>>>>>>>>>>> I'm not mistaken, this is not feasible with the way readers work.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Proposal:
>>>>>>>>>>>
>>>>>>>>>>> Don't allow equality deletes together with column updates.
>>>>>>>>>>>
>>>>>>>>>>>   1) When deleting with eq-deletes: If there is a column update
>>>>>>>>>>> on the equality-filed ID we use for the delete, reject deletion
>>>>>>>>>>>
>>>>>>>>>>>   2) When adding a column update on a column that is part of the
>>>>>>>>>>> equality field IDs in some delete, we reject the column update
>>>>>>>>>>>
>>>>>>>>>>> Alternatively, column updates could be controlled by a property
>>>>>>>>>>> of the table (immutable), and reject eq-deletes if the property 
>>>>>>>>>>> indicates
>>>>>>>>>>> column updates are turned on for the table
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Let me know what you think!
>>>>>>>>>>>
>>>>>>>>>>> Best Regards,
>>>>>>>>>>>
>>>>>>>>>>> Gabor
>>>>>>>>>>>
>>>>>>>>>>> Anurag Mantripragada <[email protected]> ezt írta
>>>>>>>>>>> (időpont: 2026. jan. 28., Sze, 3:31):
>>>>>>>>>>>
>>>>>>>>>>>> Thank you everyone for the initial review comments. It is
>>>>>>>>>>>> exciting to see so much interest in this proposal.
>>>>>>>>>>>>
>>>>>>>>>>>> I am currently reviewing and responding to each comment. The
>>>>>>>>>>>> general themes of the feedback so far include:
>>>>>>>>>>>> - Including partial updates (column updates on a subset of rows
>>>>>>>>>>>> in a table).
>>>>>>>>>>>> - Adding details on how SQL engines will write the update files.
>>>>>>>>>>>> - Adding details on split planning and row alignment for update
>>>>>>>>>>>> files.
>>>>>>>>>>>>
>>>>>>>>>>>> I will think through these points and update the design
>>>>>>>>>>>> accordingly.
>>>>>>>>>>>>
>>>>>>>>>>>> Best
>>>>>>>>>>>> Anurag
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Jan 27, 2026 at 6:25 PM Anurag Mantripragada <
>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Xiangin,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Happy to learn from your experience in supporting
>>>>>>>>>>>>> backfill use-cases. Please feel free to review the proposal and 
>>>>>>>>>>>>> add your
>>>>>>>>>>>>> comments. I will wait for a couple of days more to ensure 
>>>>>>>>>>>>> everyone has a
>>>>>>>>>>>>> chance to review the proposal.
>>>>>>>>>>>>>
>>>>>>>>>>>>> ~ Anurag
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Jan 27, 2026 at 6:42 AM Xianjin Ye <[email protected]>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Anurag and Peter,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> It’s great to see the partial column update has gained great
>>>>>>>>>>>>>> interest in the community. I internally built a BackfillColumns 
>>>>>>>>>>>>>> action to
>>>>>>>>>>>>>> efficiently backfill columns(by writing the partial columns only 
>>>>>>>>>>>>>> and copies
>>>>>>>>>>>>>> the binary data of other columns into a new DataFile). The 
>>>>>>>>>>>>>> speedup could be
>>>>>>>>>>>>>> 10x for wide tables but the write amplification is still there. 
>>>>>>>>>>>>>> I would be
>>>>>>>>>>>>>> happy to collaborate on the work and eliminate the write 
>>>>>>>>>>>>>> amplification.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 2026/01/27 10:12:54 Péter Váry wrote:
>>>>>>>>>>>>>> > Hi Anurag,
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > It’s great to see how much interest there is in the
>>>>>>>>>>>>>> community around this
>>>>>>>>>>>>>> > potential new feature. Gábor and I have actually submitted
>>>>>>>>>>>>>> an Iceberg
>>>>>>>>>>>>>> > Summit talk proposal on this topic, and we would be very
>>>>>>>>>>>>>> happy to
>>>>>>>>>>>>>> > collaborate on the work. I was mainly waiting for the File
>>>>>>>>>>>>>> Format API to be
>>>>>>>>>>>>>> > finalized, as I believe this feature should build on top of
>>>>>>>>>>>>>> it.
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > For reference, our related work includes:
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> >    - *Dev list thread:*
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> https://lists.apache.org/thread/h0941sdq9jwrb6sj0pjfjjxov8tx7ov9
>>>>>>>>>>>>>> >    - *Proposal document:*
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> https://docs.google.com/document/d/1OHuZ6RyzZvCOQ6UQoV84GzwVp3UPiu_cfXClsOi03ww
>>>>>>>>>>>>>> >    (not shared widely yet)
>>>>>>>>>>>>>> >    - *Performance testing PR for readers and writers:*
>>>>>>>>>>>>>> >    https://github.com/apache/iceberg/pull/13306
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > During earlier discussions about possible metadata changes,
>>>>>>>>>>>>>> another option
>>>>>>>>>>>>>> > came up that hasn’t been documented yet: separating planner
>>>>>>>>>>>>>> metadata from
>>>>>>>>>>>>>> > reader metadata. Since the planner does not need to know
>>>>>>>>>>>>>> about the actual
>>>>>>>>>>>>>> > files, we could store the file composition in a separate
>>>>>>>>>>>>>> file (potentially
>>>>>>>>>>>>>> > a Puffin file). This file could hold the column_files
>>>>>>>>>>>>>> metadata, while the
>>>>>>>>>>>>>> > manifest would reference the Puffin file and blob position
>>>>>>>>>>>>>> instead of the
>>>>>>>>>>>>>> > data filename.
>>>>>>>>>>>>>> > This approach has the advantage of keeping the existing
>>>>>>>>>>>>>> metadata largely
>>>>>>>>>>>>>> > intact, and it could also give us a natural place later to
>>>>>>>>>>>>>> add file-level
>>>>>>>>>>>>>> > indexes or Bloom filters for use during reads or secondary
>>>>>>>>>>>>>> filtering. The
>>>>>>>>>>>>>> > downsides are the additional files and the increased
>>>>>>>>>>>>>> complexity of
>>>>>>>>>>>>>> > identifying files that are no longer referenced by the
>>>>>>>>>>>>>> table, so this may
>>>>>>>>>>>>>> > not be an ideal solution.
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > I do have some concerns about the MoR metadata proposal
>>>>>>>>>>>>>> described in the
>>>>>>>>>>>>>> > document. At first glance, it seems to complicate
>>>>>>>>>>>>>> distributed planning, as
>>>>>>>>>>>>>> > all entries for a given file would need to be collected and
>>>>>>>>>>>>>> merged to
>>>>>>>>>>>>>> > provide the information required by both the planner and
>>>>>>>>>>>>>> the reader.
>>>>>>>>>>>>>> > Additionally, when a new column is added or updated, we
>>>>>>>>>>>>>> would still need to
>>>>>>>>>>>>>> > add a new metadata entry for every existing data file. If
>>>>>>>>>>>>>> we immediately
>>>>>>>>>>>>>> > write out the merged metadata, the total number of entries
>>>>>>>>>>>>>> remains the
>>>>>>>>>>>>>> > same. The main benefit is avoiding rewriting statistics,
>>>>>>>>>>>>>> which can be
>>>>>>>>>>>>>> > significant, but this comes at the cost of increased
>>>>>>>>>>>>>> planning complexity.
>>>>>>>>>>>>>> > If we choose to store the merged statistics in the
>>>>>>>>>>>>>> column_families entry, I
>>>>>>>>>>>>>> > don’t see much benefit in excluding the rest of the
>>>>>>>>>>>>>> metadata, especially
>>>>>>>>>>>>>> > since including it would simplify the planning process.
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > As Anton already pointed out, we should also discuss how
>>>>>>>>>>>>>> this change would
>>>>>>>>>>>>>> > affect split handling, particularly how to avoid double
>>>>>>>>>>>>>> reads when row
>>>>>>>>>>>>>> > groups are not aligned between the original data files and
>>>>>>>>>>>>>> the new column
>>>>>>>>>>>>>> > files.
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > Finally, I’d like to see some discussion around the Java
>>>>>>>>>>>>>> API implications.
>>>>>>>>>>>>>> > In particular, what API changes are required, and how SQL
>>>>>>>>>>>>>> engines would
>>>>>>>>>>>>>> > perform updates. Since the new column files must have the
>>>>>>>>>>>>>> same number of
>>>>>>>>>>>>>> > rows as the original data files, with a strict one-to-one
>>>>>>>>>>>>>> relationship, SQL
>>>>>>>>>>>>>> > engines would need access to the source filename, position,
>>>>>>>>>>>>>> and deletion
>>>>>>>>>>>>>> > status in the DataFrame in order to generate the new files.
>>>>>>>>>>>>>> This is more
>>>>>>>>>>>>>> > involved than a simple update and deserves some explicit
>>>>>>>>>>>>>> consideration.
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > Looking forward to your thoughts.
>>>>>>>>>>>>>> > Best regards,
>>>>>>>>>>>>>> > Peter
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > On Tue, Jan 27, 2026, 03:58 Anurag Mantripragada <
>>>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>> > wrote:
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > > Thanks Anton and others, for providing some initial
>>>>>>>>>>>>>> feedback. I will
>>>>>>>>>>>>>> > > address all your comments soon.
>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>> > > On Mon, Jan 26, 2026 at 11:10 AM Anton Okolnychyi <
>>>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>> > > wrote:
>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>> > >> I had a chance to see the proposal before it landed and
>>>>>>>>>>>>>> I think it is a
>>>>>>>>>>>>>> > >> cool idea and both presented approaches would likely
>>>>>>>>>>>>>> work. I am looking
>>>>>>>>>>>>>> > >> forward to discussing the tradeoffs and would encourage
>>>>>>>>>>>>>> everyone to
>>>>>>>>>>>>>> > >> push/polish each approach to see what issues can be
>>>>>>>>>>>>>> mitigated and what are
>>>>>>>>>>>>>> > >> fundamental.
>>>>>>>>>>>>>> > >>
>>>>>>>>>>>>>> > >> [1] Iceberg-native approach: better visibility into
>>>>>>>>>>>>>> column files from the
>>>>>>>>>>>>>> > >> metadata, potentially better concurrency for
>>>>>>>>>>>>>> non-overlapping column
>>>>>>>>>>>>>> > >> updates, no dep on Parquet.
>>>>>>>>>>>>>> > >> [2] Parquet-native approach: almost no changes to the
>>>>>>>>>>>>>> table format
>>>>>>>>>>>>>> > >> metadata beyond tracking of base files.
>>>>>>>>>>>>>> > >>
>>>>>>>>>>>>>> > >> I think [1] sounds a bit better on paper but I am
>>>>>>>>>>>>>> worried about the
>>>>>>>>>>>>>> > >> complexity in writers and readers (especially around
>>>>>>>>>>>>>> keeping row groups
>>>>>>>>>>>>>> > >> aligned and split planning). It would be great to cover
>>>>>>>>>>>>>> this in detail in
>>>>>>>>>>>>>> > >> the proposal.
>>>>>>>>>>>>>> > >>
>>>>>>>>>>>>>> > >> пн, 26 січ. 2026 р. о 09:00 Anurag Mantripragada <
>>>>>>>>>>>>>> > >> [email protected]> пише:
>>>>>>>>>>>>>> > >>
>>>>>>>>>>>>>> > >>> Hi all,
>>>>>>>>>>>>>> > >>>
>>>>>>>>>>>>>> > >>> "Wide tables" with thousands of columns present
>>>>>>>>>>>>>> significant challenges
>>>>>>>>>>>>>> > >>> for AI/ML workloads, particularly when only a subset of
>>>>>>>>>>>>>> columns needs to be
>>>>>>>>>>>>>> > >>> added or updated. Current Copy-on-Write (COW) and
>>>>>>>>>>>>>> Merge-on-Read (MOR)
>>>>>>>>>>>>>> > >>> operations in Iceberg apply at the row level, which
>>>>>>>>>>>>>> leads to substantial
>>>>>>>>>>>>>> > >>> write amplification in scenarios such as:
>>>>>>>>>>>>>> > >>>
>>>>>>>>>>>>>> > >>>    - Feature Backfilling & Column Updates: Adding new
>>>>>>>>>>>>>> feature columns
>>>>>>>>>>>>>> > >>>    (e.g., model embeddings) to petabyte-scale tables.
>>>>>>>>>>>>>> > >>>    - Model Score Updates: Refresh prediction scores
>>>>>>>>>>>>>> after retraining.
>>>>>>>>>>>>>> > >>>    - Embedding Refresh: Updating vector embeddings,
>>>>>>>>>>>>>> which currently
>>>>>>>>>>>>>> > >>>    triggers a rewrite of the entire row.
>>>>>>>>>>>>>> > >>>    - Incremental Feature Computation: Daily updates to
>>>>>>>>>>>>>> a small fraction
>>>>>>>>>>>>>> > >>>    of features in wide tables.
>>>>>>>>>>>>>> > >>>
>>>>>>>>>>>>>> > >>> With the Iceberg V4 proposal introducing single-file
>>>>>>>>>>>>>> commits and column
>>>>>>>>>>>>>> > >>> stats improvements, this is an ideal time to address
>>>>>>>>>>>>>> column-level updates
>>>>>>>>>>>>>> > >>> to better support these use cases.
>>>>>>>>>>>>>> > >>>
>>>>>>>>>>>>>> > >>> I have drafted a proposal that explores both
>>>>>>>>>>>>>> table-format enhancements
>>>>>>>>>>>>>> > >>> and file-format (Parquet) changes to enable more
>>>>>>>>>>>>>> efficient updates.
>>>>>>>>>>>>>> > >>>
>>>>>>>>>>>>>> > >>> Proposal Details:
>>>>>>>>>>>>>> > >>> - GitHub Issue: #15146 <
>>>>>>>>>>>>>> https://github.com/apache/iceberg/issues/15146>
>>>>>>>>>>>>>> > >>> - Design Document: Efficient Column Updates in Iceberg
>>>>>>>>>>>>>> > >>> <
>>>>>>>>>>>>>> https://docs.google.com/document/d/1Bd7JVzgajA8-DozzeEE24mID_GLuz6iwj0g4TlcVJcs/edit?tab=t.0
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > >>>
>>>>>>>>>>>>>> > >>> Next Steps:
>>>>>>>>>>>>>> > >>> I plan to create POCs to benchmark the approaches
>>>>>>>>>>>>>> described in the
>>>>>>>>>>>>>> > >>> document.
>>>>>>>>>>>>>> > >>>
>>>>>>>>>>>>>> > >>> Please review the proposal and share your feedback.
>>>>>>>>>>>>>> > >>>
>>>>>>>>>>>>>> > >>> Thanks,
>>>>>>>>>>>>>> > >>> Anurag
>>>>>>>>>>>>>> > >>>
>>>>>>>>>>>>>> > >>
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>
>>>>>>>>>>>>>

Re: [Discuss] Efficient column updates in Iceberg

Reply via email to