Hi Gabor, Thanks for the detailed example.
I agree with Steven that Option 2 seems reasonable. I will add a section to the design doc regarding equality delete handling, and we can discuss this further during our meeting on Tuesday. ~Anurag On Fri, Feb 6, 2026 at 7:08 AM Steven Wu <[email protected]> wrote: > > 1) When deleting with eq-deletes: If there is a column update on the > equality-filed ID we use for the delete, reject deletion > > 2) When adding a column update on a column that is part of the equality > field IDs in some delete, we reject the column update > > Gabor, this is a good scenario. The 2nd option makes sense to me, since > equality ids are like primary key fields. If we have the 2nd rule enforced, > the first option is not applicable anymore. > > On Fri, Feb 6, 2026 at 3:13 AM Gábor Kaszab <[email protected]> > wrote: > >> Hey, >> >> Thank you for the proposal, Anurag! I made a pass recently and I think >> there is some interference between column updates and equality deletes. Let >> me describe below: >> >> Steps: >> >> CREATE TABLE tbl (int a, int b); >> >> INSERT INTO tbl VALUES (1, 11), (2, 22); -- creates the base data file >> >> DELETE FROM tbl WHERE b=11; -- creates an equality delete >> file >> >> UPDATE tbl SET b=11; -- writes column >> update >> >> >> >> SELECT * FROM tbl; >> >> Expected result: >> >> (2, 11) >> >> >> >> Data and metadata created after the above steps: >> >> Base file >> >> (1, 11), (2, 22), >> >> seqnum=1 >> >> EQ-delete >> >> b=11 >> >> seqnum=2 >> >> Column update >> >> Field ids: [field_id_for_col_b] >> >> seqnum=3 >> >> Data file content: (dummy_value),(11) >> >> >> >> Read steps: >> >> 1. Stitch base file with column updates in reader: >> >> Rows: (1,dummy_value), (2,11) (Note, dummy value can be either null, or >> 11, see the proposal for more details) >> >> Seqnum for base file=1 >> >> Seqnum for column update=3 >> >> 2. Apply eq-delete b=11, seqnum=3 on the stitched result >> 3. Query result depends on which seqnum we carry forward to compare >> with the eq-delete's seqnum, but it's not correct in any of the cases >> 1. Use seqnum from base file: we get either an empty result if >> 'dummy_value' is 11 or we get (1, null) otherwise >> 2. Use seqnum from last update file: don't delete any rows, result >> set is (1, dummy_value),(2,11) >> >> >> >> Problem: >> >> EQ-delete should be applied midway applying the column updates to the >> base file based on sequence number, during the stitching process. If I'm >> not mistaken, this is not feasible with the way readers work. >> >> >> Proposal: >> >> Don't allow equality deletes together with column updates. >> >> 1) When deleting with eq-deletes: If there is a column update on the >> equality-filed ID we use for the delete, reject deletion >> >> 2) When adding a column update on a column that is part of the equality >> field IDs in some delete, we reject the column update >> >> Alternatively, column updates could be controlled by a property of the >> table (immutable), and reject eq-deletes if the property indicates column >> updates are turned on for the table >> >> >> Let me know what you think! >> >> Best Regards, >> >> Gabor >> >> Anurag Mantripragada <[email protected]> ezt írta (időpont: 2026. >> jan. 28., Sze, 3:31): >> >>> Thank you everyone for the initial review comments. It is exciting to >>> see so much interest in this proposal. >>> >>> I am currently reviewing and responding to each comment. The general >>> themes of the feedback so far include: >>> - Including partial updates (column updates on a subset of rows in a >>> table). >>> - Adding details on how SQL engines will write the update files. >>> - Adding details on split planning and row alignment for update files. >>> >>> I will think through these points and update the design accordingly. >>> >>> Best >>> Anurag >>> >>> On Tue, Jan 27, 2026 at 6:25 PM Anurag Mantripragada < >>> [email protected]> wrote: >>> >>>> Hi Xiangin, >>>> >>>> Happy to learn from your experience in supporting backfill use-cases. >>>> Please feel free to review the proposal and add your comments. I will wait >>>> for a couple of days more to ensure everyone has a chance to review the >>>> proposal. >>>> >>>> ~ Anurag >>>> >>>> On Tue, Jan 27, 2026 at 6:42 AM Xianjin Ye <[email protected]> wrote: >>>> >>>>> Hi Anurag and Peter, >>>>> >>>>> It’s great to see the partial column update has gained great interest >>>>> in the community. I internally built a BackfillColumns action to >>>>> efficiently backfill columns(by writing the partial columns only and >>>>> copies >>>>> the binary data of other columns into a new DataFile). The speedup could >>>>> be >>>>> 10x for wide tables but the write amplification is still there. I would be >>>>> happy to collaborate on the work and eliminate the write amplification. >>>>> >>>>> On 2026/01/27 10:12:54 Péter Váry wrote: >>>>> > Hi Anurag, >>>>> > >>>>> > It’s great to see how much interest there is in the community around >>>>> this >>>>> > potential new feature. Gábor and I have actually submitted an Iceberg >>>>> > Summit talk proposal on this topic, and we would be very happy to >>>>> > collaborate on the work. I was mainly waiting for the File Format >>>>> API to be >>>>> > finalized, as I believe this feature should build on top of it. >>>>> > >>>>> > For reference, our related work includes: >>>>> > >>>>> > - *Dev list thread:* >>>>> > https://lists.apache.org/thread/h0941sdq9jwrb6sj0pjfjjxov8tx7ov9 >>>>> > - *Proposal document:* >>>>> > >>>>> https://docs.google.com/document/d/1OHuZ6RyzZvCOQ6UQoV84GzwVp3UPiu_cfXClsOi03ww >>>>> > (not shared widely yet) >>>>> > - *Performance testing PR for readers and writers:* >>>>> > https://github.com/apache/iceberg/pull/13306 >>>>> > >>>>> > During earlier discussions about possible metadata changes, another >>>>> option >>>>> > came up that hasn’t been documented yet: separating planner metadata >>>>> from >>>>> > reader metadata. Since the planner does not need to know about the >>>>> actual >>>>> > files, we could store the file composition in a separate file >>>>> (potentially >>>>> > a Puffin file). This file could hold the column_files metadata, >>>>> while the >>>>> > manifest would reference the Puffin file and blob position instead >>>>> of the >>>>> > data filename. >>>>> > This approach has the advantage of keeping the existing metadata >>>>> largely >>>>> > intact, and it could also give us a natural place later to add >>>>> file-level >>>>> > indexes or Bloom filters for use during reads or secondary >>>>> filtering. The >>>>> > downsides are the additional files and the increased complexity of >>>>> > identifying files that are no longer referenced by the table, so >>>>> this may >>>>> > not be an ideal solution. >>>>> > >>>>> > I do have some concerns about the MoR metadata proposal described in >>>>> the >>>>> > document. At first glance, it seems to complicate distributed >>>>> planning, as >>>>> > all entries for a given file would need to be collected and merged to >>>>> > provide the information required by both the planner and the reader. >>>>> > Additionally, when a new column is added or updated, we would still >>>>> need to >>>>> > add a new metadata entry for every existing data file. If we >>>>> immediately >>>>> > write out the merged metadata, the total number of entries remains >>>>> the >>>>> > same. The main benefit is avoiding rewriting statistics, which can be >>>>> > significant, but this comes at the cost of increased planning >>>>> complexity. >>>>> > If we choose to store the merged statistics in the column_families >>>>> entry, I >>>>> > don’t see much benefit in excluding the rest of the metadata, >>>>> especially >>>>> > since including it would simplify the planning process. >>>>> > >>>>> > As Anton already pointed out, we should also discuss how this change >>>>> would >>>>> > affect split handling, particularly how to avoid double reads when >>>>> row >>>>> > groups are not aligned between the original data files and the new >>>>> column >>>>> > files. >>>>> > >>>>> > Finally, I’d like to see some discussion around the Java API >>>>> implications. >>>>> > In particular, what API changes are required, and how SQL engines >>>>> would >>>>> > perform updates. Since the new column files must have the same >>>>> number of >>>>> > rows as the original data files, with a strict one-to-one >>>>> relationship, SQL >>>>> > engines would need access to the source filename, position, and >>>>> deletion >>>>> > status in the DataFrame in order to generate the new files. This is >>>>> more >>>>> > involved than a simple update and deserves some explicit >>>>> consideration. >>>>> > >>>>> > Looking forward to your thoughts. >>>>> > Best regards, >>>>> > Peter >>>>> > >>>>> > On Tue, Jan 27, 2026, 03:58 Anurag Mantripragada < >>>>> [email protected]> >>>>> > wrote: >>>>> > >>>>> > > Thanks Anton and others, for providing some initial feedback. I >>>>> will >>>>> > > address all your comments soon. >>>>> > > >>>>> > > On Mon, Jan 26, 2026 at 11:10 AM Anton Okolnychyi < >>>>> [email protected]> >>>>> > > wrote: >>>>> > > >>>>> > >> I had a chance to see the proposal before it landed and I think >>>>> it is a >>>>> > >> cool idea and both presented approaches would likely work. I am >>>>> looking >>>>> > >> forward to discussing the tradeoffs and would encourage everyone >>>>> to >>>>> > >> push/polish each approach to see what issues can be mitigated and >>>>> what are >>>>> > >> fundamental. >>>>> > >> >>>>> > >> [1] Iceberg-native approach: better visibility into column files >>>>> from the >>>>> > >> metadata, potentially better concurrency for non-overlapping >>>>> column >>>>> > >> updates, no dep on Parquet. >>>>> > >> [2] Parquet-native approach: almost no changes to the table format >>>>> > >> metadata beyond tracking of base files. >>>>> > >> >>>>> > >> I think [1] sounds a bit better on paper but I am worried about >>>>> the >>>>> > >> complexity in writers and readers (especially around keeping row >>>>> groups >>>>> > >> aligned and split planning). It would be great to cover this in >>>>> detail in >>>>> > >> the proposal. >>>>> > >> >>>>> > >> пн, 26 січ. 2026 р. о 09:00 Anurag Mantripragada < >>>>> > >> [email protected]> пише: >>>>> > >> >>>>> > >>> Hi all, >>>>> > >>> >>>>> > >>> "Wide tables" with thousands of columns present significant >>>>> challenges >>>>> > >>> for AI/ML workloads, particularly when only a subset of columns >>>>> needs to be >>>>> > >>> added or updated. Current Copy-on-Write (COW) and Merge-on-Read >>>>> (MOR) >>>>> > >>> operations in Iceberg apply at the row level, which leads to >>>>> substantial >>>>> > >>> write amplification in scenarios such as: >>>>> > >>> >>>>> > >>> - Feature Backfilling & Column Updates: Adding new feature >>>>> columns >>>>> > >>> (e.g., model embeddings) to petabyte-scale tables. >>>>> > >>> - Model Score Updates: Refresh prediction scores after >>>>> retraining. >>>>> > >>> - Embedding Refresh: Updating vector embeddings, which >>>>> currently >>>>> > >>> triggers a rewrite of the entire row. >>>>> > >>> - Incremental Feature Computation: Daily updates to a small >>>>> fraction >>>>> > >>> of features in wide tables. >>>>> > >>> >>>>> > >>> With the Iceberg V4 proposal introducing single-file commits and >>>>> column >>>>> > >>> stats improvements, this is an ideal time to address >>>>> column-level updates >>>>> > >>> to better support these use cases. >>>>> > >>> >>>>> > >>> I have drafted a proposal that explores both table-format >>>>> enhancements >>>>> > >>> and file-format (Parquet) changes to enable more efficient >>>>> updates. >>>>> > >>> >>>>> > >>> Proposal Details: >>>>> > >>> - GitHub Issue: #15146 < >>>>> https://github.com/apache/iceberg/issues/15146> >>>>> > >>> - Design Document: Efficient Column Updates in Iceberg >>>>> > >>> < >>>>> https://docs.google.com/document/d/1Bd7JVzgajA8-DozzeEE24mID_GLuz6iwj0g4TlcVJcs/edit?tab=t.0 >>>>> > >>>>> > >>> >>>>> > >>> Next Steps: >>>>> > >>> I plan to create POCs to benchmark the approaches described in >>>>> the >>>>> > >>> document. >>>>> > >>> >>>>> > >>> Please review the proposal and share your feedback. >>>>> > >>> >>>>> > >>> Thanks, >>>>> > >>> Anurag >>>>> > >>> >>>>> > >> >>>>> > >>>>> >>>>
