+1 for making it explicit that an *undelete *of a row can't be done by unsetting the corresponding bit in DV
*Rows should only be added via new data files*, sounds reasonable to me ! apart from row-lineage it also complicates the operation type inference like here [1] as we would now inspect the contents of these DV to see if it's an insert ? [1] https://github.com/apache/iceberg/pull/14581#discussion_r2533057189 On Sat, Nov 22, 2025 at 4:48 AM Szehon Ho <[email protected]> wrote: > It makes sense to me, it sounds like a minor clarification. For v2 > position deletes, code like rewrite_position_deletes may have made some > assumptions like this and would not work well if violated, maybe other code > as well. > > Thanks > Szehon > > On Fri, Nov 21, 2025 at 3:03 PM Steven Wu <[email protected]> wrote: > >> Similar weird behavior can also happen for V2 position delete files with >> `undelete`. >> >> In V2, there could be multiple position delete files (say pd1, pd2) >> associated with the same data file (say f1). Let's say pd1 deletes row 5 >> and 10 and pd2 deletes row 15. >> 1. a new snapshot is committed with pd1 (DELETED), pd2 (EXISTING), and >> pd3 (ADDED). pd3 deletes only row 10 (undeleted row 5) >> 2. a new snapshot is committed with pd1 (DELETED) and pd2 (EXISTING) >> >> In either case, essentially some rows are added (back) to the table with >> lower sequence number than the new snapshot's sequence number. >> >> >> >> Just to recap the question: should the spec (v2 and v3) spell out that >> `undelete row` is not allowed? Rows should only be added via new data files. >> >> >> >> >> On Fri, Nov 21, 2025 at 1:09 PM Steven Wu <[email protected]> wrote: >> >>> >Are we specifically stating somewhere that all row-ids should be higher >>> than or equal to the snapshot's `first-row-id`? >>> In my mental model the `first-row-id` is only applicable for rows that >>> don't have a specific row-id assigned. >>> >>> I meant an ADDED row should have `row-id` higher than or equal to the >>> snapshot's `first-row-id`. EXISTING or UPDATED row can have lower row id. >>> >>> On Fri, Nov 21, 2025 at 1:04 PM Steven Wu <[email protected]> wrote: >>> >>>> > Can we create a validator to prevent this from happening? >>>> >>>> We don't have this problem with the Java implementation. >>>> `BaseDVFileWriter` merges the previous DV with the new delta DV. So there >>>> is no `undelete` behavior. I am not aware of any Java API to allow >>>> "undelete". So we probably don't need to add any validation code in the >>>> Java impl. >>>> >>>> Just thought it is good to spell it out in the spec so that >>>> clients/engines can be clear about the expected behavior. >>>> >>>> On Fri, Nov 21, 2025 at 12:18 PM Péter Váry < >>>> [email protected]> wrote: >>>> >>>>> Are we specifically stating somewhere that all row-ids should be >>>>> higher than or equal to the snapshot's `first-row-id`? >>>>> In my mental model the `first-row-id` is only applicable for rows that >>>>> don't have a specific row-id assigned. >>>>> >>>>> Noneless, I agree that the `row-id` and the >>>>> `last-updated-seq-num` should have changed to a new one, so we can say >>>>> that >>>>> undeleting a row is not allowed because of this. >>>>> >>>>> Can we create a validator to prevent this from happening? >>>>> >>>>> >>>>> >>>>> Steven Wu <[email protected]> ezt írta (időpont: 2025. nov. 21., >>>>> P, 21:11): >>>>> >>>>>> The undeleted row would have invalid `row-id` and >>>>>> `last-updated-seq-num`. Since it is a new row (added back), it should >>>>>> have >>>>>> the `row-id` higher than or equal to the snapshot's `first-row-id` and >>>>>> the >>>>>> `last-updated-seq-number` should inherit/have the new snapshot's sequence >>>>>> number. >>>>>> >>>>>> On Fri, Nov 21, 2025 at 11:48 AM Steven Wu <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> Should we clarify the V3 spec to explicitly formid "*undelete*" of >>>>>>> a row by unsetting the DV bit? Unsetting a DV bit essentially adds a row >>>>>>> with lower row-id than the snapshot's first-row-id, which would violate >>>>>>> the >>>>>>> row lineage spec. With the restriction, DV cardinality should be >>>>>>> monotonically increasing. >>>>>>> >>>>>>> Thanks, >>>>>>> Steven >>>>>>> >>>>>>
