Similar weird behavior can also happen for V2 position delete files with `undelete`.
In V2, there could be multiple position delete files (say pd1, pd2) associated with the same data file (say f1). Let's say pd1 deletes row 5 and 10 and pd2 deletes row 15. 1. a new snapshot is committed with pd1 (DELETED), pd2 (EXISTING), and pd3 (ADDED). pd3 deletes only row 10 (undeleted row 5) 2. a new snapshot is committed with pd1 (DELETED) and pd2 (EXISTING) In either case, essentially some rows are added (back) to the table with lower sequence number than the new snapshot's sequence number. Just to recap the question: should the spec (v2 and v3) spell out that `undelete row` is not allowed? Rows should only be added via new data files. On Fri, Nov 21, 2025 at 1:09 PM Steven Wu <[email protected]> wrote: > >Are we specifically stating somewhere that all row-ids should be higher > than or equal to the snapshot's `first-row-id`? > In my mental model the `first-row-id` is only applicable for rows that > don't have a specific row-id assigned. > > I meant an ADDED row should have `row-id` higher than or equal to the > snapshot's `first-row-id`. EXISTING or UPDATED row can have lower row id. > > On Fri, Nov 21, 2025 at 1:04 PM Steven Wu <[email protected]> wrote: > >> > Can we create a validator to prevent this from happening? >> >> We don't have this problem with the Java implementation. >> `BaseDVFileWriter` merges the previous DV with the new delta DV. So there >> is no `undelete` behavior. I am not aware of any Java API to allow >> "undelete". So we probably don't need to add any validation code in the >> Java impl. >> >> Just thought it is good to spell it out in the spec so that >> clients/engines can be clear about the expected behavior. >> >> On Fri, Nov 21, 2025 at 12:18 PM Péter Váry <[email protected]> >> wrote: >> >>> Are we specifically stating somewhere that all row-ids should be higher >>> than or equal to the snapshot's `first-row-id`? >>> In my mental model the `first-row-id` is only applicable for rows that >>> don't have a specific row-id assigned. >>> >>> Noneless, I agree that the `row-id` and the >>> `last-updated-seq-num` should have changed to a new one, so we can say that >>> undeleting a row is not allowed because of this. >>> >>> Can we create a validator to prevent this from happening? >>> >>> >>> >>> Steven Wu <[email protected]> ezt írta (időpont: 2025. nov. 21., P, >>> 21:11): >>> >>>> The undeleted row would have invalid `row-id` and >>>> `last-updated-seq-num`. Since it is a new row (added back), it should have >>>> the `row-id` higher than or equal to the snapshot's `first-row-id` and the >>>> `last-updated-seq-number` should inherit/have the new snapshot's sequence >>>> number. >>>> >>>> On Fri, Nov 21, 2025 at 11:48 AM Steven Wu <[email protected]> >>>> wrote: >>>> >>>>> Hi, >>>>> >>>>> Should we clarify the V3 spec to explicitly formid "*undelete*" of a >>>>> row by unsetting the DV bit? Unsetting a DV bit essentially adds a row >>>>> with >>>>> lower row-id than the snapshot's first-row-id, which would violate the row >>>>> lineage spec. With the restriction, DV cardinality should be monotonically >>>>> increasing. >>>>> >>>>> Thanks, >>>>> Steven >>>>> >>>>
