+1 for making it explicit that an *undelete *of a row can't be done by
unsetting the corresponding bit in DV

*Rows should only be added via new data files*, sounds reasonable to me !

apart from row-lineage it also complicates the operation type inference
like here [1] as we would now
inspect the contents of these DV to see if it's an insert ?

[1] https://github.com/apache/iceberg/pull/14581#discussion_r2533057189

On Sat, Nov 22, 2025 at 4:48 AM Szehon Ho <[email protected]> wrote:

> It makes sense to me, it sounds like a minor clarification.  For v2
> position deletes, code like rewrite_position_deletes may have made some
> assumptions like this and would not work well if violated, maybe other code
> as well.
>
> Thanks
> Szehon
>
> On Fri, Nov 21, 2025 at 3:03 PM Steven Wu <[email protected]> wrote:
>
>> Similar weird behavior can also happen for V2 position delete files with
>> `undelete`.
>>
>> In V2, there could be multiple position delete files (say pd1, pd2)
>> associated with the same data file (say f1). Let's say pd1 deletes row 5
>> and 10 and pd2 deletes row 15.
>> 1. a new snapshot is committed with pd1 (DELETED), pd2 (EXISTING), and
>> pd3 (ADDED). pd3 deletes only row 10 (undeleted row 5)
>> 2. a new snapshot is committed with pd1 (DELETED) and pd2 (EXISTING)
>>
>> In either case, essentially some rows are added (back) to the table with
>> lower sequence number than the new snapshot's sequence number.
>>
>>
>>
>> Just to recap the question: should the spec (v2 and v3) spell out that
>> `undelete row` is not allowed? Rows should only be added via new data files.
>>
>>
>>
>>
>> On Fri, Nov 21, 2025 at 1:09 PM Steven Wu <[email protected]> wrote:
>>
>>> >Are we specifically stating somewhere that all row-ids should be higher
>>> than or equal to the snapshot's `first-row-id`?
>>> In my mental model the `first-row-id` is only applicable for rows that
>>> don't have a specific row-id assigned.
>>>
>>> I meant an ADDED row should have `row-id` higher than or equal to the
>>> snapshot's `first-row-id`. EXISTING or UPDATED row can have lower row id.
>>>
>>> On Fri, Nov 21, 2025 at 1:04 PM Steven Wu <[email protected]> wrote:
>>>
>>>> > Can we create a validator to prevent this from happening?
>>>>
>>>> We don't have this problem with the Java implementation.
>>>> `BaseDVFileWriter` merges the  previous DV with the new delta DV. So there
>>>> is no `undelete` behavior. I am not aware of any Java API to allow
>>>> "undelete". So we probably don't need to add any validation code in the
>>>> Java impl.
>>>>
>>>> Just thought it is good to spell it out in the spec so that
>>>> clients/engines can be clear about the expected behavior.
>>>>
>>>> On Fri, Nov 21, 2025 at 12:18 PM Péter Váry <
>>>> [email protected]> wrote:
>>>>
>>>>> Are we specifically stating somewhere that all row-ids should be
>>>>> higher than or equal to the snapshot's `first-row-id`?
>>>>> In my mental model the `first-row-id` is only applicable for rows that
>>>>> don't have a specific row-id assigned.
>>>>>
>>>>> Noneless, I agree that the `row-id` and the
>>>>> `last-updated-seq-num` should have changed to a new one, so we can say 
>>>>> that
>>>>> undeleting a row is not allowed because of this.
>>>>>
>>>>> Can we create a validator to prevent this from happening?
>>>>>
>>>>>
>>>>>
>>>>> Steven Wu <[email protected]> ezt írta (időpont: 2025. nov. 21.,
>>>>> P, 21:11):
>>>>>
>>>>>> The undeleted row would have invalid `row-id` and
>>>>>> `last-updated-seq-num`. Since it is a new row (added back), it should 
>>>>>> have
>>>>>> the `row-id` higher than or equal to the snapshot's `first-row-id` and 
>>>>>> the
>>>>>> `last-updated-seq-number` should inherit/have the new snapshot's sequence
>>>>>> number.
>>>>>>
>>>>>> On Fri, Nov 21, 2025 at 11:48 AM Steven Wu <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Should we clarify the V3 spec to explicitly formid "*undelete*" of
>>>>>>> a row by unsetting the DV bit? Unsetting a DV bit essentially adds a row
>>>>>>> with lower row-id than the snapshot's first-row-id, which would violate 
>>>>>>> the
>>>>>>> row lineage spec. With the restriction, DV cardinality should be
>>>>>>> monotonically increasing.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Steven
>>>>>>>
>>>>>>

Reply via email to