Manu

The spec already covers the row lineage carry over (for replace)
https://iceberg.apache.org/spec/#row-lineage

"When an existing row is moved to a different data file for any reason,
writers should write _row_id and _last_updated_sequence_number according to
the following rules:"

Thanks,
Steven


On Mon, Jul 14, 2025 at 1:38 PM Steven Wu <stevenz...@gmail.com> wrote:

> another update on the release.
>
> We have one open PR left for the 1.10.0 milestone
> <https://github.com/apache/iceberg/milestone/54> (with 25 closed PRs).
> Amogh is actively working on the last blocker PR.
> Spark 4.0: Preserve row lineage information on compaction
> <https://github.com/apache/iceberg/pull/13555>
>
> I will publish a release candidate after the above blocker is merged and
> backported.
>
> Thanks,
> Steven
>
> On Mon, Jul 7, 2025 at 11:56 PM Manu Zhang <owenzhang1...@gmail.com>
> wrote:
>
>> Hi Amogh,
>>
>> Is it defined in the table spec that "replace" operation should carry
>> over existing lineage info insteading of assigning new IDs? If not, we'd
>> better firstly define it in spec because all engines and implementations
>> need to follow it.
>>
>> On Tue, Jul 8, 2025 at 11:44 AM Amogh Jahagirdar <2am...@gmail.com>
>> wrote:
>>
>>> One other area I think we need to make sure works with row lineage
>>> before release is data file compaction. At the moment,
>>> <https://github.com/apache/iceberg/blob/main/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/SparkBinPackFileRewriteRunner.java#L44>
>>>  it
>>> looks like compaction will read the records from the data files without
>>> projecting the lineage fields. What this means is that on write of the new
>>> compacted data files we'd be losing the lineage information. There's no
>>> data change in a compaction but we do need to make sure the lineage info
>>> from carried over records is materialized in the newly compacted files so
>>> they don't get new IDs or inherit the new file sequence number. I'm working
>>> on addressing this as well, but I'd call this out as a blocker as well.
>>>
>>

Reply via email to