Re: Iceberg 1.10.0 release update - July 1, 2025

Manu Zhang Mon, 14 Jul 2025 17:49:56 -0700

Thanks Steven, I missed that part but the following sentence is a bit hard
to understand (maybe just me)


Engines may model operations as deleting/inserting rows or as modifications
to rows that preserve row ids.

Can you please help to explain?


Steven Wu <[email protected]>于2025年7月15日 周二04:41写道：

> Manu
>
> The spec already covers the row lineage carry over (for replace)
> https://iceberg.apache.org/spec/#row-lineage
>
> "When an existing row is moved to a different data file for any reason,
> writers should write _row_id and _last_updated_sequence_number according
> to the following rules:"
>
> Thanks,
> Steven
>
>
> On Mon, Jul 14, 2025 at 1:38 PM Steven Wu <[email protected]> wrote:
>
>> another update on the release.
>>
>> We have one open PR left for the 1.10.0 milestone
>> <https://github.com/apache/iceberg/milestone/54> (with 25 closed PRs).
>> Amogh is actively working on the last blocker PR.
>> Spark 4.0: Preserve row lineage information on compaction
>> <https://github.com/apache/iceberg/pull/13555>
>>
>> I will publish a release candidate after the above blocker is merged and
>> backported.
>>
>> Thanks,
>> Steven
>>
>> On Mon, Jul 7, 2025 at 11:56 PM Manu Zhang <[email protected]>
>> wrote:
>>
>>> Hi Amogh,
>>>
>>> Is it defined in the table spec that "replace" operation should carry
>>> over existing lineage info insteading of assigning new IDs? If not, we'd
>>> better firstly define it in spec because all engines and implementations
>>> need to follow it.
>>>
>>> On Tue, Jul 8, 2025 at 11:44 AM Amogh Jahagirdar <[email protected]>
>>> wrote:
>>>
>>>> One other area I think we need to make sure works with row lineage
>>>> before release is data file compaction. At the moment,
>>>> <https://github.com/apache/iceberg/blob/main/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/SparkBinPackFileRewriteRunner.java#L44>
>>>>  it
>>>> looks like compaction will read the records from the data files without
>>>> projecting the lineage fields. What this means is that on write of the new
>>>> compacted data files we'd be losing the lineage information. There's no
>>>> data change in a compaction but we do need to make sure the lineage info
>>>> from carried over records is materialized in the newly compacted files so
>>>> they don't get new IDs or inherit the new file sequence number. I'm working
>>>> on addressing this as well, but I'd call this out as a blocker as well.
>>>>
>>>

Re: Iceberg 1.10.0 release update - July 1, 2025

Reply via email to