Re: Iceberg 1.10.0 release update - July 1, 2025

Steven Wu Mon, 14 Jul 2025 13:38:30 -0700

another update on the release.

We have one open PR left for the 1.10.0 milestone
<https://github.com/apache/iceberg/milestone/54> (with 25 closed PRs).
Amogh is actively working on the last blocker PR.
Spark 4.0: Preserve row lineage information on compaction
<https://github.com/apache/iceberg/pull/13555>


I will publish a release candidate after the above blocker is merged and
backported.

Thanks,
Steven

On Mon, Jul 7, 2025 at 11:56 PM Manu Zhang <[email protected]> wrote:

> Hi Amogh,
>
> Is it defined in the table spec that "replace" operation should carry over
> existing lineage info insteading of assigning new IDs? If not, we'd better
> firstly define it in spec because all engines and implementations need to
> follow it.
>
> On Tue, Jul 8, 2025 at 11:44 AM Amogh Jahagirdar <[email protected]> wrote:
>
>> One other area I think we need to make sure works with row lineage before
>> release is data file compaction. At the moment,
>> <https://github.com/apache/iceberg/blob/main/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/SparkBinPackFileRewriteRunner.java#L44>
>>  it
>> looks like compaction will read the records from the data files without
>> projecting the lineage fields. What this means is that on write of the new
>> compacted data files we'd be losing the lineage information. There's no
>> data change in a compaction but we do need to make sure the lineage info
>> from carried over records is materialized in the newly compacted files so
>> they don't get new IDs or inherit the new file sequence number. I'm working
>> on addressing this as well, but I'd call this out as a blocker as well.
>>
>

Re: Iceberg 1.10.0 release update - July 1, 2025

Reply via email to