Re: Iceberg 1.10.0 release update - July 1, 2025

Kevin Liu Fri, 25 Jul 2025 09:00:34 -0700

Thanks everyone for the review. The 2 PRs are both merged.
Looks like there's only 1 PR left in the 1.10 milestone
<https://github.com/apache/iceberg/milestone/54> :)


Best,
Kevin Liu

On Thu, Jul 24, 2025 at 7:44 PM Manu Zhang <[email protected]> wrote:

> Thanks Kevin. The first change is not in the versioned doc so it can be
> released anytime.
>
> Regards,
> Manu
>
> On Fri, Jul 25, 2025 at 4:21 AM Kevin Liu <[email protected]> wrote:
>
>> The 3 PRs above are merged. Thanks everyone for the review.
>>
>> I've added 2 more PRs to the 1.10 milestone. These are both
>> nice-to-haves.
>> - docs: add subpage for REST Catalog Spec in "Specification" #13521
>> <https://github.com/apache/iceberg/pull/13521>
>> - REST-Fixture: Ensure strict mode on jdbc catalog for rest fixture
>> #13599 <https://github.com/apache/iceberg/pull/13599>
>>
>> The first one changes the link for "REST Catalog Spec" on the left nav of
>> https://iceberg.apache.org/spec/ from the swagger.io link to a dedicated
>> page for IRC.
>> The second one fixes the default behavior of `iceberg-rest-fixture` image
>> to align with the general expectation when creating a table in a catalog.
>>
>> Please take a look. I would like to have both of these as part of the
>> 1.10 release.
>>
>> Best,
>> Kevin Liu
>>
>>
>> On Wed, Jul 23, 2025 at 1:31 PM Kevin Liu <[email protected]> wrote:
>>
>>> Here are the 3 PRs to add corresponding tests.
>>> https://github.com/apache/iceberg/pull/13648
>>> https://github.com/apache/iceberg/pull/13649
>>> https://github.com/apache/iceberg/pull/13650
>>>
>>> I've tagged them with the 1.10 milestone, waiting for CI to complete :)
>>>
>>> Best,
>>> Kevin Liu
>>>
>>> On Wed, Jul 23, 2025 at 1:08 PM Steven Wu <[email protected]> wrote:
>>>
>>>> Kevin, thanks for checking that. I will take a look at your backport
>>>> PRs. Can you add them to the 1.10.0 milestone?
>>>>
>>>> On Wed, Jul 23, 2025 at 12:27 PM Kevin Liu <[email protected]>
>>>> wrote:
>>>>
>>>>> Thanks again for driving this Steven! We're very close!!
>>>>>
>>>>> As mentioned in the community sync today, I wanted to verify feature
>>>>> parity between Spark 3.5 and Spark 4.0 for this release.
>>>>> I was able to verify that Spark 3.5 and Spark 4.0 have feature parity
>>>>> for this upcoming release. More details in the other devlist thread
>>>>> https://lists.apache.org/thread/7x7xcm3y87y81c4grq4nn9gdjd4jm05f
>>>>>
>>>>> Thanks,
>>>>> Kevin Liu
>>>>>
>>>>> On Wed, Jul 23, 2025 at 12:17 PM Steven Wu <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Another update on the release.
>>>>>>
>>>>>> The existing blocker PRs are almost done.
>>>>>>
>>>>>> During today's community sync, we identified the following issues/PRs
>>>>>> to be included in the 1.10.0 release.
>>>>>>
>>>>>>    1. backport of PR 13100 to the main branch. I have created a 
>>>>>> cherry-pick
>>>>>>    PR <https://github.com/apache/iceberg/pull/13647> for that. There
>>>>>>    is a one line difference compared to the original PR due to the 
>>>>>> removal of
>>>>>>    the deprecated RemoveSnapshot class in main branch for 1.10.0 target. 
>>>>>> Amogh
>>>>>>    has suggested using RemoveSnapshots with a single snapshot id, which 
>>>>>> should
>>>>>>    be supported by all REST catalog servers.
>>>>>>    2. Flink compaction doesn't support row lineage. Fail the
>>>>>>    compaction for V3 tables. I created a PR
>>>>>>    <https://github.com/apache/iceberg/pull/13646> for that. Will
>>>>>>    backport after it is merged.
>>>>>>    3. Spark: fix data frame join based on different versions of the
>>>>>>    same table that may lead to weird results. Anton is working on a fix. 
>>>>>> It
>>>>>>    requires a small behavior change (table state may be stale up to 
>>>>>> refresh
>>>>>>    interval). Hence it is better to include it in the 1.10.0 release 
>>>>>> where
>>>>>>    Spark 4.0 is first supported.
>>>>>>    4. Variant support in core and Spark 4.0. Ryan thinks this is
>>>>>>    very close and will prioritize the review.
>>>>>>
>>>>>> Thanks,
>>>>>> steven
>>>>>>
>>>>>> The 1.10.0 milestone can be found here.
>>>>>> https://github.com/apache/iceberg/milestone/54
>>>>>>
>>>>>>
>>>>>> On Wed, Jul 16, 2025 at 9:15 AM Steven Wu <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Ajantha/Robin, thanks for the note. We can include the PR in the
>>>>>>> 1.10.0 milestone.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jul 16, 2025 at 3:20 AM Robin Moffatt
>>>>>>> <[email protected]> wrote:
>>>>>>>
>>>>>>>> Thanks Ajantha. Just to confirm, from a Confluent point of view, we
>>>>>>>> will not be able to publish the connector on Confluent Hub until this
>>>>>>>> CVE[1] is fixed.
>>>>>>>> Since we would not publish a snapshot build, if the fix doesn't
>>>>>>>> make it into 1.10 then we'd have to wait for 1.11 (or a dot release of
>>>>>>>> 1.10) to be able to include the connector on Confluent Hub.
>>>>>>>>
>>>>>>>> Thanks, Robin.
>>>>>>>>
>>>>>>>> [1]
>>>>>>>> https://github.com/apache/iceberg/issues/10745#issuecomment-3074300861
>>>>>>>>
>>>>>>>> On Wed, 16 Jul 2025 at 04:03, Ajantha Bhat <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I have approached Confluent people
>>>>>>>>> <https://github.com/apache/iceberg/issues/10745#issuecomment-3058281281>
>>>>>>>>> to help us publish the OSS Kafka Connect Iceberg sink plugin.
>>>>>>>>> It seems we have a CVE from dependency that blocks us from
>>>>>>>>> publishing the plugin.
>>>>>>>>>
>>>>>>>>> Please include the below PR for 1.10.0 release which fixes that.
>>>>>>>>> https://github.com/apache/iceberg/pull/13561
>>>>>>>>>
>>>>>>>>> - Ajantha
>>>>>>>>>
>>>>>>>>> On Tue, Jul 15, 2025 at 10:48 AM Steven Wu <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> > Engines may model operations as deleting/inserting rows or as
>>>>>>>>>> modifications to rows that preserve row ids.
>>>>>>>>>>
>>>>>>>>>> Manu, I agree this sentence probably lacks some context. The
>>>>>>>>>> first half (as deleting/inserting rows) is probably about the
>>>>>>>>>> row lineage handling with equality deletes, which is described in 
>>>>>>>>>> another
>>>>>>>>>> place.
>>>>>>>>>>
>>>>>>>>>> "Row lineage does not track lineage for rows updated via Equality
>>>>>>>>>> Deletes <https://iceberg.apache.org/spec/#equality-delete-files>,
>>>>>>>>>> because engines using equality deletes avoid reading existing data 
>>>>>>>>>> before
>>>>>>>>>> writing changes and can't provide the original row ID for the new 
>>>>>>>>>> rows.
>>>>>>>>>> These updates are always treated as if the existing row was 
>>>>>>>>>> completely
>>>>>>>>>> removed and a unique new row was added."
>>>>>>>>>>
>>>>>>>>>> On Mon, Jul 14, 2025 at 5:49 PM Manu Zhang <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> Thanks Steven, I missed that part but the following sentence is
>>>>>>>>>>> a bit hard to understand (maybe just me)
>>>>>>>>>>>
>>>>>>>>>>> Engines may model operations as deleting/inserting rows or as
>>>>>>>>>>> modifications to rows that preserve row ids.
>>>>>>>>>>>
>>>>>>>>>>> Can you please help to explain?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Steven Wu <[email protected]>于2025年7月15日 周二04:41写道：
>>>>>>>>>>>
>>>>>>>>>>>> Manu
>>>>>>>>>>>>
>>>>>>>>>>>> The spec already covers the row lineage carry over (for replace)
>>>>>>>>>>>> https://iceberg.apache.org/spec/#row-lineage
>>>>>>>>>>>>
>>>>>>>>>>>> "When an existing row is moved to a different data file for
>>>>>>>>>>>> any reason, writers should write _row_id and
>>>>>>>>>>>> _last_updated_sequence_number according to the following rules:
>>>>>>>>>>>> "
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Steven
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Jul 14, 2025 at 1:38 PM Steven Wu <[email protected]>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> another update on the release.
>>>>>>>>>>>>>
>>>>>>>>>>>>> We have one open PR left for the 1.10.0 milestone
>>>>>>>>>>>>> <https://github.com/apache/iceberg/milestone/54> (with 25
>>>>>>>>>>>>> closed PRs). Amogh is actively working on the last blocker PR.
>>>>>>>>>>>>> Spark 4.0: Preserve row lineage information on compaction
>>>>>>>>>>>>> <https://github.com/apache/iceberg/pull/13555>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I will publish a release candidate after the above blocker is
>>>>>>>>>>>>> merged and backported.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Steven
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Jul 7, 2025 at 11:56 PM Manu Zhang <
>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Amogh,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Is it defined in the table spec that "replace" operation
>>>>>>>>>>>>>> should carry over existing lineage info insteading of assigning 
>>>>>>>>>>>>>> new IDs? If
>>>>>>>>>>>>>> not, we'd better firstly define it in spec because all engines 
>>>>>>>>>>>>>> and
>>>>>>>>>>>>>> implementations need to follow it.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Jul 8, 2025 at 11:44 AM Amogh Jahagirdar <
>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> One other area I think we need to make sure works with row
>>>>>>>>>>>>>>> lineage before release is data file compaction. At the
>>>>>>>>>>>>>>> moment,
>>>>>>>>>>>>>>> <https://github.com/apache/iceberg/blob/main/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/SparkBinPackFileRewriteRunner.java#L44>
>>>>>>>>>>>>>>>  it
>>>>>>>>>>>>>>> looks like compaction will read the records from the data files 
>>>>>>>>>>>>>>> without
>>>>>>>>>>>>>>> projecting the lineage fields. What this means is that on write 
>>>>>>>>>>>>>>> of the new
>>>>>>>>>>>>>>> compacted data files we'd be losing the lineage information. 
>>>>>>>>>>>>>>> There's no
>>>>>>>>>>>>>>> data change in a compaction but we do need to make sure the 
>>>>>>>>>>>>>>> lineage info
>>>>>>>>>>>>>>> from carried over records is materialized in the newly 
>>>>>>>>>>>>>>> compacted files so
>>>>>>>>>>>>>>> they don't get new IDs or inherit the new file sequence number. 
>>>>>>>>>>>>>>> I'm working
>>>>>>>>>>>>>>> on addressing this as well, but I'd call this out as a blocker 
>>>>>>>>>>>>>>> as well.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> *Robin Moffatt*
>>>>>>>> *Sr. Principal Advisor, Streaming Data Technologies*
>>>>>>>>
>>>>>>>

Re: Iceberg 1.10.0 release update - July 1, 2025

Reply via email to