Re: Iceberg 1.10.0 release update - July 1, 2025

Ryan Blue Fri, 25 Jul 2025 15:59:03 -0700

I thought that we said we wanted to get support out for v3 features in this
release unless there is some reasonable blocker, like Spark not having
geospatial types. To me, I think that means we should aim to get variant
and unknown done so that we have a complete implementation with a major
engine. And it should not be particularly difficult to get unknown done so
I'd opt to get it in.


On Fri, Jul 25, 2025 at 11:28 AM Steven Wu <[email protected]> wrote:

> > I believe we also wanted to get in at least the read path for
> UnknownType. Fokko has a WIP PR
> <https://github.com/apache/iceberg/pull/13445> for that.
> I thought in the community sync the consensus is that this is not a
> blocker, because it is a new feature implementation. If it is ready, it
> will be included.
>
> On Fri, Jul 25, 2025 at 9:43 AM Kevin Liu <[email protected]> wrote:
>
>> I think Fokko's OOO. Should we help with that PR?
>>
>> On Fri, Jul 25, 2025 at 9:38 AM Eduard Tudenhöfner <
>> [email protected]> wrote:
>>
>>> I believe we also wanted to get in at least the read path for
>>> UnknownType. Fokko has a WIP PR
>>> <https://github.com/apache/iceberg/pull/13445> for that.
>>>
>>> On Fri, Jul 25, 2025 at 6:13 PM Steven Wu <[email protected]> wrote:
>>>
>>>> 3. Spark: fix data frame join based on different versions of the same
>>>> table that may lead to weird results. Anton is working on a fix. It
>>>> requires a small behavior change (table state may be stale up to refresh
>>>> interval). Hence it is better to include it in the 1.10.0 release where
>>>> Spark 4.0 is first supported.
>>>> 4. Variant support in core and Spark 4.0. Ryan thinks this is very
>>>> close and will prioritize the review.
>>>>
>>>> We still have the above two issues pending. 3 doesn't have a PR yet. PR
>>>> for 4 is not associated with the milestone yet.
>>>>
>>>> On Fri, Jul 25, 2025 at 9:02 AM Kevin Liu <[email protected]>
>>>> wrote:
>>>>
>>>>> Thanks everyone for the review. The 2 PRs are both merged.
>>>>> Looks like there's only 1 PR left in the 1.10 milestone
>>>>> <https://github.com/apache/iceberg/milestone/54> :)
>>>>>
>>>>> Best,
>>>>> Kevin Liu
>>>>>
>>>>> On Thu, Jul 24, 2025 at 7:44 PM Manu Zhang <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Thanks Kevin. The first change is not in the versioned doc so it can
>>>>>> be released anytime.
>>>>>>
>>>>>> Regards,
>>>>>> Manu
>>>>>>
>>>>>> On Fri, Jul 25, 2025 at 4:21 AM Kevin Liu <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> The 3 PRs above are merged. Thanks everyone for the review.
>>>>>>>
>>>>>>> I've added 2 more PRs to the 1.10 milestone. These are both
>>>>>>> nice-to-haves.
>>>>>>> - docs: add subpage for REST Catalog Spec in "Specification" #13521
>>>>>>> <https://github.com/apache/iceberg/pull/13521>
>>>>>>> - REST-Fixture: Ensure strict mode on jdbc catalog for rest fixture
>>>>>>> #13599 <https://github.com/apache/iceberg/pull/13599>
>>>>>>>
>>>>>>> The first one changes the link for "REST Catalog Spec" on the left
>>>>>>> nav of https://iceberg.apache.org/spec/ from the swagger.io link to
>>>>>>> a dedicated page for IRC.
>>>>>>> The second one fixes the default behavior of `iceberg-rest-fixture`
>>>>>>> image to align with the general expectation when creating a table in a
>>>>>>> catalog.
>>>>>>>
>>>>>>> Please take a look. I would like to have both of these as part of
>>>>>>> the 1.10 release.
>>>>>>>
>>>>>>> Best,
>>>>>>> Kevin Liu
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jul 23, 2025 at 1:31 PM Kevin Liu <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Here are the 3 PRs to add corresponding tests.
>>>>>>>> https://github.com/apache/iceberg/pull/13648
>>>>>>>> https://github.com/apache/iceberg/pull/13649
>>>>>>>> https://github.com/apache/iceberg/pull/13650
>>>>>>>>
>>>>>>>> I've tagged them with the 1.10 milestone, waiting for CI to
>>>>>>>> complete :)
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Kevin Liu
>>>>>>>>
>>>>>>>> On Wed, Jul 23, 2025 at 1:08 PM Steven Wu <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Kevin, thanks for checking that. I will take a look at your
>>>>>>>>> backport PRs. Can you add them to the 1.10.0 milestone?
>>>>>>>>>
>>>>>>>>> On Wed, Jul 23, 2025 at 12:27 PM Kevin Liu <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Thanks again for driving this Steven! We're very close!!
>>>>>>>>>>
>>>>>>>>>> As mentioned in the community sync today, I wanted to verify
>>>>>>>>>> feature parity between Spark 3.5 and Spark 4.0 for this release.
>>>>>>>>>> I was able to verify that Spark 3.5 and Spark 4.0 have feature
>>>>>>>>>> parity for this upcoming release. More details in the other devlist 
>>>>>>>>>> thread
>>>>>>>>>> https://lists.apache.org/thread/7x7xcm3y87y81c4grq4nn9gdjd4jm05f
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Kevin Liu
>>>>>>>>>>
>>>>>>>>>> On Wed, Jul 23, 2025 at 12:17 PM Steven Wu <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Another update on the release.
>>>>>>>>>>>
>>>>>>>>>>> The existing blocker PRs are almost done.
>>>>>>>>>>>
>>>>>>>>>>> During today's community sync, we identified the following
>>>>>>>>>>> issues/PRs to be included in the 1.10.0 release.
>>>>>>>>>>>
>>>>>>>>>>>    1. backport of PR 13100 to the main branch. I have created a 
>>>>>>>>>>> cherry-pick
>>>>>>>>>>>    PR <https://github.com/apache/iceberg/pull/13647> for that.
>>>>>>>>>>>    There is a one line difference compared to the original PR due 
>>>>>>>>>>> to the
>>>>>>>>>>>    removal of the deprecated RemoveSnapshot class in main branch 
>>>>>>>>>>> for 1.10.0
>>>>>>>>>>>    target. Amogh has suggested using RemoveSnapshots with a single 
>>>>>>>>>>> snapshot
>>>>>>>>>>>    id, which should be supported by all REST catalog servers.
>>>>>>>>>>>    2. Flink compaction doesn't support row lineage. Fail the
>>>>>>>>>>>    compaction for V3 tables. I created a PR
>>>>>>>>>>>    <https://github.com/apache/iceberg/pull/13646> for that.
>>>>>>>>>>>    Will backport after it is merged.
>>>>>>>>>>>    3. Spark: fix data frame join based on different versions of
>>>>>>>>>>>    the same table that may lead to weird results. Anton is working 
>>>>>>>>>>> on a fix.
>>>>>>>>>>>    It requires a small behavior change (table state may be stale up 
>>>>>>>>>>> to refresh
>>>>>>>>>>>    interval). Hence it is better to include it in the 1.10.0 
>>>>>>>>>>> release where
>>>>>>>>>>>    Spark 4.0 is first supported.
>>>>>>>>>>>    4. Variant support in core and Spark 4.0. Ryan thinks this
>>>>>>>>>>>    is very close and will prioritize the review.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> steven
>>>>>>>>>>>
>>>>>>>>>>> The 1.10.0 milestone can be found here.
>>>>>>>>>>> https://github.com/apache/iceberg/milestone/54
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Jul 16, 2025 at 9:15 AM Steven Wu <[email protected]>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Ajantha/Robin, thanks for the note. We can include the PR in
>>>>>>>>>>>> the 1.10.0 milestone.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Jul 16, 2025 at 3:20 AM Robin Moffatt
>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks Ajantha. Just to confirm, from a Confluent point of
>>>>>>>>>>>>> view, we will not be able to publish the connector on Confluent 
>>>>>>>>>>>>> Hub until
>>>>>>>>>>>>> this CVE[1] is fixed.
>>>>>>>>>>>>> Since we would not publish a snapshot build, if the fix
>>>>>>>>>>>>> doesn't make it into 1.10 then we'd have to wait for 1.11 (or a 
>>>>>>>>>>>>> dot release
>>>>>>>>>>>>> of 1.10) to be able to include the connector on Confluent Hub.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks, Robin.
>>>>>>>>>>>>>
>>>>>>>>>>>>> [1]
>>>>>>>>>>>>> https://github.com/apache/iceberg/issues/10745#issuecomment-3074300861
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, 16 Jul 2025 at 04:03, Ajantha Bhat <
>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> I have approached Confluent people
>>>>>>>>>>>>>> <https://github.com/apache/iceberg/issues/10745#issuecomment-3058281281>
>>>>>>>>>>>>>> to help us publish the OSS Kafka Connect Iceberg sink plugin.
>>>>>>>>>>>>>> It seems we have a CVE from dependency that blocks us from
>>>>>>>>>>>>>> publishing the plugin.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Please include the below PR for 1.10.0 release which fixes
>>>>>>>>>>>>>> that.
>>>>>>>>>>>>>> https://github.com/apache/iceberg/pull/13561
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> - Ajantha
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Jul 15, 2025 at 10:48 AM Steven Wu <
>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> > Engines may model operations as deleting/inserting rows
>>>>>>>>>>>>>>> or as modifications to rows that preserve row ids.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Manu, I agree this sentence probably lacks some context. The
>>>>>>>>>>>>>>> first half (as deleting/inserting rows) is probably about
>>>>>>>>>>>>>>> the row lineage handling with equality deletes, which is 
>>>>>>>>>>>>>>> described in
>>>>>>>>>>>>>>> another place.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> "Row lineage does not track lineage for rows updated via 
>>>>>>>>>>>>>>> Equality
>>>>>>>>>>>>>>> Deletes
>>>>>>>>>>>>>>> <https://iceberg.apache.org/spec/#equality-delete-files>,
>>>>>>>>>>>>>>> because engines using equality deletes avoid reading existing 
>>>>>>>>>>>>>>> data before
>>>>>>>>>>>>>>> writing changes and can't provide the original row ID for the 
>>>>>>>>>>>>>>> new rows.
>>>>>>>>>>>>>>> These updates are always treated as if the existing row was 
>>>>>>>>>>>>>>> completely
>>>>>>>>>>>>>>> removed and a unique new row was added."
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Mon, Jul 14, 2025 at 5:49 PM Manu Zhang <
>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks Steven, I missed that part but the following
>>>>>>>>>>>>>>>> sentence is a bit hard to understand (maybe just me)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Engines may model operations as deleting/inserting rows or
>>>>>>>>>>>>>>>> as modifications to rows that preserve row ids.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Can you please help to explain?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Steven Wu <[email protected]>于2025年7月15日 周二04:41写道：
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Manu
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The spec already covers the row lineage carry over (for
>>>>>>>>>>>>>>>>> replace)
>>>>>>>>>>>>>>>>> https://iceberg.apache.org/spec/#row-lineage
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> "When an existing row is moved to a different data file
>>>>>>>>>>>>>>>>> for any reason, writers should write _row_id and
>>>>>>>>>>>>>>>>> _last_updated_sequence_number according to the following
>>>>>>>>>>>>>>>>> rules:"
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> Steven
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Mon, Jul 14, 2025 at 1:38 PM Steven Wu <
>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> another update on the release.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> We have one open PR left for the 1.10.0 milestone
>>>>>>>>>>>>>>>>>> <https://github.com/apache/iceberg/milestone/54> (with
>>>>>>>>>>>>>>>>>> 25 closed PRs). Amogh is actively working on the last 
>>>>>>>>>>>>>>>>>> blocker PR.
>>>>>>>>>>>>>>>>>> Spark 4.0: Preserve row lineage information on compaction
>>>>>>>>>>>>>>>>>> <https://github.com/apache/iceberg/pull/13555>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I will publish a release candidate after the above
>>>>>>>>>>>>>>>>>> blocker is merged and backported.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>> Steven
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Mon, Jul 7, 2025 at 11:56 PM Manu Zhang <
>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hi Amogh,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Is it defined in the table spec that "replace" operation
>>>>>>>>>>>>>>>>>>> should carry over existing lineage info insteading of 
>>>>>>>>>>>>>>>>>>> assigning new IDs? If
>>>>>>>>>>>>>>>>>>> not, we'd better firstly define it in spec because all 
>>>>>>>>>>>>>>>>>>> engines and
>>>>>>>>>>>>>>>>>>> implementations need to follow it.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Tue, Jul 8, 2025 at 11:44 AM Amogh Jahagirdar <
>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> One other area I think we need to make sure works with
>>>>>>>>>>>>>>>>>>>> row lineage before release is data file compaction. At
>>>>>>>>>>>>>>>>>>>> the moment,
>>>>>>>>>>>>>>>>>>>> <https://github.com/apache/iceberg/blob/main/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/SparkBinPackFileRewriteRunner.java#L44>
>>>>>>>>>>>>>>>>>>>>  it
>>>>>>>>>>>>>>>>>>>> looks like compaction will read the records from the data 
>>>>>>>>>>>>>>>>>>>> files without
>>>>>>>>>>>>>>>>>>>> projecting the lineage fields. What this means is that on 
>>>>>>>>>>>>>>>>>>>> write of the new
>>>>>>>>>>>>>>>>>>>> compacted data files we'd be losing the lineage 
>>>>>>>>>>>>>>>>>>>> information. There's no
>>>>>>>>>>>>>>>>>>>> data change in a compaction but we do need to make sure 
>>>>>>>>>>>>>>>>>>>> the lineage info
>>>>>>>>>>>>>>>>>>>> from carried over records is materialized in the newly 
>>>>>>>>>>>>>>>>>>>> compacted files so
>>>>>>>>>>>>>>>>>>>> they don't get new IDs or inherit the new file sequence 
>>>>>>>>>>>>>>>>>>>> number. I'm working
>>>>>>>>>>>>>>>>>>>> on addressing this as well, but I'd call this out as a 
>>>>>>>>>>>>>>>>>>>> blocker as well.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> *Robin Moffatt*
>>>>>>>>>>>>> *Sr. Principal Advisor, Streaming Data Technologies*
>>>>>>>>>>>>>
>>>>>>>>>>>>

Re: Iceberg 1.10.0 release update - July 1, 2025

Reply via email to