Hey all,

The read path for the UnknownType needs some community discussion. I've
raised a separate thread
<https://lists.apache.org/thread/gq9lyndb574ptq7vkz83zgkp1lx7vp5x>. PTAL

Kind regards from Belgium,
Fokko

Op za 26 jul 2025 om 00:58 schreef Ryan Blue <rdb...@gmail.com>:

> I thought that we said we wanted to get support out for v3 features in
> this release unless there is some reasonable blocker, like Spark not having
> geospatial types. To me, I think that means we should aim to get variant
> and unknown done so that we have a complete implementation with a major
> engine. And it should not be particularly difficult to get unknown done so
> I'd opt to get it in.
>
> On Fri, Jul 25, 2025 at 11:28 AM Steven Wu <stevenz...@gmail.com> wrote:
>
>> > I believe we also wanted to get in at least the read path for
>> UnknownType. Fokko has a WIP PR
>> <https://github.com/apache/iceberg/pull/13445> for that.
>> I thought in the community sync the consensus is that this is not a
>> blocker, because it is a new feature implementation. If it is ready, it
>> will be included.
>>
>> On Fri, Jul 25, 2025 at 9:43 AM Kevin Liu <kevinjq...@apache.org> wrote:
>>
>>> I think Fokko's OOO. Should we help with that PR?
>>>
>>> On Fri, Jul 25, 2025 at 9:38 AM Eduard Tudenhöfner <
>>> etudenhoef...@apache.org> wrote:
>>>
>>>> I believe we also wanted to get in at least the read path for
>>>> UnknownType. Fokko has a WIP PR
>>>> <https://github.com/apache/iceberg/pull/13445> for that.
>>>>
>>>> On Fri, Jul 25, 2025 at 6:13 PM Steven Wu <stevenz...@gmail.com> wrote:
>>>>
>>>>> 3. Spark: fix data frame join based on different versions of the same
>>>>> table that may lead to weird results. Anton is working on a fix. It
>>>>> requires a small behavior change (table state may be stale up to refresh
>>>>> interval). Hence it is better to include it in the 1.10.0 release where
>>>>> Spark 4.0 is first supported.
>>>>> 4. Variant support in core and Spark 4.0. Ryan thinks this is very
>>>>> close and will prioritize the review.
>>>>>
>>>>> We still have the above two issues pending. 3 doesn't have a PR yet.
>>>>> PR for 4 is not associated with the milestone yet.
>>>>>
>>>>> On Fri, Jul 25, 2025 at 9:02 AM Kevin Liu <kevinjq...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> Thanks everyone for the review. The 2 PRs are both merged.
>>>>>> Looks like there's only 1 PR left in the 1.10 milestone
>>>>>> <https://github.com/apache/iceberg/milestone/54> :)
>>>>>>
>>>>>> Best,
>>>>>> Kevin Liu
>>>>>>
>>>>>> On Thu, Jul 24, 2025 at 7:44 PM Manu Zhang <owenzhang1...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Thanks Kevin. The first change is not in the versioned doc so it can
>>>>>>> be released anytime.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Manu
>>>>>>>
>>>>>>> On Fri, Jul 25, 2025 at 4:21 AM Kevin Liu <kevinjq...@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> The 3 PRs above are merged. Thanks everyone for the review.
>>>>>>>>
>>>>>>>> I've added 2 more PRs to the 1.10 milestone. These are both
>>>>>>>> nice-to-haves.
>>>>>>>> - docs: add subpage for REST Catalog Spec in "Specification" #13521
>>>>>>>> <https://github.com/apache/iceberg/pull/13521>
>>>>>>>> - REST-Fixture: Ensure strict mode on jdbc catalog for rest fixture
>>>>>>>> #13599 <https://github.com/apache/iceberg/pull/13599>
>>>>>>>>
>>>>>>>> The first one changes the link for "REST Catalog Spec" on the left
>>>>>>>> nav of https://iceberg.apache.org/spec/ from the swagger.io link
>>>>>>>> to a dedicated page for IRC.
>>>>>>>> The second one fixes the default behavior of `iceberg-rest-fixture`
>>>>>>>> image to align with the general expectation when creating a table in a
>>>>>>>> catalog.
>>>>>>>>
>>>>>>>> Please take a look. I would like to have both of these as part of
>>>>>>>> the 1.10 release.
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Kevin Liu
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Jul 23, 2025 at 1:31 PM Kevin Liu <kevinjq...@apache.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Here are the 3 PRs to add corresponding tests.
>>>>>>>>> https://github.com/apache/iceberg/pull/13648
>>>>>>>>> https://github.com/apache/iceberg/pull/13649
>>>>>>>>> https://github.com/apache/iceberg/pull/13650
>>>>>>>>>
>>>>>>>>> I've tagged them with the 1.10 milestone, waiting for CI to
>>>>>>>>> complete :)
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Kevin Liu
>>>>>>>>>
>>>>>>>>> On Wed, Jul 23, 2025 at 1:08 PM Steven Wu <stevenz...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Kevin, thanks for checking that. I will take a look at your
>>>>>>>>>> backport PRs. Can you add them to the 1.10.0 milestone?
>>>>>>>>>>
>>>>>>>>>> On Wed, Jul 23, 2025 at 12:27 PM Kevin Liu <kevinjq...@apache.org>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Thanks again for driving this Steven! We're very close!!
>>>>>>>>>>>
>>>>>>>>>>> As mentioned in the community sync today, I wanted to verify
>>>>>>>>>>> feature parity between Spark 3.5 and Spark 4.0 for this release.
>>>>>>>>>>> I was able to verify that Spark 3.5 and Spark 4.0 have feature
>>>>>>>>>>> parity for this upcoming release. More details in the other devlist 
>>>>>>>>>>> thread
>>>>>>>>>>> https://lists.apache.org/thread/7x7xcm3y87y81c4grq4nn9gdjd4jm05f
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Kevin Liu
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Jul 23, 2025 at 12:17 PM Steven Wu <stevenz...@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Another update on the release.
>>>>>>>>>>>>
>>>>>>>>>>>> The existing blocker PRs are almost done.
>>>>>>>>>>>>
>>>>>>>>>>>> During today's community sync, we identified the following
>>>>>>>>>>>> issues/PRs to be included in the 1.10.0 release.
>>>>>>>>>>>>
>>>>>>>>>>>>    1. backport of PR 13100 to the main branch. I have created
>>>>>>>>>>>>    a cherry-pick PR
>>>>>>>>>>>>    <https://github.com/apache/iceberg/pull/13647> for that.
>>>>>>>>>>>>    There is a one line difference compared to the original PR due 
>>>>>>>>>>>> to the
>>>>>>>>>>>>    removal of the deprecated RemoveSnapshot class in main branch 
>>>>>>>>>>>> for 1.10.0
>>>>>>>>>>>>    target. Amogh has suggested using RemoveSnapshots with a single 
>>>>>>>>>>>> snapshot
>>>>>>>>>>>>    id, which should be supported by all REST catalog servers.
>>>>>>>>>>>>    2. Flink compaction doesn't support row lineage. Fail the
>>>>>>>>>>>>    compaction for V3 tables. I created a PR
>>>>>>>>>>>>    <https://github.com/apache/iceberg/pull/13646> for that.
>>>>>>>>>>>>    Will backport after it is merged.
>>>>>>>>>>>>    3. Spark: fix data frame join based on different versions
>>>>>>>>>>>>    of the same table that may lead to weird results. Anton is 
>>>>>>>>>>>> working on a
>>>>>>>>>>>>    fix. It requires a small behavior change (table state may be 
>>>>>>>>>>>> stale up to
>>>>>>>>>>>>    refresh interval). Hence it is better to include it in the 
>>>>>>>>>>>> 1.10.0 release
>>>>>>>>>>>>    where Spark 4.0 is first supported.
>>>>>>>>>>>>    4. Variant support in core and Spark 4.0. Ryan thinks this
>>>>>>>>>>>>    is very close and will prioritize the review.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> steven
>>>>>>>>>>>>
>>>>>>>>>>>> The 1.10.0 milestone can be found here.
>>>>>>>>>>>> https://github.com/apache/iceberg/milestone/54
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Jul 16, 2025 at 9:15 AM Steven Wu <stevenz...@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Ajantha/Robin, thanks for the note. We can include the PR in
>>>>>>>>>>>>> the 1.10.0 milestone.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Jul 16, 2025 at 3:20 AM Robin Moffatt
>>>>>>>>>>>>> <ro...@confluent.io.invalid> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks Ajantha. Just to confirm, from a Confluent point of
>>>>>>>>>>>>>> view, we will not be able to publish the connector on Confluent 
>>>>>>>>>>>>>> Hub until
>>>>>>>>>>>>>> this CVE[1] is fixed.
>>>>>>>>>>>>>> Since we would not publish a snapshot build, if the fix
>>>>>>>>>>>>>> doesn't make it into 1.10 then we'd have to wait for 1.11 (or a 
>>>>>>>>>>>>>> dot release
>>>>>>>>>>>>>> of 1.10) to be able to include the connector on Confluent Hub.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks, Robin.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>> https://github.com/apache/iceberg/issues/10745#issuecomment-3074300861
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, 16 Jul 2025 at 04:03, Ajantha Bhat <
>>>>>>>>>>>>>> ajanthab...@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I have approached Confluent people
>>>>>>>>>>>>>>> <https://github.com/apache/iceberg/issues/10745#issuecomment-3058281281>
>>>>>>>>>>>>>>> to help us publish the OSS Kafka Connect Iceberg sink plugin.
>>>>>>>>>>>>>>> It seems we have a CVE from dependency that blocks us from
>>>>>>>>>>>>>>> publishing the plugin.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Please include the below PR for 1.10.0 release which fixes
>>>>>>>>>>>>>>> that.
>>>>>>>>>>>>>>> https://github.com/apache/iceberg/pull/13561
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> - Ajantha
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Jul 15, 2025 at 10:48 AM Steven Wu <
>>>>>>>>>>>>>>> stevenz...@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> > Engines may model operations as deleting/inserting rows
>>>>>>>>>>>>>>>> or as modifications to rows that preserve row ids.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Manu, I agree this sentence probably lacks some context.
>>>>>>>>>>>>>>>> The first half (as deleting/inserting rows) is probably
>>>>>>>>>>>>>>>> about the row lineage handling with equality deletes, which is 
>>>>>>>>>>>>>>>> described in
>>>>>>>>>>>>>>>> another place.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> "Row lineage does not track lineage for rows updated via 
>>>>>>>>>>>>>>>> Equality
>>>>>>>>>>>>>>>> Deletes
>>>>>>>>>>>>>>>> <https://iceberg.apache.org/spec/#equality-delete-files>,
>>>>>>>>>>>>>>>> because engines using equality deletes avoid reading existing 
>>>>>>>>>>>>>>>> data before
>>>>>>>>>>>>>>>> writing changes and can't provide the original row ID for the 
>>>>>>>>>>>>>>>> new rows.
>>>>>>>>>>>>>>>> These updates are always treated as if the existing row was 
>>>>>>>>>>>>>>>> completely
>>>>>>>>>>>>>>>> removed and a unique new row was added."
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Mon, Jul 14, 2025 at 5:49 PM Manu Zhang <
>>>>>>>>>>>>>>>> owenzhang1...@gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks Steven, I missed that part but the following
>>>>>>>>>>>>>>>>> sentence is a bit hard to understand (maybe just me)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Engines may model operations as deleting/inserting rows or
>>>>>>>>>>>>>>>>> as modifications to rows that preserve row ids.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Can you please help to explain?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Steven Wu <stevenz...@gmail.com>于2025年7月15日 周二04:41写道:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Manu
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The spec already covers the row lineage carry over (for
>>>>>>>>>>>>>>>>>> replace)
>>>>>>>>>>>>>>>>>> https://iceberg.apache.org/spec/#row-lineage
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> "When an existing row is moved to a different data file
>>>>>>>>>>>>>>>>>> for any reason, writers should write _row_id and
>>>>>>>>>>>>>>>>>> _last_updated_sequence_number according to the following
>>>>>>>>>>>>>>>>>> rules:"
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>> Steven
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Mon, Jul 14, 2025 at 1:38 PM Steven Wu <
>>>>>>>>>>>>>>>>>> stevenz...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> another update on the release.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> We have one open PR left for the 1.10.0 milestone
>>>>>>>>>>>>>>>>>>> <https://github.com/apache/iceberg/milestone/54> (with
>>>>>>>>>>>>>>>>>>> 25 closed PRs). Amogh is actively working on the last 
>>>>>>>>>>>>>>>>>>> blocker PR.
>>>>>>>>>>>>>>>>>>> Spark 4.0: Preserve row lineage information on compaction
>>>>>>>>>>>>>>>>>>> <https://github.com/apache/iceberg/pull/13555>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I will publish a release candidate after the above
>>>>>>>>>>>>>>>>>>> blocker is merged and backported.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>> Steven
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Mon, Jul 7, 2025 at 11:56 PM Manu Zhang <
>>>>>>>>>>>>>>>>>>> owenzhang1...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Hi Amogh,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Is it defined in the table spec that "replace"
>>>>>>>>>>>>>>>>>>>> operation should carry over existing lineage info 
>>>>>>>>>>>>>>>>>>>> insteading of assigning
>>>>>>>>>>>>>>>>>>>> new IDs? If not, we'd better firstly define it in spec 
>>>>>>>>>>>>>>>>>>>> because all engines
>>>>>>>>>>>>>>>>>>>> and implementations need to follow it.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Tue, Jul 8, 2025 at 11:44 AM Amogh Jahagirdar <
>>>>>>>>>>>>>>>>>>>> 2am...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> One other area I think we need to make sure works with
>>>>>>>>>>>>>>>>>>>>> row lineage before release is data file compaction. At
>>>>>>>>>>>>>>>>>>>>> the moment,
>>>>>>>>>>>>>>>>>>>>> <https://github.com/apache/iceberg/blob/main/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/SparkBinPackFileRewriteRunner.java#L44>
>>>>>>>>>>>>>>>>>>>>>  it
>>>>>>>>>>>>>>>>>>>>> looks like compaction will read the records from the data 
>>>>>>>>>>>>>>>>>>>>> files without
>>>>>>>>>>>>>>>>>>>>> projecting the lineage fields. What this means is that on 
>>>>>>>>>>>>>>>>>>>>> write of the new
>>>>>>>>>>>>>>>>>>>>> compacted data files we'd be losing the lineage 
>>>>>>>>>>>>>>>>>>>>> information. There's no
>>>>>>>>>>>>>>>>>>>>> data change in a compaction but we do need to make sure 
>>>>>>>>>>>>>>>>>>>>> the lineage info
>>>>>>>>>>>>>>>>>>>>> from carried over records is materialized in the newly 
>>>>>>>>>>>>>>>>>>>>> compacted files so
>>>>>>>>>>>>>>>>>>>>> they don't get new IDs or inherit the new file sequence 
>>>>>>>>>>>>>>>>>>>>> number. I'm working
>>>>>>>>>>>>>>>>>>>>> on addressing this as well, but I'd call this out as a 
>>>>>>>>>>>>>>>>>>>>> blocker as well.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> *Robin Moffatt*
>>>>>>>>>>>>>> *Sr. Principal Advisor, Streaming Data Technologies*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>

Reply via email to