Hey all, The read path for the UnknownType needs some community discussion. I've raised a separate thread <https://lists.apache.org/thread/gq9lyndb574ptq7vkz83zgkp1lx7vp5x>. PTAL
Kind regards from Belgium, Fokko Op za 26 jul 2025 om 00:58 schreef Ryan Blue <rdb...@gmail.com>: > I thought that we said we wanted to get support out for v3 features in > this release unless there is some reasonable blocker, like Spark not having > geospatial types. To me, I think that means we should aim to get variant > and unknown done so that we have a complete implementation with a major > engine. And it should not be particularly difficult to get unknown done so > I'd opt to get it in. > > On Fri, Jul 25, 2025 at 11:28 AM Steven Wu <stevenz...@gmail.com> wrote: > >> > I believe we also wanted to get in at least the read path for >> UnknownType. Fokko has a WIP PR >> <https://github.com/apache/iceberg/pull/13445> for that. >> I thought in the community sync the consensus is that this is not a >> blocker, because it is a new feature implementation. If it is ready, it >> will be included. >> >> On Fri, Jul 25, 2025 at 9:43 AM Kevin Liu <kevinjq...@apache.org> wrote: >> >>> I think Fokko's OOO. Should we help with that PR? >>> >>> On Fri, Jul 25, 2025 at 9:38 AM Eduard Tudenhöfner < >>> etudenhoef...@apache.org> wrote: >>> >>>> I believe we also wanted to get in at least the read path for >>>> UnknownType. Fokko has a WIP PR >>>> <https://github.com/apache/iceberg/pull/13445> for that. >>>> >>>> On Fri, Jul 25, 2025 at 6:13 PM Steven Wu <stevenz...@gmail.com> wrote: >>>> >>>>> 3. Spark: fix data frame join based on different versions of the same >>>>> table that may lead to weird results. Anton is working on a fix. It >>>>> requires a small behavior change (table state may be stale up to refresh >>>>> interval). Hence it is better to include it in the 1.10.0 release where >>>>> Spark 4.0 is first supported. >>>>> 4. Variant support in core and Spark 4.0. Ryan thinks this is very >>>>> close and will prioritize the review. >>>>> >>>>> We still have the above two issues pending. 3 doesn't have a PR yet. >>>>> PR for 4 is not associated with the milestone yet. >>>>> >>>>> On Fri, Jul 25, 2025 at 9:02 AM Kevin Liu <kevinjq...@apache.org> >>>>> wrote: >>>>> >>>>>> Thanks everyone for the review. The 2 PRs are both merged. >>>>>> Looks like there's only 1 PR left in the 1.10 milestone >>>>>> <https://github.com/apache/iceberg/milestone/54> :) >>>>>> >>>>>> Best, >>>>>> Kevin Liu >>>>>> >>>>>> On Thu, Jul 24, 2025 at 7:44 PM Manu Zhang <owenzhang1...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Thanks Kevin. The first change is not in the versioned doc so it can >>>>>>> be released anytime. >>>>>>> >>>>>>> Regards, >>>>>>> Manu >>>>>>> >>>>>>> On Fri, Jul 25, 2025 at 4:21 AM Kevin Liu <kevinjq...@apache.org> >>>>>>> wrote: >>>>>>> >>>>>>>> The 3 PRs above are merged. Thanks everyone for the review. >>>>>>>> >>>>>>>> I've added 2 more PRs to the 1.10 milestone. These are both >>>>>>>> nice-to-haves. >>>>>>>> - docs: add subpage for REST Catalog Spec in "Specification" #13521 >>>>>>>> <https://github.com/apache/iceberg/pull/13521> >>>>>>>> - REST-Fixture: Ensure strict mode on jdbc catalog for rest fixture >>>>>>>> #13599 <https://github.com/apache/iceberg/pull/13599> >>>>>>>> >>>>>>>> The first one changes the link for "REST Catalog Spec" on the left >>>>>>>> nav of https://iceberg.apache.org/spec/ from the swagger.io link >>>>>>>> to a dedicated page for IRC. >>>>>>>> The second one fixes the default behavior of `iceberg-rest-fixture` >>>>>>>> image to align with the general expectation when creating a table in a >>>>>>>> catalog. >>>>>>>> >>>>>>>> Please take a look. I would like to have both of these as part of >>>>>>>> the 1.10 release. >>>>>>>> >>>>>>>> Best, >>>>>>>> Kevin Liu >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Jul 23, 2025 at 1:31 PM Kevin Liu <kevinjq...@apache.org> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Here are the 3 PRs to add corresponding tests. >>>>>>>>> https://github.com/apache/iceberg/pull/13648 >>>>>>>>> https://github.com/apache/iceberg/pull/13649 >>>>>>>>> https://github.com/apache/iceberg/pull/13650 >>>>>>>>> >>>>>>>>> I've tagged them with the 1.10 milestone, waiting for CI to >>>>>>>>> complete :) >>>>>>>>> >>>>>>>>> Best, >>>>>>>>> Kevin Liu >>>>>>>>> >>>>>>>>> On Wed, Jul 23, 2025 at 1:08 PM Steven Wu <stevenz...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Kevin, thanks for checking that. I will take a look at your >>>>>>>>>> backport PRs. Can you add them to the 1.10.0 milestone? >>>>>>>>>> >>>>>>>>>> On Wed, Jul 23, 2025 at 12:27 PM Kevin Liu <kevinjq...@apache.org> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Thanks again for driving this Steven! We're very close!! >>>>>>>>>>> >>>>>>>>>>> As mentioned in the community sync today, I wanted to verify >>>>>>>>>>> feature parity between Spark 3.5 and Spark 4.0 for this release. >>>>>>>>>>> I was able to verify that Spark 3.5 and Spark 4.0 have feature >>>>>>>>>>> parity for this upcoming release. More details in the other devlist >>>>>>>>>>> thread >>>>>>>>>>> https://lists.apache.org/thread/7x7xcm3y87y81c4grq4nn9gdjd4jm05f >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Kevin Liu >>>>>>>>>>> >>>>>>>>>>> On Wed, Jul 23, 2025 at 12:17 PM Steven Wu <stevenz...@gmail.com> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Another update on the release. >>>>>>>>>>>> >>>>>>>>>>>> The existing blocker PRs are almost done. >>>>>>>>>>>> >>>>>>>>>>>> During today's community sync, we identified the following >>>>>>>>>>>> issues/PRs to be included in the 1.10.0 release. >>>>>>>>>>>> >>>>>>>>>>>> 1. backport of PR 13100 to the main branch. I have created >>>>>>>>>>>> a cherry-pick PR >>>>>>>>>>>> <https://github.com/apache/iceberg/pull/13647> for that. >>>>>>>>>>>> There is a one line difference compared to the original PR due >>>>>>>>>>>> to the >>>>>>>>>>>> removal of the deprecated RemoveSnapshot class in main branch >>>>>>>>>>>> for 1.10.0 >>>>>>>>>>>> target. Amogh has suggested using RemoveSnapshots with a single >>>>>>>>>>>> snapshot >>>>>>>>>>>> id, which should be supported by all REST catalog servers. >>>>>>>>>>>> 2. Flink compaction doesn't support row lineage. Fail the >>>>>>>>>>>> compaction for V3 tables. I created a PR >>>>>>>>>>>> <https://github.com/apache/iceberg/pull/13646> for that. >>>>>>>>>>>> Will backport after it is merged. >>>>>>>>>>>> 3. Spark: fix data frame join based on different versions >>>>>>>>>>>> of the same table that may lead to weird results. Anton is >>>>>>>>>>>> working on a >>>>>>>>>>>> fix. It requires a small behavior change (table state may be >>>>>>>>>>>> stale up to >>>>>>>>>>>> refresh interval). Hence it is better to include it in the >>>>>>>>>>>> 1.10.0 release >>>>>>>>>>>> where Spark 4.0 is first supported. >>>>>>>>>>>> 4. Variant support in core and Spark 4.0. Ryan thinks this >>>>>>>>>>>> is very close and will prioritize the review. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> steven >>>>>>>>>>>> >>>>>>>>>>>> The 1.10.0 milestone can be found here. >>>>>>>>>>>> https://github.com/apache/iceberg/milestone/54 >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Jul 16, 2025 at 9:15 AM Steven Wu <stevenz...@gmail.com> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Ajantha/Robin, thanks for the note. We can include the PR in >>>>>>>>>>>>> the 1.10.0 milestone. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Jul 16, 2025 at 3:20 AM Robin Moffatt >>>>>>>>>>>>> <ro...@confluent.io.invalid> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks Ajantha. Just to confirm, from a Confluent point of >>>>>>>>>>>>>> view, we will not be able to publish the connector on Confluent >>>>>>>>>>>>>> Hub until >>>>>>>>>>>>>> this CVE[1] is fixed. >>>>>>>>>>>>>> Since we would not publish a snapshot build, if the fix >>>>>>>>>>>>>> doesn't make it into 1.10 then we'd have to wait for 1.11 (or a >>>>>>>>>>>>>> dot release >>>>>>>>>>>>>> of 1.10) to be able to include the connector on Confluent Hub. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, Robin. >>>>>>>>>>>>>> >>>>>>>>>>>>>> [1] >>>>>>>>>>>>>> https://github.com/apache/iceberg/issues/10745#issuecomment-3074300861 >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Wed, 16 Jul 2025 at 04:03, Ajantha Bhat < >>>>>>>>>>>>>> ajanthab...@gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> I have approached Confluent people >>>>>>>>>>>>>>> <https://github.com/apache/iceberg/issues/10745#issuecomment-3058281281> >>>>>>>>>>>>>>> to help us publish the OSS Kafka Connect Iceberg sink plugin. >>>>>>>>>>>>>>> It seems we have a CVE from dependency that blocks us from >>>>>>>>>>>>>>> publishing the plugin. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Please include the below PR for 1.10.0 release which fixes >>>>>>>>>>>>>>> that. >>>>>>>>>>>>>>> https://github.com/apache/iceberg/pull/13561 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> - Ajantha >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Tue, Jul 15, 2025 at 10:48 AM Steven Wu < >>>>>>>>>>>>>>> stevenz...@gmail.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> > Engines may model operations as deleting/inserting rows >>>>>>>>>>>>>>>> or as modifications to rows that preserve row ids. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Manu, I agree this sentence probably lacks some context. >>>>>>>>>>>>>>>> The first half (as deleting/inserting rows) is probably >>>>>>>>>>>>>>>> about the row lineage handling with equality deletes, which is >>>>>>>>>>>>>>>> described in >>>>>>>>>>>>>>>> another place. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> "Row lineage does not track lineage for rows updated via >>>>>>>>>>>>>>>> Equality >>>>>>>>>>>>>>>> Deletes >>>>>>>>>>>>>>>> <https://iceberg.apache.org/spec/#equality-delete-files>, >>>>>>>>>>>>>>>> because engines using equality deletes avoid reading existing >>>>>>>>>>>>>>>> data before >>>>>>>>>>>>>>>> writing changes and can't provide the original row ID for the >>>>>>>>>>>>>>>> new rows. >>>>>>>>>>>>>>>> These updates are always treated as if the existing row was >>>>>>>>>>>>>>>> completely >>>>>>>>>>>>>>>> removed and a unique new row was added." >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Mon, Jul 14, 2025 at 5:49 PM Manu Zhang < >>>>>>>>>>>>>>>> owenzhang1...@gmail.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks Steven, I missed that part but the following >>>>>>>>>>>>>>>>> sentence is a bit hard to understand (maybe just me) >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Engines may model operations as deleting/inserting rows or >>>>>>>>>>>>>>>>> as modifications to rows that preserve row ids. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Can you please help to explain? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Steven Wu <stevenz...@gmail.com>于2025年7月15日 周二04:41写道: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Manu >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> The spec already covers the row lineage carry over (for >>>>>>>>>>>>>>>>>> replace) >>>>>>>>>>>>>>>>>> https://iceberg.apache.org/spec/#row-lineage >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> "When an existing row is moved to a different data file >>>>>>>>>>>>>>>>>> for any reason, writers should write _row_id and >>>>>>>>>>>>>>>>>> _last_updated_sequence_number according to the following >>>>>>>>>>>>>>>>>> rules:" >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>> Steven >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Mon, Jul 14, 2025 at 1:38 PM Steven Wu < >>>>>>>>>>>>>>>>>> stevenz...@gmail.com> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> another update on the release. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> We have one open PR left for the 1.10.0 milestone >>>>>>>>>>>>>>>>>>> <https://github.com/apache/iceberg/milestone/54> (with >>>>>>>>>>>>>>>>>>> 25 closed PRs). Amogh is actively working on the last >>>>>>>>>>>>>>>>>>> blocker PR. >>>>>>>>>>>>>>>>>>> Spark 4.0: Preserve row lineage information on compaction >>>>>>>>>>>>>>>>>>> <https://github.com/apache/iceberg/pull/13555> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I will publish a release candidate after the above >>>>>>>>>>>>>>>>>>> blocker is merged and backported. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>> Steven >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Mon, Jul 7, 2025 at 11:56 PM Manu Zhang < >>>>>>>>>>>>>>>>>>> owenzhang1...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Hi Amogh, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Is it defined in the table spec that "replace" >>>>>>>>>>>>>>>>>>>> operation should carry over existing lineage info >>>>>>>>>>>>>>>>>>>> insteading of assigning >>>>>>>>>>>>>>>>>>>> new IDs? If not, we'd better firstly define it in spec >>>>>>>>>>>>>>>>>>>> because all engines >>>>>>>>>>>>>>>>>>>> and implementations need to follow it. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Tue, Jul 8, 2025 at 11:44 AM Amogh Jahagirdar < >>>>>>>>>>>>>>>>>>>> 2am...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> One other area I think we need to make sure works with >>>>>>>>>>>>>>>>>>>>> row lineage before release is data file compaction. At >>>>>>>>>>>>>>>>>>>>> the moment, >>>>>>>>>>>>>>>>>>>>> <https://github.com/apache/iceberg/blob/main/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/SparkBinPackFileRewriteRunner.java#L44> >>>>>>>>>>>>>>>>>>>>> it >>>>>>>>>>>>>>>>>>>>> looks like compaction will read the records from the data >>>>>>>>>>>>>>>>>>>>> files without >>>>>>>>>>>>>>>>>>>>> projecting the lineage fields. What this means is that on >>>>>>>>>>>>>>>>>>>>> write of the new >>>>>>>>>>>>>>>>>>>>> compacted data files we'd be losing the lineage >>>>>>>>>>>>>>>>>>>>> information. There's no >>>>>>>>>>>>>>>>>>>>> data change in a compaction but we do need to make sure >>>>>>>>>>>>>>>>>>>>> the lineage info >>>>>>>>>>>>>>>>>>>>> from carried over records is materialized in the newly >>>>>>>>>>>>>>>>>>>>> compacted files so >>>>>>>>>>>>>>>>>>>>> they don't get new IDs or inherit the new file sequence >>>>>>>>>>>>>>>>>>>>> number. I'm working >>>>>>>>>>>>>>>>>>>>> on addressing this as well, but I'd call this out as a >>>>>>>>>>>>>>>>>>>>> blocker as well. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> *Robin Moffatt* >>>>>>>>>>>>>> *Sr. Principal Advisor, Streaming Data Technologies* >>>>>>>>>>>>>> >>>>>>>>>>>>>