I thought that we said we wanted to get support out for v3 features in this release unless there is some reasonable blocker, like Spark not having geospatial types. To me, I think that means we should aim to get variant and unknown done so that we have a complete implementation with a major engine. And it should not be particularly difficult to get unknown done so I'd opt to get it in.
On Fri, Jul 25, 2025 at 11:28 AM Steven Wu <stevenz...@gmail.com> wrote: > > I believe we also wanted to get in at least the read path for > UnknownType. Fokko has a WIP PR > <https://github.com/apache/iceberg/pull/13445> for that. > I thought in the community sync the consensus is that this is not a > blocker, because it is a new feature implementation. If it is ready, it > will be included. > > On Fri, Jul 25, 2025 at 9:43 AM Kevin Liu <kevinjq...@apache.org> wrote: > >> I think Fokko's OOO. Should we help with that PR? >> >> On Fri, Jul 25, 2025 at 9:38 AM Eduard Tudenhöfner < >> etudenhoef...@apache.org> wrote: >> >>> I believe we also wanted to get in at least the read path for >>> UnknownType. Fokko has a WIP PR >>> <https://github.com/apache/iceberg/pull/13445> for that. >>> >>> On Fri, Jul 25, 2025 at 6:13 PM Steven Wu <stevenz...@gmail.com> wrote: >>> >>>> 3. Spark: fix data frame join based on different versions of the same >>>> table that may lead to weird results. Anton is working on a fix. It >>>> requires a small behavior change (table state may be stale up to refresh >>>> interval). Hence it is better to include it in the 1.10.0 release where >>>> Spark 4.0 is first supported. >>>> 4. Variant support in core and Spark 4.0. Ryan thinks this is very >>>> close and will prioritize the review. >>>> >>>> We still have the above two issues pending. 3 doesn't have a PR yet. PR >>>> for 4 is not associated with the milestone yet. >>>> >>>> On Fri, Jul 25, 2025 at 9:02 AM Kevin Liu <kevinjq...@apache.org> >>>> wrote: >>>> >>>>> Thanks everyone for the review. The 2 PRs are both merged. >>>>> Looks like there's only 1 PR left in the 1.10 milestone >>>>> <https://github.com/apache/iceberg/milestone/54> :) >>>>> >>>>> Best, >>>>> Kevin Liu >>>>> >>>>> On Thu, Jul 24, 2025 at 7:44 PM Manu Zhang <owenzhang1...@gmail.com> >>>>> wrote: >>>>> >>>>>> Thanks Kevin. The first change is not in the versioned doc so it can >>>>>> be released anytime. >>>>>> >>>>>> Regards, >>>>>> Manu >>>>>> >>>>>> On Fri, Jul 25, 2025 at 4:21 AM Kevin Liu <kevinjq...@apache.org> >>>>>> wrote: >>>>>> >>>>>>> The 3 PRs above are merged. Thanks everyone for the review. >>>>>>> >>>>>>> I've added 2 more PRs to the 1.10 milestone. These are both >>>>>>> nice-to-haves. >>>>>>> - docs: add subpage for REST Catalog Spec in "Specification" #13521 >>>>>>> <https://github.com/apache/iceberg/pull/13521> >>>>>>> - REST-Fixture: Ensure strict mode on jdbc catalog for rest fixture >>>>>>> #13599 <https://github.com/apache/iceberg/pull/13599> >>>>>>> >>>>>>> The first one changes the link for "REST Catalog Spec" on the left >>>>>>> nav of https://iceberg.apache.org/spec/ from the swagger.io link to >>>>>>> a dedicated page for IRC. >>>>>>> The second one fixes the default behavior of `iceberg-rest-fixture` >>>>>>> image to align with the general expectation when creating a table in a >>>>>>> catalog. >>>>>>> >>>>>>> Please take a look. I would like to have both of these as part of >>>>>>> the 1.10 release. >>>>>>> >>>>>>> Best, >>>>>>> Kevin Liu >>>>>>> >>>>>>> >>>>>>> On Wed, Jul 23, 2025 at 1:31 PM Kevin Liu <kevinjq...@apache.org> >>>>>>> wrote: >>>>>>> >>>>>>>> Here are the 3 PRs to add corresponding tests. >>>>>>>> https://github.com/apache/iceberg/pull/13648 >>>>>>>> https://github.com/apache/iceberg/pull/13649 >>>>>>>> https://github.com/apache/iceberg/pull/13650 >>>>>>>> >>>>>>>> I've tagged them with the 1.10 milestone, waiting for CI to >>>>>>>> complete :) >>>>>>>> >>>>>>>> Best, >>>>>>>> Kevin Liu >>>>>>>> >>>>>>>> On Wed, Jul 23, 2025 at 1:08 PM Steven Wu <stevenz...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Kevin, thanks for checking that. I will take a look at your >>>>>>>>> backport PRs. Can you add them to the 1.10.0 milestone? >>>>>>>>> >>>>>>>>> On Wed, Jul 23, 2025 at 12:27 PM Kevin Liu <kevinjq...@apache.org> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Thanks again for driving this Steven! We're very close!! >>>>>>>>>> >>>>>>>>>> As mentioned in the community sync today, I wanted to verify >>>>>>>>>> feature parity between Spark 3.5 and Spark 4.0 for this release. >>>>>>>>>> I was able to verify that Spark 3.5 and Spark 4.0 have feature >>>>>>>>>> parity for this upcoming release. More details in the other devlist >>>>>>>>>> thread >>>>>>>>>> https://lists.apache.org/thread/7x7xcm3y87y81c4grq4nn9gdjd4jm05f >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Kevin Liu >>>>>>>>>> >>>>>>>>>> On Wed, Jul 23, 2025 at 12:17 PM Steven Wu <stevenz...@gmail.com> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Another update on the release. >>>>>>>>>>> >>>>>>>>>>> The existing blocker PRs are almost done. >>>>>>>>>>> >>>>>>>>>>> During today's community sync, we identified the following >>>>>>>>>>> issues/PRs to be included in the 1.10.0 release. >>>>>>>>>>> >>>>>>>>>>> 1. backport of PR 13100 to the main branch. I have created a >>>>>>>>>>> cherry-pick >>>>>>>>>>> PR <https://github.com/apache/iceberg/pull/13647> for that. >>>>>>>>>>> There is a one line difference compared to the original PR due >>>>>>>>>>> to the >>>>>>>>>>> removal of the deprecated RemoveSnapshot class in main branch >>>>>>>>>>> for 1.10.0 >>>>>>>>>>> target. Amogh has suggested using RemoveSnapshots with a single >>>>>>>>>>> snapshot >>>>>>>>>>> id, which should be supported by all REST catalog servers. >>>>>>>>>>> 2. Flink compaction doesn't support row lineage. Fail the >>>>>>>>>>> compaction for V3 tables. I created a PR >>>>>>>>>>> <https://github.com/apache/iceberg/pull/13646> for that. >>>>>>>>>>> Will backport after it is merged. >>>>>>>>>>> 3. Spark: fix data frame join based on different versions of >>>>>>>>>>> the same table that may lead to weird results. Anton is working >>>>>>>>>>> on a fix. >>>>>>>>>>> It requires a small behavior change (table state may be stale up >>>>>>>>>>> to refresh >>>>>>>>>>> interval). Hence it is better to include it in the 1.10.0 >>>>>>>>>>> release where >>>>>>>>>>> Spark 4.0 is first supported. >>>>>>>>>>> 4. Variant support in core and Spark 4.0. Ryan thinks this >>>>>>>>>>> is very close and will prioritize the review. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> steven >>>>>>>>>>> >>>>>>>>>>> The 1.10.0 milestone can be found here. >>>>>>>>>>> https://github.com/apache/iceberg/milestone/54 >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Wed, Jul 16, 2025 at 9:15 AM Steven Wu <stevenz...@gmail.com> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Ajantha/Robin, thanks for the note. We can include the PR in >>>>>>>>>>>> the 1.10.0 milestone. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Jul 16, 2025 at 3:20 AM Robin Moffatt >>>>>>>>>>>> <ro...@confluent.io.invalid> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Thanks Ajantha. Just to confirm, from a Confluent point of >>>>>>>>>>>>> view, we will not be able to publish the connector on Confluent >>>>>>>>>>>>> Hub until >>>>>>>>>>>>> this CVE[1] is fixed. >>>>>>>>>>>>> Since we would not publish a snapshot build, if the fix >>>>>>>>>>>>> doesn't make it into 1.10 then we'd have to wait for 1.11 (or a >>>>>>>>>>>>> dot release >>>>>>>>>>>>> of 1.10) to be able to include the connector on Confluent Hub. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, Robin. >>>>>>>>>>>>> >>>>>>>>>>>>> [1] >>>>>>>>>>>>> https://github.com/apache/iceberg/issues/10745#issuecomment-3074300861 >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, 16 Jul 2025 at 04:03, Ajantha Bhat < >>>>>>>>>>>>> ajanthab...@gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> I have approached Confluent people >>>>>>>>>>>>>> <https://github.com/apache/iceberg/issues/10745#issuecomment-3058281281> >>>>>>>>>>>>>> to help us publish the OSS Kafka Connect Iceberg sink plugin. >>>>>>>>>>>>>> It seems we have a CVE from dependency that blocks us from >>>>>>>>>>>>>> publishing the plugin. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Please include the below PR for 1.10.0 release which fixes >>>>>>>>>>>>>> that. >>>>>>>>>>>>>> https://github.com/apache/iceberg/pull/13561 >>>>>>>>>>>>>> >>>>>>>>>>>>>> - Ajantha >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Tue, Jul 15, 2025 at 10:48 AM Steven Wu < >>>>>>>>>>>>>> stevenz...@gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> > Engines may model operations as deleting/inserting rows >>>>>>>>>>>>>>> or as modifications to rows that preserve row ids. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Manu, I agree this sentence probably lacks some context. The >>>>>>>>>>>>>>> first half (as deleting/inserting rows) is probably about >>>>>>>>>>>>>>> the row lineage handling with equality deletes, which is >>>>>>>>>>>>>>> described in >>>>>>>>>>>>>>> another place. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> "Row lineage does not track lineage for rows updated via >>>>>>>>>>>>>>> Equality >>>>>>>>>>>>>>> Deletes >>>>>>>>>>>>>>> <https://iceberg.apache.org/spec/#equality-delete-files>, >>>>>>>>>>>>>>> because engines using equality deletes avoid reading existing >>>>>>>>>>>>>>> data before >>>>>>>>>>>>>>> writing changes and can't provide the original row ID for the >>>>>>>>>>>>>>> new rows. >>>>>>>>>>>>>>> These updates are always treated as if the existing row was >>>>>>>>>>>>>>> completely >>>>>>>>>>>>>>> removed and a unique new row was added." >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Mon, Jul 14, 2025 at 5:49 PM Manu Zhang < >>>>>>>>>>>>>>> owenzhang1...@gmail.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks Steven, I missed that part but the following >>>>>>>>>>>>>>>> sentence is a bit hard to understand (maybe just me) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Engines may model operations as deleting/inserting rows or >>>>>>>>>>>>>>>> as modifications to rows that preserve row ids. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Can you please help to explain? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Steven Wu <stevenz...@gmail.com>于2025年7月15日 周二04:41写道: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Manu >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The spec already covers the row lineage carry over (for >>>>>>>>>>>>>>>>> replace) >>>>>>>>>>>>>>>>> https://iceberg.apache.org/spec/#row-lineage >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> "When an existing row is moved to a different data file >>>>>>>>>>>>>>>>> for any reason, writers should write _row_id and >>>>>>>>>>>>>>>>> _last_updated_sequence_number according to the following >>>>>>>>>>>>>>>>> rules:" >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>> Steven >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Mon, Jul 14, 2025 at 1:38 PM Steven Wu < >>>>>>>>>>>>>>>>> stevenz...@gmail.com> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> another update on the release. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> We have one open PR left for the 1.10.0 milestone >>>>>>>>>>>>>>>>>> <https://github.com/apache/iceberg/milestone/54> (with >>>>>>>>>>>>>>>>>> 25 closed PRs). Amogh is actively working on the last >>>>>>>>>>>>>>>>>> blocker PR. >>>>>>>>>>>>>>>>>> Spark 4.0: Preserve row lineage information on compaction >>>>>>>>>>>>>>>>>> <https://github.com/apache/iceberg/pull/13555> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I will publish a release candidate after the above >>>>>>>>>>>>>>>>>> blocker is merged and backported. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>> Steven >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Mon, Jul 7, 2025 at 11:56 PM Manu Zhang < >>>>>>>>>>>>>>>>>> owenzhang1...@gmail.com> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Hi Amogh, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Is it defined in the table spec that "replace" operation >>>>>>>>>>>>>>>>>>> should carry over existing lineage info insteading of >>>>>>>>>>>>>>>>>>> assigning new IDs? If >>>>>>>>>>>>>>>>>>> not, we'd better firstly define it in spec because all >>>>>>>>>>>>>>>>>>> engines and >>>>>>>>>>>>>>>>>>> implementations need to follow it. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Tue, Jul 8, 2025 at 11:44 AM Amogh Jahagirdar < >>>>>>>>>>>>>>>>>>> 2am...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> One other area I think we need to make sure works with >>>>>>>>>>>>>>>>>>>> row lineage before release is data file compaction. At >>>>>>>>>>>>>>>>>>>> the moment, >>>>>>>>>>>>>>>>>>>> <https://github.com/apache/iceberg/blob/main/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/SparkBinPackFileRewriteRunner.java#L44> >>>>>>>>>>>>>>>>>>>> it >>>>>>>>>>>>>>>>>>>> looks like compaction will read the records from the data >>>>>>>>>>>>>>>>>>>> files without >>>>>>>>>>>>>>>>>>>> projecting the lineage fields. What this means is that on >>>>>>>>>>>>>>>>>>>> write of the new >>>>>>>>>>>>>>>>>>>> compacted data files we'd be losing the lineage >>>>>>>>>>>>>>>>>>>> information. There's no >>>>>>>>>>>>>>>>>>>> data change in a compaction but we do need to make sure >>>>>>>>>>>>>>>>>>>> the lineage info >>>>>>>>>>>>>>>>>>>> from carried over records is materialized in the newly >>>>>>>>>>>>>>>>>>>> compacted files so >>>>>>>>>>>>>>>>>>>> they don't get new IDs or inherit the new file sequence >>>>>>>>>>>>>>>>>>>> number. I'm working >>>>>>>>>>>>>>>>>>>> on addressing this as well, but I'd call this out as a >>>>>>>>>>>>>>>>>>>> blocker as well. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> *Robin Moffatt* >>>>>>>>>>>>> *Sr. Principal Advisor, Streaming Data Technologies* >>>>>>>>>>>>> >>>>>>>>>>>>