Thanks again for driving this Steven! We're very close!! As mentioned in the community sync today, I wanted to verify feature parity between Spark 3.5 and Spark 4.0 for this release. I was able to verify that Spark 3.5 and Spark 4.0 have feature parity for this upcoming release. More details in the other devlist thread https://lists.apache.org/thread/7x7xcm3y87y81c4grq4nn9gdjd4jm05f
Thanks, Kevin Liu On Wed, Jul 23, 2025 at 12:17 PM Steven Wu <stevenz...@gmail.com> wrote: > Another update on the release. > > The existing blocker PRs are almost done. > > During today's community sync, we identified the following issues/PRs to > be included in the 1.10.0 release. > > 1. backport of PR 13100 to the main branch. I have created a cherry-pick > PR <https://github.com/apache/iceberg/pull/13647> for that. There is a > one line difference compared to the original PR due to the removal of the > deprecated RemoveSnapshot class in main branch for 1.10.0 target. Amogh has > suggested using RemoveSnapshots with a single snapshot id, which should be > supported by all REST catalog servers. > 2. Flink compaction doesn't support row lineage. Fail the compaction > for V3 tables. I created a PR > <https://github.com/apache/iceberg/pull/13646> for that. Will backport > after it is merged. > 3. Spark: fix data frame join based on different versions of the same > table that may lead to weird results. Anton is working on a fix. It > requires a small behavior change (table state may be stale up to refresh > interval). Hence it is better to include it in the 1.10.0 release where > Spark 4.0 is first supported. > 4. Variant support in core and Spark 4.0. Ryan thinks this is very > close and will prioritize the review. > > Thanks, > steven > > The 1.10.0 milestone can be found here. > https://github.com/apache/iceberg/milestone/54 > > > On Wed, Jul 16, 2025 at 9:15 AM Steven Wu <stevenz...@gmail.com> wrote: > >> Ajantha/Robin, thanks for the note. We can include the PR in the 1.10.0 >> milestone. >> >> >> >> On Wed, Jul 16, 2025 at 3:20 AM Robin Moffatt <ro...@confluent.io.invalid> >> wrote: >> >>> Thanks Ajantha. Just to confirm, from a Confluent point of view, we will >>> not be able to publish the connector on Confluent Hub until this CVE[1] is >>> fixed. >>> Since we would not publish a snapshot build, if the fix doesn't make it >>> into 1.10 then we'd have to wait for 1.11 (or a dot release of 1.10) to be >>> able to include the connector on Confluent Hub. >>> >>> Thanks, Robin. >>> >>> [1] >>> https://github.com/apache/iceberg/issues/10745#issuecomment-3074300861 >>> >>> On Wed, 16 Jul 2025 at 04:03, Ajantha Bhat <ajanthab...@gmail.com> >>> wrote: >>> >>>> I have approached Confluent people >>>> <https://github.com/apache/iceberg/issues/10745#issuecomment-3058281281> >>>> to help us publish the OSS Kafka Connect Iceberg sink plugin. >>>> It seems we have a CVE from dependency that blocks us from publishing >>>> the plugin. >>>> >>>> Please include the below PR for 1.10.0 release which fixes that. >>>> https://github.com/apache/iceberg/pull/13561 >>>> >>>> - Ajantha >>>> >>>> On Tue, Jul 15, 2025 at 10:48 AM Steven Wu <stevenz...@gmail.com> >>>> wrote: >>>> >>>>> > Engines may model operations as deleting/inserting rows or as >>>>> modifications to rows that preserve row ids. >>>>> >>>>> Manu, I agree this sentence probably lacks some context. The first >>>>> half (as deleting/inserting rows) is probably about the row lineage >>>>> handling with equality deletes, which is described in another place. >>>>> >>>>> "Row lineage does not track lineage for rows updated via Equality >>>>> Deletes <https://iceberg.apache.org/spec/#equality-delete-files>, >>>>> because engines using equality deletes avoid reading existing data before >>>>> writing changes and can't provide the original row ID for the new rows. >>>>> These updates are always treated as if the existing row was completely >>>>> removed and a unique new row was added." >>>>> >>>>> On Mon, Jul 14, 2025 at 5:49 PM Manu Zhang <owenzhang1...@gmail.com> >>>>> wrote: >>>>> >>>>>> Thanks Steven, I missed that part but the following sentence is a bit >>>>>> hard to understand (maybe just me) >>>>>> >>>>>> Engines may model operations as deleting/inserting rows or as >>>>>> modifications to rows that preserve row ids. >>>>>> >>>>>> Can you please help to explain? >>>>>> >>>>>> >>>>>> Steven Wu <stevenz...@gmail.com>于2025年7月15日 周二04:41写道: >>>>>> >>>>>>> Manu >>>>>>> >>>>>>> The spec already covers the row lineage carry over (for replace) >>>>>>> https://iceberg.apache.org/spec/#row-lineage >>>>>>> >>>>>>> "When an existing row is moved to a different data file for any >>>>>>> reason, writers should write _row_id and >>>>>>> _last_updated_sequence_number according to the following rules:" >>>>>>> >>>>>>> Thanks, >>>>>>> Steven >>>>>>> >>>>>>> >>>>>>> On Mon, Jul 14, 2025 at 1:38 PM Steven Wu <stevenz...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> another update on the release. >>>>>>>> >>>>>>>> We have one open PR left for the 1.10.0 milestone >>>>>>>> <https://github.com/apache/iceberg/milestone/54> (with 25 closed >>>>>>>> PRs). Amogh is actively working on the last blocker PR. >>>>>>>> Spark 4.0: Preserve row lineage information on compaction >>>>>>>> <https://github.com/apache/iceberg/pull/13555> >>>>>>>> >>>>>>>> I will publish a release candidate after the above blocker is >>>>>>>> merged and backported. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Steven >>>>>>>> >>>>>>>> On Mon, Jul 7, 2025 at 11:56 PM Manu Zhang <owenzhang1...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi Amogh, >>>>>>>>> >>>>>>>>> Is it defined in the table spec that "replace" operation should >>>>>>>>> carry over existing lineage info insteading of assigning new IDs? If >>>>>>>>> not, >>>>>>>>> we'd better firstly define it in spec because all engines and >>>>>>>>> implementations need to follow it. >>>>>>>>> >>>>>>>>> On Tue, Jul 8, 2025 at 11:44 AM Amogh Jahagirdar <2am...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> One other area I think we need to make sure works with row >>>>>>>>>> lineage before release is data file compaction. At the moment, >>>>>>>>>> <https://github.com/apache/iceberg/blob/main/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/SparkBinPackFileRewriteRunner.java#L44> >>>>>>>>>> it >>>>>>>>>> looks like compaction will read the records from the data files >>>>>>>>>> without >>>>>>>>>> projecting the lineage fields. What this means is that on write of >>>>>>>>>> the new >>>>>>>>>> compacted data files we'd be losing the lineage information. There's >>>>>>>>>> no >>>>>>>>>> data change in a compaction but we do need to make sure the lineage >>>>>>>>>> info >>>>>>>>>> from carried over records is materialized in the newly compacted >>>>>>>>>> files so >>>>>>>>>> they don't get new IDs or inherit the new file sequence number. I'm >>>>>>>>>> working >>>>>>>>>> on addressing this as well, but I'd call this out as a blocker as >>>>>>>>>> well. >>>>>>>>>> >>>>>>>>> >>> >>> -- >>> *Robin Moffatt* >>> *Sr. Principal Advisor, Streaming Data Technologies* >>> >>