Thanks again for driving this Steven! We're very close!!

As mentioned in the community sync today, I wanted to verify feature parity
between Spark 3.5 and Spark 4.0 for this release.
I was able to verify that Spark 3.5 and Spark 4.0 have feature parity for
this upcoming release. More details in the other devlist thread
https://lists.apache.org/thread/7x7xcm3y87y81c4grq4nn9gdjd4jm05f

Thanks,
Kevin Liu

On Wed, Jul 23, 2025 at 12:17 PM Steven Wu <stevenz...@gmail.com> wrote:

> Another update on the release.
>
> The existing blocker PRs are almost done.
>
> During today's community sync, we identified the following issues/PRs to
> be included in the 1.10.0 release.
>
>    1. backport of PR 13100 to the main branch. I have created a cherry-pick
>    PR <https://github.com/apache/iceberg/pull/13647> for that. There is a
>    one line difference compared to the original PR due to the removal of the
>    deprecated RemoveSnapshot class in main branch for 1.10.0 target. Amogh has
>    suggested using RemoveSnapshots with a single snapshot id, which should be
>    supported by all REST catalog servers.
>    2. Flink compaction doesn't support row lineage. Fail the compaction
>    for V3 tables. I created a PR
>    <https://github.com/apache/iceberg/pull/13646> for that. Will backport
>    after it is merged.
>    3. Spark: fix data frame join based on different versions of the same
>    table that may lead to weird results. Anton is working on a fix. It
>    requires a small behavior change (table state may be stale up to refresh
>    interval). Hence it is better to include it in the 1.10.0 release where
>    Spark 4.0 is first supported.
>    4. Variant support in core and Spark 4.0. Ryan thinks this is very
>    close and will prioritize the review.
>
> Thanks,
> steven
>
> The 1.10.0 milestone can be found here.
> https://github.com/apache/iceberg/milestone/54
>
>
> On Wed, Jul 16, 2025 at 9:15 AM Steven Wu <stevenz...@gmail.com> wrote:
>
>> Ajantha/Robin, thanks for the note. We can include the PR in the 1.10.0
>> milestone.
>>
>>
>>
>> On Wed, Jul 16, 2025 at 3:20 AM Robin Moffatt <ro...@confluent.io.invalid>
>> wrote:
>>
>>> Thanks Ajantha. Just to confirm, from a Confluent point of view, we will
>>> not be able to publish the connector on Confluent Hub until this CVE[1] is
>>> fixed.
>>> Since we would not publish a snapshot build, if the fix doesn't make it
>>> into 1.10 then we'd have to wait for 1.11 (or a dot release of 1.10) to be
>>> able to include the connector on Confluent Hub.
>>>
>>> Thanks, Robin.
>>>
>>> [1]
>>> https://github.com/apache/iceberg/issues/10745#issuecomment-3074300861
>>>
>>> On Wed, 16 Jul 2025 at 04:03, Ajantha Bhat <ajanthab...@gmail.com>
>>> wrote:
>>>
>>>> I have approached Confluent people
>>>> <https://github.com/apache/iceberg/issues/10745#issuecomment-3058281281>
>>>> to help us publish the OSS Kafka Connect Iceberg sink plugin.
>>>> It seems we have a CVE from dependency that blocks us from publishing
>>>> the plugin.
>>>>
>>>> Please include the below PR for 1.10.0 release which fixes that.
>>>> https://github.com/apache/iceberg/pull/13561
>>>>
>>>> - Ajantha
>>>>
>>>> On Tue, Jul 15, 2025 at 10:48 AM Steven Wu <stevenz...@gmail.com>
>>>> wrote:
>>>>
>>>>> > Engines may model operations as deleting/inserting rows or as
>>>>> modifications to rows that preserve row ids.
>>>>>
>>>>> Manu, I agree this sentence probably lacks some context. The first
>>>>> half (as deleting/inserting rows) is probably about the row lineage
>>>>> handling with equality deletes, which is described in another place.
>>>>>
>>>>> "Row lineage does not track lineage for rows updated via Equality
>>>>> Deletes <https://iceberg.apache.org/spec/#equality-delete-files>,
>>>>> because engines using equality deletes avoid reading existing data before
>>>>> writing changes and can't provide the original row ID for the new rows.
>>>>> These updates are always treated as if the existing row was completely
>>>>> removed and a unique new row was added."
>>>>>
>>>>> On Mon, Jul 14, 2025 at 5:49 PM Manu Zhang <owenzhang1...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Thanks Steven, I missed that part but the following sentence is a bit
>>>>>> hard to understand (maybe just me)
>>>>>>
>>>>>> Engines may model operations as deleting/inserting rows or as
>>>>>> modifications to rows that preserve row ids.
>>>>>>
>>>>>> Can you please help to explain?
>>>>>>
>>>>>>
>>>>>> Steven Wu <stevenz...@gmail.com>于2025年7月15日 周二04:41写道:
>>>>>>
>>>>>>> Manu
>>>>>>>
>>>>>>> The spec already covers the row lineage carry over (for replace)
>>>>>>> https://iceberg.apache.org/spec/#row-lineage
>>>>>>>
>>>>>>> "When an existing row is moved to a different data file for any
>>>>>>> reason, writers should write _row_id and
>>>>>>> _last_updated_sequence_number according to the following rules:"
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Steven
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Jul 14, 2025 at 1:38 PM Steven Wu <stevenz...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> another update on the release.
>>>>>>>>
>>>>>>>> We have one open PR left for the 1.10.0 milestone
>>>>>>>> <https://github.com/apache/iceberg/milestone/54> (with 25 closed
>>>>>>>> PRs). Amogh is actively working on the last blocker PR.
>>>>>>>> Spark 4.0: Preserve row lineage information on compaction
>>>>>>>> <https://github.com/apache/iceberg/pull/13555>
>>>>>>>>
>>>>>>>> I will publish a release candidate after the above blocker is
>>>>>>>> merged and backported.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Steven
>>>>>>>>
>>>>>>>> On Mon, Jul 7, 2025 at 11:56 PM Manu Zhang <owenzhang1...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Amogh,
>>>>>>>>>
>>>>>>>>> Is it defined in the table spec that "replace" operation should
>>>>>>>>> carry over existing lineage info insteading of assigning new IDs? If 
>>>>>>>>> not,
>>>>>>>>> we'd better firstly define it in spec because all engines and
>>>>>>>>> implementations need to follow it.
>>>>>>>>>
>>>>>>>>> On Tue, Jul 8, 2025 at 11:44 AM Amogh Jahagirdar <2am...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> One other area I think we need to make sure works with row
>>>>>>>>>> lineage before release is data file compaction. At the moment,
>>>>>>>>>> <https://github.com/apache/iceberg/blob/main/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/SparkBinPackFileRewriteRunner.java#L44>
>>>>>>>>>>  it
>>>>>>>>>> looks like compaction will read the records from the data files 
>>>>>>>>>> without
>>>>>>>>>> projecting the lineage fields. What this means is that on write of 
>>>>>>>>>> the new
>>>>>>>>>> compacted data files we'd be losing the lineage information. There's 
>>>>>>>>>> no
>>>>>>>>>> data change in a compaction but we do need to make sure the lineage 
>>>>>>>>>> info
>>>>>>>>>> from carried over records is materialized in the newly compacted 
>>>>>>>>>> files so
>>>>>>>>>> they don't get new IDs or inherit the new file sequence number. I'm 
>>>>>>>>>> working
>>>>>>>>>> on addressing this as well, but I'd call this out as a blocker as 
>>>>>>>>>> well.
>>>>>>>>>>
>>>>>>>>>
>>>
>>> --
>>> *Robin Moffatt*
>>> *Sr. Principal Advisor, Streaming Data Technologies*
>>>
>>

Reply via email to