Re: Iceberg 1.10.0 release update - July 1, 2025

Steven Wu Wed, 23 Jul 2025 12:17:03 -0700

Another update on the release.

The existing blocker PRs are almost done.


During today's community sync, we identified the following issues/PRs to be
included in the 1.10.0 release.

   1. backport of PR 13100 to the main branch. I have created a cherry-pick
   PR <https://github.com/apache/iceberg/pull/13647> for that. There is a
   one line difference compared to the original PR due to the removal of the
   deprecated RemoveSnapshot class in main branch for 1.10.0 target. Amogh has
   suggested using RemoveSnapshots with a single snapshot id, which should be
   supported by all REST catalog servers.
   2. Flink compaction doesn't support row lineage. Fail the compaction for
   V3 tables. I created a PR <https://github.com/apache/iceberg/pull/13646>
   for that. Will backport after it is merged.
   3. Spark: fix data frame join based on different versions of the same
   table that may lead to weird results. Anton is working on a fix. It
   requires a small behavior change (table state may be stale up to refresh
   interval). Hence it is better to include it in the 1.10.0 release where
   Spark 4.0 is first supported.
   4. Variant support in core and Spark 4.0. Ryan thinks this is very close
   and will prioritize the review.

Thanks,
steven

The 1.10.0 milestone can be found here.
https://github.com/apache/iceberg/milestone/54


On Wed, Jul 16, 2025 at 9:15 AM Steven Wu <stevenz...@gmail.com> wrote:

> Ajantha/Robin, thanks for the note. We can include the PR in the 1.10.0
> milestone.
>
>
>
> On Wed, Jul 16, 2025 at 3:20 AM Robin Moffatt <ro...@confluent.io.invalid>
> wrote:
>
>> Thanks Ajantha. Just to confirm, from a Confluent point of view, we will
>> not be able to publish the connector on Confluent Hub until this CVE[1] is
>> fixed.
>> Since we would not publish a snapshot build, if the fix doesn't make it
>> into 1.10 then we'd have to wait for 1.11 (or a dot release of 1.10) to be
>> able to include the connector on Confluent Hub.
>>
>> Thanks, Robin.
>>
>> [1]
>> https://github.com/apache/iceberg/issues/10745#issuecomment-3074300861
>>
>> On Wed, 16 Jul 2025 at 04:03, Ajantha Bhat <ajanthab...@gmail.com> wrote:
>>
>>> I have approached Confluent people
>>> <https://github.com/apache/iceberg/issues/10745#issuecomment-3058281281>
>>> to help us publish the OSS Kafka Connect Iceberg sink plugin.
>>> It seems we have a CVE from dependency that blocks us from publishing
>>> the plugin.
>>>
>>> Please include the below PR for 1.10.0 release which fixes that.
>>> https://github.com/apache/iceberg/pull/13561
>>>
>>> - Ajantha
>>>
>>> On Tue, Jul 15, 2025 at 10:48 AM Steven Wu <stevenz...@gmail.com> wrote:
>>>
>>>> > Engines may model operations as deleting/inserting rows or as
>>>> modifications to rows that preserve row ids.
>>>>
>>>> Manu, I agree this sentence probably lacks some context. The first half
>>>> (as deleting/inserting rows) is probably about the row lineage
>>>> handling with equality deletes, which is described in another place.
>>>>
>>>> "Row lineage does not track lineage for rows updated via Equality
>>>> Deletes <https://iceberg.apache.org/spec/#equality-delete-files>,
>>>> because engines using equality deletes avoid reading existing data before
>>>> writing changes and can't provide the original row ID for the new rows.
>>>> These updates are always treated as if the existing row was completely
>>>> removed and a unique new row was added."
>>>>
>>>> On Mon, Jul 14, 2025 at 5:49 PM Manu Zhang <owenzhang1...@gmail.com>
>>>> wrote:
>>>>
>>>>> Thanks Steven, I missed that part but the following sentence is a bit
>>>>> hard to understand (maybe just me)
>>>>>
>>>>> Engines may model operations as deleting/inserting rows or as
>>>>> modifications to rows that preserve row ids.
>>>>>
>>>>> Can you please help to explain?
>>>>>
>>>>>
>>>>> Steven Wu <stevenz...@gmail.com>于2025年7月15日 周二04:41写道：
>>>>>
>>>>>> Manu
>>>>>>
>>>>>> The spec already covers the row lineage carry over (for replace)
>>>>>> https://iceberg.apache.org/spec/#row-lineage
>>>>>>
>>>>>> "When an existing row is moved to a different data file for any
>>>>>> reason, writers should write _row_id and
>>>>>> _last_updated_sequence_number according to the following rules:"
>>>>>>
>>>>>> Thanks,
>>>>>> Steven
>>>>>>
>>>>>>
>>>>>> On Mon, Jul 14, 2025 at 1:38 PM Steven Wu <stevenz...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> another update on the release.
>>>>>>>
>>>>>>> We have one open PR left for the 1.10.0 milestone
>>>>>>> <https://github.com/apache/iceberg/milestone/54> (with 25 closed
>>>>>>> PRs). Amogh is actively working on the last blocker PR.
>>>>>>> Spark 4.0: Preserve row lineage information on compaction
>>>>>>> <https://github.com/apache/iceberg/pull/13555>
>>>>>>>
>>>>>>> I will publish a release candidate after the above blocker is merged
>>>>>>> and backported.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Steven
>>>>>>>
>>>>>>> On Mon, Jul 7, 2025 at 11:56 PM Manu Zhang <owenzhang1...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Amogh,
>>>>>>>>
>>>>>>>> Is it defined in the table spec that "replace" operation should
>>>>>>>> carry over existing lineage info insteading of assigning new IDs? If 
>>>>>>>> not,
>>>>>>>> we'd better firstly define it in spec because all engines and
>>>>>>>> implementations need to follow it.
>>>>>>>>
>>>>>>>> On Tue, Jul 8, 2025 at 11:44 AM Amogh Jahagirdar <2am...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> One other area I think we need to make sure works with row lineage
>>>>>>>>> before release is data file compaction. At the moment,
>>>>>>>>> <https://github.com/apache/iceberg/blob/main/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/SparkBinPackFileRewriteRunner.java#L44>
>>>>>>>>>  it
>>>>>>>>> looks like compaction will read the records from the data files 
>>>>>>>>> without
>>>>>>>>> projecting the lineage fields. What this means is that on write of 
>>>>>>>>> the new
>>>>>>>>> compacted data files we'd be losing the lineage information. There's 
>>>>>>>>> no
>>>>>>>>> data change in a compaction but we do need to make sure the lineage 
>>>>>>>>> info
>>>>>>>>> from carried over records is materialized in the newly compacted 
>>>>>>>>> files so
>>>>>>>>> they don't get new IDs or inherit the new file sequence number. I'm 
>>>>>>>>> working
>>>>>>>>> on addressing this as well, but I'd call this out as a blocker as 
>>>>>>>>> well.
>>>>>>>>>
>>>>>>>>
>>
>> --
>> *Robin Moffatt*
>> *Sr. Principal Advisor, Streaming Data Technologies*
>>
>

Re: Iceberg 1.10.0 release update - July 1, 2025

Reply via email to