Re: Iceberg 1.10.0 release update - July 1, 2025

Steven Wu Wed, 16 Jul 2025 09:17:42 -0700

Ajantha/Robin, thanks for the note. We can include the PR in the 1.10.0
milestone.




On Wed, Jul 16, 2025 at 3:20 AM Robin Moffatt <[email protected]>
wrote:

> Thanks Ajantha. Just to confirm, from a Confluent point of view, we will
> not be able to publish the connector on Confluent Hub until this CVE[1] is
> fixed.
> Since we would not publish a snapshot build, if the fix doesn't make it
> into 1.10 then we'd have to wait for 1.11 (or a dot release of 1.10) to be
> able to include the connector on Confluent Hub.
>
> Thanks, Robin.
>
> [1] https://github.com/apache/iceberg/issues/10745#issuecomment-3074300861
>
> On Wed, 16 Jul 2025 at 04:03, Ajantha Bhat <[email protected]> wrote:
>
>> I have approached Confluent people
>> <https://github.com/apache/iceberg/issues/10745#issuecomment-3058281281>
>> to help us publish the OSS Kafka Connect Iceberg sink plugin.
>> It seems we have a CVE from dependency that blocks us from publishing the
>> plugin.
>>
>> Please include the below PR for 1.10.0 release which fixes that.
>> https://github.com/apache/iceberg/pull/13561
>>
>> - Ajantha
>>
>> On Tue, Jul 15, 2025 at 10:48 AM Steven Wu <[email protected]> wrote:
>>
>>> > Engines may model operations as deleting/inserting rows or as
>>> modifications to rows that preserve row ids.
>>>
>>> Manu, I agree this sentence probably lacks some context. The first half (as
>>> deleting/inserting rows) is probably about the row lineage handling
>>> with equality deletes, which is described in another place.
>>>
>>> "Row lineage does not track lineage for rows updated via Equality
>>> Deletes <https://iceberg.apache.org/spec/#equality-delete-files>,
>>> because engines using equality deletes avoid reading existing data before
>>> writing changes and can't provide the original row ID for the new rows.
>>> These updates are always treated as if the existing row was completely
>>> removed and a unique new row was added."
>>>
>>> On Mon, Jul 14, 2025 at 5:49 PM Manu Zhang <[email protected]>
>>> wrote:
>>>
>>>> Thanks Steven, I missed that part but the following sentence is a bit
>>>> hard to understand (maybe just me)
>>>>
>>>> Engines may model operations as deleting/inserting rows or as
>>>> modifications to rows that preserve row ids.
>>>>
>>>> Can you please help to explain?
>>>>
>>>>
>>>> Steven Wu <[email protected]>于2025年7月15日 周二04:41写道：
>>>>
>>>>> Manu
>>>>>
>>>>> The spec already covers the row lineage carry over (for replace)
>>>>> https://iceberg.apache.org/spec/#row-lineage
>>>>>
>>>>> "When an existing row is moved to a different data file for any
>>>>> reason, writers should write _row_id and _last_updated_sequence_number 
>>>>> according
>>>>> to the following rules:"
>>>>>
>>>>> Thanks,
>>>>> Steven
>>>>>
>>>>>
>>>>> On Mon, Jul 14, 2025 at 1:38 PM Steven Wu <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> another update on the release.
>>>>>>
>>>>>> We have one open PR left for the 1.10.0 milestone
>>>>>> <https://github.com/apache/iceberg/milestone/54> (with 25 closed
>>>>>> PRs). Amogh is actively working on the last blocker PR.
>>>>>> Spark 4.0: Preserve row lineage information on compaction
>>>>>> <https://github.com/apache/iceberg/pull/13555>
>>>>>>
>>>>>> I will publish a release candidate after the above blocker is merged
>>>>>> and backported.
>>>>>>
>>>>>> Thanks,
>>>>>> Steven
>>>>>>
>>>>>> On Mon, Jul 7, 2025 at 11:56 PM Manu Zhang <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Amogh,
>>>>>>>
>>>>>>> Is it defined in the table spec that "replace" operation should
>>>>>>> carry over existing lineage info insteading of assigning new IDs? If 
>>>>>>> not,
>>>>>>> we'd better firstly define it in spec because all engines and
>>>>>>> implementations need to follow it.
>>>>>>>
>>>>>>> On Tue, Jul 8, 2025 at 11:44 AM Amogh Jahagirdar <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> One other area I think we need to make sure works with row lineage
>>>>>>>> before release is data file compaction. At the moment,
>>>>>>>> <https://github.com/apache/iceberg/blob/main/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/SparkBinPackFileRewriteRunner.java#L44>
>>>>>>>>  it
>>>>>>>> looks like compaction will read the records from the data files without
>>>>>>>> projecting the lineage fields. What this means is that on write of the 
>>>>>>>> new
>>>>>>>> compacted data files we'd be losing the lineage information. There's no
>>>>>>>> data change in a compaction but we do need to make sure the lineage 
>>>>>>>> info
>>>>>>>> from carried over records is materialized in the newly compacted files 
>>>>>>>> so
>>>>>>>> they don't get new IDs or inherit the new file sequence number. I'm 
>>>>>>>> working
>>>>>>>> on addressing this as well, but I'd call this out as a blocker as well.
>>>>>>>>
>>>>>>>
>
> --
> *Robin Moffatt*
> *Sr. Principal Advisor, Streaming Data Technologies*
>

Re: Iceberg 1.10.0 release update - July 1, 2025

Reply via email to