Ajantha/Robin, thanks for the note. We can include the PR in the 1.10.0 milestone.
On Wed, Jul 16, 2025 at 3:20 AM Robin Moffatt <ro...@confluent.io.invalid> wrote: > Thanks Ajantha. Just to confirm, from a Confluent point of view, we will > not be able to publish the connector on Confluent Hub until this CVE[1] is > fixed. > Since we would not publish a snapshot build, if the fix doesn't make it > into 1.10 then we'd have to wait for 1.11 (or a dot release of 1.10) to be > able to include the connector on Confluent Hub. > > Thanks, Robin. > > [1] https://github.com/apache/iceberg/issues/10745#issuecomment-3074300861 > > On Wed, 16 Jul 2025 at 04:03, Ajantha Bhat <ajanthab...@gmail.com> wrote: > >> I have approached Confluent people >> <https://github.com/apache/iceberg/issues/10745#issuecomment-3058281281> >> to help us publish the OSS Kafka Connect Iceberg sink plugin. >> It seems we have a CVE from dependency that blocks us from publishing the >> plugin. >> >> Please include the below PR for 1.10.0 release which fixes that. >> https://github.com/apache/iceberg/pull/13561 >> >> - Ajantha >> >> On Tue, Jul 15, 2025 at 10:48 AM Steven Wu <stevenz...@gmail.com> wrote: >> >>> > Engines may model operations as deleting/inserting rows or as >>> modifications to rows that preserve row ids. >>> >>> Manu, I agree this sentence probably lacks some context. The first half (as >>> deleting/inserting rows) is probably about the row lineage handling >>> with equality deletes, which is described in another place. >>> >>> "Row lineage does not track lineage for rows updated via Equality >>> Deletes <https://iceberg.apache.org/spec/#equality-delete-files>, >>> because engines using equality deletes avoid reading existing data before >>> writing changes and can't provide the original row ID for the new rows. >>> These updates are always treated as if the existing row was completely >>> removed and a unique new row was added." >>> >>> On Mon, Jul 14, 2025 at 5:49 PM Manu Zhang <owenzhang1...@gmail.com> >>> wrote: >>> >>>> Thanks Steven, I missed that part but the following sentence is a bit >>>> hard to understand (maybe just me) >>>> >>>> Engines may model operations as deleting/inserting rows or as >>>> modifications to rows that preserve row ids. >>>> >>>> Can you please help to explain? >>>> >>>> >>>> Steven Wu <stevenz...@gmail.com>于2025年7月15日 周二04:41写道: >>>> >>>>> Manu >>>>> >>>>> The spec already covers the row lineage carry over (for replace) >>>>> https://iceberg.apache.org/spec/#row-lineage >>>>> >>>>> "When an existing row is moved to a different data file for any >>>>> reason, writers should write _row_id and _last_updated_sequence_number >>>>> according >>>>> to the following rules:" >>>>> >>>>> Thanks, >>>>> Steven >>>>> >>>>> >>>>> On Mon, Jul 14, 2025 at 1:38 PM Steven Wu <stevenz...@gmail.com> >>>>> wrote: >>>>> >>>>>> another update on the release. >>>>>> >>>>>> We have one open PR left for the 1.10.0 milestone >>>>>> <https://github.com/apache/iceberg/milestone/54> (with 25 closed >>>>>> PRs). Amogh is actively working on the last blocker PR. >>>>>> Spark 4.0: Preserve row lineage information on compaction >>>>>> <https://github.com/apache/iceberg/pull/13555> >>>>>> >>>>>> I will publish a release candidate after the above blocker is merged >>>>>> and backported. >>>>>> >>>>>> Thanks, >>>>>> Steven >>>>>> >>>>>> On Mon, Jul 7, 2025 at 11:56 PM Manu Zhang <owenzhang1...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Hi Amogh, >>>>>>> >>>>>>> Is it defined in the table spec that "replace" operation should >>>>>>> carry over existing lineage info insteading of assigning new IDs? If >>>>>>> not, >>>>>>> we'd better firstly define it in spec because all engines and >>>>>>> implementations need to follow it. >>>>>>> >>>>>>> On Tue, Jul 8, 2025 at 11:44 AM Amogh Jahagirdar <2am...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> One other area I think we need to make sure works with row lineage >>>>>>>> before release is data file compaction. At the moment, >>>>>>>> <https://github.com/apache/iceberg/blob/main/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/SparkBinPackFileRewriteRunner.java#L44> >>>>>>>> it >>>>>>>> looks like compaction will read the records from the data files without >>>>>>>> projecting the lineage fields. What this means is that on write of the >>>>>>>> new >>>>>>>> compacted data files we'd be losing the lineage information. There's no >>>>>>>> data change in a compaction but we do need to make sure the lineage >>>>>>>> info >>>>>>>> from carried over records is materialized in the newly compacted files >>>>>>>> so >>>>>>>> they don't get new IDs or inherit the new file sequence number. I'm >>>>>>>> working >>>>>>>> on addressing this as well, but I'd call this out as a blocker as well. >>>>>>>> >>>>>>> > > -- > *Robin Moffatt* > *Sr. Principal Advisor, Streaming Data Technologies* >