Thanks Steven, I missed that part but the following sentence is a bit hard to understand (maybe just me)
Engines may model operations as deleting/inserting rows or as modifications to rows that preserve row ids. Can you please help to explain? Steven Wu <stevenz...@gmail.com>于2025年7月15日 周二04:41写道: > Manu > > The spec already covers the row lineage carry over (for replace) > https://iceberg.apache.org/spec/#row-lineage > > "When an existing row is moved to a different data file for any reason, > writers should write _row_id and _last_updated_sequence_number according > to the following rules:" > > Thanks, > Steven > > > On Mon, Jul 14, 2025 at 1:38 PM Steven Wu <stevenz...@gmail.com> wrote: > >> another update on the release. >> >> We have one open PR left for the 1.10.0 milestone >> <https://github.com/apache/iceberg/milestone/54> (with 25 closed PRs). >> Amogh is actively working on the last blocker PR. >> Spark 4.0: Preserve row lineage information on compaction >> <https://github.com/apache/iceberg/pull/13555> >> >> I will publish a release candidate after the above blocker is merged and >> backported. >> >> Thanks, >> Steven >> >> On Mon, Jul 7, 2025 at 11:56 PM Manu Zhang <owenzhang1...@gmail.com> >> wrote: >> >>> Hi Amogh, >>> >>> Is it defined in the table spec that "replace" operation should carry >>> over existing lineage info insteading of assigning new IDs? If not, we'd >>> better firstly define it in spec because all engines and implementations >>> need to follow it. >>> >>> On Tue, Jul 8, 2025 at 11:44 AM Amogh Jahagirdar <2am...@gmail.com> >>> wrote: >>> >>>> One other area I think we need to make sure works with row lineage >>>> before release is data file compaction. At the moment, >>>> <https://github.com/apache/iceberg/blob/main/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/SparkBinPackFileRewriteRunner.java#L44> >>>> it >>>> looks like compaction will read the records from the data files without >>>> projecting the lineage fields. What this means is that on write of the new >>>> compacted data files we'd be losing the lineage information. There's no >>>> data change in a compaction but we do need to make sure the lineage info >>>> from carried over records is materialized in the newly compacted files so >>>> they don't get new IDs or inherit the new file sequence number. I'm working >>>> on addressing this as well, but I'd call this out as a blocker as well. >>>> >>>