One other area I think we need to make sure works with row lineage before
release is data file compaction. At the moment,
<https://github.com/apache/iceberg/blob/main/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/SparkBinPackFileRewriteRunner.java#L44>
it
looks like compaction will read the records from the data files without
projecting the lineage fields. What this means is that on write of the new
compacted data files we'd be losing the lineage information. There's no
data change in a compaction but we do need to make sure the lineage info
from carried over records is materialized in the newly compacted files so
they don't get new IDs or inherit the new file sequence number. I'm working
on addressing this as well, but I'd call this out as a blocker as well.

Reply via email to