Re: [PR] Spark 4.0: Row Lineage support [iceberg]

via GitHub Thu, 10 Jul 2025 11:15:00 -0700


amogh-jahagirdar commented on code in PR #13310:
URL: https://github.com/apache/iceberg/pull/13310#discussion_r2198388161



##########
spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/source/SparkPositionDeltaWrite.java:
##########
@@ -426,17 +428,35 @@ public DeltaWriter<InternalRow> createWriter(int 
partitionId, long taskId) {
               .writeProperties(writeProperties)
               .build();
 
+      Function<InternalRow, InternalRow> rowLineageProjector =
+          context.dataSchema() != null
+                  && 
context.dataSchema().findField(MetadataColumns.ROW_ID.fieldId()) != null
+              ? new ProjectRowLineageFromMetadata()

Review Comment:
   >ok. looks like context.dataSchema() can be null. when could it be null? if 
it is null, row lineage is not carried over. doesn't it violate the spec?
   
   Yeah I observed that during the very initial analysis rules, 
context.dataSchema() won't technically be defined at certain points but 
SparkPositionDeltaWrite will still attempted to be built. Ultimately before 
execution it'll always be non-null because there will be some output schema for 
a write, so there's no risk of lineage not being carried over.
   
    I added the null check to be defensive otherwise we'd fail with an NPE in 
the middle of analysis needlessly when trying to lookup to determine if lineage 
fields are defined.
   
   
   But I do hear your bigger point, which is it's probably cleaner to try and 
abstract as much as possible behind the project logic



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Spark 4.0: Row Lineage support [iceberg]

Reply via email to