szehon-ho commented on code in PR #11555:
URL: https://github.com/apache/iceberg/pull/11555#discussion_r1903293655
##########
core/src/main/java/org/apache/iceberg/io/DeleteSchemaUtil.java:
##########
@@ -43,4 +43,15 @@ public static Schema pathPosSchema() {
public static Schema posDeleteSchema(Schema rowSchema) {
return rowSchema == null ? pathPosSchema() : pathPosSchema(rowSchema);
}
+
+ public static Schema posDeleteReadSchema(Schema rowSchema) {
Review Comment:
Somehow after the rebase this is needed for position delete rewrite (there
must be some intervening change related to delete readers). Previously this
used the method above `pathPosSchema(rowSchema)` for the read schema, which has
'row' as required. This would fail saying 'row' is required but not found in
the delete file, as 'row' is usually not set.
Note that Spark and all readers now actually seem to no longer include the
'row' field in the read schema
https://github.com/apache/iceberg/blob/main/data/src/main/java/org/apache/iceberg/data/BaseDeleteLoader.java#L70.
But here, I do want to read the 'row' field and preserve it if it is set by
some engine.
So I am taking the strategy of RewritePositionDelete and actually reading
this field, but as optional to avoid the assert error if it is not found.
https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/PositionDeletesTable.java#L118.
(the reader there is derived from schema of metadata table
PositionDeletesTable).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]