aokolnychyi commented on a change in pull request #3069:
URL: https://github.com/apache/iceberg/pull/3069#discussion_r717035292



##########
File path: core/src/main/java/org/apache/iceberg/BaseOverwriteFiles.java
##########
@@ -127,8 +140,20 @@ protected void validate(TableMetadata base) {
       }
     }
 
-    if (conflictDetectionFilter != null && base.currentSnapshot() != null) {
-      validateAddedDataFiles(base, startingSnapshotId, 
conflictDetectionFilter, caseSensitive);
+    if (appendConflictDetectionFilter != null && base.currentSnapshot() != 
null) {
+      validateAddedDataFiles(base, startingSnapshotId, 
appendConflictDetectionFilter, caseSensitive);
+    }
+
+    boolean validateNewDeletes = deleteConflictDetectionFilter != null && 
base.currentSnapshot() != null;

Review comment:
       > Basing `validateNewDeletes` on whether the conflict detection filter 
was set doesn't seem correct to me.
   
   If I got you correctly, you are proposing that `validateFromSnapshot` will 
now indicate whether we should validate delete files. I think that is different 
compared to how `RowDelta` and `OverwriteFiles` work right now. I'd actually 
say calling `validateFromSnapshot` is an optimization that tells us from which 
snapshot to start looking. We never validate new appends if the append conflict 
detection filter is null. Moreover, it is not always possible to set the 
starting snapshot. If we start on an empty table, we must validate all 
snapshots. Here is our copy-on-write commit logic.
   
   ```
   Long scanSnapshotId = scan.snapshotId();
   if (scanSnapshotId != null) {
     overwriteFiles.validateFromSnapshot(scanSnapshotId);
   }
   
   Expression conflictDetectionFilter = conflictDetectionFilter();
   overwriteFiles.validateNoConflictingAppends(conflictDetectionFilter);
   ``` 
   
   Also, in your snippet, why call `validateNoNewDeletesForDataFiles` if we 
already know the overwrite filter is set? I think 
`validateNoNewDeletesForDataFiles` is simply a more efficient version of 
`validateNoNewDeletes` that can open delete files that match the filter to 
check their content for conflicts. The problem is that we can use 
`validateNoNewDeletesForDataFiles` only if we overwrite specific files.  




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to