rdblue commented on a change in pull request #3069:
URL: https://github.com/apache/iceberg/pull/3069#discussion_r705787905



##########
File path: core/src/main/java/org/apache/iceberg/BaseRowDelta.java
##########
@@ -81,23 +82,32 @@ public RowDelta validateDataFilesExist(Iterable<? extends 
CharSequence> referenc
   }
 
   @Override
-  public RowDelta validateNoConflictingAppends(Expression 
newConflictDetectionFilter) {
+  public RowDelta validateNoConflictingOperations(Expression 
newConflictDetectionFilter) {
     Preconditions.checkArgument(newConflictDetectionFilter != null, "Conflict 
detection filter cannot be null");
     this.conflictDetectionFilter = newConflictDetectionFilter;
     return this;
   }
 
+  @Override
+  public RowDelta validateNoConflictingDeleteFiles() {
+    this.validateNoConflictingDeleteFiles = true;
+    return this;
+  }
+
   @Override
   protected void validate(TableMetadata base) {
     if (base.currentSnapshot() != null) {
       if (!referencedDataFiles.isEmpty()) {
         validateDataFilesExist(base, startingSnapshotId, referencedDataFiles, 
!validateDeletes);
       }
 
-      // TODO: does this need to check new delete files?
       if (conflictDetectionFilter != null) {
         validateAddedDataFiles(base, startingSnapshotId, 
conflictDetectionFilter, caseSensitive);
       }
+
+      if (conflictDetectionFilter != null && validateNoConflictingDeleteFiles) 
{
+        validateAddedDeleteFiles(base, startingSnapshotId, 
conflictDetectionFilter, caseSensitive);

Review comment:
       If I understand correctly, the motivation for updating `RowDelta` is the 
case where we have two concurrent delta commits? So an UPDATE and a MERGE at 
the same time might both rewrite a row, which could cause a duplicate:
   
   ```sql
   INSERT INTO t VALUES (1, 'a'), (2, 'b'), (3, 'c');
   
   -- running these concurrently causes a problem
   UPDATE t SET data = 'x' WHERE id = 1;
   UPDATE t SET data = 'y' WHERE id = 1;
   ```
   
   If I ran the updates concurrently, both would delete id=1 and both would add 
a new file with `(1, 'x')` and `(1, 'y')` right?
   
   The validation here is that the file created by the initial insert doesn't 
have any new delete files written against it. It seems like we want to just 
call `validateNoNewDeletesForDataFiles` and pass `referencedFiles` in, right? 
Maybe I'm missing something?
   
   We might want to make this a separate issue to keep changes smaller and 
reviews easier.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to