aokolnychyi commented on a change in pull request #3069:
URL: https://github.com/apache/iceberg/pull/3069#discussion_r707604903
##########
File path: core/src/main/java/org/apache/iceberg/BaseRowDelta.java
##########
@@ -81,23 +82,32 @@ public RowDelta validateDataFilesExist(Iterable<? extends
CharSequence> referenc
}
@Override
- public RowDelta validateNoConflictingAppends(Expression
newConflictDetectionFilter) {
+ public RowDelta validateNoConflictingOperations(Expression
newConflictDetectionFilter) {
Preconditions.checkArgument(newConflictDetectionFilter != null, "Conflict
detection filter cannot be null");
this.conflictDetectionFilter = newConflictDetectionFilter;
return this;
}
+ @Override
+ public RowDelta validateNoConflictingDeleteFiles() {
+ this.validateNoConflictingDeleteFiles = true;
+ return this;
+ }
+
@Override
protected void validate(TableMetadata base) {
if (base.currentSnapshot() != null) {
if (!referencedDataFiles.isEmpty()) {
validateDataFilesExist(base, startingSnapshotId, referencedDataFiles,
!validateDeletes);
}
- // TODO: does this need to check new delete files?
if (conflictDetectionFilter != null) {
validateAddedDataFiles(base, startingSnapshotId,
conflictDetectionFilter, caseSensitive);
}
+
+ if (conflictDetectionFilter != null && validateNoConflictingDeleteFiles)
{
+ validateAddedDeleteFiles(base, startingSnapshotId,
conflictDetectionFilter, caseSensitive);
Review comment:
I had a chance to think more about the required validation. While using
`DataFile`s instead of locations would give us min/max filtering, we can
probably do better than that if all delete files are position-based.
We can do something like this if we are working with position-based delete
files:
- Read all position deletes we are trying to commit and build a map with
file location -> a list of deleted positions.
- Iterate through concurrently added delete files applying min/max filtering
(secondary indexes in the future) and find which may have conflicts.
- Read files that potentially conflict and verify that (file, pos) pairs
don't overlap.
We may need to add some limit on the overall size of deletes we can scan but
we can figure that out.
Overall, this will allow us to resolve conflicts within the same partition.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]