voonhous commented on code in PR #7669:
URL: https://github.com/apache/hudi/pull/7669#discussion_r1090237751


##########
hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/table/action/commit/FlinkDeletePartitionCommitActionExecutor.java:
##########
@@ -98,4 +103,42 @@ private List<String> getAllExistingFileIds(String 
partitionPath) {
     // because new commit is not complete. it is safe to mark all existing 
file Ids as old files
     return 
table.getSliceView().getLatestFileSlices(partitionPath).map(FileSlice::getFileId).distinct().collect(Collectors.toList());
   }
+
+  /**
+   * Check if there are any pending table service actions (requested + 
inflight) on a table affecting the partitions to
+   * be dropped.
+   * <p>
+   * This check is to prevent a drop-partition from proceeding should a 
partition have a table service action in
+   * the pending stage. If this is allowed to happen, the filegroup that is an 
input for a table service action, might
+   * also be a candidate for being replaced. As such, when the table service 
action and drop-partition commits are
+   * committed, there will be two commits replacing a single filegroup.
+   * <p>
+   * For example, a timeline might have an execution order as such:
+   * 000.replacecommit.requested (clustering filegroup_1 + filegroup_2 -> 
filegroup_3)
+   * 001.replacecommit.requested, 001.replacecommit.inflight, 
0001.replacecommit (drop_partition to replace filegroup_1)
+   * 000.replacecommit.inflight (clustering is executed now)
+   * 000.replacecommit (clustering completed)
+   * For an execution order as shown above, 000.replacecommit and 
001.replacecommit will both flag filegroup_1 to be replaced.
+   * This will cause downstream duplicate key errors when a map is being 
constructed.
+   */
+  private void checkPreconditions() {
+    List<String> instantsOfOffendingPendingTableServiceAction = new 
ArrayList<>();
+    // ensure that there are no pending inflight clustering/compaction 
operations involving this partition
+    SyncableFileSystemView fileSystemView = (SyncableFileSystemView) 
table.getSliceView();
+
+    Stream.concat(fileSystemView.getPendingCompactionOperations(), 
fileSystemView.getPendingLogCompactionOperations())
+        .filter(op -> partitions.contains(op.getRight().getPartitionPath()))
+        .forEach(op -> 
instantsOfOffendingPendingTableServiceAction.add(op.getLeft()));
+
+    fileSystemView.getFileGroupsInPendingClustering()
+        .filter(fgIdInstantPair -> 
partitions.contains(fgIdInstantPair.getLeft().getPartitionPath()))
+        .forEach(x -> 
instantsOfOffendingPendingTableServiceAction.add(x.getRight().getTimestamp()));
+
+    if (instantsOfOffendingPendingTableServiceAction.size() > 0) {
+      throw new HoodieDeletePartitionException("Failed to drop partitions. "

Review Comment:
   I agree. I have given this a thought before. There are a few ways around 
this issue. 
   
   The list below is ranked in terms of difficulty to implement (easiest to 
hardest).
   
   1. Fail-fast, do not allow users to drop a partition if there are pending 
clustering/compaction.
   2. Spark-SQL console/session will wait for the pending clustering/compaction 
to finish before issuing a drop partition (This may lead to 
"resource-starvation" if we do not implement additional logic to block table 
service action plans from being created if there is a drop-partition that is 
being requested)
   3. We allow drop-partition to complete; for any pending 
clustering/compaction jobs, when performing a commit, we can flag these files 
to be replaced so they will not be read.
   
   As such, i chose to implement the first option first.
   
   A "full" fix is definitely required, this PR is to address the immediate 
problem of correctness through adding and making limitations known. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to