zhouyejoe commented on code in PR #35906:
URL: https://github.com/apache/spark/pull/35906#discussion_r896274975


##########
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java:
##########
@@ -317,22 +353,24 @@ public void applicationRemoved(String appId, boolean 
cleanupLocalDirs) {
     logger.info("Application {} removed, cleanupLocalDirs = {}", appId, 
cleanupLocalDirs);
     AppShuffleInfo appShuffleInfo = appsShuffleInfo.remove(appId);
     if (null != appShuffleInfo) {
-      mergedShuffleCleaner.execute(
-        () -> closeAndDeletePartitionFilesIfNeeded(appShuffleInfo, 
cleanupLocalDirs));
+      submitCleanupTask(
+        () -> closeAndDeletePartitions(appShuffleInfo, cleanupLocalDirs, 
true));
     }
+    removeAppAttemptPathInfoFromDB(
+        new AppAttemptId(appShuffleInfo.appId, appShuffleInfo.attemptId));
   }
 
-
   /**
    * Clean up the AppShufflePartitionInfo for a specific AppShuffleInfo.
    * If cleanupLocalDirs is true, the merged shuffle files will also be 
deleted.
    * The cleanup will be executed in a separate thread.
    */
   @SuppressWarnings("SynchronizationOnLocalVariableOrMethodParameter")
   @VisibleForTesting
-  void closeAndDeletePartitionFilesIfNeeded(
+  void closeAndDeletePartitions(
       AppShuffleInfo appShuffleInfo,
-      boolean cleanupLocalDirs) {
+      boolean cleanupLocalDirs,
+      boolean removeFromDb) {

Review Comment:
   This is to handle the case you mentioned earlier, that the merged shuffle 
data has been removed from the disk through some API(TBD in another ticket for 
cleaning up merged shuffle during job runtime), but the information in the DB 
should be kept. Right now, we don't have that API in place, so all the callers 
will set this flag to true.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to