zhouyejoe commented on code in PR #35906: URL: https://github.com/apache/spark/pull/35906#discussion_r896274975
########## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ########## @@ -317,22 +353,24 @@ public void applicationRemoved(String appId, boolean cleanupLocalDirs) { logger.info("Application {} removed, cleanupLocalDirs = {}", appId, cleanupLocalDirs); AppShuffleInfo appShuffleInfo = appsShuffleInfo.remove(appId); if (null != appShuffleInfo) { - mergedShuffleCleaner.execute( - () -> closeAndDeletePartitionFilesIfNeeded(appShuffleInfo, cleanupLocalDirs)); + submitCleanupTask( + () -> closeAndDeletePartitions(appShuffleInfo, cleanupLocalDirs, true)); } + removeAppAttemptPathInfoFromDB( + new AppAttemptId(appShuffleInfo.appId, appShuffleInfo.attemptId)); } - /** * Clean up the AppShufflePartitionInfo for a specific AppShuffleInfo. * If cleanupLocalDirs is true, the merged shuffle files will also be deleted. * The cleanup will be executed in a separate thread. */ @SuppressWarnings("SynchronizationOnLocalVariableOrMethodParameter") @VisibleForTesting - void closeAndDeletePartitionFilesIfNeeded( + void closeAndDeletePartitions( AppShuffleInfo appShuffleInfo, - boolean cleanupLocalDirs) { + boolean cleanupLocalDirs, + boolean removeFromDb) { Review Comment: This is to handle the case you mentioned earlier, that the merged shuffle data has been removed from the disk through some API(TBD in another ticket for cleaning up merged shuffle during job runtime), but the information in the DB should be kept. Right now, we don't have that API in place, so all the callers will set this flag to true. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org