Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/4984#discussion_r33726149 --- Diff: core/src/main/scala/org/apache/spark/storage/DiskBlockManager.scala --- @@ -124,10 +124,16 @@ private[spark] class DiskBlockManager(blockManager: BlockManager, conf: SparkCon (blockId, getFile(blockId)) } + /** + * Create local directories for storing block data. These directories are + * located inside configured local directories and won't + * be deleted on JVM exit when using the external shuffle service. --- End diff -- I just read your comment again. I still don't see how the directory layout is related to cleaning up shuffle files. The reason why we don't clean up shuffle files in Mesos (and standalone mode) is simply because the shuffle service doesn't know when an application exits. When shuffle service is enabled, [executors no longer clean up the shuffle files on exit](https://github.com/apache/spark/blob/1ce6428907b4ddcf52dbf0c86196d82ab7392442/core/src/main/scala/org/apache/spark/storage/DiskBlockManager.scala#L162), so no one cleans these files up anymore. All we need to do then is to add this missing code path. Since the external shuffle service already [knows](https://github.com/apache/spark/blob/1ce6428907b4ddcf52dbf0c86196d82ab7392442/network/shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockResolver.java#L147) about the `localDirs` on each executor, it can just go ahead and delete these directories (which contain the shuffle files written). Could you explain why the directory structure needs to change? Why is it not sufficient to just remove the shuffle directories?
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org