Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4984#discussion_r33726149
  
    --- Diff: 
core/src/main/scala/org/apache/spark/storage/DiskBlockManager.scala ---
    @@ -124,10 +124,16 @@ private[spark] class DiskBlockManager(blockManager: 
BlockManager, conf: SparkCon
         (blockId, getFile(blockId))
       }
     
    +  /**
    +   * Create local directories for storing block data. These directories are
    +   * located inside configured local directories and won't
    +   * be deleted on JVM exit when using the external shuffle service.
    --- End diff --
    
    I just read your comment again. I still don't see how the directory layout 
is related to cleaning up shuffle files. The reason why we don't clean up 
shuffle files in Mesos (and standalone mode) is simply because the shuffle 
service doesn't know when an application exits. When shuffle service is 
enabled, [executors no longer clean up the shuffle files on 
exit](https://github.com/apache/spark/blob/1ce6428907b4ddcf52dbf0c86196d82ab7392442/core/src/main/scala/org/apache/spark/storage/DiskBlockManager.scala#L162),
 so no one cleans these files up anymore. All we need to do then is to add this 
missing code path.
    
    Since the external shuffle service already 
[knows](https://github.com/apache/spark/blob/1ce6428907b4ddcf52dbf0c86196d82ab7392442/network/shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockResolver.java#L147)
 about the `localDirs` on each executor, it can just go ahead and delete these 
directories (which contain the shuffle files written). Could you explain why 
the directory structure needs to change? Why is it not sufficient to just 
remove the shuffle directories?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to