Github user aarondav commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2609#discussion_r18316727
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala 
---
    @@ -202,9 +205,17 @@ private[spark] class Worker(
           // Spin up a separate thread (in a future) to do the dir cleanup; 
don't tie up worker actor
           val cleanupFuture = concurrent.future {
             logInfo("Cleaning up oldest application directories in " + workDir 
+ " ...")
    -        Utils.findOldFiles(workDir, APP_DATA_RETENTION_SECS)
    -          .foreach(Utils.deleteRecursively)
    +        val appDirs = workDir.listFiles()
    +        if (appDirs == null) {
    +          throw new IOException("ERROR: Failed to list files in " + 
appDirs)
    +        }
    +        appDirs.filter {
    +          dir => 
    +            dir.isDirectory && executors.keys.filter(key => 
dir.getPath.contains(key)).isEmpty &&
    --- End diff --
    
    I think the key to the executors is appId/execId, which is not the same as 
the dir's path, which contains only appId. Additionally I would clarify this 
logic, either by extracting it out to a variable or adding a comment, or both. 
Something like
    
    ```scala
    val appId = dir.getPath.getName
    // Attempt to filter out running applications based on the existence of 
executors for that appId.
    val isAppStillRunning = 
executors.values.map(_.appId).contains(dir.getPath.getName)
    ```
    
    I would also add a higher level comment that describes the overall logic 
(i.e., only delete directories for apps that are not running anymore and whose 
files are sufficiently old).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to