Github user aarondav commented on a diff in the pull request: https://github.com/apache/spark/pull/2609#discussion_r18316727 --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala --- @@ -202,9 +205,17 @@ private[spark] class Worker( // Spin up a separate thread (in a future) to do the dir cleanup; don't tie up worker actor val cleanupFuture = concurrent.future { logInfo("Cleaning up oldest application directories in " + workDir + " ...") - Utils.findOldFiles(workDir, APP_DATA_RETENTION_SECS) - .foreach(Utils.deleteRecursively) + val appDirs = workDir.listFiles() + if (appDirs == null) { + throw new IOException("ERROR: Failed to list files in " + appDirs) + } + appDirs.filter { + dir => + dir.isDirectory && executors.keys.filter(key => dir.getPath.contains(key)).isEmpty && --- End diff -- I think the key to the executors is appId/execId, which is not the same as the dir's path, which contains only appId. Additionally I would clarify this logic, either by extracting it out to a variable or adding a comment, or both. Something like ```scala val appId = dir.getPath.getName // Attempt to filter out running applications based on the existence of executors for that appId. val isAppStillRunning = executors.values.map(_.appId).contains(dir.getPath.getName) ``` I would also add a higher level comment that describes the overall logic (i.e., only delete directories for apps that are not running anymore and whose files are sufficiently old).
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org