[ 
https://issues.apache.org/jira/browse/SPARK-5970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14334775#comment-14334775
 ] 

Sean Owen commented on SPARK-5970:
----------------------------------

I believe that's right. Would you like to open a PR?

> Temporary directories are not removed (but their content is)
> ------------------------------------------------------------
>
>                 Key: SPARK-5970
>                 URL: https://issues.apache.org/jira/browse/SPARK-5970
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.2.1
>         Environment: Linux, 64bit
> spark-1.2.1-bin-hadoop2.4.tgz
>            Reporter: Milan Straka
>
> How to reproduce: 
> - extract spark-1.2.1-bin-hadoop2.4.tgz
> - without any further configuration, run bin/pyspark
> - run sc.stop() and close python shell
> Expected results:
> - no temporary directories are left in /tmp
> Actual results:
> - four empty temporary directories are created in /tmp, for example after 
> {{ls -d /tmp/spark*}}:{code}
> /tmp/spark-1577b13d-4b9a-4e35-bac2-6e84e5605f53
> /tmp/spark-96084e69-77fd-42fb-ab10-e1fc74296fe3
> /tmp/spark-ab2ea237-d875-485e-b16c-5b0ac31bd753
> /tmp/spark-ddeb0363-4760-48a4-a189-81321898b146
> {code}
> The issue is caused by changes in {{util/Utils.scala}}. Consider the 
> {{createDirectory}}:
> {code}  /**
>    * Create a directory inside the given parent directory. The directory is 
> guaranteed to be
>    * newly created, and is not marked for automatic deletion.
>    */
>   def createDirectory(root: String, namePrefix: String = "spark"): File = ...
> {code}
> The {{createDirectory}} is used in two places. The first is in 
> {{createTempDir}}, where it is marked for automatic deletion:
> {code}
>   def createTempDir(
>       root: String = System.getProperty("java.io.tmpdir"),
>       namePrefix: String = "spark"): File = {
>     val dir = createDirectory(root, namePrefix)
>     registerShutdownDeleteDir(dir)
>     dir
>   }
> {code}
> Nevertheless, it is also used in {{getOrCreateLocalDirs}} where it is _not_ 
> marked for automatic deletion:
> {code}
>   private[spark] def getOrCreateLocalRootDirs(conf: SparkConf): Array[String] 
> = {
>     if (isRunningInYarnContainer(conf)) {
>       // If we are in yarn mode, systems can have different disk layouts so 
> we must set it
>       // to what Yarn on this system said was available. Note this assumes 
> that Yarn has
>       // created the directories already, and that they are secured so that 
> only the
>       // user has access to them.
>       getYarnLocalDirs(conf).split(",")
>     } else {
>       // In non-Yarn mode (or for the driver in yarn-client mode), we cannot 
> trust the user
>       // configuration to point to a secure directory. So create a 
> subdirectory with restricted
>       // permissions under each listed directory.
>       Option(conf.getenv("SPARK_LOCAL_DIRS"))
>         .getOrElse(conf.get("spark.local.dir", 
> System.getProperty("java.io.tmpdir")))
>         .split(",")
>         .flatMap { root =>
>           try {
>             val rootDir = new File(root)
>             if (rootDir.exists || rootDir.mkdirs()) {
>               Some(createDirectory(root).getAbsolutePath())
>             } else {
>               logError(s"Failed to create dir in $root. Ignoring this 
> directory.")
>               None
>             }
>           } catch {
>             case e: IOException =>
>             logError(s"Failed to create local root dir in $root. Ignoring 
> this directory.")
>             None
>           }
>         }
>         .toArray
>     }
>   }
> {code}
> Therefore I think the
> {code}
> Some(createDirectory(root).getAbsolutePath())
> {code}
> should be replaced by something like (I am not an experienced Scala 
> programmer):
> {code}
> val dir = createDirectory(root)
> registerShutdownDeleteDir(dir)
> Some(dir.getAbsolutePath())
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to