[ https://issues.apache.org/jira/browse/SPARK-5970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14334775#comment-14334775 ]
Sean Owen commented on SPARK-5970: ---------------------------------- I believe that's right. Would you like to open a PR? > Temporary directories are not removed (but their content is) > ------------------------------------------------------------ > > Key: SPARK-5970 > URL: https://issues.apache.org/jira/browse/SPARK-5970 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 1.2.1 > Environment: Linux, 64bit > spark-1.2.1-bin-hadoop2.4.tgz > Reporter: Milan Straka > > How to reproduce: > - extract spark-1.2.1-bin-hadoop2.4.tgz > - without any further configuration, run bin/pyspark > - run sc.stop() and close python shell > Expected results: > - no temporary directories are left in /tmp > Actual results: > - four empty temporary directories are created in /tmp, for example after > {{ls -d /tmp/spark*}}:{code} > /tmp/spark-1577b13d-4b9a-4e35-bac2-6e84e5605f53 > /tmp/spark-96084e69-77fd-42fb-ab10-e1fc74296fe3 > /tmp/spark-ab2ea237-d875-485e-b16c-5b0ac31bd753 > /tmp/spark-ddeb0363-4760-48a4-a189-81321898b146 > {code} > The issue is caused by changes in {{util/Utils.scala}}. Consider the > {{createDirectory}}: > {code} /** > * Create a directory inside the given parent directory. The directory is > guaranteed to be > * newly created, and is not marked for automatic deletion. > */ > def createDirectory(root: String, namePrefix: String = "spark"): File = ... > {code} > The {{createDirectory}} is used in two places. The first is in > {{createTempDir}}, where it is marked for automatic deletion: > {code} > def createTempDir( > root: String = System.getProperty("java.io.tmpdir"), > namePrefix: String = "spark"): File = { > val dir = createDirectory(root, namePrefix) > registerShutdownDeleteDir(dir) > dir > } > {code} > Nevertheless, it is also used in {{getOrCreateLocalDirs}} where it is _not_ > marked for automatic deletion: > {code} > private[spark] def getOrCreateLocalRootDirs(conf: SparkConf): Array[String] > = { > if (isRunningInYarnContainer(conf)) { > // If we are in yarn mode, systems can have different disk layouts so > we must set it > // to what Yarn on this system said was available. Note this assumes > that Yarn has > // created the directories already, and that they are secured so that > only the > // user has access to them. > getYarnLocalDirs(conf).split(",") > } else { > // In non-Yarn mode (or for the driver in yarn-client mode), we cannot > trust the user > // configuration to point to a secure directory. So create a > subdirectory with restricted > // permissions under each listed directory. > Option(conf.getenv("SPARK_LOCAL_DIRS")) > .getOrElse(conf.get("spark.local.dir", > System.getProperty("java.io.tmpdir"))) > .split(",") > .flatMap { root => > try { > val rootDir = new File(root) > if (rootDir.exists || rootDir.mkdirs()) { > Some(createDirectory(root).getAbsolutePath()) > } else { > logError(s"Failed to create dir in $root. Ignoring this > directory.") > None > } > } catch { > case e: IOException => > logError(s"Failed to create local root dir in $root. Ignoring > this directory.") > None > } > } > .toArray > } > } > {code} > Therefore I think the > {code} > Some(createDirectory(root).getAbsolutePath()) > {code} > should be replaced by something like (I am not an experienced Scala > programmer): > {code} > val dir = createDirectory(root) > registerShutdownDeleteDir(dir) > Some(dir.getAbsolutePath()) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org