[
https://issues.apache.org/jira/browse/SPARK-20741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lior Regev updated SPARK-20741:
-------------------------------
Description:
Running spark submit has to distribute Spark's JARs to a distributed cache in
order for the executors to access it.
When neither spark.yarn.jars or spark.yarn.archive is provided, SparkSubmit
creates a ZIP of all the JARs in $SPARK_HOME/jars and uploads it to the
distributed cache.
After uploading the ZIP file, SparkSubmit does not delete the local copy of it.
This, in turn can cause the disk on the local machine to fill up (200MB at a
time) until no more submissions are possible.
> SparkSubmit does not clean up after uploading spark_libs to the distributed
> cache
> ---------------------------------------------------------------------------------
>
> Key: SPARK-20741
> URL: https://issues.apache.org/jira/browse/SPARK-20741
> Project: Spark
> Issue Type: Bug
> Components: Spark Submit
> Affects Versions: 2.1.1
> Reporter: Lior Regev
> Priority: Minor
>
> Running spark submit has to distribute Spark's JARs to a distributed cache in
> order for the executors to access it.
> When neither spark.yarn.jars or spark.yarn.archive is provided, SparkSubmit
> creates a ZIP of all the JARs in $SPARK_HOME/jars and uploads it to the
> distributed cache.
> After uploading the ZIP file, SparkSubmit does not delete the local copy of
> it. This, in turn can cause the disk on the local machine to fill up (200MB
> at a time) until no more submissions are possible.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]