[ 
https://issues.apache.org/jira/browse/SPARK-42170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santosh Pingale updated SPARK-42170:
------------------------------------
    Description: 
Files added to the spark-submit command with master K8s and deploy mode 
cluster, end up in a non deterministic location inside the driver.

eg:

{{spark-submit --files myfile --master k8s.. --deploy-mode cluster` will upload 
the files to /tmp/spark-uuid/myfile}}

The issue happens because 
[Utils.createTempDir()|https://github.com/apache/spark/blob/v3.3.1/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L344]
 creates a directory with a uuid in the directory name. This issue does not 
affect the --archives option, because we `unarchive` the archives into the 
destination directory which is relative to the working dir. This bug affects 
file access pre & post app creation. For example if we distribute python 
dependencies with pex, we need to use --files to attach the pex file and change 
the spark.pyspark.python to point to this file. But the file location can not 
be determined before submitting the app. On the other hand, after the app is 
created, referencing the files without using `SparkFiles.get` also does not work

  was:
Files added to the spark-submit command with master K8s and deploy mode 
cluster, end up in a non deterministic location inside the driver.

eg:

{{spark-submit --files myfile --master k8s.. --deploy-mode cluster` will upload 
the files to /tmp/spark-uuid/myfile}}

The issue happens because 
[Utils.createTempDir()|https://github.com/apache/spark/blob/v3.3.1/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L344]
 creates a directory with a uuid in the directory name. This issue does not 
affect the `--archives` option, because we `unarchive` the archives into the 
destination directory which is relative to the working dir. This bug affects 
file access pre & post app creation. For example if we distribute python 
dependencies with pex, we need to use `--files` to attach the pex file and 
change the spark.pyspark.python to point to this file. But the file location 
can not be determined before submitting the app. On the other hand, after the 
app is created, referencing the files without using `SparkFiles.get` also does 
not work


> Files added to the spark-submit command with master K8s and deploy mode 
> cluster, end up in a non deterministic location inside the driver.
> ------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-42170
>                 URL: https://issues.apache.org/jira/browse/SPARK-42170
>             Project: Spark
>          Issue Type: Bug
>          Components: Kubernetes, Spark Submit
>    Affects Versions: 3.3.0, 3.2.2
>            Reporter: Santosh Pingale
>            Priority: Major
>
> Files added to the spark-submit command with master K8s and deploy mode 
> cluster, end up in a non deterministic location inside the driver.
> eg:
> {{spark-submit --files myfile --master k8s.. --deploy-mode cluster` will 
> upload the files to /tmp/spark-uuid/myfile}}
> The issue happens because 
> [Utils.createTempDir()|https://github.com/apache/spark/blob/v3.3.1/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L344]
>  creates a directory with a uuid in the directory name. This issue does not 
> affect the --archives option, because we `unarchive` the archives into the 
> destination directory which is relative to the working dir. This bug affects 
> file access pre & post app creation. For example if we distribute python 
> dependencies with pex, we need to use --files to attach the pex file and 
> change the spark.pyspark.python to point to this file. But the file location 
> can not be determined before submitting the app. On the other hand, after the 
> app is created, referencing the files without using `SparkFiles.get` also 
> does not work



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to