[ 
https://issues.apache.org/jira/browse/SPARK-31726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17480756#comment-17480756
 ] 

koert kuipers edited comment on SPARK-31726 at 1/23/22, 11:22 PM:
------------------------------------------------------------------

[~beregon87] about --jars, are you seeing that the jars are also not available 
on driver, or not added to classpath, or both?

i ran a simple test where i added a jar from s3, e.g. --jars 
s3a://some/jar.jar, and was surprised to find the driver could not find a class 
in that jar (on kubernetes with cluster deploy mode). this would be a more 
serious bug given the description of --jars clearly says it should:
--jars JARS    Comma-separated list of jars to include on the driver and 
executor classpaths.

now with --files its to bad the drivers dont get it but at least it does what 
it says on the tin (which does not include a promise to get the files to the 
driver):
--files FILES    Comma-separated list of files to be placed in the working 
directory of each executor.


was (Author: koert):
[~beregon87] about --jars, are you seeing that the jars are also not available 
on driver, or not added to classpath, or both?

i ran a simple test where i added a jar from s3, e.g. --jars 
s3a://some/jar.jar, and was surprised to find the driver could not find a class 
in that jar (on kubernetes with cluster deploy mode). this would be a more 
serious bug given the description of --jars clearly says it should:
Comma-separated list of jars to include on the driver and executor classpaths.

now with --files its to bad the drivers dont get it but at least it does what 
it says on the tin (which does not include a promise to get the files to the 
driver):
Comma-separated list of files to be placed in the working directory of each 
executor.

> Make spark.files available in driver with cluster deploy mode on kubernetes
> ---------------------------------------------------------------------------
>
>                 Key: SPARK-31726
>                 URL: https://issues.apache.org/jira/browse/SPARK-31726
>             Project: Spark
>          Issue Type: Improvement
>          Components: Kubernetes, Spark Core
>    Affects Versions: 3.0.0
>            Reporter: koert kuipers
>            Priority: Minor
>
> currently on yarn with cluster deploy mode --files makes the files available 
> for driver and executors and also put them on classpath for driver and 
> executors.
> on k8s with cluster deploy mode --files makes the files available on 
> executors but they are not on classpath. it does not make the files available 
> on driver and they are not on driver classpath.
> it would be nice if the k8s behavior was consistent with yarn, or at least 
> makes the files available on driver. once the files are available there is a 
> simple workaround to get them on classpath using 
> spark.driver.extraClassPath="./"
> background:
> we recently started testing kubernetes for spark. our main platform is yarn 
> on which we use client deploy mode. our first experience was that client 
> deploy mode was difficult to use on k8s (we dont launch from inside a pod). 
> so we switched to cluster deploy mode, which seems to behave well on k8s. but 
> then we realized that our program rely on reading files on classpath 
> (application.conf, log4j.properties etc.) that are on the client but now are 
> no longer on the driver (since driver is no longer on client). an easy fix 
> for this seems to be to ship the files using --files to make them available 
> on driver, but we could not get this to work.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to