[ 
https://issues.apache.org/jira/browse/SPARK-23153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17188222#comment-17188222
 ] 

Xuzhou Yin edited comment on SPARK-23153 at 9/1/20, 7:37 AM:
-------------------------------------------------------------

Hi guys,

I have looked through the pull request of this change, and there is one part 
which I don't quite understand, it would be awesome if someone can explain it a 
little bit.

At this line: 
[https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala#L161,]
 Spark filters out all paths which are not local (ie. no scheme or 
[file://|file:///] scheme). Does it mean it will ignore all other paths which 
are not local? For example, when starting a Spark job with 
spark.jars=local:///local/path/1.jar,s3://s3/path/2.jar,[file:///local/path/3.jar],
 it seems like this logic will upload [file:///local/path/3.jar] to s3, and 
reset spark.jars to only s3://upload/path/3.jar, while completely ignoring 
local:///local/path/1.jar and s3:///s3/path/2.jar.

Is this an expected behavior? If so, how should we do if we want to specify 
dependencies which are in HCFS such as S3, or driver's local (ie. local://) 
instead of [file://?|file:///?] If this is a bug, is there a Jira issue for it?

Thanks a lot!

@[~skonto]


was (Author: xuzhoyin):
Hi guys,

I have looked through the pull request of this change, and there is one part 
which I don't quite understand, it would be awesome if someone can explain it a 
little bit.

At this line: 
[https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala#L161,]
 Spark filters out all paths which are not local (ie. no scheme or 
[file://|file:///] scheme). Does it mean it will ignore all other paths which 
are not local? For example, when starting a Spark job with 
spark.jars=local:///local/path/1.jar,s3://s3/path/2.jar,[file:///local/path/3.jar],
 it seems like this logic will upload [file:///local/path/3.jar] to s3, and 
reset spark.jars to only s3://upload/path/3.jar, while completely ignoring 
local:///local/path/1.jar and s3:///s3/path/2.jar.

Is this an expected behavior? If so, how should we do if we want to specify 
dependencies which are in HCFS such as S3, or driver's local (ie. local://) 
instead of file://? If this is a bug, is there a Jira issue for it?

Thanks a lot!

> Support application dependencies in submission client's local file system
> -------------------------------------------------------------------------
>
>                 Key: SPARK-23153
>                 URL: https://issues.apache.org/jira/browse/SPARK-23153
>             Project: Spark
>          Issue Type: Improvement
>          Components: Kubernetes, Spark Core
>    Affects Versions: 2.4.0
>            Reporter: Yinan Li
>            Assignee: Stavros Kontopoulos
>            Priority: Major
>             Fix For: 3.0.0
>
>
> Currently local dependencies are not supported with Spark on K8S i.e. if the 
> user has code or dependencies only on the client where they run 
> {{spark-submit}} then the current implementation has no way to make those 
> visible to the Spark application running inside the K8S pods that get 
> launched.  This limits users to only running applications where the code and 
> dependencies are either baked into the Docker images used or where those are 
> available via some external and globally accessible file system e.g. HDFS 
> which are not viable options for many users and environments



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to