[jira] [Commented] (SPARK-23153) Support application dependencies in submission client's local file system

2021-09-17 Thread Stavros Kontopoulos (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-23153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17416846#comment-17416846
 ] 

Stavros Kontopoulos commented on SPARK-23153:
-

[~xuzhoyin] sorry for the late reply, the local scheme in the past meant local 
in the container, had a different meaning 
(https://github.com/apache/spark/pull/21378). Not sure the status now.

> Support application dependencies in submission client's local file system
> -
>
> Key: SPARK-23153
> URL: https://issues.apache.org/jira/browse/SPARK-23153
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes, Spark Core
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Assignee: Stavros Kontopoulos
>Priority: Major
> Fix For: 3.0.0
>
>
> Currently local dependencies are not supported with Spark on K8S i.e. if the 
> user has code or dependencies only on the client where they run 
> {{spark-submit}} then the current implementation has no way to make those 
> visible to the Spark application running inside the K8S pods that get 
> launched.  This limits users to only running applications where the code and 
> dependencies are either baked into the Docker images used or where those are 
> available via some external and globally accessible file system e.g. HDFS 
> which are not viable options for many users and environments



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23153) Support application dependencies in submission client's local file system

2020-09-01 Thread Xuzhou Yin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-23153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17188222#comment-17188222
 ] 

Xuzhou Yin commented on SPARK-23153:


Hi guys,

I have looked through the pull request of this change, and does not quite 
understand one part, it would be awesome if someone can explain it a little bit.

At this line: 
[https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala#L161,]
 it filters out all paths which are not local (ie. no scheme or file:// 
scheme). Does it ignore all other paths which are not local? For example, when 
starting a Spark job with 
spark.jars=local:///local/path/1.jar,s3:///s3/path/2.jar,file:///local/path/3.jar,
 it seems like this logic will upload file:///local/path/3.jar to s3, and reset 
spark.jars to only s3://upload/path/3.jar, while completely ignoring 
local:///local/path/1.jar and s3:///s3/path/2.jar.

Is this expected behavior? If so, how should we do if we want to specify 
dependencies which are in HCFS such as S3 instead of local? If this is a bug, 
is there a Jira issue for it?

Thanks a lot!

> Support application dependencies in submission client's local file system
> -
>
> Key: SPARK-23153
> URL: https://issues.apache.org/jira/browse/SPARK-23153
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes, Spark Core
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Assignee: Stavros Kontopoulos
>Priority: Major
> Fix For: 3.0.0
>
>
> Currently local dependencies are not supported with Spark on K8S i.e. if the 
> user has code or dependencies only on the client where they run 
> {{spark-submit}} then the current implementation has no way to make those 
> visible to the Spark application running inside the K8S pods that get 
> launched.  This limits users to only running applications where the code and 
> dependencies are either baked into the Docker images used or where those are 
> available via some external and globally accessible file system e.g. HDFS 
> which are not viable options for many users and environments



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23153) Support application dependencies in submission client's local file system

2019-07-01 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876192#comment-16876192
 ] 

Stavros Kontopoulos commented on SPARK-23153:
-

[~cloud_fan] is there going to be another 2.4.x release? Does it make sense to 
backport?

> Support application dependencies in submission client's local file system
> -
>
> Key: SPARK-23153
> URL: https://issues.apache.org/jira/browse/SPARK-23153
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Assignee: Stavros Kontopoulos
>Priority: Major
> Fix For: 3.0.0
>
>
> Currently local dependencies are not supported with Spark on K8S i.e. if the 
> user has code or dependencies only on the client where they run 
> {{spark-submit}} then the current implementation has no way to make those 
> visible to the Spark application running inside the K8S pods that get 
> launched.  This limits users to only running applications where the code and 
> dependencies are either baked into the Docker images used or where those are 
> available via some external and globally accessible file system e.g. HDFS 
> which are not viable options for many users and environments



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23153) Support application dependencies in submission client's local file system

2019-06-30 Thread Eric Joel Blanco-Hermida (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875778#comment-16875778
 ] 

Eric Joel Blanco-Hermida commented on SPARK-23153:
--

Has this been fixed for Spark 2.4.X too? 

> Support application dependencies in submission client's local file system
> -
>
> Key: SPARK-23153
> URL: https://issues.apache.org/jira/browse/SPARK-23153
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Assignee: Stavros Kontopoulos
>Priority: Major
> Fix For: 3.0.0
>
>
> Currently local dependencies are not supported with Spark on K8S i.e. if the 
> user has code or dependencies only on the client where they run 
> {{spark-submit}} then the current implementation has no way to make those 
> visible to the Spark application running inside the K8S pods that get 
> launched.  This limits users to only running applications where the code and 
> dependencies are either baked into the Docker images used or where those are 
> available via some external and globally accessible file system e.g. HDFS 
> which are not viable options for many users and environments



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23153) Support application dependencies in submission client's local file system

2019-01-14 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16742096#comment-16742096
 ] 

Stavros Kontopoulos commented on SPARK-23153:
-

I will have a PR shortly regarding hadoop compatible fs. 

> Support application dependencies in submission client's local file system
> -
>
> Key: SPARK-23153
> URL: https://issues.apache.org/jira/browse/SPARK-23153
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> Currently local dependencies are not supported with Spark on K8S i.e. if the 
> user has code or dependencies only on the client where they run 
> {{spark-submit}} then the current implementation has no way to make those 
> visible to the Spark application running inside the K8S pods that get 
> launched.  This limits users to only running applications where the code and 
> dependencies are either baked into the Docker images used or where those are 
> available via some external and globally accessible file system e.g. HDFS 
> which are not viable options for many users and environments



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23153) Support application dependencies in submission client's local file system

2018-11-27 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16700356#comment-16700356
 ] 

Stavros Kontopoulos commented on SPARK-23153:
-

[~eje] [~rvesse] [~liyinan926] working on the document to capture options here: 
https://docs.google.com/document/d/1peg_qVhLaAl4weo5C51jQicPwLclApBsdR1To2fgc48

> Support application dependencies in submission client's local file system
> -
>
> Key: SPARK-23153
> URL: https://issues.apache.org/jira/browse/SPARK-23153
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> Currently local dependencies are not supported with Spark on K8S i.e. if the 
> user has code or dependencies only on the client where they run 
> {{spark-submit}} then the current implementation has no way to make those 
> visible to the Spark application running inside the K8S pods that get 
> launched.  This limits users to only running applications where the code and 
> dependencies are either baked into the Docker images used or where those are 
> available via some external and globally accessible file system e.g. HDFS 
> which are not viable options for many users and environments



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23153) Support application dependencies in submission client's local file system

2018-10-05 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16640086#comment-16640086
 ] 

Stavros Kontopoulos commented on SPARK-23153:
-

The question is what can you do when you dont have a distributed cache like in 
the yarn case. Do we need to upload artifacts in the first place or fetch them 
remotely (eg. cluster mode)? Mesos has the same issue AFAIK. Having 
pre-populated PVs is not different to me as a mechanism compared to images 
since no uploading takes place from the submission side to the driver via spark 
submit. Someone has to approve PVs contents too as well when it comes to 
security. If we can do it in Spark without going down the path of using K8s 
constructs like init containers without performance issues then we should be 
ok. Even now, if not mistaken, executors on k8s fetch jars from the driver when 
they update their dependencies and that contradicts the third point. But what 
do you do when you need driver HA? Then you need check-pointing and you need to 
store artifacts to some storage like PVs or custom images or hdfs (distributed 
storage in general). If we omit the last two then the only option I see is PVs.

  

> Support application dependencies in submission client's local file system
> -
>
> Key: SPARK-23153
> URL: https://issues.apache.org/jira/browse/SPARK-23153
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> Currently local dependencies are not supported with Spark on K8S i.e. if the 
> user has code or dependencies only on the client where they run 
> {{spark-submit}} then the current implementation has no way to make those 
> visible to the Spark application running inside the K8S pods that get 
> launched.  This limits users to only running applications where the code and 
> dependencies are either baked into the Docker images used or where those are 
> available via some external and globally accessible file system e.g. HDFS 
> which are not viable options for many users and environments



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23153) Support application dependencies in submission client's local file system

2018-10-02 Thread Rob Vesse (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16635534#comment-16635534
 ] 

Rob Vesse commented on SPARK-23153:
---

[~cloud_fan][~liyinan926][~mcheah][~eje] Has there been any discussion of how 
to go about addressing this limitation?

In the original downstream fork there was the Resource Staging Server but that 
got removed to simplify upstreaming and because Spark core folks had objections 
to that approach.  Also in our usages of it we encountered a number of 
performance, scalability and security issues that made it a not particularly 
stable approach.

There was a long dev list thread on this - 
https://lists.apache.org/thread.html/82b4ae9a2eb5ddeb3f7240ebf154f06f19b830f8b3120038e5d687a1@%3Cdev.spark.apache.org%3E
 - but no real conclusion seemed to be reached.

There are a few workarounds open to users that I can think of:

* Use the PVC support to mount a pre-created PVC that has somehow been 
populated with the user code
* Use the incoming pod template feature to mount arbitrary volumes that has 
somehow been populated with the user code
* Build custom images

All these options put the onus on users to do prep work prior to launch, I 
think Option 3 is currently the "recommended" workaround.  Unfortunately for us 
that is not a viable option as our customers tend to be very security conscious 
and often only allow a pre-approved list of images to be run.  (Ignoring the 
obvious fallacy of disallowing custom images while permitting the running of 
images that allow custom user code to execute...)

This is a blocker for me currently and I would like to contribute here but 
don't want to reinvent the wheel or waste effort on approaches that have 
already been discussed/discounted.

> Support application dependencies in submission client's local file system
> -
>
> Key: SPARK-23153
> URL: https://issues.apache.org/jira/browse/SPARK-23153
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23153) Support application dependencies in submission client's local file system

2018-09-10 Thread Wenchen Fan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609283#comment-16609283
 ] 

Wenchen Fan commented on SPARK-23153:
-

I'm removing the target version, since no one is working on it.

> Support application dependencies in submission client's local file system
> -
>
> Key: SPARK-23153
> URL: https://issues.apache.org/jira/browse/SPARK-23153
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org