[jira] [Created] (SPARK-33761) [k8s] Support fetching driver and executor pod templates from HCFS

2020-12-11 Thread Xuzhou Yin (Jira)
Xuzhou Yin created SPARK-33761:
--

 Summary: [k8s] Support fetching driver and executor pod templates 
from HCFS
 Key: SPARK-33761
 URL: https://issues.apache.org/jira/browse/SPARK-33761
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes
Affects Versions: 3.0.1
Reporter: Xuzhou Yin


Currently Spark 3 on Kubernetes supports loading driver and executor pod 
templates from Local file system: 
[https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesUtils.scala#L87.]
 However, this is not very convenient as user needs to either bake the pod 
templates into client pod image or manually mounting the file as configMap. It 
would be nice if Spark supports loading pod templates from Hadoop Compatible 
File Systems (such as S3A), so that user can directly update the pod template 
files in S3 without changing the underline Kubernetes job definition (eg. 
Updating Docker image or updating ConfigMap)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32775) [k8s] Spark client dependency support ignores non-local paths

2020-09-01 Thread Xuzhou Yin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuzhou Yin updated SPARK-32775:
---
Description: 
According to the logic of this line: 
[https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala#L161,]
 Spark filters out all paths which are not local (ie. no scheme or 
[file://|file:///] scheme). It may cause non-local dependencies not loaded by 
Driver.

For example, when starting a Spark job with 
spark.jars=*local*:///local/path/1.jar,*s3*://s3/path/2.jar,*file*:///local/path/3.jar,
 it seems like this logic will upload *file*:///local/path/3.jar to s3, and 
reset spark.jars to only s3://transformed/path/3.jar, while completely ignoring 
local:///local/path/1.jar and s3:///s3/path/2.jar.

We need to fix this logic such that Spark uploads local files to S3, and 
transforms the paths while keeping all other paths as they are.

  was:
According to the logic of this line: 
[https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala#L161,]
 Spark filters out all paths which are not local (ie. no scheme or 
[file://|file:///] scheme). It may cause non-local dependencies not loaded by 
Driver.

For example, when starting a Spark job with 
spark.jars=*local*:///local/path/1.jar,*s3*://s3/path/2.jar,*file*:///local/path/3.jar,
 it seems like this logic will upload *file*:///local/path/3.jar to s3, and 
reset spark.jars to only s3://transformed/path/3.jar, while completely ignoring 
local:///local/path/1.jar and s3:///s3/path/2.jar.

We need to fix this logic such that Spark upload local files to S3, and 
transform the paths while keeping all other paths as they are.


> [k8s] Spark client dependency support ignores non-local paths
> -
>
> Key: SPARK-32775
> URL: https://issues.apache.org/jira/browse/SPARK-32775
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: Xuzhou Yin
>Priority: Major
>
> According to the logic of this line: 
> [https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala#L161,]
>  Spark filters out all paths which are not local (ie. no scheme or 
> [file://|file:///] scheme). It may cause non-local dependencies not loaded by 
> Driver.
> For example, when starting a Spark job with 
> spark.jars=*local*:///local/path/1.jar,*s3*://s3/path/2.jar,*file*:///local/path/3.jar,
>  it seems like this logic will upload *file*:///local/path/3.jar to s3, and 
> reset spark.jars to only s3://transformed/path/3.jar, while completely 
> ignoring local:///local/path/1.jar and s3:///s3/path/2.jar.
> We need to fix this logic such that Spark uploads local files to S3, and 
> transforms the paths while keeping all other paths as they are.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32775) [k8s] Spark client dependency support ignores non-local paths

2020-09-01 Thread Xuzhou Yin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuzhou Yin updated SPARK-32775:
---
Description: 
According to the logic of this line: 
[https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala#L161,]
 Spark filters out all paths which are not local (ie. no scheme or 
[file://|file:///] scheme). It may cause non-local dependencies not loaded by 
Driver.

For example, when starting a Spark job with 
spark.jars=*local*:///local/path/1.jar,*s3*://s3/path/2.jar,*file*:///local/path/3.jar,
 it seems like this logic will upload *file*:///local/path/3.jar to s3, and 
reset spark.jars to only s3://transformed/path/3.jar, while completely ignoring 
local:///local/path/1.jar and s3:///s3/path/2.jar.

We need to fix this logic such that Spark upload local files to S3, and 
transform the paths while keeping all other paths as they are.

  was:
According to the logic of this line: 
[https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala#L161,]
 Spark filters out all paths which are not local (ie. no scheme or 
[file://|file:///] scheme). It may cause non-local dependencies not loaded by 
Driver.

For example, when starting a Spark job with 
spark.jars=local:///local/path/1.jar,s3://s3/path/2.jar,[file:///local/path/3.jar],
 it seems like this logic will upload [file:///local/path/3.jar] to s3, and 
reset spark.jars to only s3://upload/path/3.jar, while completely ignoring 
local:///local/path/1.jar and s3:///s3/path/2.jar.

We need to fix this logic such that Spark upload local files to S3, and 
transform the paths while keeping all other paths as they are.


> [k8s] Spark client dependency support ignores non-local paths
> -
>
> Key: SPARK-32775
> URL: https://issues.apache.org/jira/browse/SPARK-32775
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: Xuzhou Yin
>Priority: Major
>
> According to the logic of this line: 
> [https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala#L161,]
>  Spark filters out all paths which are not local (ie. no scheme or 
> [file://|file:///] scheme). It may cause non-local dependencies not loaded by 
> Driver.
> For example, when starting a Spark job with 
> spark.jars=*local*:///local/path/1.jar,*s3*://s3/path/2.jar,*file*:///local/path/3.jar,
>  it seems like this logic will upload *file*:///local/path/3.jar to s3, and 
> reset spark.jars to only s3://transformed/path/3.jar, while completely 
> ignoring local:///local/path/1.jar and s3:///s3/path/2.jar.
> We need to fix this logic such that Spark upload local files to S3, and 
> transform the paths while keeping all other paths as they are.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-32775) [k8s] Spark client dependency support ignores non-local paths

2020-09-01 Thread Xuzhou Yin (Jira)
Xuzhou Yin created SPARK-32775:
--

 Summary: [k8s] Spark client dependency support ignores non-local 
paths
 Key: SPARK-32775
 URL: https://issues.apache.org/jira/browse/SPARK-32775
 Project: Spark
  Issue Type: Bug
  Components: Kubernetes
Affects Versions: 3.0.0
Reporter: Xuzhou Yin


According to the logic of this line: 
[https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala#L161,]
 Spark filters out all paths which are not local (ie. no scheme or 
[file://|file:///] scheme). It may cause non-local dependencies not loaded by 
Driver.

For example, when starting a Spark job with 
spark.jars=local:///local/path/1.jar,s3://s3/path/2.jar,[file:///local/path/3.jar],
 it seems like this logic will upload [file:///local/path/3.jar] to s3, and 
reset spark.jars to only s3://upload/path/3.jar, while completely ignoring 
local:///local/path/1.jar and s3:///s3/path/2.jar.

We need to fix this logic such that Spark upload local files to S3, and 
transform the paths while keeping all other paths as they are.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-23153) Support application dependencies in submission client's local file system

2020-09-01 Thread Xuzhou Yin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-23153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17188222#comment-17188222
 ] 

Xuzhou Yin edited comment on SPARK-23153 at 9/1/20, 7:37 AM:
-

Hi guys,

I have looked through the pull request of this change, and there is one part 
which I don't quite understand, it would be awesome if someone can explain it a 
little bit.

At this line: 
[https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala#L161,]
 Spark filters out all paths which are not local (ie. no scheme or 
[file://|file:///] scheme). Does it mean it will ignore all other paths which 
are not local? For example, when starting a Spark job with 
spark.jars=local:///local/path/1.jar,s3://s3/path/2.jar,[file:///local/path/3.jar],
 it seems like this logic will upload [file:///local/path/3.jar] to s3, and 
reset spark.jars to only s3://upload/path/3.jar, while completely ignoring 
local:///local/path/1.jar and s3:///s3/path/2.jar.

Is this an expected behavior? If so, how should we do if we want to specify 
dependencies which are in HCFS such as S3, or driver's local (ie. local://) 
instead of [file://?|file:///?] If this is a bug, is there a Jira issue for it?

Thanks a lot!

@[~skonto]


was (Author: xuzhoyin):
Hi guys,

I have looked through the pull request of this change, and there is one part 
which I don't quite understand, it would be awesome if someone can explain it a 
little bit.

At this line: 
[https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala#L161,]
 Spark filters out all paths which are not local (ie. no scheme or 
[file://|file:///] scheme). Does it mean it will ignore all other paths which 
are not local? For example, when starting a Spark job with 
spark.jars=local:///local/path/1.jar,s3://s3/path/2.jar,[file:///local/path/3.jar],
 it seems like this logic will upload [file:///local/path/3.jar] to s3, and 
reset spark.jars to only s3://upload/path/3.jar, while completely ignoring 
local:///local/path/1.jar and s3:///s3/path/2.jar.

Is this an expected behavior? If so, how should we do if we want to specify 
dependencies which are in HCFS such as S3, or driver's local (ie. local://) 
instead of file://? If this is a bug, is there a Jira issue for it?

Thanks a lot!

> Support application dependencies in submission client's local file system
> -
>
> Key: SPARK-23153
> URL: https://issues.apache.org/jira/browse/SPARK-23153
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes, Spark Core
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Assignee: Stavros Kontopoulos
>Priority: Major
> Fix For: 3.0.0
>
>
> Currently local dependencies are not supported with Spark on K8S i.e. if the 
> user has code or dependencies only on the client where they run 
> {{spark-submit}} then the current implementation has no way to make those 
> visible to the Spark application running inside the K8S pods that get 
> launched.  This limits users to only running applications where the code and 
> dependencies are either baked into the Docker images used or where those are 
> available via some external and globally accessible file system e.g. HDFS 
> which are not viable options for many users and environments



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-23153) Support application dependencies in submission client's local file system

2020-09-01 Thread Xuzhou Yin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-23153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17188222#comment-17188222
 ] 

Xuzhou Yin edited comment on SPARK-23153 at 9/1/20, 7:33 AM:
-

Hi guys,

I have looked through the pull request of this change, and there is one part 
which I don't quite understand, it would be awesome if someone can explain it a 
little bit.

At this line: 
[https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala#L161,]
 Spark filters out all paths which are not local (ie. no scheme or 
[file://|file:///] scheme). Does it mean it will ignore all other paths which 
are not local? For example, when starting a Spark job with 
spark.jars=local:///local/path/1.jar,s3://s3/path/2.jar,[file:///local/path/3.jar],
 it seems like this logic will upload [file:///local/path/3.jar] to s3, and 
reset spark.jars to only s3://upload/path/3.jar, while completely ignoring 
local:///local/path/1.jar and s3:///s3/path/2.jar.

Is this an expected behavior? If so, how should we do if we want to specify 
dependencies which are in HCFS such as S3, or driver's local (ie. local://) 
instead of file://? If this is a bug, is there a Jira issue for it?

Thanks a lot!


was (Author: xuzhoyin):
Hi guys,

I have looked through the pull request of this change, and does not quite 
understand one part, it would be awesome if someone can explain it a little bit.

At this line: 
[https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala#L161,]
 it filters out all paths which are not local (ie. no scheme or file:// 
scheme). Does it ignore all other paths which are not local? For example, when 
starting a Spark job with 
spark.jars=local:///local/path/1.jar,s3:///s3/path/2.jar,file:///local/path/3.jar,
 it seems like this logic will upload file:///local/path/3.jar to s3, and reset 
spark.jars to only s3://upload/path/3.jar, while completely ignoring 
local:///local/path/1.jar and s3:///s3/path/2.jar.

Is this expected behavior? If so, how should we do if we want to specify 
dependencies which are in HCFS such as S3 instead of local? If this is a bug, 
is there a Jira issue for it?

Thanks a lot!

> Support application dependencies in submission client's local file system
> -
>
> Key: SPARK-23153
> URL: https://issues.apache.org/jira/browse/SPARK-23153
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes, Spark Core
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Assignee: Stavros Kontopoulos
>Priority: Major
> Fix For: 3.0.0
>
>
> Currently local dependencies are not supported with Spark on K8S i.e. if the 
> user has code or dependencies only on the client where they run 
> {{spark-submit}} then the current implementation has no way to make those 
> visible to the Spark application running inside the K8S pods that get 
> launched.  This limits users to only running applications where the code and 
> dependencies are either baked into the Docker images used or where those are 
> available via some external and globally accessible file system e.g. HDFS 
> which are not viable options for many users and environments



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23153) Support application dependencies in submission client's local file system

2020-09-01 Thread Xuzhou Yin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-23153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17188222#comment-17188222
 ] 

Xuzhou Yin commented on SPARK-23153:


Hi guys,

I have looked through the pull request of this change, and does not quite 
understand one part, it would be awesome if someone can explain it a little bit.

At this line: 
[https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala#L161,]
 it filters out all paths which are not local (ie. no scheme or file:// 
scheme). Does it ignore all other paths which are not local? For example, when 
starting a Spark job with 
spark.jars=local:///local/path/1.jar,s3:///s3/path/2.jar,file:///local/path/3.jar,
 it seems like this logic will upload file:///local/path/3.jar to s3, and reset 
spark.jars to only s3://upload/path/3.jar, while completely ignoring 
local:///local/path/1.jar and s3:///s3/path/2.jar.

Is this expected behavior? If so, how should we do if we want to specify 
dependencies which are in HCFS such as S3 instead of local? If this is a bug, 
is there a Jira issue for it?

Thanks a lot!

> Support application dependencies in submission client's local file system
> -
>
> Key: SPARK-23153
> URL: https://issues.apache.org/jira/browse/SPARK-23153
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes, Spark Core
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Assignee: Stavros Kontopoulos
>Priority: Major
> Fix For: 3.0.0
>
>
> Currently local dependencies are not supported with Spark on K8S i.e. if the 
> user has code or dependencies only on the client where they run 
> {{spark-submit}} then the current implementation has no way to make those 
> visible to the Spark application running inside the K8S pods that get 
> launched.  This limits users to only running applications where the code and 
> dependencies are either baked into the Docker images used or where those are 
> available via some external and globally accessible file system e.g. HDFS 
> which are not viable options for many users and environments



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org