[jira] [Created] (SPARK-33761) [k8s] Support fetching driver and executor pod templates from HCFS
Xuzhou Yin created SPARK-33761: -- Summary: [k8s] Support fetching driver and executor pod templates from HCFS Key: SPARK-33761 URL: https://issues.apache.org/jira/browse/SPARK-33761 Project: Spark Issue Type: Improvement Components: Kubernetes Affects Versions: 3.0.1 Reporter: Xuzhou Yin Currently Spark 3 on Kubernetes supports loading driver and executor pod templates from Local file system: [https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesUtils.scala#L87.] However, this is not very convenient as user needs to either bake the pod templates into client pod image or manually mounting the file as configMap. It would be nice if Spark supports loading pod templates from Hadoop Compatible File Systems (such as S3A), so that user can directly update the pod template files in S3 without changing the underline Kubernetes job definition (eg. Updating Docker image or updating ConfigMap) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32775) [k8s] Spark client dependency support ignores non-local paths
[ https://issues.apache.org/jira/browse/SPARK-32775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuzhou Yin updated SPARK-32775: --- Description: According to the logic of this line: [https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala#L161,] Spark filters out all paths which are not local (ie. no scheme or [file://|file:///] scheme). It may cause non-local dependencies not loaded by Driver. For example, when starting a Spark job with spark.jars=*local*:///local/path/1.jar,*s3*://s3/path/2.jar,*file*:///local/path/3.jar, it seems like this logic will upload *file*:///local/path/3.jar to s3, and reset spark.jars to only s3://transformed/path/3.jar, while completely ignoring local:///local/path/1.jar and s3:///s3/path/2.jar. We need to fix this logic such that Spark uploads local files to S3, and transforms the paths while keeping all other paths as they are. was: According to the logic of this line: [https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala#L161,] Spark filters out all paths which are not local (ie. no scheme or [file://|file:///] scheme). It may cause non-local dependencies not loaded by Driver. For example, when starting a Spark job with spark.jars=*local*:///local/path/1.jar,*s3*://s3/path/2.jar,*file*:///local/path/3.jar, it seems like this logic will upload *file*:///local/path/3.jar to s3, and reset spark.jars to only s3://transformed/path/3.jar, while completely ignoring local:///local/path/1.jar and s3:///s3/path/2.jar. We need to fix this logic such that Spark upload local files to S3, and transform the paths while keeping all other paths as they are. > [k8s] Spark client dependency support ignores non-local paths > - > > Key: SPARK-32775 > URL: https://issues.apache.org/jira/browse/SPARK-32775 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.0.0 >Reporter: Xuzhou Yin >Priority: Major > > According to the logic of this line: > [https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala#L161,] > Spark filters out all paths which are not local (ie. no scheme or > [file://|file:///] scheme). It may cause non-local dependencies not loaded by > Driver. > For example, when starting a Spark job with > spark.jars=*local*:///local/path/1.jar,*s3*://s3/path/2.jar,*file*:///local/path/3.jar, > it seems like this logic will upload *file*:///local/path/3.jar to s3, and > reset spark.jars to only s3://transformed/path/3.jar, while completely > ignoring local:///local/path/1.jar and s3:///s3/path/2.jar. > We need to fix this logic such that Spark uploads local files to S3, and > transforms the paths while keeping all other paths as they are. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32775) [k8s] Spark client dependency support ignores non-local paths
[ https://issues.apache.org/jira/browse/SPARK-32775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuzhou Yin updated SPARK-32775: --- Description: According to the logic of this line: [https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala#L161,] Spark filters out all paths which are not local (ie. no scheme or [file://|file:///] scheme). It may cause non-local dependencies not loaded by Driver. For example, when starting a Spark job with spark.jars=*local*:///local/path/1.jar,*s3*://s3/path/2.jar,*file*:///local/path/3.jar, it seems like this logic will upload *file*:///local/path/3.jar to s3, and reset spark.jars to only s3://transformed/path/3.jar, while completely ignoring local:///local/path/1.jar and s3:///s3/path/2.jar. We need to fix this logic such that Spark upload local files to S3, and transform the paths while keeping all other paths as they are. was: According to the logic of this line: [https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala#L161,] Spark filters out all paths which are not local (ie. no scheme or [file://|file:///] scheme). It may cause non-local dependencies not loaded by Driver. For example, when starting a Spark job with spark.jars=local:///local/path/1.jar,s3://s3/path/2.jar,[file:///local/path/3.jar], it seems like this logic will upload [file:///local/path/3.jar] to s3, and reset spark.jars to only s3://upload/path/3.jar, while completely ignoring local:///local/path/1.jar and s3:///s3/path/2.jar. We need to fix this logic such that Spark upload local files to S3, and transform the paths while keeping all other paths as they are. > [k8s] Spark client dependency support ignores non-local paths > - > > Key: SPARK-32775 > URL: https://issues.apache.org/jira/browse/SPARK-32775 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.0.0 >Reporter: Xuzhou Yin >Priority: Major > > According to the logic of this line: > [https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala#L161,] > Spark filters out all paths which are not local (ie. no scheme or > [file://|file:///] scheme). It may cause non-local dependencies not loaded by > Driver. > For example, when starting a Spark job with > spark.jars=*local*:///local/path/1.jar,*s3*://s3/path/2.jar,*file*:///local/path/3.jar, > it seems like this logic will upload *file*:///local/path/3.jar to s3, and > reset spark.jars to only s3://transformed/path/3.jar, while completely > ignoring local:///local/path/1.jar and s3:///s3/path/2.jar. > We need to fix this logic such that Spark upload local files to S3, and > transform the paths while keeping all other paths as they are. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32775) [k8s] Spark client dependency support ignores non-local paths
Xuzhou Yin created SPARK-32775: -- Summary: [k8s] Spark client dependency support ignores non-local paths Key: SPARK-32775 URL: https://issues.apache.org/jira/browse/SPARK-32775 Project: Spark Issue Type: Bug Components: Kubernetes Affects Versions: 3.0.0 Reporter: Xuzhou Yin According to the logic of this line: [https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala#L161,] Spark filters out all paths which are not local (ie. no scheme or [file://|file:///] scheme). It may cause non-local dependencies not loaded by Driver. For example, when starting a Spark job with spark.jars=local:///local/path/1.jar,s3://s3/path/2.jar,[file:///local/path/3.jar], it seems like this logic will upload [file:///local/path/3.jar] to s3, and reset spark.jars to only s3://upload/path/3.jar, while completely ignoring local:///local/path/1.jar and s3:///s3/path/2.jar. We need to fix this logic such that Spark upload local files to S3, and transform the paths while keeping all other paths as they are. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-23153) Support application dependencies in submission client's local file system
[ https://issues.apache.org/jira/browse/SPARK-23153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17188222#comment-17188222 ] Xuzhou Yin edited comment on SPARK-23153 at 9/1/20, 7:37 AM: - Hi guys, I have looked through the pull request of this change, and there is one part which I don't quite understand, it would be awesome if someone can explain it a little bit. At this line: [https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala#L161,] Spark filters out all paths which are not local (ie. no scheme or [file://|file:///] scheme). Does it mean it will ignore all other paths which are not local? For example, when starting a Spark job with spark.jars=local:///local/path/1.jar,s3://s3/path/2.jar,[file:///local/path/3.jar], it seems like this logic will upload [file:///local/path/3.jar] to s3, and reset spark.jars to only s3://upload/path/3.jar, while completely ignoring local:///local/path/1.jar and s3:///s3/path/2.jar. Is this an expected behavior? If so, how should we do if we want to specify dependencies which are in HCFS such as S3, or driver's local (ie. local://) instead of [file://?|file:///?] If this is a bug, is there a Jira issue for it? Thanks a lot! @[~skonto] was (Author: xuzhoyin): Hi guys, I have looked through the pull request of this change, and there is one part which I don't quite understand, it would be awesome if someone can explain it a little bit. At this line: [https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala#L161,] Spark filters out all paths which are not local (ie. no scheme or [file://|file:///] scheme). Does it mean it will ignore all other paths which are not local? For example, when starting a Spark job with spark.jars=local:///local/path/1.jar,s3://s3/path/2.jar,[file:///local/path/3.jar], it seems like this logic will upload [file:///local/path/3.jar] to s3, and reset spark.jars to only s3://upload/path/3.jar, while completely ignoring local:///local/path/1.jar and s3:///s3/path/2.jar. Is this an expected behavior? If so, how should we do if we want to specify dependencies which are in HCFS such as S3, or driver's local (ie. local://) instead of file://? If this is a bug, is there a Jira issue for it? Thanks a lot! > Support application dependencies in submission client's local file system > - > > Key: SPARK-23153 > URL: https://issues.apache.org/jira/browse/SPARK-23153 > Project: Spark > Issue Type: Improvement > Components: Kubernetes, Spark Core >Affects Versions: 2.4.0 >Reporter: Yinan Li >Assignee: Stavros Kontopoulos >Priority: Major > Fix For: 3.0.0 > > > Currently local dependencies are not supported with Spark on K8S i.e. if the > user has code or dependencies only on the client where they run > {{spark-submit}} then the current implementation has no way to make those > visible to the Spark application running inside the K8S pods that get > launched. This limits users to only running applications where the code and > dependencies are either baked into the Docker images used or where those are > available via some external and globally accessible file system e.g. HDFS > which are not viable options for many users and environments -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-23153) Support application dependencies in submission client's local file system
[ https://issues.apache.org/jira/browse/SPARK-23153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17188222#comment-17188222 ] Xuzhou Yin edited comment on SPARK-23153 at 9/1/20, 7:33 AM: - Hi guys, I have looked through the pull request of this change, and there is one part which I don't quite understand, it would be awesome if someone can explain it a little bit. At this line: [https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala#L161,] Spark filters out all paths which are not local (ie. no scheme or [file://|file:///] scheme). Does it mean it will ignore all other paths which are not local? For example, when starting a Spark job with spark.jars=local:///local/path/1.jar,s3://s3/path/2.jar,[file:///local/path/3.jar], it seems like this logic will upload [file:///local/path/3.jar] to s3, and reset spark.jars to only s3://upload/path/3.jar, while completely ignoring local:///local/path/1.jar and s3:///s3/path/2.jar. Is this an expected behavior? If so, how should we do if we want to specify dependencies which are in HCFS such as S3, or driver's local (ie. local://) instead of file://? If this is a bug, is there a Jira issue for it? Thanks a lot! was (Author: xuzhoyin): Hi guys, I have looked through the pull request of this change, and does not quite understand one part, it would be awesome if someone can explain it a little bit. At this line: [https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala#L161,] it filters out all paths which are not local (ie. no scheme or file:// scheme). Does it ignore all other paths which are not local? For example, when starting a Spark job with spark.jars=local:///local/path/1.jar,s3:///s3/path/2.jar,file:///local/path/3.jar, it seems like this logic will upload file:///local/path/3.jar to s3, and reset spark.jars to only s3://upload/path/3.jar, while completely ignoring local:///local/path/1.jar and s3:///s3/path/2.jar. Is this expected behavior? If so, how should we do if we want to specify dependencies which are in HCFS such as S3 instead of local? If this is a bug, is there a Jira issue for it? Thanks a lot! > Support application dependencies in submission client's local file system > - > > Key: SPARK-23153 > URL: https://issues.apache.org/jira/browse/SPARK-23153 > Project: Spark > Issue Type: Improvement > Components: Kubernetes, Spark Core >Affects Versions: 2.4.0 >Reporter: Yinan Li >Assignee: Stavros Kontopoulos >Priority: Major > Fix For: 3.0.0 > > > Currently local dependencies are not supported with Spark on K8S i.e. if the > user has code or dependencies only on the client where they run > {{spark-submit}} then the current implementation has no way to make those > visible to the Spark application running inside the K8S pods that get > launched. This limits users to only running applications where the code and > dependencies are either baked into the Docker images used or where those are > available via some external and globally accessible file system e.g. HDFS > which are not viable options for many users and environments -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23153) Support application dependencies in submission client's local file system
[ https://issues.apache.org/jira/browse/SPARK-23153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17188222#comment-17188222 ] Xuzhou Yin commented on SPARK-23153: Hi guys, I have looked through the pull request of this change, and does not quite understand one part, it would be awesome if someone can explain it a little bit. At this line: [https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala#L161,] it filters out all paths which are not local (ie. no scheme or file:// scheme). Does it ignore all other paths which are not local? For example, when starting a Spark job with spark.jars=local:///local/path/1.jar,s3:///s3/path/2.jar,file:///local/path/3.jar, it seems like this logic will upload file:///local/path/3.jar to s3, and reset spark.jars to only s3://upload/path/3.jar, while completely ignoring local:///local/path/1.jar and s3:///s3/path/2.jar. Is this expected behavior? If so, how should we do if we want to specify dependencies which are in HCFS such as S3 instead of local? If this is a bug, is there a Jira issue for it? Thanks a lot! > Support application dependencies in submission client's local file system > - > > Key: SPARK-23153 > URL: https://issues.apache.org/jira/browse/SPARK-23153 > Project: Spark > Issue Type: Improvement > Components: Kubernetes, Spark Core >Affects Versions: 2.4.0 >Reporter: Yinan Li >Assignee: Stavros Kontopoulos >Priority: Major > Fix For: 3.0.0 > > > Currently local dependencies are not supported with Spark on K8S i.e. if the > user has code or dependencies only on the client where they run > {{spark-submit}} then the current implementation has no way to make those > visible to the Spark application running inside the K8S pods that get > launched. This limits users to only running applications where the code and > dependencies are either baked into the Docker images used or where those are > available via some external and globally accessible file system e.g. HDFS > which are not viable options for many users and environments -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org