[jira] [Commented] (SPARK-36088) 'spark.archives' does not extract the archive file into the driver under client mode
[ https://issues.apache.org/jira/browse/SPARK-36088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17461405#comment-17461405 ] jingxiong zhong commented on SPARK-36088: - @hyukjin.kwon I make an issue at https://issues.apache.org/jira/browse/SPARK-37677, I think I can fix it. > 'spark.archives' does not extract the archive file into the driver under > client mode > > > Key: SPARK-36088 > URL: https://issues.apache.org/jira/browse/SPARK-36088 > Project: Spark > Issue Type: Improvement > Components: Kubernetes, Spark Submit >Affects Versions: 3.1.2 >Reporter: rickcheng >Priority: Major > > When running spark in the k8s cluster, there are 2 deploy modes: cluster and > client. After my test, in the cluster mode, *spark.archives* can extract the > archive file to the working directory of the executors and driver. But in > client mode, *spark.archives* can only extract the archive file to the > working directory of the executors. > > However, I need *spark.archives* to send the virtual environment tar file > packaged by conda to both the driver and executors under client mode (So that > the executor and the driver have the same python environment). > > Why *spark.archives* does not extract the archive file into the working > directory of the driver under client mode? -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36088) 'spark.archives' does not extract the archive file into the driver under client mode
[ https://issues.apache.org/jira/browse/SPARK-36088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17461305#comment-17461305 ] Hyukjin Kwon commented on SPARK-36088: -- [~rickcheng] can you try a fix? we can maybe try to download the files locally when we're in a pod. we can mimic the behaviour of spark.files on driver side. > 'spark.archives' does not extract the archive file into the driver under > client mode > > > Key: SPARK-36088 > URL: https://issues.apache.org/jira/browse/SPARK-36088 > Project: Spark > Issue Type: Improvement > Components: Kubernetes, Spark Submit >Affects Versions: 3.1.2 >Reporter: rickcheng >Priority: Major > > When running spark in the k8s cluster, there are 2 deploy modes: cluster and > client. After my test, in the cluster mode, *spark.archives* can extract the > archive file to the working directory of the executors and driver. But in > client mode, *spark.archives* can only extract the archive file to the > working directory of the executors. > > However, I need *spark.archives* to send the virtual environment tar file > packaged by conda to both the driver and executors under client mode (So that > the executor and the driver have the same python environment). > > Why *spark.archives* does not extract the archive file into the working > directory of the driver under client mode? -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36088) 'spark.archives' does not extract the archive file into the driver under client mode
[ https://issues.apache.org/jira/browse/SPARK-36088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17461298#comment-17461298 ] Hyukjin Kwon commented on SPARK-36088: -- [~zhongjingxiong] can you try with tar.gz? zip doesn't keep the permissions by default. > 'spark.archives' does not extract the archive file into the driver under > client mode > > > Key: SPARK-36088 > URL: https://issues.apache.org/jira/browse/SPARK-36088 > Project: Spark > Issue Type: Improvement > Components: Kubernetes, Spark Submit >Affects Versions: 3.1.2 >Reporter: rickcheng >Priority: Major > > When running spark in the k8s cluster, there are 2 deploy modes: cluster and > client. After my test, in the cluster mode, *spark.archives* can extract the > archive file to the working directory of the executors and driver. But in > client mode, *spark.archives* can only extract the archive file to the > working directory of the executors. > > However, I need *spark.archives* to send the virtual environment tar file > packaged by conda to both the driver and executors under client mode (So that > the executor and the driver have the same python environment). > > Why *spark.archives* does not extract the archive file into the working > directory of the driver under client mode? -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36088) 'spark.archives' does not extract the archive file into the driver under client mode
[ https://issues.apache.org/jira/browse/SPARK-36088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17461230#comment-17461230 ] jingxiong zhong commented on SPARK-36088: - In cluster mode, I hava another question that when I unzip python3.6.6.zip in pod , but no permission to execute, my execute operation as follows: {code:shell} spark-submit \ --archives ./python3.6.6.zip#python3.6.6 \ --conf "spark.pyspark.python=python3.6.6/python3.6.6/bin/python3" \ --conf "spark.pyspark.driver.python=python3.6.6/python3.6.6/bin/python3" \ --conf spark.kubernetes.container.image.pullPolicy=Always \ ./examples/src/main/python/pi.py 100 {code} > 'spark.archives' does not extract the archive file into the driver under > client mode > > > Key: SPARK-36088 > URL: https://issues.apache.org/jira/browse/SPARK-36088 > Project: Spark > Issue Type: Improvement > Components: Kubernetes, Spark Submit >Affects Versions: 3.1.2 >Reporter: rickcheng >Priority: Major > > When running spark in the k8s cluster, there are 2 deploy modes: cluster and > client. After my test, in the cluster mode, *spark.archives* can extract the > archive file to the working directory of the executors and driver. But in > client mode, *spark.archives* can only extract the archive file to the > working directory of the executors. > > However, I need *spark.archives* to send the virtual environment tar file > packaged by conda to both the driver and executors under client mode (So that > the executor and the driver have the same python environment). > > Why *spark.archives* does not extract the archive file into the working > directory of the driver under client mode? -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36088) 'spark.archives' does not extract the archive file into the driver under client mode
[ https://issues.apache.org/jira/browse/SPARK-36088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17384576#comment-17384576 ] rickcheng commented on SPARK-36088: --- Hi, [~srowen] Thanks for the comment. I agree that in client mode, user can access the file in some cases. However, my original intention to raise this question was because I wanted to distribute the conda packaged environment (a tar.gz file) through *spark.archive* and extract it to the driver and executors. In this way, the driver and executors will have the same python environment. And in K8s, the driver may run in a pod and the tar.gz file may be in a remote place (e.g., HDFS). So I think it's also necessary to extract the archive file to the driver through spark.archive. Or maybe there is a better way to achieve this goal? > 'spark.archives' does not extract the archive file into the driver under > client mode > > > Key: SPARK-36088 > URL: https://issues.apache.org/jira/browse/SPARK-36088 > Project: Spark > Issue Type: Improvement > Components: Kubernetes, Spark Submit >Affects Versions: 3.1.2 >Reporter: rickcheng >Priority: Major > > When running spark in the k8s cluster, there are 2 deploy modes: cluster and > client. After my test, in the cluster mode, *spark.archives* can extract the > archive file to the working directory of the executors and driver. But in > client mode, *spark.archives* can only extract the archive file to the > working directory of the executors. > > However, I need *spark.archives* to send the virtual environment tar file > packaged by conda to both the driver and executors under client mode (So that > the executor and the driver have the same python environment). > > Why *spark.archives* does not extract the archive file into the working > directory of the driver under client mode? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36088) 'spark.archives' does not extract the archive file into the driver under client mode
[ https://issues.apache.org/jira/browse/SPARK-36088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17384569#comment-17384569 ] rickcheng commented on SPARK-36088: --- Hi, [~hyukjin.kwon] Thanks for the comment. After my test, under client mode, the archive file will not extract to the driver's working directory no matter if the driver is in the pod or not. Thanks for pointing out the code, maybe I will consider making a PR. > 'spark.archives' does not extract the archive file into the driver under > client mode > > > Key: SPARK-36088 > URL: https://issues.apache.org/jira/browse/SPARK-36088 > Project: Spark > Issue Type: Improvement > Components: Kubernetes, Spark Submit >Affects Versions: 3.1.2 >Reporter: rickcheng >Priority: Major > > When running spark in the k8s cluster, there are 2 deploy modes: cluster and > client. After my test, in the cluster mode, *spark.archives* can extract the > archive file to the working directory of the executors and driver. But in > client mode, *spark.archives* can only extract the archive file to the > working directory of the executors. > > However, I need *spark.archives* to send the virtual environment tar file > packaged by conda to both the driver and executors under client mode (So that > the executor and the driver have the same python environment). > > Why *spark.archives* does not extract the archive file into the working > directory of the driver under client mode? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36088) 'spark.archives' does not extract the archive file into the driver under client mode
[ https://issues.apache.org/jira/browse/SPARK-36088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17384325#comment-17384325 ] Sean R. Owen commented on SPARK-36088: -- I think the idea is that in client mode you already have access to the file, presumably? > 'spark.archives' does not extract the archive file into the driver under > client mode > > > Key: SPARK-36088 > URL: https://issues.apache.org/jira/browse/SPARK-36088 > Project: Spark > Issue Type: Improvement > Components: Kubernetes, Spark Submit >Affects Versions: 3.1.2 >Reporter: rickcheng >Priority: Major > > When running spark in the k8s cluster, there are 2 deploy modes: cluster and > client. After my test, in the cluster mode, *spark.archives* can extract the > archive file to the working directory of the executors and driver. But in > client mode, *spark.archives* can only extract the archive file to the > working directory of the executors. > > However, I need *spark.archives* to send the virtual environment tar file > packaged by conda to both the driver and executors under client mode (So that > the executor and the driver have the same python environment). > > Why *spark.archives* does not extract the archive file into the working > directory of the driver under client mode? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36088) 'spark.archives' does not extract the archive file into the driver under client mode
[ https://issues.apache.org/jira/browse/SPARK-36088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17383131#comment-17383131 ] Hyukjin Kwon commented on SPARK-36088: -- cc [~dongjoon] and [~holdenkarau] FYI > 'spark.archives' does not extract the archive file into the driver under > client mode > > > Key: SPARK-36088 > URL: https://issues.apache.org/jira/browse/SPARK-36088 > Project: Spark > Issue Type: Improvement > Components: Kubernetes, Spark Submit >Affects Versions: 3.1.2 >Reporter: rickcheng >Priority: Major > > When running spark in the k8s cluster, there are 2 deploy modes: cluster and > client. After my test, in the cluster mode, *spark.archives* can extract the > archive file to the working directory of the executors and driver. But in > client mode, *spark.archives* can only extract the archive file to the > working directory of the executors. > > However, I need *spark.archives* to send the virtual environment tar file > packaged by conda to both the driver and executors under client mode (So that > the executor and the driver have the same python environment). > > Why *spark.archives* does not extract the archive file into the working > directory of the driver under client mode? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36088) 'spark.archives' does not extract the archive file into the driver under client mode
[ https://issues.apache.org/jira/browse/SPARK-36088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17383130#comment-17383130 ] Hyukjin Kwon commented on SPARK-36088: -- You might have to call https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L393-L419 logic when {{isKubernetesClient}} is on. Are you interested in submitting a PR? > 'spark.archives' does not extract the archive file into the driver under > client mode > > > Key: SPARK-36088 > URL: https://issues.apache.org/jira/browse/SPARK-36088 > Project: Spark > Issue Type: Improvement > Components: Kubernetes, Spark Submit >Affects Versions: 3.1.2 >Reporter: rickcheng >Priority: Major > > When running spark in the k8s cluster, there are 2 deploy modes: cluster and > client. After my test, in the cluster mode, *spark.archives* can extract the > archive file to the working directory of the executors and driver. But in > client mode, *spark.archives* can only extract the archive file to the > working directory of the executors. > > However, I need *spark.archives* to send the virtual environment tar file > packaged by conda to both the driver and executors under client mode (So that > the executor and the driver have the same python environment). > > Why *spark.archives* does not extract the archive file into the working > directory of the driver under client mode? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36088) 'spark.archives' does not extract the archive file into the driver under client mode
[ https://issues.apache.org/jira/browse/SPARK-36088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17383128#comment-17383128 ] Hyukjin Kwon commented on SPARK-36088: -- does your driver run inside a pod or on a physical host? > 'spark.archives' does not extract the archive file into the driver under > client mode > > > Key: SPARK-36088 > URL: https://issues.apache.org/jira/browse/SPARK-36088 > Project: Spark > Issue Type: Improvement > Components: Kubernetes, Spark Submit >Affects Versions: 3.1.2 >Reporter: rickcheng >Priority: Major > > When running spark in the k8s cluster, there are 2 deploy modes: cluster and > client. After my test, in the cluster mode, *spark.archives* can extract the > archive file to the working directory of the executors and driver. But in > client mode, *spark.archives* can only extract the archive file to the > working directory of the executors. > > However, I need *spark.archives* to send the virtual environment tar file > packaged by conda to both the driver and executors under client mode (So that > the executor and the driver have the same python environment). > > Why *spark.archives* does not extract the archive file into the working > directory of the driver under client mode? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org