[jira] [Commented] (SPARK-36088) 'spark.archives' does not extract the archive file into the driver under client mode

2021-12-17 Thread jingxiong zhong (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17461405#comment-17461405
 ] 

jingxiong zhong commented on SPARK-36088:
-

@hyukjin.kwon I make an issue at 
https://issues.apache.org/jira/browse/SPARK-37677, I think I can fix it.

> 'spark.archives' does not extract the archive file into the driver under 
> client mode
> 
>
> Key: SPARK-36088
> URL: https://issues.apache.org/jira/browse/SPARK-36088
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes, Spark Submit
>Affects Versions: 3.1.2
>Reporter: rickcheng
>Priority: Major
>
> When running spark in the k8s cluster, there are 2 deploy modes: cluster and 
> client. After my test, in the cluster mode, *spark.archives* can extract the 
> archive file to the working directory of the executors and driver. But in 
> client mode, *spark.archives* can only extract the archive file to the 
> working directory of the executors.
>  
> However, I need *spark.archives* to send the virtual environment tar file 
> packaged by conda to both the driver and executors under client mode (So that 
> the executor and the driver have the same python environment).
>  
> Why *spark.archives* does not extract the archive file into the working 
> directory of the driver under client mode?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36088) 'spark.archives' does not extract the archive file into the driver under client mode

2021-12-17 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17461305#comment-17461305
 ] 

Hyukjin Kwon commented on SPARK-36088:
--

[~rickcheng] can you try a fix? we can maybe try to download the files locally 
when we're in a pod. we can mimic the behaviour of spark.files on driver side.

> 'spark.archives' does not extract the archive file into the driver under 
> client mode
> 
>
> Key: SPARK-36088
> URL: https://issues.apache.org/jira/browse/SPARK-36088
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes, Spark Submit
>Affects Versions: 3.1.2
>Reporter: rickcheng
>Priority: Major
>
> When running spark in the k8s cluster, there are 2 deploy modes: cluster and 
> client. After my test, in the cluster mode, *spark.archives* can extract the 
> archive file to the working directory of the executors and driver. But in 
> client mode, *spark.archives* can only extract the archive file to the 
> working directory of the executors.
>  
> However, I need *spark.archives* to send the virtual environment tar file 
> packaged by conda to both the driver and executors under client mode (So that 
> the executor and the driver have the same python environment).
>  
> Why *spark.archives* does not extract the archive file into the working 
> directory of the driver under client mode?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36088) 'spark.archives' does not extract the archive file into the driver under client mode

2021-12-17 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17461298#comment-17461298
 ] 

Hyukjin Kwon commented on SPARK-36088:
--

[~zhongjingxiong] can you try with tar.gz? zip doesn't keep the permissions by 
default.

> 'spark.archives' does not extract the archive file into the driver under 
> client mode
> 
>
> Key: SPARK-36088
> URL: https://issues.apache.org/jira/browse/SPARK-36088
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes, Spark Submit
>Affects Versions: 3.1.2
>Reporter: rickcheng
>Priority: Major
>
> When running spark in the k8s cluster, there are 2 deploy modes: cluster and 
> client. After my test, in the cluster mode, *spark.archives* can extract the 
> archive file to the working directory of the executors and driver. But in 
> client mode, *spark.archives* can only extract the archive file to the 
> working directory of the executors.
>  
> However, I need *spark.archives* to send the virtual environment tar file 
> packaged by conda to both the driver and executors under client mode (So that 
> the executor and the driver have the same python environment).
>  
> Why *spark.archives* does not extract the archive file into the working 
> directory of the driver under client mode?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36088) 'spark.archives' does not extract the archive file into the driver under client mode

2021-12-16 Thread jingxiong zhong (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17461230#comment-17461230
 ] 

jingxiong zhong commented on SPARK-36088:
-

In cluster mode, I hava another question that when I unzip python3.6.6.zip in 
pod , but no permission to execute, my execute operation as follows:

{code:shell}
spark-submit \
--archives ./python3.6.6.zip#python3.6.6 \
--conf "spark.pyspark.python=python3.6.6/python3.6.6/bin/python3" \
--conf "spark.pyspark.driver.python=python3.6.6/python3.6.6/bin/python3" \
--conf spark.kubernetes.container.image.pullPolicy=Always \
./examples/src/main/python/pi.py 100
{code}


> 'spark.archives' does not extract the archive file into the driver under 
> client mode
> 
>
> Key: SPARK-36088
> URL: https://issues.apache.org/jira/browse/SPARK-36088
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes, Spark Submit
>Affects Versions: 3.1.2
>Reporter: rickcheng
>Priority: Major
>
> When running spark in the k8s cluster, there are 2 deploy modes: cluster and 
> client. After my test, in the cluster mode, *spark.archives* can extract the 
> archive file to the working directory of the executors and driver. But in 
> client mode, *spark.archives* can only extract the archive file to the 
> working directory of the executors.
>  
> However, I need *spark.archives* to send the virtual environment tar file 
> packaged by conda to both the driver and executors under client mode (So that 
> the executor and the driver have the same python environment).
>  
> Why *spark.archives* does not extract the archive file into the working 
> directory of the driver under client mode?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36088) 'spark.archives' does not extract the archive file into the driver under client mode

2021-07-20 Thread rickcheng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17384576#comment-17384576
 ] 

rickcheng commented on SPARK-36088:
---

Hi, [~srowen]

Thanks for the comment. I agree that in client mode, user can access the file 
in some cases. However, my original intention to raise this question was 
because I wanted to distribute the conda packaged environment (a tar.gz file) 
through *spark.archive* and extract it to the driver and executors. In this 
way, the driver and executors will have the same python environment. And in 
K8s, the driver may run in a pod and the tar.gz file may be in a remote place 
(e.g., HDFS). So I think it's also necessary to extract the archive file to the 
driver through spark.archive. Or maybe there is a better way to achieve this 
goal?

> 'spark.archives' does not extract the archive file into the driver under 
> client mode
> 
>
> Key: SPARK-36088
> URL: https://issues.apache.org/jira/browse/SPARK-36088
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes, Spark Submit
>Affects Versions: 3.1.2
>Reporter: rickcheng
>Priority: Major
>
> When running spark in the k8s cluster, there are 2 deploy modes: cluster and 
> client. After my test, in the cluster mode, *spark.archives* can extract the 
> archive file to the working directory of the executors and driver. But in 
> client mode, *spark.archives* can only extract the archive file to the 
> working directory of the executors.
>  
> However, I need *spark.archives* to send the virtual environment tar file 
> packaged by conda to both the driver and executors under client mode (So that 
> the executor and the driver have the same python environment).
>  
> Why *spark.archives* does not extract the archive file into the working 
> directory of the driver under client mode?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36088) 'spark.archives' does not extract the archive file into the driver under client mode

2021-07-20 Thread rickcheng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17384569#comment-17384569
 ] 

rickcheng commented on SPARK-36088:
---

Hi, [~hyukjin.kwon]

Thanks for the comment. After my test, under client mode, the archive file will 
not extract to the driver's working directory no matter if the driver is in the 
pod or not. Thanks for pointing out the code, maybe I will consider making a PR.

> 'spark.archives' does not extract the archive file into the driver under 
> client mode
> 
>
> Key: SPARK-36088
> URL: https://issues.apache.org/jira/browse/SPARK-36088
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes, Spark Submit
>Affects Versions: 3.1.2
>Reporter: rickcheng
>Priority: Major
>
> When running spark in the k8s cluster, there are 2 deploy modes: cluster and 
> client. After my test, in the cluster mode, *spark.archives* can extract the 
> archive file to the working directory of the executors and driver. But in 
> client mode, *spark.archives* can only extract the archive file to the 
> working directory of the executors.
>  
> However, I need *spark.archives* to send the virtual environment tar file 
> packaged by conda to both the driver and executors under client mode (So that 
> the executor and the driver have the same python environment).
>  
> Why *spark.archives* does not extract the archive file into the working 
> directory of the driver under client mode?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36088) 'spark.archives' does not extract the archive file into the driver under client mode

2021-07-20 Thread Sean R. Owen (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17384325#comment-17384325
 ] 

Sean R. Owen commented on SPARK-36088:
--

I think the idea is that in client mode you already have access to the file, 
presumably?

> 'spark.archives' does not extract the archive file into the driver under 
> client mode
> 
>
> Key: SPARK-36088
> URL: https://issues.apache.org/jira/browse/SPARK-36088
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes, Spark Submit
>Affects Versions: 3.1.2
>Reporter: rickcheng
>Priority: Major
>
> When running spark in the k8s cluster, there are 2 deploy modes: cluster and 
> client. After my test, in the cluster mode, *spark.archives* can extract the 
> archive file to the working directory of the executors and driver. But in 
> client mode, *spark.archives* can only extract the archive file to the 
> working directory of the executors.
>  
> However, I need *spark.archives* to send the virtual environment tar file 
> packaged by conda to both the driver and executors under client mode (So that 
> the executor and the driver have the same python environment).
>  
> Why *spark.archives* does not extract the archive file into the working 
> directory of the driver under client mode?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36088) 'spark.archives' does not extract the archive file into the driver under client mode

2021-07-19 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17383131#comment-17383131
 ] 

Hyukjin Kwon commented on SPARK-36088:
--

cc [~dongjoon] and [~holdenkarau] FYI

> 'spark.archives' does not extract the archive file into the driver under 
> client mode
> 
>
> Key: SPARK-36088
> URL: https://issues.apache.org/jira/browse/SPARK-36088
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes, Spark Submit
>Affects Versions: 3.1.2
>Reporter: rickcheng
>Priority: Major
>
> When running spark in the k8s cluster, there are 2 deploy modes: cluster and 
> client. After my test, in the cluster mode, *spark.archives* can extract the 
> archive file to the working directory of the executors and driver. But in 
> client mode, *spark.archives* can only extract the archive file to the 
> working directory of the executors.
>  
> However, I need *spark.archives* to send the virtual environment tar file 
> packaged by conda to both the driver and executors under client mode (So that 
> the executor and the driver have the same python environment).
>  
> Why *spark.archives* does not extract the archive file into the working 
> directory of the driver under client mode?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36088) 'spark.archives' does not extract the archive file into the driver under client mode

2021-07-19 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17383130#comment-17383130
 ] 

Hyukjin Kwon commented on SPARK-36088:
--

You might have to call 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L393-L419
 logic when {{isKubernetesClient}} is on. Are you interested in submitting a PR?

> 'spark.archives' does not extract the archive file into the driver under 
> client mode
> 
>
> Key: SPARK-36088
> URL: https://issues.apache.org/jira/browse/SPARK-36088
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes, Spark Submit
>Affects Versions: 3.1.2
>Reporter: rickcheng
>Priority: Major
>
> When running spark in the k8s cluster, there are 2 deploy modes: cluster and 
> client. After my test, in the cluster mode, *spark.archives* can extract the 
> archive file to the working directory of the executors and driver. But in 
> client mode, *spark.archives* can only extract the archive file to the 
> working directory of the executors.
>  
> However, I need *spark.archives* to send the virtual environment tar file 
> packaged by conda to both the driver and executors under client mode (So that 
> the executor and the driver have the same python environment).
>  
> Why *spark.archives* does not extract the archive file into the working 
> directory of the driver under client mode?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36088) 'spark.archives' does not extract the archive file into the driver under client mode

2021-07-19 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17383128#comment-17383128
 ] 

Hyukjin Kwon commented on SPARK-36088:
--

does your driver run inside a pod or on a physical host?

> 'spark.archives' does not extract the archive file into the driver under 
> client mode
> 
>
> Key: SPARK-36088
> URL: https://issues.apache.org/jira/browse/SPARK-36088
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes, Spark Submit
>Affects Versions: 3.1.2
>Reporter: rickcheng
>Priority: Major
>
> When running spark in the k8s cluster, there are 2 deploy modes: cluster and 
> client. After my test, in the cluster mode, *spark.archives* can extract the 
> archive file to the working directory of the executors and driver. But in 
> client mode, *spark.archives* can only extract the archive file to the 
> working directory of the executors.
>  
> However, I need *spark.archives* to send the virtual environment tar file 
> packaged by conda to both the driver and executors under client mode (So that 
> the executor and the driver have the same python environment).
>  
> Why *spark.archives* does not extract the archive file into the working 
> directory of the driver under client mode?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org