[ 
https://issues.apache.org/jira/browse/SPARK-32187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17178728#comment-17178728
 ] 

Hyukjin Kwon edited comment on SPARK-32187 at 8/17/20, 5:39 AM:
----------------------------------------------------------------

The draft looks good as a start. A couple of comments from my cursory look:

- Let's make sure having copy-and-pastable examples, and let's try to write de 
facto standard given that there are multiple other sites such as 
[http://alkaline-ml.com/2018-07-02-conda-spark/], 
[https://jcristharif.com/venv-pack/spark.html.|https://jcristharif.com/venv-pack/spark.html].
- Let's place the section about shipping zip, egg and .py files onto the top, 
and place pex and virtual environment on the bottom. Arguably it is more common 
to simply use {{ --py-files}} or {{spark.submit.pyFiles}} configuration to ship 
Python packages.

Let's open a PR and loop with other committers to have more reviews. Shipping 
packages is a bit hairy area and there are many other committers who have a 
better insight than me in particular about other clusters Mesos, Kubernates, 
etc.

As for referencing your own stuff, It looks fine. It's okay to mention things 
as a FYI reference.

{quote}
there is no way to set the archives as a config param when not running on YARN. 
I checked the doc and the spark code. So it seems inconsistent. Can you check 
or confirm ?
{quote}

Yes, I think that's correct up to my knowledge. We can just say it's supported 
on Yarn only for now.

SPARK-13587 was not merged so PySpark does not support yet. Yes, it would not 
be in the doc at least for now.



was (Author: hyukjin.kwon):
The draft looks good as a start. A couple of comments from my cursory look:

- Let's make sure having copy-and-pastable examples, and let's try to write de 
facto standard given that there are multiple other sites such as 
[http://alkaline-ml.com/2018-07-02-conda-spark/], 
[https://jcristharif.com/venv-pack/spark.html.|https://jcristharif.com/venv-pack/spark.html].
- Let's place the section about shipping zip, egg and .py files onto the top, 
and place pex and virtual environment on the bottom. Arguably it is more common 
to simply use {{ --py-files}} or {{spark.submit.pyFiles}} configuration to ship 
Python packages.

Let's open a PR and loop with other committers to have more reviews. Shipping 
packages is a bit hairy area and there are many other committers who have a 
better insight than me in particular about other clusters Mesos, Kubernates, 
etc.

As for referencing your own stuff, It looks fine. It's okay to mention things 
as a FYI reference.

{quote}
there is no way to set the archives as a config param when not running on YARN. 
I checked the doc and the spark code. So it seems inconsistent. Can you check 
or confirm ?
{quote}

Yes, I think that's correct up to my knowledge.

SPARK-13587 was not merged so PySpark does not support yet. Yes, it would not 
be in the doc at least for now.


> User Guide - Shipping Python Package
> ------------------------------------
>
>                 Key: SPARK-32187
>                 URL: https://issues.apache.org/jira/browse/SPARK-32187
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Documentation, PySpark
>    Affects Versions: 3.1.0
>            Reporter: Hyukjin Kwon
>            Priority: Major
>
> - Zipped file
> - Python files
> - PEX \(?\) (see also SPARK-25433)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to