[ 
https://issues.apache.org/jira/browse/SPARK-13587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173228#comment-15173228
 ] 

Jeff Zhang edited comment on SPARK-13587 at 3/1/16 5:04 AM:
------------------------------------------------------------

I have implemented POC for this features. Here's oen simple command for how to 
use virtualenv in pyspark

{code}
bin/spark-submit --master yarn --deploy-mode client --conf 
"spark.pyspark.virtualenv.enabled=true" --conf 
"spark.pyspark.virtualenv.type=conda" --conf 
"spark.pyspark.virtualenv.requirements=/Users/jzhang/work/virtualenv/conda.txt" 
--conf "spark.pyspark.virtualenv.path=/Users/jzhang/anaconda/bin/conda"  
~/work/virtualenv/spark.py
{code}

There's 4 properties needs to be set 
* spark.pyspark.virtualenv.enabled    (enable virtualenv)
* spark.pyspark.virtualenv.type  (default/conda are supported, default is 
native)
* spark.pyspark.virtualenv.requirements  (requirement file for the dependencies)
* spark.pyspark.virtualenv.path  (path to the executable for for 
virtualenv/conda)

Comments and feedback are welcome about how to improve it and whether it's 
valuable for users. 


was (Author: zjffdu):
I have implemented POC for this features. Here's oen simple command for how to 
execute use virtualenv in pyspark

{code}
bin/spark-submit --master yarn --deploy-mode client --conf 
"spark.pyspark.virtualenv.enabled=true" --conf 
"spark.pyspark.virtualenv.type=conda" --conf 
"spark.pyspark.virtualenv.requirements=/Users/jzhang/work/virtualenv/conda.txt" 
--conf "spark.pyspark.virtualenv.path=/Users/jzhang/anaconda/bin/conda"  
~/work/virtualenv/spark.py
{code}

There's 4 properties needs to be set 
* spark.pyspark.virtualenv.enabled    (enable virtualenv)
* spark.pyspark.virtualenv.type  (default/conda are supported, default is 
native)
* spark.pyspark.virtualenv.requirements  (requirement file for the dependencies)
* spark.pyspark.virtualenv.path  (path to the executable for for 
virtualenv/conda)

Comments and feedback are welcome about how to improve it and whether it's 
valuable for users. 

> Support virtualenv in PySpark
> -----------------------------
>
>                 Key: SPARK-13587
>                 URL: https://issues.apache.org/jira/browse/SPARK-13587
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark
>            Reporter: Jeff Zhang
>
> Currently, it's not easy for user to add third party python packages in 
> pyspark.
> * One way is to using --py-files (suitable for simple dependency, but not 
> suitable for complicated dependency, especially with transitive dependency)
> * Another way is install packages manually on each node (time wasting, and 
> not easy to switch to different environment)
> Python has now 2 different virtualenv implementation. One is native 
> virtualenv another is through conda. This jira is trying to migrate these 2 
> tools to distributed environment



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to