[ https://issues.apache.org/jira/browse/SPARK-13587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173228#comment-15173228 ]
Jeff Zhang edited comment on SPARK-13587 at 3/1/16 5:04 AM: ------------------------------------------------------------ I have implemented POC for this features. Here's oen simple command for how to use virtualenv in pyspark {code} bin/spark-submit --master yarn --deploy-mode client --conf "spark.pyspark.virtualenv.enabled=true" --conf "spark.pyspark.virtualenv.type=conda" --conf "spark.pyspark.virtualenv.requirements=/Users/jzhang/work/virtualenv/conda.txt" --conf "spark.pyspark.virtualenv.path=/Users/jzhang/anaconda/bin/conda" ~/work/virtualenv/spark.py {code} There's 4 properties needs to be set * spark.pyspark.virtualenv.enabled (enable virtualenv) * spark.pyspark.virtualenv.type (default/conda are supported, default is native) * spark.pyspark.virtualenv.requirements (requirement file for the dependencies) * spark.pyspark.virtualenv.path (path to the executable for for virtualenv/conda) Comments and feedback are welcome about how to improve it and whether it's valuable for users. was (Author: zjffdu): I have implemented POC for this features. Here's oen simple command for how to execute use virtualenv in pyspark {code} bin/spark-submit --master yarn --deploy-mode client --conf "spark.pyspark.virtualenv.enabled=true" --conf "spark.pyspark.virtualenv.type=conda" --conf "spark.pyspark.virtualenv.requirements=/Users/jzhang/work/virtualenv/conda.txt" --conf "spark.pyspark.virtualenv.path=/Users/jzhang/anaconda/bin/conda" ~/work/virtualenv/spark.py {code} There's 4 properties needs to be set * spark.pyspark.virtualenv.enabled (enable virtualenv) * spark.pyspark.virtualenv.type (default/conda are supported, default is native) * spark.pyspark.virtualenv.requirements (requirement file for the dependencies) * spark.pyspark.virtualenv.path (path to the executable for for virtualenv/conda) Comments and feedback are welcome about how to improve it and whether it's valuable for users. > Support virtualenv in PySpark > ----------------------------- > > Key: SPARK-13587 > URL: https://issues.apache.org/jira/browse/SPARK-13587 > Project: Spark > Issue Type: Improvement > Components: PySpark > Reporter: Jeff Zhang > > Currently, it's not easy for user to add third party python packages in > pyspark. > * One way is to using --py-files (suitable for simple dependency, but not > suitable for complicated dependency, especially with transitive dependency) > * Another way is install packages manually on each node (time wasting, and > not easy to switch to different environment) > Python has now 2 different virtualenv implementation. One is native > virtualenv another is through conda. This jira is trying to migrate these 2 > tools to distributed environment -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org