[ https://issues.apache.org/jira/browse/SPARK-13587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16512433#comment-16512433 ]
Matt Mould commented on SPARK-13587: ------------------------------------ What is the current status of this ticket please? This [article|https://community.hortonworks.com/articles/104947/using-virtualenv-with-pyspark.html] suggests that it's done, but the it doesn't work for me with the following command. {code:java} spark-submit --deploy-mode cluster --master yarn --py-files parallelisation_hack-0.1-py2.7.egg --conf spark.pyspark.virtualenv.enabled=true --conf spark.pyspark.virtualenv.type=native --conf spark.pyspark.virtualenv.requirements=requirements.txt --conf spark.pyspark.virtualenv.bin.path=virtualenv --conf spark.pyspark.python=python3 pyspark_poc_runner.py{code} > Support virtualenv in PySpark > ----------------------------- > > Key: SPARK-13587 > URL: https://issues.apache.org/jira/browse/SPARK-13587 > Project: Spark > Issue Type: New Feature > Components: PySpark > Affects Versions: 1.6.3, 2.0.2, 2.1.2, 2.2.1, 2.3.0 > Reporter: Jeff Zhang > Assignee: Jeff Zhang > Priority: Major > > Currently, it's not easy for user to add third party python packages in > pyspark. > * One way is to using --py-files (suitable for simple dependency, but not > suitable for complicated dependency, especially with transitive dependency) > * Another way is install packages manually on each node (time wasting, and > not easy to switch to different environment) > Python has now 2 different virtualenv implementation. One is native > virtualenv another is through conda. This jira is trying to migrate these 2 > tools to distributed environment -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org