[ 
https://issues.apache.org/jira/browse/SPARK-13587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15714622#comment-15714622
 ] 

Semet commented on SPARK-13587:
-------------------------------

For myself, I share a NFS folder with all the executors. It works because they 
all have the same architecture and distribution.

Frankly, I begin to be a bit disapointed there is no infatuation, no real will 
to solve this huge hole in PySpark. Dependency management has been solved years 
ago in Python with Virtualenv in general and with Anaconda in Data Science, but 
PySpark still continue to play with the PYTHONPATH and there is no Spark core 
developer actively involved to help us integrating such patch. Dependency 
management for JAR are modernly handled by {{--packages}}, automatically 
downloading the files from a remote repository, why not doing that for Python 
as well? And maybe R as well if available? I even proposed a way to package 
everything in a single zip archive, called "wheelhouse", so executors might not 
have to download anything.

So, please help us raising this concern to core developers to tell them that 
there are several persons interested in solving this issue.

> Support virtualenv in PySpark
> -----------------------------
>
>                 Key: SPARK-13587
>                 URL: https://issues.apache.org/jira/browse/SPARK-13587
>             Project: Spark
>          Issue Type: New Feature
>          Components: PySpark
>            Reporter: Jeff Zhang
>
> Currently, it's not easy for user to add third party python packages in 
> pyspark.
> * One way is to using --py-files (suitable for simple dependency, but not 
> suitable for complicated dependency, especially with transitive dependency)
> * Another way is install packages manually on each node (time wasting, and 
> not easy to switch to different environment)
> Python has now 2 different virtualenv implementation. One is native 
> virtualenv another is through conda. This jira is trying to migrate these 2 
> tools to distributed environment



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to