[ 
https://issues.apache.org/jira/browse/SPARK-17428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15475593#comment-15475593
 ] 

Sun Rui commented on SPARK-17428:
---------------------------------

I don't understand the meaning of exact version control. I think a user can 
specify downloaded R packages or specify a package name and version, and SparkR 
can download it from CRAN.

PySpark does not have the compilation issue, as Python code needs no 
complication. The python interpreter abstracts the underly architecture 
differences just as JVM does.

For R package compilation issue, maybe we can have the following polices:
1. For binary R packages, just deliver them to worker nodes;
2. For source R packges:
  2.1 if only R code is contained, complication on the driver node is OK
  2.2 if C/c++ code is contained, by default, compile it on the driver node. 
But we can have an option --compile-on-workers allowing users to choose to 
compile on worker nodes. If the option is specified, users should ensure the 
compiling tool chain be ready on worker nodes.

> SparkR executors/workers support virtualenv
> -------------------------------------------
>
>                 Key: SPARK-17428
>                 URL: https://issues.apache.org/jira/browse/SPARK-17428
>             Project: Spark
>          Issue Type: New Feature
>          Components: SparkR
>            Reporter: Yanbo Liang
>
> Many users have requirements to use third party R packages in 
> executors/workers, but SparkR can not satisfy this requirements elegantly. 
> For example, you should to mess with the IT/administrators of the cluster to 
> deploy these R packages on each executors/workers node which is very 
> inflexible.
> I think we should support third party R packages for SparkR users as what we 
> do for jar packages in the following two scenarios:
> 1, Users can install R packages from CRAN or custom CRAN-like repository for 
> each executors.
> 2, Users can load their local R packages and install them on each executors.
> To achieve this goal, the first thing is to make SparkR executors support 
> virtualenv like Python conda. I have investigated and found 
> packrat(http://rstudio.github.io/packrat/) is one of the candidates to 
> support virtualenv for R. Packrat is a dependency management system for R and 
> can isolate the dependent R packages in its own private package space. Then 
> SparkR users can install third party packages in the application 
> scope(destroy after the application exit) and don’t need to bother 
> IT/administrators to install these packages manually.
> I would like to know whether it make sense.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to