[ https://issues.apache.org/jira/browse/SPARK-17428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15473643#comment-15473643 ]
Yanbo Liang edited comment on SPARK-17428 at 9/8/16 11:46 AM: -------------------------------------------------------------- [~sunrui] [~shivaram] [~felixcheung] Thanks for your reply. Yes, we can compile packages at driver and send them to executors. But it involves some issues: * Usually the Spark job is not run as root, but we need root privilege to install R packages on executors which is not permitted. * After we run a SparkR job, the executors' R libraries will be polluted. And when another job was running on that executor, it may failed due to some conflict. * The architecture of driver and executor may different, so the packages compiled on driver may not work well when it was sending to executors if it dependent on some architecture-related code. These issues can not solved by SparkR currently. I investigated and found packrat can help us on this direction, but may need more experiment and study to verify. If this proposal make sense, I can work on this feature. Please feel free to let me know what you concern about. Thanks! was (Author: yanboliang): [~sunrui] [~shivaram] [~felixcheung] Thanks for your reply. Yes, we can compile packages at driver and send them to executors. But it involves some issues: * Usually the Spark job is not run as root, but we need root privilege to install R packages on executors which is not permitted. * After we run a SparkR job, the executors' R libraries will be polluted. And when another job was running on that executor, it may failed due to some conflict. * The architecture of driver and executor may different, so the packages compiled on driver may not work well when it was sending to executors if it dependent on some architecture-related code. These issues can not solved by SparkR currently. I investigated and found packrat can help us on this direction, but may be need more experiments and study. If this proposal make sense, I can work on this feature. Please feel free to let me know what you concern about. Thanks! > SparkR executors/workers support virtualenv > ------------------------------------------- > > Key: SPARK-17428 > URL: https://issues.apache.org/jira/browse/SPARK-17428 > Project: Spark > Issue Type: New Feature > Components: SparkR > Reporter: Yanbo Liang > > Many users have requirements to use third party R packages in > executors/workers, but SparkR can not satisfy this requirements elegantly. > For example, you should to mess with the IT/administrators of the cluster to > deploy these R packages on each executors/workers node which is very > inflexible. > I think we should support third party R packages for SparkR users as what we > do for jar packages in the following two scenarios: > 1, Users can install R packages from CRAN or custom CRAN-like repository for > each executors. > 2, Users can load their local R packages and install them on each executors. > To achieve this goal, the first thing is to make SparkR executors support > virtualenv like Python conda. I have investigated and found > packrat(http://rstudio.github.io/packrat/) is one of the candidates to > support virtualenv for R. Packrat is a dependency management system for R and > can isolate the dependent R packages in its own private package space. Then > SparkR users can install third party packages in the application > scope(destroy after the application exit) and don’t need to bother > IT/administrators to install these packages manually. > I would like to know whether it make sense. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org