I think this makes sense -- making it easier to use additional R packages would be a good feature. I am not sure we need Packrat for this use case though. Lets continue discussion on the JIRA at https://issues.apache.org/jira/browse/SPARK-17428
Thanks Shivaram On Tue, Sep 6, 2016 at 11:36 PM, Yanbo Liang <yblia...@gmail.com> wrote: > Hi All, > > > Many users have requirements to use third party R packages in > executors/workers, but SparkR can not satisfy this requirements elegantly. > For example, you should to mess with the IT/administrators of the cluster to > deploy these R packages on each executors/workers node which is very > inflexible. > > I think we should support third party R packages for SparkR users as what we > do for jar packages in the following two scenarios: > 1, Users can install R packages from CRAN or custom CRAN-like repository for > each executors. > 2, Users can load their local R packages and install them on each executors. > > To achieve this goal, the first thing is to make SparkR executors support > virtualenv like Python conda. I have investigated and found > packrat(http://rstudio.github.io/packrat/) is one of the candidates to > support virtualenv for R. Packrat is a dependency management system for R > and can isolate the dependent R packages in its own private package space. > Then SparkR users can install third party packages in the application > scope(destroy after the application exit) and don’t need to bother > IT/administrators to install these packages manually. > > I would like to know whether it make sense. > > > Thanks > > Yanbo --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org