Requiring users to download entire Spark distribution to connect to a remote cluster (which is already running Spark) seems an over kill. Even for most spark users who download Spark source, it is very unintuitive that they need to run a script named "install-dev.sh" before they can run SparkR.
--Hossein On Wed, Sep 23, 2015 at 7:28 PM, Sun, Rui <rui....@intel.com> wrote: > SparkR package is not a standalone R package, as it is actually R API of > Spark and needs to co-operate with a matching version of Spark, so exposing > it in CRAN does not ease use of R users as they need to download matching > Spark distribution, unless we expose a bundled SparkR package to CRAN > (packageing with Spark), is this desirable? Actually, for normal users who > are not developers, they are not required to download Spark source, build > and install SparkR package. They just need to download a Spark > distribution, and then use SparkR. > > > > For using SparkR in Rstudio, there is a documentation at > https://github.com/apache/spark/tree/master/R > > > > > > > > *From:* Hossein [mailto:fal...@gmail.com] > *Sent:* Thursday, September 24, 2015 1:42 AM > *To:* shiva...@eecs.berkeley.edu > *Cc:* Sun, Rui; dev@spark.apache.org > *Subject:* Re: SparkR package path > > > > Yes, I think exposing SparkR in CRAN can significantly expand the reach of > both SparkR and Spark itself to a larger community of data scientists (and > statisticians). > > > > I have been getting questions on how to use SparkR in RStudio. Most of > these folks have a Spark Cluster and wish to talk to it from RStudio. While > that is a bigger task, for now, first step could be not requiring them to > download Spark source and run a script that is named install-dev.sh. I > filed SPARK-10776 to track this. > > > > > --Hossein > > > > On Tue, Sep 22, 2015 at 7:21 PM, Shivaram Venkataraman < > shiva...@eecs.berkeley.edu> wrote: > > As Rui says it would be good to understand the use case we want to > support (supporting CRAN installs could be one for example). I don't > think it should be very hard to do as the RBackend itself doesn't use > the R source files. The RRDD does use it and the value comes from > > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/api/r/RUtils.scala#L29 > AFAIK -- So we could introduce a new config flag that can be used for > this new mode. > > Thanks > Shivaram > > > On Mon, Sep 21, 2015 at 8:15 PM, Sun, Rui <rui....@intel.com> wrote: > > Hossein, > > > > > > > > Any strong reason to download and install SparkR source package > separately > > from the Spark distribution? > > > > An R user can simply download the spark distribution, which contains > SparkR > > source and binary package, and directly use sparkR. No need to install > > SparkR package at all. > > > > > > > > From: Hossein [mailto:fal...@gmail.com] > > Sent: Tuesday, September 22, 2015 9:19 AM > > To: dev@spark.apache.org > > Subject: SparkR package path > > > > > > > > Hi dev list, > > > > > > > > SparkR backend assumes SparkR source files are located under > > "SPARK_HOME/R/lib/." This directory is created by running > R/install-dev.sh. > > This setting makes sense for Spark developers, but if an R user downloads > > and installs SparkR source package, the source files are going to be in > > placed different locations. > > > > > > > > In the R runtime it is easy to find location of package files using > > path.package("SparkR"). But we need to make some changes to R backend > and/or > > spark-submit so that, JVM process learns the location of worker.R and > > daemon.R and shell.R from the R runtime. > > > > > > > > Do you think this change is feasible? > > > > > > > > Thanks, > > > > --Hossein > > >