Re: Using R code as part of a Spark Application

2016-06-30 Thread Sun Rui
I would guess that the technology behind Azure R Server is about Revolution Enterprise DistributedR/ScaleR. I don’t know the details, but the statement in the “Step 6. Install R packages” section in the given documentation page. However, if you need to install R packages on the worker nodes

Re: Using R code as part of a Spark Application

2016-06-30 Thread sujeet jog
Thanks for the link Sun, I believe running external Scripts like R code in Data Frames is a much needed facility, for example for the algorithms that are not available in MLLIB, invoking such from a R script would definitely be a powerful feature when your APP is Scala/Python based, you don;t

Re: Using R code as part of a Spark Application

2016-06-29 Thread Sun Rui
Hi, Gilad, You can try the dapply() and gapply() function in SparkR in Spark 2.0. Yes, it is required that R installed on each worker node. However, if your Spark application is Scala/Java based, it is not supported for now to run R code in DataFrames. There is closed lira

Re: Using R code as part of a Spark Application

2016-06-29 Thread Xinh Huynh
It looks like it. "DataFrame UDFs in R" is resolved in Spark 2.0: https://issues.apache.org/jira/browse/SPARK-6817 Here's some of the code: https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/r/MapPartitionsRWrapper.scala /** * A function wrapper

Re: Using R code as part of a Spark Application

2016-06-29 Thread Sean Owen
Here we (or certainly I) am not talking about R Server, but plain vanilla R, as used with Spark and SparkR. Currently, SparkR doesn't distribute R code at all (it used to, sort of), so I'm wondering if that is changing back. On Wed, Jun 29, 2016 at 10:53 PM, John Aherne

Re: Using R code as part of a Spark Application

2016-06-29 Thread John Aherne
I don't think R server requires R on the executor nodes. I originally set up a SparkR cluster for our Data Scientist on Azure which required that I install R on each node, but for the R Server set up, there is an extra edge node with R server that they connect to. From what little research I was

Re: Using R code as part of a Spark Application

2016-06-29 Thread Sean Owen
Oh, interesting: does this really mean the return of distributing R code from driver to executors and running it remotely, or do I misunderstand? this would require having R on the executor nodes like it used to? On Wed, Jun 29, 2016 at 5:53 PM, Xinh Huynh wrote: > There is

Re: Using R code as part of a Spark Application

2016-06-29 Thread Jörn Franke
Still you need sparkR > On 29 Jun 2016, at 19:14, John Aherne wrote: > > Microsoft Azure has an option to create a spark cluster with R Server. MS > bought RevoScale (I think that was the name) and just recently deployed it. > >> On Wed, Jun 29, 2016 at 10:53 AM,

Re: Using R code as part of a Spark Application

2016-06-29 Thread John Aherne
Microsoft Azure has an option to create a spark cluster with R Server. MS bought RevoScale (I think that was the name) and just recently deployed it. On Wed, Jun 29, 2016 at 10:53 AM, Xinh Huynh wrote: > There is some new SparkR functionality coming in Spark 2.0, such as >

Re: Using R code as part of a Spark Application

2016-06-29 Thread Xinh Huynh
There is some new SparkR functionality coming in Spark 2.0, such as "dapply". You could use SparkR to load a Parquet file and then run "dapply" to apply a function to each partition of a DataFrame. Info about loading Parquet file:

Re: Using R code as part of a Spark Application

2016-06-29 Thread sujeet jog
try Spark pipeRDD's , you can invoke the R script from pipe , push the stuff you want to do on the Rscript stdin, p On Wed, Jun 29, 2016 at 7:10 PM, Gilad Landau wrote: > Hello, > > > > I want to use R code as part of spark application (the same way I would do >

Using R code as part of a Spark Application

2016-06-29 Thread Gilad Landau
Hello, I want to use R code as part of spark application (the same way I would do with Scala/Python). I want to be able to run an R syntax as a map function on a big Spark dataframe loaded from a parquet file. Is this even possible or the only way to use R is as part of RStudio orchestration