Re: Using R code as part of a Spark Application

2016-06-30 Thread Sun Rui
I would guess that the technology behind Azure R Server is about Revolution 
Enterprise DistributedR/ScaleR. I don’t know the details, but the statement in 
the “Step 6. Install R packages” section in the given documentation page.
However, if you need to install R packages on the worker nodes of the 
cluster, you must use a Script Action.

That implies that R should be installed on each worker node.

> On Jun 30, 2016, at 05:53, John Aherne  > wrote:
> 
> I don't think R server requires R on the executor nodes. I originally set up 
> a SparkR cluster for our Data Scientist on Azure which required that I 
> install R on each node, but for the R Server set up, there is an extra edge 
> node with R server that they connect to. From what little research I was able 
> to do, it seems that there are some special functions in R Server that can 
> distribute the work to the cluster. 
> 
> Documentation is light, and hard to find but I found this helpful:
> https://blogs.msdn.microsoft.com/uk_faculty_connection/2016/05/10/r-server-for-hdinsight-running-on-microsoft-azure-cloud-data-science-challenges/
>  
> 
> 
> 
> 
> On Wed, Jun 29, 2016 at 3:29 PM, Sean Owen  > wrote:
> Oh, interesting: does this really mean the return of distributing R
> code from driver to executors and running it remotely, or do I
> misunderstand? this would require having R on the executor nodes like
> it used to?
> 
> On Wed, Jun 29, 2016 at 5:53 PM, Xinh Huynh  > wrote:
> > There is some new SparkR functionality coming in Spark 2.0, such as
> > "dapply". You could use SparkR to load a Parquet file and then run "dapply"
> > to apply a function to each partition of a DataFrame.
> >
> > Info about loading Parquet file:
> > http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc1-docs/sparkr.html#from-data-sources
> >  
> > 
> >
> > API doc for "dapply":
> > http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc1-docs/api/R/index.html
> >  
> > 
> >
> > Xinh
> >
> > On Wed, Jun 29, 2016 at 6:54 AM, sujeet jog  > > wrote:
> >>
> >> try Spark pipeRDD's , you can invoke the R script from pipe , push  the
> >> stuff you want to do on the Rscript stdin,  p
> >>
> >>
> >> On Wed, Jun 29, 2016 at 7:10 PM, Gilad Landau  >> >
> >> wrote:
> >>>
> >>> Hello,
> >>>
> >>>
> >>>
> >>> I want to use R code as part of spark application (the same way I would
> >>> do with Scala/Python).  I want to be able to run an R syntax as a map
> >>> function on a big Spark dataframe loaded from a parquet file.
> >>>
> >>> Is this even possible or the only way to use R is as part of RStudio
> >>> orchestration of our Spark  cluster?
> >>>
> >>>
> >>>
> >>> Thanks for the help!
> >>>
> >>>
> >>>
> >>> Gilad
> >>>
> >>>
> >>
> >>
> >
> 
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org 
> 
> 
> 
> 
> 
> -- 
> John Aherne
> Big Data and SQL Developer
> 
> 
> Cell:
> Email:
> Skype:
> Web:
> 
> +1 (303) 809-9718
> john.ahe...@justenough.com 
> john.aherne.je 
> www.justenough.com 
> 
> Confidentiality Note: The information contained in this email and document(s) 
> attached are for the exclusive use of the addressee and may contain 
> confidential, privileged and non-disclosable information. If the recipient of 
> this email is not the addressee, such recipient is strictly prohibited from 
> reading, photocopying, distribution or otherwise using this email or its 
> contents in any way.
> 



Re: Using R code as part of a Spark Application

2016-06-30 Thread sujeet jog
Thanks for the link Sun,  I believe running external Scripts like R code in
Data Frames is a much needed facility,  for example for the algorithms that
are not available in MLLIB, invoking such from a R script would definitely
be a powerful feature when your APP is Scala/Python based,  you don;t have
to use Spark-R for this sake when much of your application code is in
Scala/python.

On Thu, Jun 30, 2016 at 8:25 AM, Sun Rui  wrote:

> Hi, Gilad,
>
> You can try the dapply() and gapply() function in SparkR in Spark 2.0.
> Yes, it is required that R installed on each worker node.
>
> However, if your Spark application is Scala/Java based, it is not
> supported for now to run R code in DataFrames. There is closed lira
> https://issues.apache.org/jira/browse/SPARK-14746 which remains
> discussion purpose. You have to convert DataFrames to RDDs, and use pipe()
> on RDDs to launch external R processes and run R code.
>
> On Jun 30, 2016, at 07:08, Xinh Huynh  wrote:
>
> It looks like it. "DataFrame UDFs in R" is resolved in Spark 2.0:
> https://issues.apache.org/jira/browse/SPARK-6817
>
> Here's some of the code:
>
> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/r/MapPartitionsRWrapper.scala
>
> /**
> * A function wrapper that applies the given R function to each partition.
> */
> private[sql] case class MapPartitionsRWrapper(
> func: Array[Byte],
> packageNames: Array[Byte],
> broadcastVars: Array[Broadcast[Object]],
> inputSchema: StructType,
> outputSchema: StructType) extends (Iterator[Any] => Iterator[Any])
>
> Xinh
>
> On Wed, Jun 29, 2016 at 2:59 PM, Sean Owen  wrote:
>
>> Here we (or certainly I) am not talking about R Server, but plain vanilla
>> R, as used with Spark and SparkR. Currently, SparkR doesn't distribute R
>> code at all (it used to, sort of), so I'm wondering if that is changing
>> back.
>>
>> On Wed, Jun 29, 2016 at 10:53 PM, John Aherne > > wrote:
>>
>>> I don't think R server requires R on the executor nodes. I originally
>>> set up a SparkR cluster for our Data Scientist on Azure which required that
>>> I install R on each node, but for the R Server set up, there is an extra
>>> edge node with R server that they connect to. From what little research I
>>> was able to do, it seems that there are some special functions in R Server
>>> that can distribute the work to the cluster.
>>>
>>> Documentation is light, and hard to find but I found this helpful:
>>>
>>> https://blogs.msdn.microsoft.com/uk_faculty_connection/2016/05/10/r-server-for-hdinsight-running-on-microsoft-azure-cloud-data-science-challenges/
>>>
>>>
>>>
>>> On Wed, Jun 29, 2016 at 3:29 PM, Sean Owen  wrote:
>>>
 Oh, interesting: does this really mean the return of distributing R
 code from driver to executors and running it remotely, or do I
 misunderstand? this would require having R on the executor nodes like
 it used to?

 On Wed, Jun 29, 2016 at 5:53 PM, Xinh Huynh 
 wrote:
 > There is some new SparkR functionality coming in Spark 2.0, such as
 > "dapply". You could use SparkR to load a Parquet file and then run
 "dapply"
 > to apply a function to each partition of a DataFrame.
 >
 > Info about loading Parquet file:
 >
 http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc1-docs/sparkr.html#from-data-sources
 >
 > API doc for "dapply":
 >
 http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc1-docs/api/R/index.html
 >
 > Xinh
 >
 > On Wed, Jun 29, 2016 at 6:54 AM, sujeet jog 
 wrote:
 >>
 >> try Spark pipeRDD's , you can invoke the R script from pipe , push
 the
 >> stuff you want to do on the Rscript stdin,  p
 >>
 >>
 >> On Wed, Jun 29, 2016 at 7:10 PM, Gilad Landau <
 gilad.lan...@clicktale.com>
 >> wrote:
 >>>
 >>> Hello,
 >>>
 >>>
 >>>
 >>> I want to use R code as part of spark application (the same way I
 would
 >>> do with Scala/Python).  I want to be able to run an R syntax as a
 map
 >>> function on a big Spark dataframe loaded from a parquet file.
 >>>
 >>> Is this even possible or the only way to use R is as part of RStudio
 >>> orchestration of our Spark  cluster?
 >>>
 >>>
 >>>
 >>> Thanks for the help!
 >>>
 >>>
 >>>
 >>> Gilad
 >>>
 >>>
 >>
 >>
 >

 -
 To unsubscribe e-mail: user-unsubscr...@spark.apache.org


>>>
>>>
>>> --
>>>
>>> John Aherne
>>> Big Data and SQL Developer
>>>
>>> [image: JustEnough Logo]
>>>
>>> Cell:
>>> Email:
>>> Skype:
>>> Web:
>>>
>>> +1 (303) 809-9718
>>> john.ahe...@justenough.com
>>> john.aherne.je
>>> www.justenough.com
>>>
>>>
>>> 

Re: Using R code as part of a Spark Application

2016-06-29 Thread Sun Rui
Hi, Gilad,

You can try the dapply() and gapply() function in SparkR in Spark 2.0. Yes, it 
is required that R installed on each worker node.

However, if your Spark application is Scala/Java based, it is not supported for 
now to run R code in DataFrames. There is closed lira 
https://issues.apache.org/jira/browse/SPARK-14746 which remains discussion 
purpose. You have to convert DataFrames to RDDs, and use pipe() on RDDs to 
launch external R processes and run R code.

> On Jun 30, 2016, at 07:08, Xinh Huynh  wrote:
> 
> It looks like it. "DataFrame UDFs in R" is resolved in Spark 2.0: 
> https://issues.apache.org/jira/browse/SPARK-6817 
> 
> 
> Here's some of the code:
> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/r/MapPartitionsRWrapper.scala
>  
> 
> 
> /**
>  * A function wrapper that applies the given R function to each partition.
>  */
> private[sql] case class MapPartitionsRWrapper(
> func: Array[Byte],
> packageNames: Array[Byte],
> broadcastVars: Array[Broadcast[Object]],
> inputSchema: StructType,
> outputSchema: StructType) extends (Iterator[Any] => Iterator[Any]) 
> 
> Xinh
> 
> On Wed, Jun 29, 2016 at 2:59 PM, Sean Owen  > wrote:
> Here we (or certainly I) am not talking about R Server, but plain vanilla R, 
> as used with Spark and SparkR. Currently, SparkR doesn't distribute R code at 
> all (it used to, sort of), so I'm wondering if that is changing back.
> 
> On Wed, Jun 29, 2016 at 10:53 PM, John Aherne  > wrote:
> I don't think R server requires R on the executor nodes. I originally set up 
> a SparkR cluster for our Data Scientist on Azure which required that I 
> install R on each node, but for the R Server set up, there is an extra edge 
> node with R server that they connect to. From what little research I was able 
> to do, it seems that there are some special functions in R Server that can 
> distribute the work to the cluster. 
> 
> Documentation is light, and hard to find but I found this helpful:
> https://blogs.msdn.microsoft.com/uk_faculty_connection/2016/05/10/r-server-for-hdinsight-running-on-microsoft-azure-cloud-data-science-challenges/
>  
> 
> 
> 
> 
> On Wed, Jun 29, 2016 at 3:29 PM, Sean Owen  > wrote:
> Oh, interesting: does this really mean the return of distributing R
> code from driver to executors and running it remotely, or do I
> misunderstand? this would require having R on the executor nodes like
> it used to?
> 
> On Wed, Jun 29, 2016 at 5:53 PM, Xinh Huynh  > wrote:
> > There is some new SparkR functionality coming in Spark 2.0, such as
> > "dapply". You could use SparkR to load a Parquet file and then run "dapply"
> > to apply a function to each partition of a DataFrame.
> >
> > Info about loading Parquet file:
> > http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc1-docs/sparkr.html#from-data-sources
> >  
> > 
> >
> > API doc for "dapply":
> > http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc1-docs/api/R/index.html
> >  
> > 
> >
> > Xinh
> >
> > On Wed, Jun 29, 2016 at 6:54 AM, sujeet jog  > > wrote:
> >>
> >> try Spark pipeRDD's , you can invoke the R script from pipe , push  the
> >> stuff you want to do on the Rscript stdin,  p
> >>
> >>
> >> On Wed, Jun 29, 2016 at 7:10 PM, Gilad Landau  >> >
> >> wrote:
> >>>
> >>> Hello,
> >>>
> >>>
> >>>
> >>> I want to use R code as part of spark application (the same way I would
> >>> do with Scala/Python).  I want to be able to run an R syntax as a map
> >>> function on a big Spark dataframe loaded from a parquet file.
> >>>
> >>> Is this even possible or the only way to use R is as part of RStudio
> >>> orchestration of our Spark  cluster?
> >>>
> >>>
> >>>
> >>> Thanks for the help!
> >>>
> >>>
> >>>
> >>> Gilad
> >>>
> >>>
> >>
> >>
> >
> 
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org 
> 
> 
> 
> 
> 
> -- 
> John Aherne
> Big Data and SQL Developer
> 
> 
> Cell:
> Email:
> Skype:
> Web:
> 
> +1 (303) 809-9718 

Re: Using R code as part of a Spark Application

2016-06-29 Thread Xinh Huynh
It looks like it. "DataFrame UDFs in R" is resolved in Spark 2.0:
https://issues.apache.org/jira/browse/SPARK-6817

Here's some of the code:
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/r/MapPartitionsRWrapper.scala

/**
* A function wrapper that applies the given R function to each partition.
*/
private[sql] case class MapPartitionsRWrapper(
func: Array[Byte],
packageNames: Array[Byte],
broadcastVars: Array[Broadcast[Object]],
inputSchema: StructType,
outputSchema: StructType) extends (Iterator[Any] => Iterator[Any])

Xinh

On Wed, Jun 29, 2016 at 2:59 PM, Sean Owen  wrote:

> Here we (or certainly I) am not talking about R Server, but plain vanilla
> R, as used with Spark and SparkR. Currently, SparkR doesn't distribute R
> code at all (it used to, sort of), so I'm wondering if that is changing
> back.
>
> On Wed, Jun 29, 2016 at 10:53 PM, John Aherne 
> wrote:
>
>> I don't think R server requires R on the executor nodes. I originally set
>> up a SparkR cluster for our Data Scientist on Azure which required that I
>> install R on each node, but for the R Server set up, there is an extra edge
>> node with R server that they connect to. From what little research I was
>> able to do, it seems that there are some special functions in R Server that
>> can distribute the work to the cluster.
>>
>> Documentation is light, and hard to find but I found this helpful:
>>
>> https://blogs.msdn.microsoft.com/uk_faculty_connection/2016/05/10/r-server-for-hdinsight-running-on-microsoft-azure-cloud-data-science-challenges/
>>
>>
>>
>> On Wed, Jun 29, 2016 at 3:29 PM, Sean Owen  wrote:
>>
>>> Oh, interesting: does this really mean the return of distributing R
>>> code from driver to executors and running it remotely, or do I
>>> misunderstand? this would require having R on the executor nodes like
>>> it used to?
>>>
>>> On Wed, Jun 29, 2016 at 5:53 PM, Xinh Huynh 
>>> wrote:
>>> > There is some new SparkR functionality coming in Spark 2.0, such as
>>> > "dapply". You could use SparkR to load a Parquet file and then run
>>> "dapply"
>>> > to apply a function to each partition of a DataFrame.
>>> >
>>> > Info about loading Parquet file:
>>> >
>>> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc1-docs/sparkr.html#from-data-sources
>>> >
>>> > API doc for "dapply":
>>> >
>>> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc1-docs/api/R/index.html
>>> >
>>> > Xinh
>>> >
>>> > On Wed, Jun 29, 2016 at 6:54 AM, sujeet jog 
>>> wrote:
>>> >>
>>> >> try Spark pipeRDD's , you can invoke the R script from pipe , push
>>> the
>>> >> stuff you want to do on the Rscript stdin,  p
>>> >>
>>> >>
>>> >> On Wed, Jun 29, 2016 at 7:10 PM, Gilad Landau <
>>> gilad.lan...@clicktale.com>
>>> >> wrote:
>>> >>>
>>> >>> Hello,
>>> >>>
>>> >>>
>>> >>>
>>> >>> I want to use R code as part of spark application (the same way I
>>> would
>>> >>> do with Scala/Python).  I want to be able to run an R syntax as a map
>>> >>> function on a big Spark dataframe loaded from a parquet file.
>>> >>>
>>> >>> Is this even possible or the only way to use R is as part of RStudio
>>> >>> orchestration of our Spark  cluster?
>>> >>>
>>> >>>
>>> >>>
>>> >>> Thanks for the help!
>>> >>>
>>> >>>
>>> >>>
>>> >>> Gilad
>>> >>>
>>> >>>
>>> >>
>>> >>
>>> >
>>>
>>> -
>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>>
>>>
>>
>>
>> --
>>
>> John Aherne
>> Big Data and SQL Developer
>>
>> [image: JustEnough Logo]
>>
>> Cell:
>> Email:
>> Skype:
>> Web:
>>
>> +1 (303) 809-9718
>> john.ahe...@justenough.com
>> john.aherne.je
>> www.justenough.com
>>
>>
>> Confidentiality Note: The information contained in this email and 
>> document(s) attached are for the exclusive use of the addressee and may 
>> contain confidential, privileged and non-disclosable information. If the 
>> recipient of this email is not the addressee, such recipient is strictly 
>> prohibited from reading, photocopying, distribution or otherwise using this 
>> email or its contents in any way.
>>
>>
>


Re: Using R code as part of a Spark Application

2016-06-29 Thread Sean Owen
Here we (or certainly I) am not talking about R Server, but plain vanilla
R, as used with Spark and SparkR. Currently, SparkR doesn't distribute R
code at all (it used to, sort of), so I'm wondering if that is changing
back.

On Wed, Jun 29, 2016 at 10:53 PM, John Aherne 
wrote:

> I don't think R server requires R on the executor nodes. I originally set
> up a SparkR cluster for our Data Scientist on Azure which required that I
> install R on each node, but for the R Server set up, there is an extra edge
> node with R server that they connect to. From what little research I was
> able to do, it seems that there are some special functions in R Server that
> can distribute the work to the cluster.
>
> Documentation is light, and hard to find but I found this helpful:
>
> https://blogs.msdn.microsoft.com/uk_faculty_connection/2016/05/10/r-server-for-hdinsight-running-on-microsoft-azure-cloud-data-science-challenges/
>
>
>
> On Wed, Jun 29, 2016 at 3:29 PM, Sean Owen  wrote:
>
>> Oh, interesting: does this really mean the return of distributing R
>> code from driver to executors and running it remotely, or do I
>> misunderstand? this would require having R on the executor nodes like
>> it used to?
>>
>> On Wed, Jun 29, 2016 at 5:53 PM, Xinh Huynh  wrote:
>> > There is some new SparkR functionality coming in Spark 2.0, such as
>> > "dapply". You could use SparkR to load a Parquet file and then run
>> "dapply"
>> > to apply a function to each partition of a DataFrame.
>> >
>> > Info about loading Parquet file:
>> >
>> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc1-docs/sparkr.html#from-data-sources
>> >
>> > API doc for "dapply":
>> >
>> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc1-docs/api/R/index.html
>> >
>> > Xinh
>> >
>> > On Wed, Jun 29, 2016 at 6:54 AM, sujeet jog 
>> wrote:
>> >>
>> >> try Spark pipeRDD's , you can invoke the R script from pipe , push  the
>> >> stuff you want to do on the Rscript stdin,  p
>> >>
>> >>
>> >> On Wed, Jun 29, 2016 at 7:10 PM, Gilad Landau <
>> gilad.lan...@clicktale.com>
>> >> wrote:
>> >>>
>> >>> Hello,
>> >>>
>> >>>
>> >>>
>> >>> I want to use R code as part of spark application (the same way I
>> would
>> >>> do with Scala/Python).  I want to be able to run an R syntax as a map
>> >>> function on a big Spark dataframe loaded from a parquet file.
>> >>>
>> >>> Is this even possible or the only way to use R is as part of RStudio
>> >>> orchestration of our Spark  cluster?
>> >>>
>> >>>
>> >>>
>> >>> Thanks for the help!
>> >>>
>> >>>
>> >>>
>> >>> Gilad
>> >>>
>> >>>
>> >>
>> >>
>> >
>>
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>
>
>
> --
>
> John Aherne
> Big Data and SQL Developer
>
> [image: JustEnough Logo]
>
> Cell:
> Email:
> Skype:
> Web:
>
> +1 (303) 809-9718
> john.ahe...@justenough.com
> john.aherne.je
> www.justenough.com
>
>
> Confidentiality Note: The information contained in this email and document(s) 
> attached are for the exclusive use of the addressee and may contain 
> confidential, privileged and non-disclosable information. If the recipient of 
> this email is not the addressee, such recipient is strictly prohibited from 
> reading, photocopying, distribution or otherwise using this email or its 
> contents in any way.
>
>


Re: Using R code as part of a Spark Application

2016-06-29 Thread John Aherne
I don't think R server requires R on the executor nodes. I originally set
up a SparkR cluster for our Data Scientist on Azure which required that I
install R on each node, but for the R Server set up, there is an extra edge
node with R server that they connect to. From what little research I was
able to do, it seems that there are some special functions in R Server that
can distribute the work to the cluster.

Documentation is light, and hard to find but I found this helpful:
https://blogs.msdn.microsoft.com/uk_faculty_connection/2016/05/10/r-server-for-hdinsight-running-on-microsoft-azure-cloud-data-science-challenges/



On Wed, Jun 29, 2016 at 3:29 PM, Sean Owen  wrote:

> Oh, interesting: does this really mean the return of distributing R
> code from driver to executors and running it remotely, or do I
> misunderstand? this would require having R on the executor nodes like
> it used to?
>
> On Wed, Jun 29, 2016 at 5:53 PM, Xinh Huynh  wrote:
> > There is some new SparkR functionality coming in Spark 2.0, such as
> > "dapply". You could use SparkR to load a Parquet file and then run
> "dapply"
> > to apply a function to each partition of a DataFrame.
> >
> > Info about loading Parquet file:
> >
> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc1-docs/sparkr.html#from-data-sources
> >
> > API doc for "dapply":
> >
> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc1-docs/api/R/index.html
> >
> > Xinh
> >
> > On Wed, Jun 29, 2016 at 6:54 AM, sujeet jog 
> wrote:
> >>
> >> try Spark pipeRDD's , you can invoke the R script from pipe , push  the
> >> stuff you want to do on the Rscript stdin,  p
> >>
> >>
> >> On Wed, Jun 29, 2016 at 7:10 PM, Gilad Landau <
> gilad.lan...@clicktale.com>
> >> wrote:
> >>>
> >>> Hello,
> >>>
> >>>
> >>>
> >>> I want to use R code as part of spark application (the same way I would
> >>> do with Scala/Python).  I want to be able to run an R syntax as a map
> >>> function on a big Spark dataframe loaded from a parquet file.
> >>>
> >>> Is this even possible or the only way to use R is as part of RStudio
> >>> orchestration of our Spark  cluster?
> >>>
> >>>
> >>>
> >>> Thanks for the help!
> >>>
> >>>
> >>>
> >>> Gilad
> >>>
> >>>
> >>
> >>
> >
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


-- 

John Aherne
Big Data and SQL Developer

[image: JustEnough Logo]

Cell:
Email:
Skype:
Web:

+1 (303) 809-9718
john.ahe...@justenough.com
john.aherne.je
www.justenough.com


Confidentiality Note: The information contained in this email and
document(s) attached are for the exclusive use of the addressee and
may contain confidential, privileged and non-disclosable information.
If the recipient of this email is not the addressee, such recipient is
strictly prohibited from reading, photocopying, distribution or
otherwise using this email or its contents in any way.


Re: Using R code as part of a Spark Application

2016-06-29 Thread Sean Owen
Oh, interesting: does this really mean the return of distributing R
code from driver to executors and running it remotely, or do I
misunderstand? this would require having R on the executor nodes like
it used to?

On Wed, Jun 29, 2016 at 5:53 PM, Xinh Huynh  wrote:
> There is some new SparkR functionality coming in Spark 2.0, such as
> "dapply". You could use SparkR to load a Parquet file and then run "dapply"
> to apply a function to each partition of a DataFrame.
>
> Info about loading Parquet file:
> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc1-docs/sparkr.html#from-data-sources
>
> API doc for "dapply":
> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc1-docs/api/R/index.html
>
> Xinh
>
> On Wed, Jun 29, 2016 at 6:54 AM, sujeet jog  wrote:
>>
>> try Spark pipeRDD's , you can invoke the R script from pipe , push  the
>> stuff you want to do on the Rscript stdin,  p
>>
>>
>> On Wed, Jun 29, 2016 at 7:10 PM, Gilad Landau 
>> wrote:
>>>
>>> Hello,
>>>
>>>
>>>
>>> I want to use R code as part of spark application (the same way I would
>>> do with Scala/Python).  I want to be able to run an R syntax as a map
>>> function on a big Spark dataframe loaded from a parquet file.
>>>
>>> Is this even possible or the only way to use R is as part of RStudio
>>> orchestration of our Spark  cluster?
>>>
>>>
>>>
>>> Thanks for the help!
>>>
>>>
>>>
>>> Gilad
>>>
>>>
>>
>>
>

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Using R code as part of a Spark Application

2016-06-29 Thread Jörn Franke
Still you need sparkR
> On 29 Jun 2016, at 19:14, John Aherne  wrote:
> 
> Microsoft Azure has an option to create a spark cluster with R Server. MS 
> bought RevoScale (I think that was the name) and just recently deployed it. 
> 
>> On Wed, Jun 29, 2016 at 10:53 AM, Xinh Huynh  wrote:
>> There is some new SparkR functionality coming in Spark 2.0, such as 
>> "dapply". You could use SparkR to load a Parquet file and then run "dapply" 
>> to apply a function to each partition of a DataFrame.
>> 
>> Info about loading Parquet file:
>> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc1-docs/sparkr.html#from-data-sources
>> 
>> API doc for "dapply":
>> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc1-docs/api/R/index.html
>> 
>> Xinh
>> 
>>> On Wed, Jun 29, 2016 at 6:54 AM, sujeet jog  wrote:
>>> try Spark pipeRDD's , you can invoke the R script from pipe , push  the 
>>> stuff you want to do on the Rscript stdin,  p 
>>> 
>>> 
 On Wed, Jun 29, 2016 at 7:10 PM, Gilad Landau  
 wrote:
 Hello,
 
  
 
 I want to use R code as part of spark application (the same way I would do 
 with Scala/Python).  I want to be able to run an R syntax as a map 
 function on a big Spark dataframe loaded from a parquet file.
 
 Is this even possible or the only way to use R is as part of RStudio 
 orchestration of our Spark  cluster?
 
  
 
 Thanks for the help!
 
  
 
 Gilad
 
> 
> 
> 
> -- 
> John Aherne
> Big Data and SQL Developer
> 
> 
> 
> Cell:
> Email:
> Skype:
> Web:
> 
> +1 (303) 809-9718
> john.ahe...@justenough.com
> john.aherne.je
> www.justenough.com
> 
> 
> Confidentiality Note: The information contained in this email and document(s) 
> attached are for the exclusive use of the addressee and may contain 
> confidential, privileged and non-disclosable information. If the recipient of 
> this email is not the addressee, such recipient is strictly prohibited from 
> reading, photocopying, distribution or otherwise using this email or its 
> contents in any way.


Re: Using R code as part of a Spark Application

2016-06-29 Thread John Aherne
Microsoft Azure has an option to create a spark cluster with R Server. MS
bought RevoScale (I think that was the name) and just recently deployed it.

On Wed, Jun 29, 2016 at 10:53 AM, Xinh Huynh  wrote:

> There is some new SparkR functionality coming in Spark 2.0, such as
> "dapply". You could use SparkR to load a Parquet file and then run "dapply"
> to apply a function to each partition of a DataFrame.
>
> Info about loading Parquet file:
>
> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc1-docs/sparkr.html#from-data-sources
>
> API doc for "dapply":
>
> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc1-docs/api/R/index.html
>
> Xinh
>
> On Wed, Jun 29, 2016 at 6:54 AM, sujeet jog  wrote:
>
>> try Spark pipeRDD's , you can invoke the R script from pipe , push  the
>> stuff you want to do on the Rscript stdin,  p
>>
>>
>> On Wed, Jun 29, 2016 at 7:10 PM, Gilad Landau > > wrote:
>>
>>> Hello,
>>>
>>>
>>>
>>> I want to use R code as part of spark application (the same way I would
>>> do with Scala/Python).  I want to be able to run an R syntax as a map
>>> function on a big Spark dataframe loaded from a parquet file.
>>>
>>> Is this even possible or the only way to use R is as part of RStudio
>>> orchestration of our Spark  cluster?
>>>
>>>
>>>
>>> Thanks for the help!
>>>
>>>
>>>
>>> Gilad
>>>
>>>
>>>
>>
>>
>


-- 

John Aherne
Big Data and SQL Developer

[image: JustEnough Logo]

Cell:
Email:
Skype:
Web:

+1 (303) 809-9718
john.ahe...@justenough.com
john.aherne.je
www.justenough.com


Confidentiality Note: The information contained in this email and
document(s) attached are for the exclusive use of the addressee and
may contain confidential, privileged and non-disclosable information.
If the recipient of this email is not the addressee, such recipient is
strictly prohibited from reading, photocopying, distribution or
otherwise using this email or its contents in any way.


Re: Using R code as part of a Spark Application

2016-06-29 Thread Xinh Huynh
There is some new SparkR functionality coming in Spark 2.0, such as
"dapply". You could use SparkR to load a Parquet file and then run "dapply"
to apply a function to each partition of a DataFrame.

Info about loading Parquet file:
http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc1-docs/sparkr.html#from-data-sources

API doc for "dapply":
http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc1-docs/api/R/index.html

Xinh

On Wed, Jun 29, 2016 at 6:54 AM, sujeet jog  wrote:

> try Spark pipeRDD's , you can invoke the R script from pipe , push  the
> stuff you want to do on the Rscript stdin,  p
>
>
> On Wed, Jun 29, 2016 at 7:10 PM, Gilad Landau 
> wrote:
>
>> Hello,
>>
>>
>>
>> I want to use R code as part of spark application (the same way I would
>> do with Scala/Python).  I want to be able to run an R syntax as a map
>> function on a big Spark dataframe loaded from a parquet file.
>>
>> Is this even possible or the only way to use R is as part of RStudio
>> orchestration of our Spark  cluster?
>>
>>
>>
>> Thanks for the help!
>>
>>
>>
>> Gilad
>>
>>
>>
>
>


Re: Using R code as part of a Spark Application

2016-06-29 Thread sujeet jog
try Spark pipeRDD's , you can invoke the R script from pipe , push  the
stuff you want to do on the Rscript stdin,  p


On Wed, Jun 29, 2016 at 7:10 PM, Gilad Landau 
wrote:

> Hello,
>
>
>
> I want to use R code as part of spark application (the same way I would do
> with Scala/Python).  I want to be able to run an R syntax as a map function
> on a big Spark dataframe loaded from a parquet file.
>
> Is this even possible or the only way to use R is as part of RStudio
> orchestration of our Spark  cluster?
>
>
>
> Thanks for the help!
>
>
>
> Gilad
>
>
>


Using R code as part of a Spark Application

2016-06-29 Thread Gilad Landau
Hello,

I want to use R code as part of spark application (the same way I would do with 
Scala/Python).  I want to be able to run an R syntax as a map function on a big 
Spark dataframe loaded from a parquet file.
Is this even possible or the only way to use R is as part of RStudio 
orchestration of our Spark  cluster?

Thanks for the help!

Gilad