Re: ROSE: Spark + R on the JVM.
Hi Richard, Thanks for providing the background on your application. > the user types or copy-pastes his R code, > the system should then send this R code (using ROSE) to R Unfortunately this type of ad hoc R analysis is not supported. ROSE supports the execution of any R function or script within an existing R package on CRAN, Bioc, or github. It does not support the direct execution of arbitrary blocks of R code as you described. You may want to look at [DeployR](http://deployr.revolutionanalytics.com/), it's an open source R integration server that provides APIs in Java, JavaScript and .NET that can easily support your use case. The outputs of your DeployR integration could then become inputs to your data processing system. David "All that is gold does not glitter, Not all those who wander are lost." Original Message ---- Subject: Re: ROSE: Spark + R on the JVM. Local Time: January 13 2016 3:18 am UTC Time: January 13 2016 8:18 am From: rsiebel...@gmail.com To: themarchoffo...@protonmail.com CC: m...@vijaykiran.com,cjno...@gmail.com,user@spark.apache.org,d...@spark.apache.org Hi David, the use case is that we're building a data processing system with an intuitive user interface where Spark is used as the data processing framework. We would like to provide a HTML user interface to R where the user types or copy-pastes his R code, the system should then send this R code (using ROSE) to R, process it and give the results back to the user. The RDD would be used so that the data can be further processed by the system but we would like to also show or be able to show the messages printed to STDOUT and also the images (plots) that are generated by R. The plots seems to be available in the OpenCPU API, see below Inline image 1 So the case is not that we're trying to process millions of images but rather that we would like to show the generated plots (like a regression plot) that's generated in R to the user. There could be several plots generated by the code, but certainly not thousands or even hundreds, only a few. Hope that this would be possible using ROSE because it seems a really good fit, thanks in advance, Richard On Wed, Jan 13, 2016 at 3:39 AM, David Russell wrote: Hi Richard, > Would it be possible to access the session API from within ROSE, > to get for example the images that are generated by R / openCPU Technically it would be possible although there would be some potentially significant runtime costs per task in doing so, primarily those related to extracting image data from the R session, serializing and then moving that data across the cluster for each and every image. From a design perspective ROSE was intended to be used within Spark scale applications where R object data was seen as the primary task output. An output in a format that could be rapidly serialized and easily processed. Are there real world use cases where Spark scale applications capable of generating 10k, 100k, or even millions of image files would actually need to capture and store images? If so, how practically speaking, would these images ever be used? I'm just not sure. Maybe you could describe your own use case to provide some insights? > and the logging to stdout that is logged by R? If you are referring to the R console output (generated within the R session during the execution of an OCPUTask) then this data could certainly (optionally) be captured and returned on an OCPUResult. Again, can you provide any details for how you might use this console output in a real world application? As an aside, for simple standalone Spark applications that will only ever run on a single host (no cluster) you could consider using an alternative library called fluent-r. This library is also available under my GitHub repo, [see here](https://github.com/onetapbeyond/fluent-r). The fluent-r library already has support for the retrieval of R objects, R console output and R graphics device image/plots. However it is not as lightweight as ROSE and it not designed to work in a clustered environment. ROSE on the other hand is designed for scale. David "All that is gold does not glitter, Not all those who wander are lost." -------- Original Message Subject: Re: ROSE: Spark + R on the JVM. Local Time: January 12 2016 6:56 pm UTC Time: January 12 2016 11:56 pm From: rsiebel...@gmail.com To: m...@vijaykiran.com CC: cjno...@gmail.com,themarchoffo...@protonmail.com,user@spark.apache.org,d...@spark.apache.org Hi, this looks great and seems to be very usable. Would it be possible to access the session API from within ROSE, to get for example the images that are generated by R / openCPU and the logging to stdout that is logged by R? thanks in advance, Richard On Tue, Jan 12, 2016 at 10:16 PM, Vijay Kiran wrote: I think it would be this: https://github.com/onetapbeyond/opencpu-sp
Re: ROSE: Spark + R on the JVM.
Hi David, the use case is that we're building a data processing system with an intuitive user interface where Spark is used as the data processing framework. We would like to provide a HTML user interface to R where the user types or copy-pastes his R code, the system should then send this R code (using ROSE) to R, process it and give the results back to the user. The RDD would be used so that the data can be further processed by the system but we would like to also show or be able to show the messages printed to STDOUT and also the images (plots) that are generated by R. The plots seems to be available in the OpenCPU API, see below [image: Inline image 1] So the case is not that we're trying to process millions of images but rather that we would like to show the generated plots (like a regression plot) that's generated in R to the user. There could be several plots generated by the code, but certainly not thousands or even hundreds, only a few. Hope that this would be possible using ROSE because it seems a really good fit, thanks in advance, Richard On Wed, Jan 13, 2016 at 3:39 AM, David Russell < themarchoffo...@protonmail.com> wrote: > Hi Richard, > > > Would it be possible to access the session API from within ROSE, > > to get for example the images that are generated by R / openCPU > > Technically it would be possible although there would be some potentially > significant runtime costs per task in doing so, primarily those related to > extracting image data from the R session, serializing and then moving that > data across the cluster for each and every image. > > From a design perspective ROSE was intended to be used within Spark scale > applications where R object data was seen as the primary task output. An > output in a format that could be rapidly serialized and easily processed. > Are there real world use cases where Spark scale applications capable of > generating 10k, 100k, or even millions of image files would actually > need to capture and store images? If so, how practically speaking, would > these images ever be used? I'm just not sure. Maybe you could describe your > own use case to provide some insights? > > > and the logging to stdout that is logged by R? > > If you are referring to the R console output (generated within the R > session during the execution of an OCPUTask) then this data could certainly > (optionally) be captured and returned on an OCPUResult. Again, can you > provide any details for how you might use this console output in a real > world application? > > As an aside, for simple standalone Spark applications that will only ever > run on a single host (no cluster) you could consider using an alternative > library called *fluent-r*. This library is also available under my GitHub > repo, see here <https://github.com/onetapbeyond/fluent-r>. The fluent-r > library already has support for the retrieval of R objects, R console > output and R graphics device image/plots. However it is not as lightweight > as ROSE and it not designed to work in a clustered environment. ROSE on the > other hand is designed for scale. > > David > > "All that is gold does not glitter, Not all those who wander are lost." > > > Original Message > Subject: Re: ROSE: Spark + R on the JVM. > Local Time: January 12 2016 6:56 pm > UTC Time: January 12 2016 11:56 pm > From: rsiebel...@gmail.com > To: m...@vijaykiran.com > CC: cjno...@gmail.com,themarchoffo...@protonmail.com,user@spark.apache.org > ,d...@spark.apache.org > > Hi, > > this looks great and seems to be very usable. > Would it be possible to access the session API from within ROSE, to get > for example the images that are generated by R / openCPU and the logging to > stdout that is logged by R? > > thanks in advance, > Richard > > On Tue, Jan 12, 2016 at 10:16 PM, Vijay Kiran wrote: > >> I think it would be this: >> https://github.com/onetapbeyond/opencpu-spark-executor >> >> > On 12 Jan 2016, at 18:32, Corey Nolet wrote: >> > >> > David, >> > >> > Thank you very much for announcing this! It looks like it could be very >> useful. Would you mind providing a link to the github? >> > >> > On Tue, Jan 12, 2016 at 10:03 AM, David >> wrote: >> > Hi all, >> > >> > I'd like to share news of the recent release of a new Spark package, >> ROSE. >> > >> > ROSE is a Scala library offering access to the full scientific >> computing power of the R programming language to Apache Spark batch and >> streaming applications on the JVM. Where Apache SparkR lets data scientists >> use Spark from R, ROSE is designed to let Scala and Ja
Re: ROSE: Spark + R on the JVM.
Hi Richard, > Would it be possible to access the session API from within ROSE, > to get for example the images that are generated by R / openCPU Technically it would be possible although there would be some potentially significant runtime costs per task in doing so, primarily those related to extracting image data from the R session, serializing and then moving that data across the cluster for each and every image. From a design perspective ROSE was intended to be used within Spark scale applications where R object data was seen as the primary task output. An output in a format that could be rapidly serialized and easily processed. Are there real world use cases where Spark scale applications capable of generating 10k, 100k, or even millions of image files would actually need to capture and store images? If so, how practically speaking, would these images ever be used? I'm just not sure. Maybe you could describe your own use case to provide some insights? > and the logging to stdout that is logged by R? If you are referring to the R console output (generated within the R session during the execution of an OCPUTask) then this data could certainly (optionally) be captured and returned on an OCPUResult. Again, can you provide any details for how you might use this console output in a real world application? As an aside, for simple standalone Spark applications that will only ever run on a single host (no cluster) you could consider using an alternative library called fluent-r. This library is also available under my GitHub repo, [see here](https://github.com/onetapbeyond/fluent-r). The fluent-r library already has support for the retrieval of R objects, R console output and R graphics device image/plots. However it is not as lightweight as ROSE and it not designed to work in a clustered environment. ROSE on the other hand is designed for scale. David "All that is gold does not glitter, Not all those who wander are lost." Original Message ---- Subject: Re: ROSE: Spark + R on the JVM. Local Time: January 12 2016 6:56 pm UTC Time: January 12 2016 11:56 pm From: rsiebel...@gmail.com To: m...@vijaykiran.com CC: cjno...@gmail.com,themarchoffo...@protonmail.com,user@spark.apache.org,d...@spark.apache.org Hi, this looks great and seems to be very usable. Would it be possible to access the session API from within ROSE, to get for example the images that are generated by R / openCPU and the logging to stdout that is logged by R? thanks in advance, Richard On Tue, Jan 12, 2016 at 10:16 PM, Vijay Kiran wrote: I think it would be this: https://github.com/onetapbeyond/opencpu-spark-executor > On 12 Jan 2016, at 18:32, Corey Nolet wrote: > > David, > > Thank you very much for announcing this! It looks like it could be very > useful. Would you mind providing a link to the github? > > On Tue, Jan 12, 2016 at 10:03 AM, David > wrote: > Hi all, > > I'd like to share news of the recent release of a new Spark package, ROSE. > > ROSE is a Scala library offering access to the full scientific computing > power of the R programming language to Apache Spark batch and streaming > applications on the JVM. Where Apache SparkR lets data scientists use Spark > from R, ROSE is designed to let Scala and Java developers use R from Spark. > > The project is available and documented on GitHub and I would encourage you > to take a look. Any feedback, questions etc very welcome. > > David > > "All that is gold does not glitter, Not all those who wander are lost." > - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: ROSE: Spark + R on the JVM.
Hi, this looks great and seems to be very usable. Would it be possible to access the session API from within ROSE, to get for example the images that are generated by R / openCPU and the logging to stdout that is logged by R? thanks in advance, Richard On Tue, Jan 12, 2016 at 10:16 PM, Vijay Kiran wrote: > I think it would be this: > https://github.com/onetapbeyond/opencpu-spark-executor > > > On 12 Jan 2016, at 18:32, Corey Nolet wrote: > > > > David, > > > > Thank you very much for announcing this! It looks like it could be very > useful. Would you mind providing a link to the github? > > > > On Tue, Jan 12, 2016 at 10:03 AM, David > wrote: > > Hi all, > > > > I'd like to share news of the recent release of a new Spark package, > ROSE. > > > > ROSE is a Scala library offering access to the full scientific computing > power of the R programming language to Apache Spark batch and streaming > applications on the JVM. Where Apache SparkR lets data scientists use Spark > from R, ROSE is designed to let Scala and Java developers use R from Spark. > > > > The project is available and documented on GitHub and I would encourage > you to take a look. Any feedback, questions etc very welcome. > > > > David > > > > "All that is gold does not glitter, Not all those who wander are lost." > > > > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >
Re: ROSE: Spark + R on the JVM.
I think it would be this: https://github.com/onetapbeyond/opencpu-spark-executor > On 12 Jan 2016, at 18:32, Corey Nolet wrote: > > David, > > Thank you very much for announcing this! It looks like it could be very > useful. Would you mind providing a link to the github? > > On Tue, Jan 12, 2016 at 10:03 AM, David > wrote: > Hi all, > > I'd like to share news of the recent release of a new Spark package, ROSE. > > ROSE is a Scala library offering access to the full scientific computing > power of the R programming language to Apache Spark batch and streaming > applications on the JVM. Where Apache SparkR lets data scientists use Spark > from R, ROSE is designed to let Scala and Java developers use R from Spark. > > The project is available and documented on GitHub and I would encourage you > to take a look. Any feedback, questions etc very welcome. > > David > > "All that is gold does not glitter, Not all those who wander are lost." > - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: ROSE: Spark + R on the JVM.
I think it would be this: https://github.com/onetapbeyond/opencpu-spark-executor > On 12 Jan 2016, at 18:32, Corey Nolet wrote: > > David, > > Thank you very much for announcing this! It looks like it could be very > useful. Would you mind providing a link to the github? > > On Tue, Jan 12, 2016 at 10:03 AM, David > wrote: > Hi all, > > I'd like to share news of the recent release of a new Spark package, ROSE. > > ROSE is a Scala library offering access to the full scientific computing > power of the R programming language to Apache Spark batch and streaming > applications on the JVM. Where Apache SparkR lets data scientists use Spark > from R, ROSE is designed to let Scala and Java developers use R from Spark. > > The project is available and documented on GitHub and I would encourage you > to take a look. Any feedback, questions etc very welcome. > > David > > "All that is gold does not glitter, Not all those who wander are lost." > - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
RE: ROSE: Spark + R on the JVM.
Definitely a great news for all the R and spark guys over here. From: Corey Nolet [mailto:cjno...@gmail.com] Sent: Tuesday, January 12, 2016 11:02 PM To: David Cc: user@spark.apache.org; d...@spark.apache.org Subject: Re: ROSE: Spark + R on the JVM. David, Thank you very much for announcing this! It looks like it could be very useful. Would you mind providing a link to the github? On Tue, Jan 12, 2016 at 10:03 AM, David wrote: Hi all, I'd like to share news of the recent release of a new Spark package, ROSE. ROSE is a Scala library offering access to the full scientific computing power of the R programming language to Apache Spark batch and streaming applications on the JVM. Where Apache SparkR lets data scientists use Spark from R, ROSE is designed to let Scala and Java developers use R from Spark. The project is available and documented on GitHub and I would encourage you to take a look. Any feedback, questions etc very welcome. David "All that is gold does not glitter, Not all those who wander are lost." === DISCLAIMER: The information contained in this message (including any attachments) is confidential and may be privileged. If you have received it by mistake please notify the sender by return e-mail and permanently delete this message and any attachments from your system. Any dissemination, use, review, distribution, printing or copying of this message in whole or in part is strictly prohibited. Please note that e-mails are susceptible to change. CitiusTech shall not be liable for the improper or incomplete transmission of the information contained in this communication nor for any delay in its receipt or damage to your system. CitiusTech does not guarantee that the integrity of this communication has been maintained or that this communication is free of viruses, interceptions or interferences.
Re: ROSE: Spark + R on the JVM.
Hi Corey, > Would you mind providing a link to the github? Sure, here is the github link you're looking for: https://github.com/onetapbeyond/opencpu-spark-executor David "All that is gold does not glitter, Not all those who wander are lost." Original Message ---- Subject: Re: ROSE: Spark + R on the JVM. Local Time: January 12 2016 12:32 pm UTC Time: January 12 2016 5:32 pm From: cjno...@gmail.com To: themarchoffo...@protonmail.com CC: user@spark.apache.org,d...@spark.apache.org David, Thank you very much for announcing this! It looks like it could be very useful. Would you mind providing a link to the github? On Tue, Jan 12, 2016 at 10:03 AM, David wrote: Hi all, I'd like to share news of the recent release of a new Spark package, ROSE. ROSE is a Scala library offering access to the full scientific computing power of the R programming language to Apache Spark batch and streaming applications on the JVM. Where Apache SparkR lets data scientists use Spark from R, ROSE is designed to let Scala and Java developers use R from Spark. The project is available and documented on GitHub and I would encourage you to take a look. Any feedback, questions etc very welcome. David "All that is gold does not glitter, Not all those who wander are lost."
Re: ROSE: Spark + R on the JVM.
David, Thank you very much for announcing this! It looks like it could be very useful. Would you mind providing a link to the github? On Tue, Jan 12, 2016 at 10:03 AM, David wrote: > Hi all, > > I'd like to share news of the recent release of a new Spark package, ROSE. > > > ROSE is a Scala library offering access to the full scientific computing > power of the R programming language to Apache Spark batch and streaming > applications on the JVM. Where Apache SparkR lets data scientists use Spark > from R, ROSE is designed to let Scala and Java developers use R from Spark. > > The project is available and documented on GitHub and I would encourage > you to take a look. Any feedback, questions etc very welcome. > > David > > "All that is gold does not glitter, Not all those who wander are lost." >