Hi David,

the use case is that we're building a data processing system with an
intuitive user interface where Spark is used as the data processing
framework.
We would like to provide a HTML user interface to R where the user types or
copy-pastes his R code, the system should then send this R code (using
ROSE) to R, process it and give the results back to the user. The RDD would
be used so that the data can be further processed by the system but we
would like to also show or be able to show the messages printed to STDOUT
and also the images (plots) that are generated by R. The plots seems to be
available in the OpenCPU API, see below

[image: Inline image 1]

So the case is not that we're trying to process millions of images but
rather that we would like to show the generated plots (like a regression
plot) that's generated in R to the user. There could be several plots
generated by the code, but certainly not thousands or even hundreds, only a
few.

Hope that this would be possible using ROSE because it seems a really good
fit,
thanks in advance,
Richard

On Wed, Jan 13, 2016 at 3:39 AM, David Russell <
themarchoffo...@protonmail.com> wrote:

> Hi Richard,
>
> > Would it be possible to access the session API from within ROSE,
> > to get for example the images that are generated by R / openCPU
>
> Technically it would be possible although there would be some potentially
> significant runtime costs per task in doing so, primarily those related to
> extracting image data from the R session, serializing and then moving that
> data across the cluster for each and every image.
>
> From a design perspective ROSE was intended to be used within Spark scale
> applications where R object data was seen as the primary task output. An
> output in a format that could be rapidly serialized and easily processed.
> Are there real world use cases where Spark scale applications capable of
> generating 10k, 100k, or even millions of image files would actually
> need to capture and store images? If so, how practically speaking, would
> these images ever be used? I'm just not sure. Maybe you could describe your
> own use case to provide some insights?
>
> > and the logging to stdout that is logged by R?
>
> If you are referring to the R console output (generated within the R
> session during the execution of an OCPUTask) then this data could certainly
> (optionally) be captured and returned on an OCPUResult. Again, can you
> provide any details for how you might use this console output in a real
> world application?
>
> As an aside, for simple standalone Spark applications that will only ever
> run on a single host (no cluster) you could consider using an alternative
> library called *fluent-r*. This library is also available under my GitHub
> repo, see here <https://github.com/onetapbeyond/fluent-r>. The fluent-r
> library already has support for the retrieval of R objects, R console
> output and R graphics device image/plots. However it is not as lightweight
> as ROSE and it not designed to work in a clustered environment. ROSE on the
> other hand is designed for scale.
>
> David
>
> "All that is gold does not glitter, Not all those who wander are lost."
>
>
> -------- Original Message --------
> Subject: Re: ROSE: Spark + R on the JVM.
> Local Time: January 12 2016 6:56 pm
> UTC Time: January 12 2016 11:56 pm
> From: rsiebel...@gmail.com
> To: m...@vijaykiran.com
> CC: cjno...@gmail.com,themarchoffo...@protonmail.com,user@spark.apache.org
> ,d...@spark.apache.org
>
> Hi,
>
> this looks great and seems to be very usable.
> Would it be possible to access the session API from within ROSE, to get
> for example the images that are generated by R / openCPU and the logging to
> stdout that is logged by R?
>
> thanks in advance,
> Richard
>
> On Tue, Jan 12, 2016 at 10:16 PM, Vijay Kiran <m...@vijaykiran.com> wrote:
>
>> I think it would be this:
>> https://github.com/onetapbeyond/opencpu-spark-executor
>>
>> > On 12 Jan 2016, at 18:32, Corey Nolet <cjno...@gmail.com> wrote:
>> >
>> > David,
>> >
>> > Thank you very much for announcing this! It looks like it could be very
>> useful. Would you mind providing a link to the github?
>> >
>> > On Tue, Jan 12, 2016 at 10:03 AM, David <themarchoffo...@protonmail.com>
>> wrote:
>> > Hi all,
>> >
>> > I'd like to share news of the recent release of a new Spark package,
>> ROSE.
>> >
>> > ROSE is a Scala library offering access to the full scientific
>> computing power of the R programming language to Apache Spark batch and
>> streaming applications on the JVM. Where Apache SparkR lets data scientists
>> use Spark from R, ROSE is designed to let Scala and Java developers use R
>> from Spark.
>> >
>> > The project is available and documented on GitHub and I would encourage
>> you to take a look. Any feedback, questions etc very welcome.
>> >
>> > David
>> >
>> > "All that is gold does not glitter, Not all those who wander are lost."
>> >
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>

Reply via email to