[ 
https://issues.apache.org/jira/browse/SPARK-6803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14488315#comment-14488315
 ] 

Davies Liu commented on SPARK-6803:
-----------------------------------

After a quick look over the prototype, the callback server is sit in another 
process than the driver, because R does not support multiple threading. This 
approach will have some limitation, for example, access some shared variables 
in callback functions.

Also, we should have a way to collect the logging from callback server, it's 
needed when you run the streaming job as a daemon process, with 
dstream.pprint().

This prototype is pretty cool, it shows that it's doable to have a Streaming 
API in R, even with some limitations.

But the question is that how many user want to do streaming job in R? There 
will be a lots of effort to make it production ready. Even with Python API, 
there's lots of work to do, for example, support checkpointing and recovery 
with HDFS.

> [SparkR] Support SparkR Streaming
> ---------------------------------
>
>                 Key: SPARK-6803
>                 URL: https://issues.apache.org/jira/browse/SPARK-6803
>             Project: Spark
>          Issue Type: New Feature
>          Components: SparkR, Streaming
>            Reporter: Hao
>             Fix For: 1.4.0
>
>
> Adds R API for Spark Streaming.
> A experimental version is presented in repo [1]. which follows the PySpark 
> streaming design. Also, this PR can be further broken down into sub task 
> issues.
> [1] https://github.com/hlin09/spark/tree/SparkR-streaming/ 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to