[ https://issues.apache.org/jira/browse/SPARK-6803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14488315#comment-14488315 ]
Davies Liu commented on SPARK-6803: ----------------------------------- After a quick look over the prototype, the callback server is sit in another process than the driver, because R does not support multiple threading. This approach will have some limitation, for example, access some shared variables in callback functions. Also, we should have a way to collect the logging from callback server, it's needed when you run the streaming job as a daemon process, with dstream.pprint(). This prototype is pretty cool, it shows that it's doable to have a Streaming API in R, even with some limitations. But the question is that how many user want to do streaming job in R? There will be a lots of effort to make it production ready. Even with Python API, there's lots of work to do, for example, support checkpointing and recovery with HDFS. > [SparkR] Support SparkR Streaming > --------------------------------- > > Key: SPARK-6803 > URL: https://issues.apache.org/jira/browse/SPARK-6803 > Project: Spark > Issue Type: New Feature > Components: SparkR, Streaming > Reporter: Hao > Fix For: 1.4.0 > > > Adds R API for Spark Streaming. > A experimental version is presented in repo [1]. which follows the PySpark > streaming design. Also, this PR can be further broken down into sub task > issues. > [1] https://github.com/hlin09/spark/tree/SparkR-streaming/ -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org