[ 
https://issues.apache.org/jira/browse/SPARK-24036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469863#comment-16469863
 ] 

Jose Torres commented on SPARK-24036:
-------------------------------------

The way I was envisioning it, there would be four kinds of tasks when we're 
done:
 * reader-only, which has a ContinuousDataReader at the bottom and one of the 
new queue writers at the top
 * intermediate, which has one of the new queue readers at the bottom and one 
of the new queue writers at the top
 * writer-only, which has one of the new queue readers at the bottom and a 
DataWriter (to the remote data sink) at the top
 * reader-writer, which has a ContinuousDataReader at the bottom and a 
DataWriter at the top

But each of these would be implemented as partitions of the ContinuousWriteRDD, 
allowing all of this to be opaque to the scheduler. Changing DAGScheduler to 
accommodate continuous processing would create significant additional 
complexity I don't think we can really justify.

Whether we need to write an explicit shuffle RDD class or not would I think 
come down to an implementation detail of SPARK-24236. It depends on what's the 
cleanest way to unfold the SparkPlan tree.

> Stateful operators in continuous processing
> -------------------------------------------
>
>                 Key: SPARK-24036
>                 URL: https://issues.apache.org/jira/browse/SPARK-24036
>             Project: Spark
>          Issue Type: Improvement
>          Components: Structured Streaming
>    Affects Versions: 2.4.0
>            Reporter: Jose Torres
>            Priority: Major
>
> The first iteration of continuous processing in Spark 2.3 does not work with 
> stateful operators.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to