[jira] [Commented] (SPARK-24036) Stateful operators in continuous processing

Arun Mahadevan (JIRA) Wed, 25 Apr 2018 13:37:54 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-24036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453058#comment-16453058
 ]


Arun Mahadevan commented on SPARK-24036:
----------------------------------------

Hi [~joseph.torres], I am also interested to contribute to this effort if you 
are open to it.

> Supporting single partition aggregates. I have a substantially complete 
> prototype of this in [https://github.com/jose-torres/spark/pull/13] - it 
> doesn't really involve design as much as removing a very silly hack I put in 
> earlier.

Does it require saving the aggregate state by injecting epoch marker into the 
stream or it just works using the iterator approach since its involves only 
single partition?

To extend this to support multiple partition and shuffles, shouldn't the epoch 
markers be injected into the stream and state save happen on receiving the 
markers from all the parent tasks ?

 > Just write RPC endpoints on both ends tossing rows around, optimizing for 
throughput later if needed. (I'm leaning towards this one.)

So buffering of the rows between the stages and handling back-pressure needs to 
be considered here ? Would the existing shuffle infrastructure make it easier 
to handle this ?

 

> Stateful operators in continuous processing
> -------------------------------------------
>
>                 Key: SPARK-24036
>                 URL: https://issues.apache.org/jira/browse/SPARK-24036
>             Project: Spark
>          Issue Type: Improvement
>          Components: Structured Streaming
>    Affects Versions: 2.4.0
>            Reporter: Jose Torres
>            Priority: Major
>
> The first iteration of continuous processing in Spark 2.3 does not work with 
> stateful operators.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-24036) Stateful operators in continuous processing

Reply via email to