[ https://issues.apache.org/jira/browse/SPARK-24036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453058#comment-16453058 ]
Arun Mahadevan commented on SPARK-24036: ---------------------------------------- Hi [~joseph.torres], I am also interested to contribute to this effort if you are open to it. > Supporting single partition aggregates. I have a substantially complete > prototype of this in [https://github.com/jose-torres/spark/pull/13] - it > doesn't really involve design as much as removing a very silly hack I put in > earlier. Does it require saving the aggregate state by injecting epoch marker into the stream or it just works using the iterator approach since its involves only single partition? To extend this to support multiple partition and shuffles, shouldn't the epoch markers be injected into the stream and state save happen on receiving the markers from all the parent tasks ? > Just write RPC endpoints on both ends tossing rows around, optimizing for throughput later if needed. (I'm leaning towards this one.) So buffering of the rows between the stages and handling back-pressure needs to be considered here ? Would the existing shuffle infrastructure make it easier to handle this ? > Stateful operators in continuous processing > ------------------------------------------- > > Key: SPARK-24036 > URL: https://issues.apache.org/jira/browse/SPARK-24036 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming > Affects Versions: 2.4.0 > Reporter: Jose Torres > Priority: Major > > The first iteration of continuous processing in Spark 2.3 does not work with > stateful operators. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org