[ https://issues.apache.org/jira/browse/SPARK-26655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17532614#comment-17532614 ]
Huw commented on SPARK-26655: ----------------------------- I hit the guards in UnsupportedOperationChecker recently, and considered that if I was using append mode it would be sound. Glad to see it's being looked into. I think this also applies to flatMapGroupsWithState, and specifically, the error "flatMapGroupsWithState in append mode is not supported with $outputMode output mode on a streaming DataFrame/Dataset". > Support multiple aggregates in Structured Streaming append mode > --------------------------------------------------------------- > > Key: SPARK-26655 > URL: https://issues.apache.org/jira/browse/SPARK-26655 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming > Affects Versions: 3.1.0 > Reporter: Arun Mahadevan > Priority: Major > Attachments: Watermarks and multiple aggregates in Spark strucutred > streaming_v1.pdf > > > Right now multiple aggregates are not supported in structured streaming. > However, in append mode, the aggregates are emitted only after the watermark > passes the threshold (e.g. the window boundary) and the emitted value is not > affected by further late data. So it possible to chain multiple aggregates in > 'Append' output mode without worrying about retractions. > However the current event time watermarks in structured streaming are tracked > at a global level and this does not work when aggregates are chained. > We need to track the watermarks at individual operator level so that each > operator can make progress independently and not rely on global min or max > value. -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org