[ https://issues.apache.org/jira/browse/SPARK-15428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17442292#comment-17442292 ]
Hongbo edited comment on SPARK-15428 at 11/12/21, 3:56 AM: ----------------------------------------------------------- Is there any plan to enable it? It's quite a big limitation. For example, we can not do analysis between 1-minute average and 5 minute-average. As it was disabled many years ago when structured streaming was new, is it still blocked by any major technical difficulties after years of evolution? There are also other related tickets, such as https://issues.apache.org/jira/browse/SPARK-26692 . After reading more documentation of Spark and Flink, I think it's caused by the fundamental limitation that Spark uses the "global watermark". While Flink has watermark generators so the watermarks can flow together with the data. Due to this limitation, after aggregation, we can not have a single reliable "global watermark" for the aggregated stream. So joining aggregated stream will produce unreliable data, as the document states: [https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#limitation-of-global-watermark] Any of the stateful operation(s) after any of below stateful operations can have this issue: * *streaming aggregation in Append mode* * stream-stream outer join * {{mapGroupsWithState}} and {{flatMapGroupsWithState}} in Append mode (depending on the implementation of the state function) Is the understanding correct? was (Author: liuhb86): Is there any plan to enable it? It's quite a big limitation. For example, we can not do analysis between 1-minute average and 5 minute-average. As it was disabled many years ago when structured streaming was new, is is still blocked by any major technical difficulities after years of evolution? There are also other related tickets, such as https://issues.apache.org/jira/browse/SPARK-26692 . > Disable support for multiple streaming aggregations > --------------------------------------------------- > > Key: SPARK-15428 > URL: https://issues.apache.org/jira/browse/SPARK-15428 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming > Reporter: Tathagata Das > Assignee: Tathagata Das > Priority: Major > Fix For: 2.0.0 > > > Incrementalizing plans of with multiple streaming aggregation is tricky and > we dont have the necessary support for "delta" to implement correctly. So > disabling the support for multiple streaming aggregations. -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org