[jira] [Comment Edited] (SPARK-15428) Disable support for multiple streaming aggregations

2018-07-19 Thread Joost Verdoorn (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16549174#comment-16549174
 ] 

Joost Verdoorn edited comment on SPARK-15428 at 7/19/18 11:56 AM:
--

I was wondering the same. Being able to do only one aggregation within 
structured streaming is extremely limiting. Any idea on when (if ever) multiple 
aggregations could be supported? [~tdas]


was (Author: joostverdoorn):
I was wondering the same. Being able to do only one aggregation within 
structured streaming is extremely limiting. Any idea on when (if ever) multiple 
aggregations could be supported?

> Disable support for multiple streaming aggregations
> ---
>
> Key: SPARK-15428
> URL: https://issues.apache.org/jira/browse/SPARK-15428
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Reporter: Tathagata Das
>Assignee: Tathagata Das
>Priority: Major
> Fix For: 2.0.0
>
>
> Incrementalizing plans of with multiple streaming aggregation is tricky and 
> we dont have the necessary support for "delta" to implement correctly. So 
> disabling the support for multiple streaming aggregations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-15428) Disable support for multiple streaming aggregations

2021-11-11 Thread Hongbo (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-15428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17442292#comment-17442292
 ] 

Hongbo edited comment on SPARK-15428 at 11/12/21, 3:56 AM:
---

Is there any plan to enable it? It's quite a big limitation. For example, we 
can not do analysis between  1-minute average and 5 minute-average. As it was 
disabled many years ago when structured streaming was new, is it still blocked 
by any major technical difficulties after years of evolution?

There are also other related tickets, such as 
https://issues.apache.org/jira/browse/SPARK-26692 .

 

After reading more documentation of Spark and Flink,

I think it's caused by the fundamental limitation that Spark uses the "global 
watermark". While Flink has watermark generators so the watermarks can flow 
together with the data. Due to this limitation, after aggregation, we can not 
have a single reliable "global watermark" for the aggregated stream. So joining 
aggregated stream will produce unreliable data, as the document states:

[https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#limitation-of-global-watermark]

Any of the stateful operation(s) after any of below stateful operations can 
have this issue:
 * *streaming aggregation in Append mode*
 * stream-stream outer join
 * {{mapGroupsWithState}} and {{flatMapGroupsWithState}} in Append mode 
(depending on the implementation of the state function)

Is the understanding correct?


was (Author: liuhb86):
Is there any plan to enable it? It's quite a big limitation. For example, we 
can not do analysis between  1-minute average and 5 minute-average. As it was 
disabled many years ago when structured streaming was new, is is still blocked 
by any major technical difficulities after years of evolution?

There are also other related tickets, such as 
https://issues.apache.org/jira/browse/SPARK-26692 .

> Disable support for multiple streaming aggregations
> ---
>
> Key: SPARK-15428
> URL: https://issues.apache.org/jira/browse/SPARK-15428
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Reporter: Tathagata Das
>Assignee: Tathagata Das
>Priority: Major
> Fix For: 2.0.0
>
>
> Incrementalizing plans of with multiple streaming aggregation is tricky and 
> we dont have the necessary support for "delta" to implement correctly. So 
> disabling the support for multiple streaming aggregations.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org