[jira] [Commented] (SPARK-42376) Introduce watermark propagation among operators

2023-02-07 Thread Jungtaek Lim (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17685315#comment-17685315
 ] 

Jungtaek Lim commented on SPARK-42376:
--

Will submit a PR sooner.

> Introduce watermark propagation among operators
> ---
>
> Key: SPARK-42376
> URL: https://issues.apache.org/jira/browse/SPARK-42376
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Jungtaek Lim
>Priority: Major
>
> With introduction of SPARK-40925, we enabled workloads containing multiple 
> stateful operators in a single streaming query.
> The JIRA ticket clearly described out-of-scope, "Here we propose fixing the 
> late record filtering in stateful operators to allow chaining of stateful 
> operators {*}which do not produce delayed records (like time-interval join or 
> potentially flatMapGroupsWithState){*}".
> We identified production use case for stream-stream time-interval join 
> followed by stateful operator (e.g. window aggregation), and propose to 
> address such use case via this ticket.
> The design will be described in the PR, but the sketched idea is introducing 
> simulation of watermark propagation among operators. As of now, Spark 
> considers all stateful operators to have same input watermark and output 
> watermark, which introduced the limitation. With this ticket, we construct 
> the logic to simulate watermark propagation so that each operator can have 
> its own (input watermark, output watermark). Operators introducing delayed 
> records will produce delayed output watermark, and downstream operator can 
> take the delay into account as input watermark will be adjusted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42376) Introduce watermark propagation among operators

2023-02-07 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17685350#comment-17685350
 ] 

Apache Spark commented on SPARK-42376:
--

User 'HeartSaVioR' has created a pull request for this issue:
https://github.com/apache/spark/pull/39931

> Introduce watermark propagation among operators
> ---
>
> Key: SPARK-42376
> URL: https://issues.apache.org/jira/browse/SPARK-42376
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Jungtaek Lim
>Priority: Major
>
> With introduction of SPARK-40925, we enabled workloads containing multiple 
> stateful operators in a single streaming query.
> The JIRA ticket clearly described out-of-scope, "Here we propose fixing the 
> late record filtering in stateful operators to allow chaining of stateful 
> operators {*}which do not produce delayed records (like time-interval join or 
> potentially flatMapGroupsWithState){*}".
> We identified production use case for stream-stream time-interval join 
> followed by stateful operator (e.g. window aggregation), and propose to 
> address such use case via this ticket.
> The design will be described in the PR, but the sketched idea is introducing 
> simulation of watermark propagation among operators. As of now, Spark 
> considers all stateful operators to have same input watermark and output 
> watermark, which introduced the limitation. With this ticket, we construct 
> the logic to simulate watermark propagation so that each operator can have 
> its own (input watermark, output watermark). Operators introducing delayed 
> records will produce delayed output watermark, and downstream operator can 
> take the delay into account as input watermark will be adjusted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42376) Introduce watermark propagation among operators

2023-02-10 Thread Alex Balikov (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17687246#comment-17687246
 ] 

Alex Balikov commented on SPARK-42376:
--

Yay! Congrats on getting this out!

> Introduce watermark propagation among operators
> ---
>
> Key: SPARK-42376
> URL: https://issues.apache.org/jira/browse/SPARK-42376
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Jungtaek Lim
>Priority: Major
>
> With introduction of SPARK-40925, we enabled workloads containing multiple 
> stateful operators in a single streaming query.
> The JIRA ticket clearly described out-of-scope, "Here we propose fixing the 
> late record filtering in stateful operators to allow chaining of stateful 
> operators {*}which do not produce delayed records (like time-interval join or 
> potentially flatMapGroupsWithState){*}".
> We identified production use case for stream-stream time-interval join 
> followed by stateful operator (e.g. window aggregation), and propose to 
> address such use case via this ticket.
> The design will be described in the PR, but the sketched idea is introducing 
> simulation of watermark propagation among operators. As of now, Spark 
> considers all stateful operators to have same input watermark and output 
> watermark, which introduced the limitation. With this ticket, we construct 
> the logic to simulate watermark propagation so that each operator can have 
> its own (input watermark, output watermark). Operators introducing delayed 
> records will produce delayed output watermark, and downstream operator can 
> take the delay into account as input watermark will be adjusted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42376) Introduce watermark propagation among operators

2023-02-15 Thread Jungtaek Lim (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17688956#comment-17688956
 ] 

Jungtaek Lim commented on SPARK-42376:
--

[~alex-balikov] Thanks! We finally got here and I'd like to say "thank you" for 
your help. 

> Introduce watermark propagation among operators
> ---
>
> Key: SPARK-42376
> URL: https://issues.apache.org/jira/browse/SPARK-42376
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Jungtaek Lim
>Priority: Major
>
> With introduction of SPARK-40925, we enabled workloads containing multiple 
> stateful operators in a single streaming query.
> The JIRA ticket clearly described out-of-scope, "Here we propose fixing the 
> late record filtering in stateful operators to allow chaining of stateful 
> operators {*}which do not produce delayed records (like time-interval join or 
> potentially flatMapGroupsWithState){*}".
> We identified production use case for stream-stream time-interval join 
> followed by stateful operator (e.g. window aggregation), and propose to 
> address such use case via this ticket.
> The design will be described in the PR, but the sketched idea is introducing 
> simulation of watermark propagation among operators. As of now, Spark 
> considers all stateful operators to have same input watermark and output 
> watermark, which introduced the limitation. With this ticket, we construct 
> the logic to simulate watermark propagation so that each operator can have 
> its own (input watermark, output watermark). Operators introducing delayed 
> records will produce delayed output watermark, and downstream operator can 
> take the delay into account as input watermark will be adjusted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42376) Introduce watermark propagation among operators

2023-02-28 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694401#comment-17694401
 ] 

Apache Spark commented on SPARK-42376:
--

User 'HeartSaVioR' has created a pull request for this issue:
https://github.com/apache/spark/pull/40215

> Introduce watermark propagation among operators
> ---
>
> Key: SPARK-42376
> URL: https://issues.apache.org/jira/browse/SPARK-42376
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Jungtaek Lim
>Priority: Major
>
> With introduction of SPARK-40925, we enabled workloads containing multiple 
> stateful operators in a single streaming query.
> The JIRA ticket clearly described out-of-scope, "Here we propose fixing the 
> late record filtering in stateful operators to allow chaining of stateful 
> operators {*}which do not produce delayed records (like time-interval join or 
> potentially flatMapGroupsWithState){*}".
> We identified production use case for stream-stream time-interval join 
> followed by stateful operator (e.g. window aggregation), and propose to 
> address such use case via this ticket.
> The design will be described in the PR, but the sketched idea is introducing 
> simulation of watermark propagation among operators. As of now, Spark 
> considers all stateful operators to have same input watermark and output 
> watermark, which introduced the limitation. With this ticket, we construct 
> the logic to simulate watermark propagation so that each operator can have 
> its own (input watermark, output watermark). Operators introducing delayed 
> records will produce delayed output watermark, and downstream operator can 
> take the delay into account as input watermark will be adjusted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org