[jira] [Commented] (SPARK-42376) Introduce watermark propagation among operators
[ https://issues.apache.org/jira/browse/SPARK-42376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17685315#comment-17685315 ] Jungtaek Lim commented on SPARK-42376: -- Will submit a PR sooner. > Introduce watermark propagation among operators > --- > > Key: SPARK-42376 > URL: https://issues.apache.org/jira/browse/SPARK-42376 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.5.0 >Reporter: Jungtaek Lim >Priority: Major > > With introduction of SPARK-40925, we enabled workloads containing multiple > stateful operators in a single streaming query. > The JIRA ticket clearly described out-of-scope, "Here we propose fixing the > late record filtering in stateful operators to allow chaining of stateful > operators {*}which do not produce delayed records (like time-interval join or > potentially flatMapGroupsWithState){*}". > We identified production use case for stream-stream time-interval join > followed by stateful operator (e.g. window aggregation), and propose to > address such use case via this ticket. > The design will be described in the PR, but the sketched idea is introducing > simulation of watermark propagation among operators. As of now, Spark > considers all stateful operators to have same input watermark and output > watermark, which introduced the limitation. With this ticket, we construct > the logic to simulate watermark propagation so that each operator can have > its own (input watermark, output watermark). Operators introducing delayed > records will produce delayed output watermark, and downstream operator can > take the delay into account as input watermark will be adjusted. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42376) Introduce watermark propagation among operators
[ https://issues.apache.org/jira/browse/SPARK-42376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17685350#comment-17685350 ] Apache Spark commented on SPARK-42376: -- User 'HeartSaVioR' has created a pull request for this issue: https://github.com/apache/spark/pull/39931 > Introduce watermark propagation among operators > --- > > Key: SPARK-42376 > URL: https://issues.apache.org/jira/browse/SPARK-42376 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.5.0 >Reporter: Jungtaek Lim >Priority: Major > > With introduction of SPARK-40925, we enabled workloads containing multiple > stateful operators in a single streaming query. > The JIRA ticket clearly described out-of-scope, "Here we propose fixing the > late record filtering in stateful operators to allow chaining of stateful > operators {*}which do not produce delayed records (like time-interval join or > potentially flatMapGroupsWithState){*}". > We identified production use case for stream-stream time-interval join > followed by stateful operator (e.g. window aggregation), and propose to > address such use case via this ticket. > The design will be described in the PR, but the sketched idea is introducing > simulation of watermark propagation among operators. As of now, Spark > considers all stateful operators to have same input watermark and output > watermark, which introduced the limitation. With this ticket, we construct > the logic to simulate watermark propagation so that each operator can have > its own (input watermark, output watermark). Operators introducing delayed > records will produce delayed output watermark, and downstream operator can > take the delay into account as input watermark will be adjusted. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42376) Introduce watermark propagation among operators
[ https://issues.apache.org/jira/browse/SPARK-42376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17687246#comment-17687246 ] Alex Balikov commented on SPARK-42376: -- Yay! Congrats on getting this out! > Introduce watermark propagation among operators > --- > > Key: SPARK-42376 > URL: https://issues.apache.org/jira/browse/SPARK-42376 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.5.0 >Reporter: Jungtaek Lim >Priority: Major > > With introduction of SPARK-40925, we enabled workloads containing multiple > stateful operators in a single streaming query. > The JIRA ticket clearly described out-of-scope, "Here we propose fixing the > late record filtering in stateful operators to allow chaining of stateful > operators {*}which do not produce delayed records (like time-interval join or > potentially flatMapGroupsWithState){*}". > We identified production use case for stream-stream time-interval join > followed by stateful operator (e.g. window aggregation), and propose to > address such use case via this ticket. > The design will be described in the PR, but the sketched idea is introducing > simulation of watermark propagation among operators. As of now, Spark > considers all stateful operators to have same input watermark and output > watermark, which introduced the limitation. With this ticket, we construct > the logic to simulate watermark propagation so that each operator can have > its own (input watermark, output watermark). Operators introducing delayed > records will produce delayed output watermark, and downstream operator can > take the delay into account as input watermark will be adjusted. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42376) Introduce watermark propagation among operators
[ https://issues.apache.org/jira/browse/SPARK-42376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17688956#comment-17688956 ] Jungtaek Lim commented on SPARK-42376: -- [~alex-balikov] Thanks! We finally got here and I'd like to say "thank you" for your help. > Introduce watermark propagation among operators > --- > > Key: SPARK-42376 > URL: https://issues.apache.org/jira/browse/SPARK-42376 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.5.0 >Reporter: Jungtaek Lim >Priority: Major > > With introduction of SPARK-40925, we enabled workloads containing multiple > stateful operators in a single streaming query. > The JIRA ticket clearly described out-of-scope, "Here we propose fixing the > late record filtering in stateful operators to allow chaining of stateful > operators {*}which do not produce delayed records (like time-interval join or > potentially flatMapGroupsWithState){*}". > We identified production use case for stream-stream time-interval join > followed by stateful operator (e.g. window aggregation), and propose to > address such use case via this ticket. > The design will be described in the PR, but the sketched idea is introducing > simulation of watermark propagation among operators. As of now, Spark > considers all stateful operators to have same input watermark and output > watermark, which introduced the limitation. With this ticket, we construct > the logic to simulate watermark propagation so that each operator can have > its own (input watermark, output watermark). Operators introducing delayed > records will produce delayed output watermark, and downstream operator can > take the delay into account as input watermark will be adjusted. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42376) Introduce watermark propagation among operators
[ https://issues.apache.org/jira/browse/SPARK-42376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694401#comment-17694401 ] Apache Spark commented on SPARK-42376: -- User 'HeartSaVioR' has created a pull request for this issue: https://github.com/apache/spark/pull/40215 > Introduce watermark propagation among operators > --- > > Key: SPARK-42376 > URL: https://issues.apache.org/jira/browse/SPARK-42376 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.5.0 >Reporter: Jungtaek Lim >Priority: Major > > With introduction of SPARK-40925, we enabled workloads containing multiple > stateful operators in a single streaming query. > The JIRA ticket clearly described out-of-scope, "Here we propose fixing the > late record filtering in stateful operators to allow chaining of stateful > operators {*}which do not produce delayed records (like time-interval join or > potentially flatMapGroupsWithState){*}". > We identified production use case for stream-stream time-interval join > followed by stateful operator (e.g. window aggregation), and propose to > address such use case via this ticket. > The design will be described in the PR, but the sketched idea is introducing > simulation of watermark propagation among operators. As of now, Spark > considers all stateful operators to have same input watermark and output > watermark, which introduced the limitation. With this ticket, we construct > the logic to simulate watermark propagation so that each operator can have > its own (input watermark, output watermark). Operators introducing delayed > records will produce delayed output watermark, and downstream operator can > take the delay into account as input watermark will be adjusted. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org