[ https://issues.apache.org/jira/browse/SPARK-39805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17567866#comment-17567866 ]
Apache Spark commented on SPARK-39805: -------------------------------------- User 'HeartSaVioR' has created a pull request for this issue: https://github.com/apache/spark/pull/37213 > Deprecate Trigger.Once and Promote Trigger.AvailableNow > ------------------------------------------------------- > > Key: SPARK-39805 > URL: https://issues.apache.org/jira/browse/SPARK-39805 > Project: Spark > Issue Type: Task > Components: Structured Streaming > Affects Versions: 3.4.0 > Reporter: Jungtaek Lim > Priority: Major > > Quoting the discussion in spark dev@: > [link|https://lists.apache.org/thread/2xnxlxhw245cmspd8nd17cq5doj2c7hc] > Rationalization: > The expected behavior of Trigger.Once is like reading all available data > after the last trigger and processing them. This holds true when the last run > was gracefully terminated, but there are cases streaming queries to not be > terminated gracefully. There is a possibility the last run may write the > offset for the new batch before termination, then a new run of Trigger.Once > only processes the data which was built in the latest unfinished batch and > doesn't process new data. > The behavior is not deterministic from the users' point of view, as end users > wouldn't know whether the last run wrote the offset or not, unless they look > into the query's checkpoint by themselves. > While Trigger.AvailableNow came to solve the scalability issue on > Trigger.Once, it also ensures that it tries to process all available data at > the point of time it is triggered, which consistently works as expected > behavior of Trigger.Once. > Another issue on Trigger.Once is that it does not trigger a no-data batch > immediately. When the watermark is calculated in batch N, it takes effect in > batch N + 1. If the query is scheduled to be run per day, you can see the > output from the new watermark in the query run the next day. Thanks to the > behavior of Trigger.AvailableNow, it handles no-data batch as well before > termination of the query. > There was no strong feedback in the discussion thread, but accounting the > fact we have very small number of contributors (including committers/PMC > members) being active in SS area, we have to just go with lazy consensus. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org