Bump to get a chance to expose the proposal to wider audiences. Given that there are not many active contributors/maintainers in area Structured Streaming, I'd consider the discussion as "lazy consensus" to avoid being stuck. I'll give a final reminder early next week, and move forward if there are no outstanding objections.
On Wed, Jul 6, 2022 at 8:46 PM Jungtaek Lim <kabhwan.opensou...@gmail.com> wrote: > Hi dev, > > I would like to hear voices about deprecating Trigger.Once, and promoting > Trigger.AvailableNow as a replacement [1] in Structured Streaming. > (It doesn't mean we remove Trigger.Once now or near future. It probably > requires another discussion at some time.) > > Rationalization: > > The expected behavior of Trigger.Once is like reading all available data > after the last trigger and processing them. This holds true when the last > run was gracefully terminated, but there are cases streaming queries to not > be terminated gracefully. There is a possibility the last run may write the > offset for the new batch before termination, then a new run of Trigger.Once > only processes the data which was built in the latest unfinished batch and > doesn't process new data. > > The behavior is not deterministic from the users' point of view, as end > users wouldn't know whether the last run wrote the offset or not, unless > they look into the query's checkpoint by themselves. > > While Trigger.AvailableNow came to solve the scalability issue on > Trigger.Once, it also ensures that it tries to process all available data > at the point of time it is triggered, which consistently works as expected > behavior of Trigger.Once. > > Another issue on Trigger.Once is that it does not trigger a no-data batch > immediately. When the watermark is calculated in batch N, it takes effect > in batch N + 1. If the query is scheduled to be run per day, you can see > the output from the new watermark in the query run the next day. Thanks to > the behavior of Trigger.AvailableNow, it handles no-data batch as well > before termination of the query. > > Please review and let us know if you have any feedback or concerns on the > proposal. > > Thanks! > Jungtaek Lim > > 1. https://issues.apache.org/jira/browse/SPARK-36533 >