Bump to get a chance to expose the proposal to wider audiences.

Given that there are not many active contributors/maintainers in area
Structured Streaming, I'd consider the discussion as "lazy consensus" to
avoid being stuck. I'll give a final reminder early next week, and move
forward if there are no outstanding objections.

On Wed, Jul 6, 2022 at 8:46 PM Jungtaek Lim <kabhwan.opensou...@gmail.com>
wrote:

> Hi dev,
>
> I would like to hear voices about deprecating Trigger.Once, and promoting
> Trigger.AvailableNow as a replacement [1] in Structured Streaming.
> (It doesn't mean we remove Trigger.Once now or near future. It probably
> requires another discussion at some time.)
>
> Rationalization:
>
> The expected behavior of Trigger.Once is like reading all available data
> after the last trigger and processing them. This holds true when the last
> run was gracefully terminated, but there are cases streaming queries to not
> be terminated gracefully. There is a possibility the last run may write the
> offset for the new batch before termination, then a new run of Trigger.Once
> only processes the data which was built in the latest unfinished batch and
> doesn't process new data.
>
> The behavior is not deterministic from the users' point of view, as end
> users wouldn't know whether the last run wrote the offset or not, unless
> they look into the query's checkpoint by themselves.
>
> While Trigger.AvailableNow came to solve the scalability issue on
> Trigger.Once, it also ensures that it tries to process all available data
> at the point of time it is triggered, which consistently works as expected
> behavior of Trigger.Once.
>
> Another issue on Trigger.Once is that it does not trigger a no-data batch
> immediately. When the watermark is calculated in batch N, it takes effect
> in batch N + 1. If the query is scheduled to be run per day, you can see
> the output from the new watermark in the query run the next day. Thanks to
> the behavior of Trigger.AvailableNow, it handles no-data batch as well
> before termination of the query.
>
> Please review and let us know if you have any feedback or concerns on the
> proposal.
>
> Thanks!
> Jungtaek Lim
>
> 1. https://issues.apache.org/jira/browse/SPARK-36533
>

Reply via email to