Re: [Structured Streaming] Avoid one microbatch delay with multiple stateful operations

2024-01-24 Thread Andrzej Zera
Hi, I'm sorry but I got confused about the inner workings of late events watermark. You're completely right. Thanks for clarifying. Regards, Andrzej czw., 11 sty 2024 o 13:02 Jungtaek Lim napisaƂ(a): > Hi, > > The time window is closed and evicted as long as "eviction watermark" > passes the

Re: [Structured Streaming] Avoid one microbatch delay with multiple stateful operations

2024-01-11 Thread Jungtaek Lim
Hi, The time window is closed and evicted as long as "eviction watermark" passes the end of the window. Late events watermark only deals with discarding late events from "inputs". We did not introduce additional delay on the work of multiple stateful operators. We just allowed more late events to

Re: [Structured Streaming] Avoid one microbatch delay with multiple stateful operations

2024-01-10 Thread Ant Kutschera
Hi *Do we have any option to make streaming queries with multiple stateful operations output data without waiting this extra iteration? One of my ideas was to force an empty microbatch to run and propagate late events watermark without any new data. While this conceptually works, I didn't find a

[Structured Streaming] Avoid one microbatch delay with multiple stateful operations

2024-01-10 Thread Andrzej Zera
I'm struggling with the following issue in Spark >=3.4, related to multiple stateful operations. When spark.sql.streaming.statefulOperator.allowMultiple is enabled, Spark keeps track of two types of watermarks: eventTimeWatermarkForEviction and eventTimeWatermarkForLateEvents. Introducing them