Re: Detecting latecomer events in Spark structured streaming

2021-03-11 Thread Jungtaek Lim
Hi, If I remember correctly, I don't think Spark provides watermark value itself for the current batch to the public API. That said, if you're dealing with "event time" (and I guess you belong to this case as you worry about late events), unless you employ a new logical/physical plan to expose

Detecting latecomer events in Spark structured streaming

2021-03-08 Thread Sergey Oboguev
I have a Spark structured streaming based application that performs window(...) construct followed by aggregation. This construct discards latecomer events that arrive past the watermark. I need to be able to detect these late events to handle them out-of-band. The application maintains a raw