Re: Watermark on late data only

2023-10-10 Thread Raghu Angadi
I like some way to expose watermarks to the user. It does affect the processing of the records, so it is relevant for the users. `current_watermark()` is a good option. The implementation of this might be engine specific. But it is a very relevant concept for authors of streaming pipelines.

Re: Watermark on late data only

2023-10-10 Thread Jungtaek Lim
slight correction/clarification: We now take the "previous" watermark to determine the late record, because they are valid inputs for non-first stateful operators dropping records based on the same criteria would drop valid records from previous (upstream) stateful operators. Please look back

Re: Watermark on late data only

2023-10-10 Thread Jungtaek Lim
We wouldn't like to expose the internal mechanism to the public. As you are a very detail oriented engineer tracking major changes, you might notice that we "changed" the definition of late record while fixing late records. Previously the late record is defined as a record having event time

Re: Watermark on late data only

2023-10-10 Thread Bartosz Konieczny
Thank you for the clarification, Jungtaek  Indeed, it doesn't sound like a highly demanded feature from the end users, haven't seen that a lot on StackOverflow or mailing lists. I was just curious about the reasons. Using the arbitrary stateful processing could be indeed a workaround! But IMHO

Re: Watermark on late data only

2023-10-09 Thread Jungtaek Lim
Technically speaking, "late data" represents the data which cannot be processed due to the fact the engine threw out the state associated with the data already. That said, the only reason watermark does exist for streaming is to handle stateful operators. From the engine's point of view, there is

Watermark on late data only

2023-10-08 Thread Bartosz Konieczny
Hi, I've been analyzing the watermark propagation added in the 3.5.0 recently and had to return to the basics of watermarks. One question is still unanswered in my head. Why are the watermarks reserved to stateful queries? Can't they apply to the filtering late date out only? The reason is only