Hi Community,

In Beam programming guide [1], there is a sentence: "Data that arrives with
a timestamp after the watermark is considered *late data*"

Seems like people get confused by it. For example, see Stackoverflow
comment [2]. Basically it makes people think that a event timestamp that is
bigger than watermark is considered late (due to that "after").

Although there is a example right after this sentence to explain late data,
seems to me that this sentence is incomplete. The complete sentence to me
can be: "The watermark consistently advances from -inf to +inf. Data that
arrives with a timestamp after the watermark is considered late data."

Am I understand correctly? Is there better description for the order of
late data and watermark? I would happy to send PR to update Beam
documentation.

-Rui

[1]: https://beam.apache.org/documentation/programming-guide/#windowing
[2]:
https://stackoverflow.com/questions/54141352/dataflow-to-process-late-and-out-of-order-data-for-batch-and-stream-messages/54188971?noredirect=1#comment95302476_54188971

Reply via email to