Reuven - I don't think I realized it was possible to have late data with the global window, so I'm definitely learning things through this discussion.
New suggested wording, then: Elements that arrive with a smaller timestamp than the current watermark are considered late data. That says basically the same thing as the wording currently in the guide, but uses "smaller" (which implies a less-than-watermark comparison) rather than "later" (which folks have interpreted as a greater-than-watermark comparison). On Thu, Jan 17, 2019 at 3:40 PM Reuven Lax <re...@google.com> wrote: > Though it's not tied to window. You could be in the global window, so the > watermark never advances past the end of the window, yet still get late > data. > > On Thu, Jan 17, 2019, 11:14 AM Jeff Klukas <jklu...@mozilla.com wrote: > >> How about: "Once the watermark progresses past the end of a window, any >> further elements that arrive with a timestamp in that window are considered >> late data." >> >> On Thu, Jan 17, 2019 at 1:43 PM Rui Wang <ruw...@google.com> wrote: >> >>> Hi Community, >>> >>> In Beam programming guide [1], there is a sentence: "Data that arrives >>> with a timestamp after the watermark is considered *late data*" >>> >>> Seems like people get confused by it. For example, see Stackoverflow >>> comment [2]. Basically it makes people think that a event timestamp that is >>> bigger than watermark is considered late (due to that "after"). >>> >>> Although there is a example right after this sentence to explain late >>> data, seems to me that this sentence is incomplete. The complete sentence >>> to me can be: "The watermark consistently advances from -inf to +inf. Data >>> that arrives with a timestamp after the watermark is considered late data." >>> >>> Am I understand correctly? Is there better description for the order of >>> late data and watermark? I would happy to send PR to update Beam >>> documentation. >>> >>> -Rui >>> >>> [1]: https://beam.apache.org/documentation/programming-guide/#windowing >>> [2]: >>> https://stackoverflow.com/questions/54141352/dataflow-to-process-late-and-out-of-order-data-for-batch-and-stream-messages/54188971?noredirect=1#comment95302476_54188971 >>> >>> >>>