It might be better to keep something like "watermark usually consistently moves forward". But "Elements that arrive with a smaller timestamp than the current watermark are considered late data." has already given the order of late data ts and watermark.
-Rui On Thu, Jan 17, 2019 at 1:39 PM Jeff Klukas <jklu...@mozilla.com> wrote: > Reuven - I don't think I realized it was possible to have late data with > the global window, so I'm definitely learning things through this > discussion. > > New suggested wording, then: > > Elements that arrive with a smaller timestamp than the current > watermark are considered late data. > > That says basically the same thing as the wording currently in the guide, > but uses "smaller" (which implies a less-than-watermark comparison) rather > than "later" (which folks have interpreted as a greater-than-watermark > comparison). > > On Thu, Jan 17, 2019 at 3:40 PM Reuven Lax <re...@google.com> wrote: > >> Though it's not tied to window. You could be in the global window, so the >> watermark never advances past the end of the window, yet still get late >> data. >> >> On Thu, Jan 17, 2019, 11:14 AM Jeff Klukas <jklu...@mozilla.com wrote: >> >>> How about: "Once the watermark progresses past the end of a window, any >>> further elements that arrive with a timestamp in that window are considered >>> late data." >>> >>> On Thu, Jan 17, 2019 at 1:43 PM Rui Wang <ruw...@google.com> wrote: >>> >>>> Hi Community, >>>> >>>> In Beam programming guide [1], there is a sentence: "Data that arrives >>>> with a timestamp after the watermark is considered *late data*" >>>> >>>> Seems like people get confused by it. For example, see Stackoverflow >>>> comment [2]. Basically it makes people think that a event timestamp that is >>>> bigger than watermark is considered late (due to that "after"). >>>> >>>> Although there is a example right after this sentence to explain late >>>> data, seems to me that this sentence is incomplete. The complete sentence >>>> to me can be: "The watermark consistently advances from -inf to +inf. Data >>>> that arrives with a timestamp after the watermark is considered late data." >>>> >>>> Am I understand correctly? Is there better description for the order of >>>> late data and watermark? I would happy to send PR to update Beam >>>> documentation. >>>> >>>> -Rui >>>> >>>> [1]: https://beam.apache.org/documentation/programming-guide/#windowing >>>> [2]: >>>> https://stackoverflow.com/questions/54141352/dataflow-to-process-late-and-out-of-order-data-for-batch-and-stream-messages/54188971?noredirect=1#comment95302476_54188971 >>>> >>>> >>>>