It might be better to keep something like "watermark usually consistently
moves forward". But "Elements that arrive with a smaller timestamp than the
current watermark are considered late data." has already given the order of
late data ts and watermark.


-Rui

On Thu, Jan 17, 2019 at 1:39 PM Jeff Klukas <jklu...@mozilla.com> wrote:

> Reuven - I don't think I realized it was possible to have late data with
> the global window, so I'm definitely learning things through this
> discussion.
>
> New suggested wording, then:
>
>     Elements that arrive with a smaller timestamp than the current
> watermark are considered late data.
>
> That says basically the same thing as the wording currently in the guide,
> but uses "smaller" (which implies a less-than-watermark comparison) rather
> than "later" (which folks have interpreted as a greater-than-watermark
> comparison).
>
> On Thu, Jan 17, 2019 at 3:40 PM Reuven Lax <re...@google.com> wrote:
>
>> Though it's not tied to window. You could be in the global window, so the
>> watermark never advances past the end of the window, yet still get late
>> data.
>>
>> On Thu, Jan 17, 2019, 11:14 AM Jeff Klukas <jklu...@mozilla.com wrote:
>>
>>> How about: "Once the watermark progresses past the end of a window, any
>>> further elements that arrive with a timestamp in that window are considered
>>> late data."
>>>
>>> On Thu, Jan 17, 2019 at 1:43 PM Rui Wang <ruw...@google.com> wrote:
>>>
>>>> Hi Community,
>>>>
>>>> In Beam programming guide [1], there is a sentence: "Data that arrives
>>>> with a timestamp after the watermark is considered *late data*"
>>>>
>>>> Seems like people get confused by it. For example, see Stackoverflow
>>>> comment [2]. Basically it makes people think that a event timestamp that is
>>>> bigger than watermark is considered late (due to that "after").
>>>>
>>>> Although there is a example right after this sentence to explain late
>>>> data, seems to me that this sentence is incomplete. The complete sentence
>>>> to me can be: "The watermark consistently advances from -inf to +inf. Data
>>>> that arrives with a timestamp after the watermark is considered late data."
>>>>
>>>> Am I understand correctly? Is there better description for the order of
>>>> late data and watermark? I would happy to send PR to update Beam
>>>> documentation.
>>>>
>>>> -Rui
>>>>
>>>> [1]: https://beam.apache.org/documentation/programming-guide/#windowing
>>>> [2]:
>>>> https://stackoverflow.com/questions/54141352/dataflow-to-process-late-and-out-of-order-data-for-batch-and-stream-messages/54188971?noredirect=1#comment95302476_54188971
>>>>
>>>>
>>>>

Reply via email to