Yes, that is correct.

On Tue, Feb 6, 2018 at 4:56 PM, Jiewen Shao <fifistorm...@gmail.com> wrote:

> Vishnu, thanks for the reply
> so "event time" and "window end time" have nothing to do with current
> system timestamp, watermark moves with the higher value of "timestamp"
> field of the input and never moves down, is that correct understanding?
>
>
> On Tue, Feb 6, 2018 at 11:47 AM, Vishnu Viswanath <
> vishnu.viswanat...@gmail.com> wrote:
>
>> Hi
>>
>> 20 second corresponds to when the window state should be cleared. For the
>> late message to be dropped, it should come in after you receive a message
>> with event time >= window end time + 20 seconds.
>>
>> I wrote a post on this recently: http://vishnuviswana
>> th.com/spark_structured_streaming.html#watermark
>>
>> Thanks,
>> Vishnu
>>
>> On Tue, Feb 6, 2018 at 12:11 PM Jiewen Shao <fifistorm...@gmail.com>
>> wrote:
>>
>>> sample code:
>>>
>>> Let's say Xyz is POJO with a field called timestamp,
>>>
>>> regarding code withWatermark("timestamp", "20 seconds")
>>>
>>> I expect the msg with timestamp 20 seconds or older will be dropped,
>>> what does 20 seconds compare to? based on my test nothing was dropped no
>>> matter how old the timestamp is, what did i miss?
>>>
>>> Dataset<Xyz> xyz = lines
>>>         .as(Encoders.STRING())
>>>         .map((MapFunction<String, Xyz>) value -> mapper.readValue(value, 
>>> Xyz.class), Encoders.bean(Xyz.class));
>>>
>>> Dataset<Row> aggregated = xyz.withWatermark("timestamp", "20 seconds")
>>>         .groupBy(functions.window(xyz.col("timestamp"), "5 seconds"), 
>>> xyz.col("x") //tumbling window of size 5 seconds (timestamp)
>>>         ).count();
>>>
>>> Thanks
>>>
>>>
>

Reply via email to