Re: Question about watermark and window

2017-08-28 Thread Tony Wei
Hi Alijoscha,

It is very helpful to me to understand the behavior on such scenario. Thank
you very much!!!

Best Regards,
Tony Wei

2017-08-28 20:00 GMT+08:00 Aljoscha Krettek :

> Hi Tony,
>
> I think your analyses are correct. Especially, yes, if you re-read the
> data the (ts=3) data should still be considered late if both consumers read
> with the same speed. If, however, (ts=3) is read before the other consumer
> reads (ts=8) then it should not be considered late, as you said.
>
> Best,
> Aljoscha
>
> > On 24. Aug 2017, at 15:49, Tony Wei  wrote:
> >
> > Hi,
> >
> > Recently, I studied about watermark from Flink documents and blogs.
> >
> > I have some question about this scenario below.
> >
> > Suppose there are five clients sending events with different time to the
> topic on Kafka.
> > Topic has two partitions and five events' timestamp are (ts=1), (ts=2),
> (ts=3), (ts=8), (ts=9).
> > The Flink streaming job uses the following setting:
> > 1. use AscendingTimestampExtractor
> > 2. client time as timestamp
> > 3. use tumbling window with 5 unit window size
> > 4. not allow late event
> >
> > If the client events out of order like this.
> >   Partition A [(ts=1), (ts=8)]
> >   Partition B [(ts=2), (ts=9)]  <= (ts=3) delay
> > Should the window function emit [(ts=1), (ts=2)], keep [(ts=8), (ts=9]
> in state and drop out (ts=3) ?
> >
> > If all events has come, and then replay the job from the beginning, the
> partition state would be
> >   Partition A [(ts=1), (ts=8)]
> >   Partition B [(ts=2), (ts=9), (ts=3)]
> > Suppose two consumers fetch events with same speed, should the result be
> the same as above?
> > If consumer B reads (ts=3) earlier than consumer A reads (ts=8), would
> (ts=3) be placed in the window before watermark becomes to 8 and then emit
> [(ts=1), (ts=2), (ts=3)] as result?
> >
> > I wonder if those questions are all correct. If not, is there any
> mechanisms about watermark and window in Flink that I missed.
> >
> > Thank for your help.
> >
> > Best Regards,
> > Tony Wei
> >
>
>


Re: Question about watermark and window

2017-08-28 Thread Aljoscha Krettek
Hi Tony,

I think your analyses are correct. Especially, yes, if you re-read the data the 
(ts=3) data should still be considered late if both consumers read with the 
same speed. If, however, (ts=3) is read before the other consumer reads (ts=8) 
then it should not be considered late, as you said.

Best,
Aljoscha

> On 24. Aug 2017, at 15:49, Tony Wei  wrote:
> 
> Hi,
> 
> Recently, I studied about watermark from Flink documents and blogs.
> 
> I have some question about this scenario below.
> 
> Suppose there are five clients sending events with different time to the 
> topic on Kafka.
> Topic has two partitions and five events' timestamp are (ts=1), (ts=2), 
> (ts=3), (ts=8), (ts=9).
> The Flink streaming job uses the following setting:
> 1. use AscendingTimestampExtractor
> 2. client time as timestamp
> 3. use tumbling window with 5 unit window size
> 4. not allow late event
> 
> If the client events out of order like this.
>   Partition A [(ts=1), (ts=8)]
>   Partition B [(ts=2), (ts=9)]  <= (ts=3) delay
> Should the window function emit [(ts=1), (ts=2)], keep [(ts=8), (ts=9] in 
> state and drop out (ts=3) ?
> 
> If all events has come, and then replay the job from the beginning, the 
> partition state would be
>   Partition A [(ts=1), (ts=8)]
>   Partition B [(ts=2), (ts=9), (ts=3)]
> Suppose two consumers fetch events with same speed, should the result be the 
> same as above?
> If consumer B reads (ts=3) earlier than consumer A reads (ts=8), would (ts=3) 
> be placed in the window before watermark becomes to 8 and then emit [(ts=1), 
> (ts=2), (ts=3)] as result?
> 
> I wonder if those questions are all correct. If not, is there any mechanisms 
> about watermark and window in Flink that I missed.
> 
> Thank for your help.
> 
> Best Regards,
> Tony Wei
> 



Question about watermark and window

2017-08-24 Thread Tony Wei
Hi,

Recently, I studied about watermark from Flink documents and blogs.

I have some question about this scenario below.

Suppose there are five clients sending events with different time to the
topic on Kafka.
Topic has two partitions and five events' timestamp are (ts=1), (ts=2),
(ts=3), (ts=8), (ts=9).
The Flink streaming job uses the following setting:
1. use AscendingTimestampExtractor
2. client time as timestamp
3. use tumbling window with 5 unit window size
4. not allow late event

If the client events out of order like this.
  Partition A [(ts=1), (ts=8)]
  Partition B [(ts=2), (ts=9)]  <= (ts=3) delay
Should the window function emit [(ts=1), (ts=2)], keep [(ts=8), (ts=9] in
state and drop out (ts=3) ?

If all events has come, and then replay the job from the beginning, the
partition state would be
  Partition A [(ts=1), (ts=8)]
  Partition B [(ts=2), (ts=9), (ts=3)]
Suppose two consumers fetch events with same speed, should the result be
the same as above?
If consumer B reads (ts=3) earlier than consumer A reads (ts=8), would
(ts=3) be placed in the window before watermark becomes to 8 and then emit
[(ts=1), (ts=2), (ts=3)] as result?

I wonder if those questions are all correct. If not, is there any
mechanisms about watermark and window in Flink that I missed.

Thank for your help.

Best Regards,
Tony Wei