Hi Xintong,

Thanks for the reply.

I have considered using the timestamp in the records to determine the backlog 
status, and decided to use watermark at the end. By definition, watermark is 
the time progress indication in the data stream. It indicates the stream’s 
event time has progressed to some specific time. On the other hand, timestamp 
in the records is usually used to generate the watermark. Therefore, it appears 
more appropriate and intuitive to calculate the event time lag by watermark and 
determine the backlog status. And by using the watermark, we can easily deal 
with the out-of-order and the idleness of the data.

Please let me know if you have further questions.

Best,
Xuannan
On Aug 10, 2023, 20:23 +0800, Xintong Song <tonysong...@gmail.com>, wrote:
> Thanks for preparing the FLIP, Xuannan.
>
> +1 in general.
>
> A quick question, could you explain why we are relying on the watermark for
> emitting the record attribute? Why not use timestamps in the records? I
> don't see any concern in using watermarks. Just wondering if there's any
> deep considerations behind this.
>
> Best,
>
> Xintong
>
>
>
> On Thu, Aug 3, 2023 at 3:03 PM Xuannan Su <suxuanna...@gmail.com> wrote:
>
> > Hi all,
> >
> > I am opening this thread to discuss FLIP-328: Allow source operators to
> > determine isProcessingBacklog based on watermark lag[1]. We had a several
> > discussions with Dong Ling about the design, and thanks for all the
> > valuable advice.
> >
> > The FLIP aims to target the use-case where user want to run a Flink job to
> > backfill historical data in a high throughput manner and continue
> > processing real-time data with low latency. Building upon the backlog
> > concept introduced in FLIP-309[2], this proposal enables sources to report
> > their status of processing backlog based on the watermark lag.
> >
> > We would greatly appreciate any comments or feedback you may have on this
> > proposal.
> >
> > Best,
> > Xuannan
> >
> >
> > [1]
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-328%3A+Allow+source+operators+to+determine+isProcessingBacklog+based+on+watermark+lag
> > [2]
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-309%3A+Support+using+larger+checkpointing+interval+when+source+is+processing+backlog
> >

Reply via email to