Ok that answers my questions.

What are you keeping the source and sink as? Is it Kafka for both?

--------------------------------------------------
Dhruv Kumar
PhD Candidate
Department of Computer Science and Engineering
University of Minnesota
www.dhruvkumar.me

> On Apr 26, 2018, at 16:37, TechnoMage <mla...@technomage.com> wrote:
> 
> Yes NTP can still have skew.  It may be measured in fractions of a second, 
> but with Flink that can be significant if you care about sub-second latency 
> accuracy.  Since I have a 20 stage stream with 0.002 second latency it can 
> matter.
> 
> Back pressure is the limiting of input due to the inability of down-stream 
> tasks to accept input.  For example if you have a map that reads from a 
> database to enhance an element, that may limit earlier steps performance as 
> they can not push elements to it faster than it can read from the database.  
> This can flow all the way back to the source and slow records coming into the 
> system.
> 
> Michael
> 
>> On Apr 26, 2018, at 12:38 PM, Dhruv Kumar <gargdhru...@gmail.com 
>> <mailto:gargdhru...@gmail.com>> wrote:
>> 
>> What do you mean by the time skew from one machine(source) to another(sink)? 
>> Do you mean the system time clocks of the source and sink may not be in 
>> sync. If I regularly use NTP to keep the system clocks in sync, will time 
>> skew still happen?
>> 
>> Could you also elaborate on what do you mean by back pressure on source and 
>> how will it impact the latency calculations?
>> 
>> Sorry if these are trivial questions. I am a bit new to the real world 
>> streaming systems.
>> 
>> --------------------------------------------------
>> Dhruv Kumar
>> PhD Candidate
>> Department of Computer Science and Engineering
>> University of Minnesota
>> www.dhruvkumar.me <http://www.dhruvkumar.me/>
>> 
>>> On Apr 26, 2018, at 13:26, TechnoMage <mla...@technomage.com 
>>> <mailto:mla...@technomage.com>> wrote:
>>> 
>>> In a single machine system this may work ok.  In a multi-machine system 
>>> this is not as reliable as the time skew from one machine (source) to 
>>> another (sink) can impact the measurements.  This also does not account for 
>>> back presure on the source.  We are using an external process to in 
>>> parallel read the source and output of the sink to measure the latency on a 
>>> single system clock.  It does account for those issues, but of course does 
>>> not account for delivery delays in the messaging system (kafka in our 
>>> case).  But, does measure real world latency as seen by the rest of the 
>>> system which is ultimately what matters to us.
>>> 
>>> Michael
>>> 
>>>> On Apr 26, 2018, at 12:01 PM, Dhruv Kumar <gargdhru...@gmail.com 
>>>> <mailto:gargdhru...@gmail.com>> wrote:
>>>> 
>>>> Hi
>>>> 
>>>> I was trying to compute the end-to-end-latency for each record processed 
>>>> by Flink. By end-to-end latency, I mean the difference between the time at 
>>>> which the record entered the Flink system (came at source) and the time at 
>>>> which the record is finally emitted into the sink. What is the best way to 
>>>> measure this? I was thinking of doing the following:
>>>> 1. Add the current system timestamp to the record when the record arrives 
>>>> at Flink.
>>>> 2. Add the current system timestamp to the record when the record is 
>>>> finally being emitted into the sink.
>>>> 3. Take the difference between 2 and 1 offline when all the records have 
>>>> been written into the sink.
>>>> 
>>>> Does this sound ok?
>>>> 
>>>> Also, if I use Processing time characteristic for this end-to-end-latency, 
>>>> will it be fine?
>>>> 
>>>> Thanks
>>>> --------------------------------------------------
>>>> Dhruv Kumar
>>>> PhD Candidate
>>>> Department of Computer Science and Engineering
>>>> University of Minnesota
>>>> www.dhruvkumar.me <http://www.dhruvkumar.me/>
>>> 
>> 
> 

Reply via email to