Ok thanks Michael for all your help!

--------------------------------------------------
Dhruv Kumar
PhD Candidate
Department of Computer Science and Engineering
University of Minnesota
www.dhruvkumar.me

> On Apr 26, 2018, at 19:24, TechnoMage <mla...@technomage.com> wrote:
> 
> Yes, Kafka for source and sink which makes monitoring the Flink in/out easy.
> 
> Michael
> 
>> On Apr 26, 2018, at 5:27 PM, Dhruv Kumar <gargdhru...@gmail.com 
>> <mailto:gargdhru...@gmail.com>> wrote:
>> 
>> Ok that answers my questions.
>> 
>> What are you keeping the source and sink as? Is it Kafka for both?
>> 
>> --------------------------------------------------
>> Dhruv Kumar
>> PhD Candidate
>> Department of Computer Science and Engineering
>> University of Minnesota
>> www.dhruvkumar.me <http://www.dhruvkumar.me/>
>> 
>>> On Apr 26, 2018, at 16:37, TechnoMage <mla...@technomage.com 
>>> <mailto:mla...@technomage.com>> wrote:
>>> 
>>> Yes NTP can still have skew.  It may be measured in fractions of a second, 
>>> but with Flink that can be significant if you care about sub-second latency 
>>> accuracy.  Since I have a 20 stage stream with 0.002 second latency it can 
>>> matter.
>>> 
>>> Back pressure is the limiting of input due to the inability of down-stream 
>>> tasks to accept input.  For example if you have a map that reads from a 
>>> database to enhance an element, that may limit earlier steps performance as 
>>> they can not push elements to it faster than it can read from the database. 
>>>  This can flow all the way back to the source and slow records coming into 
>>> the system.
>>> 
>>> Michael
>>> 
>>>> On Apr 26, 2018, at 12:38 PM, Dhruv Kumar <gargdhru...@gmail.com 
>>>> <mailto:gargdhru...@gmail.com>> wrote:
>>>> 
>>>> What do you mean by the time skew from one machine(source) to 
>>>> another(sink)? Do you mean the system time clocks of the source and sink 
>>>> may not be in sync. If I regularly use NTP to keep the system clocks in 
>>>> sync, will time skew still happen?
>>>> 
>>>> Could you also elaborate on what do you mean by back pressure on source 
>>>> and how will it impact the latency calculations?
>>>> 
>>>> Sorry if these are trivial questions. I am a bit new to the real world 
>>>> streaming systems.
>>>> 
>>>> --------------------------------------------------
>>>> Dhruv Kumar
>>>> PhD Candidate
>>>> Department of Computer Science and Engineering
>>>> University of Minnesota
>>>> www.dhruvkumar.me <http://www.dhruvkumar.me/>
>>>> 
>>>>> On Apr 26, 2018, at 13:26, TechnoMage <mla...@technomage.com 
>>>>> <mailto:mla...@technomage.com>> wrote:
>>>>> 
>>>>> In a single machine system this may work ok.  In a multi-machine system 
>>>>> this is not as reliable as the time skew from one machine (source) to 
>>>>> another (sink) can impact the measurements.  This also does not account 
>>>>> for back presure on the source.  We are using an external process to in 
>>>>> parallel read the source and output of the sink to measure the latency on 
>>>>> a single system clock.  It does account for those issues, but of course 
>>>>> does not account for delivery delays in the messaging system (kafka in 
>>>>> our case).  But, does measure real world latency as seen by the rest of 
>>>>> the system which is ultimately what matters to us.
>>>>> 
>>>>> Michael
>>>>> 
>>>>>> On Apr 26, 2018, at 12:01 PM, Dhruv Kumar <gargdhru...@gmail.com 
>>>>>> <mailto:gargdhru...@gmail.com>> wrote:
>>>>>> 
>>>>>> Hi
>>>>>> 
>>>>>> I was trying to compute the end-to-end-latency for each record processed 
>>>>>> by Flink. By end-to-end latency, I mean the difference between the time 
>>>>>> at which the record entered the Flink system (came at source) and the 
>>>>>> time at which the record is finally emitted into the sink. What is the 
>>>>>> best way to measure this? I was thinking of doing the following:
>>>>>> 1. Add the current system timestamp to the record when the record 
>>>>>> arrives at Flink.
>>>>>> 2. Add the current system timestamp to the record when the record is 
>>>>>> finally being emitted into the sink.
>>>>>> 3. Take the difference between 2 and 1 offline when all the records have 
>>>>>> been written into the sink.
>>>>>> 
>>>>>> Does this sound ok?
>>>>>> 
>>>>>> Also, if I use Processing time characteristic for this 
>>>>>> end-to-end-latency, will it be fine?
>>>>>> 
>>>>>> Thanks
>>>>>> --------------------------------------------------
>>>>>> Dhruv Kumar
>>>>>> PhD Candidate
>>>>>> Department of Computer Science and Engineering
>>>>>> University of Minnesota
>>>>>> www.dhruvkumar.me <http://www.dhruvkumar.me/>
>>>>> 
>>>> 
>>> 
>> 
> 

Reply via email to