Ok that answers my questions. What are you keeping the source and sink as? Is it Kafka for both?
-------------------------------------------------- Dhruv Kumar PhD Candidate Department of Computer Science and Engineering University of Minnesota www.dhruvkumar.me > On Apr 26, 2018, at 16:37, TechnoMage <mla...@technomage.com> wrote: > > Yes NTP can still have skew. It may be measured in fractions of a second, > but with Flink that can be significant if you care about sub-second latency > accuracy. Since I have a 20 stage stream with 0.002 second latency it can > matter. > > Back pressure is the limiting of input due to the inability of down-stream > tasks to accept input. For example if you have a map that reads from a > database to enhance an element, that may limit earlier steps performance as > they can not push elements to it faster than it can read from the database. > This can flow all the way back to the source and slow records coming into the > system. > > Michael > >> On Apr 26, 2018, at 12:38 PM, Dhruv Kumar <gargdhru...@gmail.com >> <mailto:gargdhru...@gmail.com>> wrote: >> >> What do you mean by the time skew from one machine(source) to another(sink)? >> Do you mean the system time clocks of the source and sink may not be in >> sync. If I regularly use NTP to keep the system clocks in sync, will time >> skew still happen? >> >> Could you also elaborate on what do you mean by back pressure on source and >> how will it impact the latency calculations? >> >> Sorry if these are trivial questions. I am a bit new to the real world >> streaming systems. >> >> -------------------------------------------------- >> Dhruv Kumar >> PhD Candidate >> Department of Computer Science and Engineering >> University of Minnesota >> www.dhruvkumar.me <http://www.dhruvkumar.me/> >> >>> On Apr 26, 2018, at 13:26, TechnoMage <mla...@technomage.com >>> <mailto:mla...@technomage.com>> wrote: >>> >>> In a single machine system this may work ok. In a multi-machine system >>> this is not as reliable as the time skew from one machine (source) to >>> another (sink) can impact the measurements. This also does not account for >>> back presure on the source. We are using an external process to in >>> parallel read the source and output of the sink to measure the latency on a >>> single system clock. It does account for those issues, but of course does >>> not account for delivery delays in the messaging system (kafka in our >>> case). But, does measure real world latency as seen by the rest of the >>> system which is ultimately what matters to us. >>> >>> Michael >>> >>>> On Apr 26, 2018, at 12:01 PM, Dhruv Kumar <gargdhru...@gmail.com >>>> <mailto:gargdhru...@gmail.com>> wrote: >>>> >>>> Hi >>>> >>>> I was trying to compute the end-to-end-latency for each record processed >>>> by Flink. By end-to-end latency, I mean the difference between the time at >>>> which the record entered the Flink system (came at source) and the time at >>>> which the record is finally emitted into the sink. What is the best way to >>>> measure this? I was thinking of doing the following: >>>> 1. Add the current system timestamp to the record when the record arrives >>>> at Flink. >>>> 2. Add the current system timestamp to the record when the record is >>>> finally being emitted into the sink. >>>> 3. Take the difference between 2 and 1 offline when all the records have >>>> been written into the sink. >>>> >>>> Does this sound ok? >>>> >>>> Also, if I use Processing time characteristic for this end-to-end-latency, >>>> will it be fine? >>>> >>>> Thanks >>>> -------------------------------------------------- >>>> Dhruv Kumar >>>> PhD Candidate >>>> Department of Computer Science and Engineering >>>> University of Minnesota >>>> www.dhruvkumar.me <http://www.dhruvkumar.me/> >>> >> >