> @Matthias > And that's what he says about the 95th percentile! Imagine what he would say about the 50th :P
I should have kept watching another 10 minutes. He gets around to covering the 50th, and let's just say he is not a fan: https://youtu.be/lJ8ydIuPFeU?t=786 On Fri, May 15, 2020 at 3:15 PM John Roesler <vvcep...@apache.org> wrote: > Thanks, Sophie! > > I think this makes perfect sense. It will be much more intuitive to use > the metric for the stated motivation this way. I’d be in favor of the > proposal after this update. > > Thanks again for taking this on, > -John > > On Fri, May 15, 2020, at 17:07, Sophie Blee-Goldman wrote: > > Hi all, > > > > I'd like to clarify/modify one aspect of this KIP, which is to record the > > staleness > > at the *completion* of the record's processing by the operator/task in > > question, > > rather than on the intake. The task-level metrics will be recorded at the > > sink > > node instead of at the source, and the operator-level metrics will be > > recorded > > at the end of the operation. > > > > The stated purpose and intended usefulness of this KIP is to give users a > > way > > to gauge roughly how long it takes for a record to be reflected in the > > "results", > > whether these results are being read from an output topic or through IQ. > To > > take the IQ example, the results of a record are obviously not visible > until > > *after* that node has finished processing it. The staleness, while still > > potentially > > useful as it can impact the *way* a record is processed in a stateful and > > time- > > dependent operator, is not part of the problem this KIP specifically set > out > > to solve. > > > > In light of this I think it's appropriate to revert the name change back > to > > include > > latency, since "staleness" as described above only makes sense when > > measuring > > relative to the arrival of a record at a task/node. I'd propose to save > the > > term > > "staleness" for that particular meaning, and adopt Matthias's suggestion > of > > "record-e2e-latency" for this. > > > > Thanks for hanging in there all. Please let me know if you have any > > concerns about > > this change! > > > > Sophie > > > > On Fri, May 15, 2020 at 2:25 PM Matthias J. Sax <mj...@apache.org> > wrote: > > > > > I am also happy with max/min/99/90. And I buy your naming argument > about > > > staleness vs latency. > > > > > > > > > -Matthias > > > > > > On 5/15/20 12:24 PM, Boyang Chen wrote: > > > > Hey Sophie, > > > > > > > > 90/99/min/max make sense to me. > > > > > > > > On Fri, May 15, 2020 at 12:20 PM Sophie Blee-Goldman < > > > sop...@confluent.io> > > > > wrote: > > > > > > > >> @Matthias > > > >> Regarding tracking the 50th percentile, I'll refer you to the 4:53 > mark > > > of > > > >> the video > > > >> *you* linked: https://youtu.be/lJ8ydIuPFeU?t=293 > > > >> > > > >> And that's what he says about the 95th percentile! Imagine what he > would > > > >> say about > > > >> the 50th :P > > > >> > > > >> But seriously, since we can't seem to agree that the mean or 50th > > > >> percentile is actually > > > >> useful I'm inclined to resurrect my original proposal, neither. But > I > > > think > > > >> that's a good > > > >> argument against the 75th, which I admittedly chose somewhat > > > arbitrarily as > > > >> an > > > >> intermediate between the 50th and the higher percentiles. How about: > > > >> > > > >> -max > > > >> -p99 > > > >> -p90 > > > >> -min > > > >> > > > >> with p50/mean still up for debate if anyone feels strongly for > either of > > > >> them. > > > >> > > > >> Regarding the name, I've already flip-flopped on this so I'm > definitely > > > >> still open to > > > >> further arguments. But the reason for changing it from > > > end-to-end-latency > > > >> (which > > > >> is similar to what you propose) is that this metric technically > reflects > > > >> how old (ie how "stale") > > > >> the record is when it's *received* by the operator, not when it's > > > processed > > > >> by the operator. > > > >> It seemed like there was the potential for confusion that > > > >> "end-to-end-latency" might > > > >> represent the latency from the event creation to the time the > processor > > > is > > > >> done > > > >> processing it. > > > >> > > > >> @John > > > >> I'd rather err on the side of "not-enough" metrics as we can always > add > > > >> this to the > > > >> stateless metrics later on. If we decide to measure the time at > every > > > node > > > >> and don't > > > >> find any evidence of a serious performance impact, and users > indicate > > > they > > > >> would > > > >> like to see this metric at all nodes, then we can easily start > reporting > > > >> them as well. > > > >> WDYT? > > > >> > > > >> That said, sink nodes seem like a reasonable exception to the rule. > > > >> Obviously users > > > >> should be able to detect the time when the record reaches the output > > > topic > > > >> but that > > > >> still leaves a gap in understanding how long the production latency > was. > > > >> This mirrors > > > >> the consumption latency that is exposed by the task-level metrics, > which > > > >> are measured > > > >> at the source node. For good symmetry what if we actually expose > both > > > the > > > >> source > > > >> and sink latency at the task-level? ie report both sets of > statistical > > > >> measurements with > > > >> the additional tag -source/-sink > > > >> > > > >> @Bill > > > >> Thanks for the comment about regarding the min! I hadn't considered > that > > > >> and it's > > > >> quite useful to think about how and what is useful from a users > point of > > > >> view. > > > >> > > > >> Regarding your second. point, I'm inclined to leave that as an > > > >> implementation detail > > > >> but my take would be that the user should be allowed to control the > > > record > > > >> timestamp > > > >> used for this with the timestamp extractor. My impression is that > users > > > may > > > >> often embed > > > >> the actual event time in the payload for whatever reason, and this > > > >> represents the "true" > > > >> timestamp as far as the Streams topology is concerned. > > > >> > > > >> > > > >> On Fri, May 15, 2020 at 11:05 AM Bill Bejeck <bbej...@gmail.com> > wrote: > > > >> > > > >>> Thanks for the KIP, Sophie, this will be a useful metric to add. > > > >>> > > > >>> Regarding tracking min, I think it could be valuable for users to > > > >> discern > > > >>> which part of their topologies are more efficient since this is a > > > >>> task-level metric. I realize everyone seems to be on board with > > > >> including > > > >>> min anyway, but I wanted to add my 2 cents on this topic should we > > > decide > > > >>> to revisit adding min or not. > > > >>> > > > >>> I do have a question regarding the calculation of staleness. > > > >>> Is there going to be a consideration for timestamp extractors? > Users > > > >> could > > > >>> prefer to use a timestamp embedded in the payload, and it could > skew > > > the > > > >>> measurements. > > > >>> I was wondering if we should specify in the KIP if setting the > arrival > > > >> time > > > >>> is always going to come from the record timestamp, or is this an > > > >>> implementation detail we can cover in the PR? > > > >>> > > > >>> Thanks! > > > >>> Bill > > > >>> > > > >>> On Fri, May 15, 2020 at 1:11 AM Matthias J. Sax <mj...@apache.org> > > > >> wrote: > > > >>> > > > >>>> Thanks for the KIP Sophie. > > > >>>> > > > >>>> I think it's not useful to record the avg/mean; it sensitive to > > > >>>> outliers. We should rather track the median (50th percentile). > > > >>>> > > > >>>> Not sure if tracking min is useful, but I am also ok to track it. > > > >>>> > > > >>>> However, I find it odd to track 75th percentile. Standard measures > > > >> would > > > >>>> the 90th or 95th -- I guess we don't need both, so maybe picking > 90th > > > >>>> might be more useful? > > > >>>> > > > >>>> About the name: "staleness" wound really odd, and if fact the > metric > > > >>>> does capture "latency" so we should call it "latency". I > understand > > > the > > > >>>> issue that we already have a latency metric. So maybe we could > call it > > > >>>> `record-e2e-latency-*` ? > > > >>>> > > > >>>> While I agree that we should include out-or-order data (the KIP > should > > > >>>> talk about `out-of-order` data, not `late` data; data is only > `late` > > > if > > > >>>> it's out-of-order and if it's dropped), I don't really understand > why > > > >>>> the new metric would help to configure grace period or retention > time? > > > >>>> As you mention in the KIP, both are define as max difference of > > > >>>> `event-time - stream-time` and thus the new metric that takes > > > >>>> system-/wallclock-time into account does not seem to help at all. > > > >>>> > > > >>>> > > > >>>> Btw: there is a great talk about "How NOT to Measure Latency" by > Gil > > > >>>> Tene: https://www.youtube.com/watch?v=lJ8ydIuPFeU > > > >>>> > > > >>>> > > > >>>> -Matthias > > > >>>> > > > >>>> > > > >>>> On 5/14/20 7:17 PM, John Roesler wrote: > > > >>>>> Hi Sophie, > > > >>>>> > > > >>>>> It seems like there would still be plenty of use cases for > recording > > > >>>>> this metric at all processors and not just stateful ones, but I'm > > > >> happy > > > >>>>> to suspend my arguments for now. Since you're proposing to keep > > > >>>>> them at the processor-node level, it will be seamless later to > add > > > >>>>> in the stateless processors if we want. As a wise man once said, > > > >>>>> "Adding is always easier than removing." > > > >>>>> > > > >>>>> Regarding the time measurement, it's an implementation detail > > > >>>>> we don't need to consider in the KIP. Nevertheless, I'd greatly > > > >>>>> prefer to measure the system time again when recording the > > > >>>>> metric. I don't think we've seen any evidence that proves this > > > >>>>> would harm performance, and the amount of inaccuracy using > > > >>>>> the cached system time could incur is actually substantial. But, > > > >>>>> if you want to just "not mention this" in the KIP, we can defer > to > > > >>>>> the actual PR discussion, at which time we're in a better > position > > > >>>>> to use benchmarks, etc., to make the call. > > > >>>>> > > > >>>>> Along the lines of the measurement accuracy discussion, one > > > >>>>> minor thought I had is that maybe we should consider measuring > > > >>>>> the task staleness metric at the sink, rather than the source, so > > > >> that > > > >>>>> it includes the processing latency of the task itself, not just > the > > > >>>> latency > > > >>>>> of everything up to, but not including, the task (which seems > > > >> confusing > > > >>>>> for users). I guess this could also be an implementation detail, > > > >>> though. > > > >>>>> > > > >>>>> Thanks for the update, > > > >>>>> -John > > > >>>>> > > > >>>>> On Thu, May 14, 2020, at 13:31, Sophie Blee-Goldman wrote: > > > >>>>>> Hey all, > > > >>>>>> > > > >>>>>> After discussing with Bruno I'd like to propose a small > amendment, > > > >>>>>> which is to record the processor-node-level metrics only for > > > >>> *stateful* > > > >>>>>> *operators*. They would still be considered a > "processor-node-level" > > > >>>>>> metric and not a "state-store-level" metric as the staleness is > > > >> still > > > >>>>>> a property of the node rather than of the state itself. > However, it > > > >>>> seems > > > >>>>>> that this information is primarily useful for stateful operators > > > >> that > > > >>>> might > > > >>>>>> be exposing state via IQ or otherwise dependent on the record > time > > > >>>>>> unlike a stateless operator. > > > >>>>>> > > > >>>>>> It's worth calling out that recent performance improvements to > the > > > >>>> metrics > > > >>>>>> framework mean that we no longer fetch the system time at the > > > >> operator > > > >>>>>> level, but only once per task. In other words the system time > is not > > > >>>> updated > > > >>>>>> between each process as a record flows through the subtopology, > so > > > >>>>>> debugging the processor-level latency via the stateleness will > not > > > >> be > > > >>>>>> possible.Note that this doesn't mean the operator-level metrics > are > > > >>> not > > > >>>>>> *useful* relative to the task-level metric. Upstream caching > and/or > > > >>>>>> suppression > > > >>>>>> can still cause a record's staleness at some downstream stateful > > > >>>> operator > > > >>>>>> to deviate from the task-level staleness (recorded at the source > > > >>> node). > > > >>>>>> > > > >>>>>> Please let me know if you have any concerns about this change. > The > > > >>>>>> KIP has been updated with the new proposal > > > >>>>>> > > > >>>>>> On Thu, May 14, 2020 at 3:04 AM Bruno Cadonna < > br...@confluent.io> > > > >>>> wrote: > > > >>>>>> > > > >>>>>>> Hi Sophie, > > > >>>>>>> > > > >>>>>>> Thank you for the KIP. > > > >>>>>>> > > > >>>>>>> The KIP looks good to me. > > > >>>>>>> > > > >>>>>>> 50th percentile: > > > >>>>>>> I think we do not need it now. If we need it, we can add it. > Here > > > >> the > > > >>>>>>> old truism applies: Adding is always easier than removing. > > > >>>>>>> > > > >>>>>>> processor-node-level metrics: > > > >>>>>>> I think it is good to have the staleness metrics also on > > > >>>>>>> processor-node-level. If we do not want to record them on all > > > >>>>>>> processor nodes, you could restrict the recording to stateful > > > >>>>>>> processor-nodes, since those are the ones that would benefit > most > > > >>> from > > > >>>>>>> the staleness metrics. > > > >>>>>>> > > > >>>>>>> Best, > > > >>>>>>> Bruno > > > >>>>>>> > > > >>>>>>> On Thu, May 14, 2020 at 4:15 AM Sophie Blee-Goldman < > > > >>>> sop...@confluent.io> > > > >>>>>>> wrote: > > > >>>>>>>> > > > >>>>>>>> Yeah, the specific reason was just to align with the current > > > >>> metrics. > > > >>>>>>>> > > > >>>>>>>> Is it better to conform than to be right? History has a lot > to say > > > >>> on > > > >>>>>>> that > > > >>>>>>>> matter > > > >>>>>>>> but I'm not sure how much of it applies to the fine details of > > > >>> metrics > > > >>>>>>>> naming :P > > > >>>>>>>> > > > >>>>>>>> More seriously, I figured if people are looking at this metric > > > >>> they're > > > >>>>>>>> likely to > > > >>>>>>>> be looking at all the others. Then naming this one "-mean" > would > > > >>>> probably > > > >>>>>>>> lead some to conclude that the "-avg" suffix in the other > metrics > > > >>> has > > > >>>> a > > > >>>>>>>> different meaning. > > > >>>>>>>> > > > >>>>>>>> As for the percentiles, I actually like p99 (and p75) better. > I'll > > > >>>> swap > > > >>>>>>>> that out > > > >>>>>>>> > > > >>>>>>>> On Wed, May 13, 2020 at 7:07 PM John Roesler < > vvcep...@apache.org > > > >>> > > > >>>>>>> wrote: > > > >>>>>>>> > > > >>>>>>>>> Thanks Sophie, > > > >>>>>>>>> > > > >>>>>>>>> I hope this isn't too nit-picky, but is there a reason to > choose > > > >>>> "avg" > > > >>>>>>>>> instead > > > >>>>>>>>> of "mean"? Maybe this is too paranoid, and I might be > > > >> oversensitive > > > >>>>>>> because > > > >>>>>>>>> of the mistake I just made earlier, but it strikes me that > "avg" > > > >> is > > > >>>>>>>>> actually > > > >>>>>>>>> ambiguous, as it refers to a family of statistics, whereas > "mean" > > > >>> is > > > >>>>>>>>> specific. > > > >>>>>>>>> I see other Kafka metrics with "avg", but none with "mean"; > was > > > >>> that > > > >>>>>>> the > > > >>>>>>>>> reason? If so, I'm +1. > > > >>>>>>>>> > > > >>>>>>>>> Regarding the names of the percentile, I actually couldn't > find > > > >>> _any_ > > > >>>>>>> other > > > >>>>>>>>> metrics that use percentile. Was there a reason to choose > "99th" > > > >> as > > > >>>>>>> opposed > > > >>>>>>>>> to "p99" or any other scheme? This is not a criticism, I'm > just > > > >>>>>>> primarily > > > >>>>>>>>> asking > > > >>>>>>>>> for consistency's sake. > > > >>>>>>>>> > > > >>>>>>>>> Thanks again, > > > >>>>>>>>> -John > > > >>>>>>>>> > > > >>>>>>>>> On Wed, May 13, 2020, at 19:19, Sophie Blee-Goldman wrote: > > > >>>>>>>>>> Alright, I can get behind adding the min metric for the > sake of > > > >>>>>>> pretty > > > >>>>>>>>>> graphs > > > >>>>>>>>>> (and trivial computation). > > > >>>>>>>>>> > > > >>>>>>>>>> I'm still on the fence regarding the mean (or 50th > percentile) > > > >>> but I > > > >>>>>>> can > > > >>>>>>>>> see > > > >>>>>>>>>> how users might expect it and find it a bit disorienting > not to > > > >>>>>>> have. So > > > >>>>>>>>> the > > > >>>>>>>>>> updated proposed metrics are > > > >>>>>>>>>> > > > >>>>>>>>>> > > > >>>>>>>>>> - record-staleness-max [ms] > > > >>>>>>>>>> - record-staleness-99th [ms] *(99th percentile)* > > > >>>>>>>>>> - record-staleness-75th [ms] *(75th percentile)* > > > >>>>>>>>>> - record-staleness-avg [ms] *(mean)* > > > >>>>>>>>>> - record-staleness-min [ms] > > > >>>>>>>>>> > > > >>>>>>>>>> > > > >>>>>>>>>> On Wed, May 13, 2020 at 4:42 PM John Roesler < > > > >> vvcep...@apache.org > > > >>>> > > > >>>>>>>>> wrote: > > > >>>>>>>>>> > > > >>>>>>>>>>> Oh boy, I never miss an opportunity to embarrass myself. I > > > >> guess > > > >>>>>>> the > > > >>>>>>>>> mean > > > >>>>>>>>>>> seems more interesting to me than the median, but neither > are > > > >> as > > > >>>>>>>>>>> interesting as the higher percentiles (99th and max). > > > >>>>>>>>>>> > > > >>>>>>>>>>> Min isn’t really important for any SLAs, but it does round > out > > > >>> the > > > >>>>>>>>> mental > > > >>>>>>>>>>> picture of the distribution. I’ve always graphed min along > with > > > >>> the > > > >>>>>>>>> other > > > >>>>>>>>>>> metrics to help me understand how fast the system can be, > which > > > >>>>>>> helps > > > >>>>>>>>> in > > > >>>>>>>>>>> optimization decisions. It’s also a relatively inexpensive > > > >> metric > > > >>>>>>> to > > > >>>>>>>>>>> compute, so it might be nice to just throw it in. > > > >>>>>>>>>>> > > > >>>>>>>>>>> On Wed, May 13, 2020, at 18:18, Sophie Blee-Goldman wrote: > > > >>>>>>>>>>>> G1: > > > >>>>>>>>>>>> I was considering it as the "end-to-end latency *up* to > the > > > >>>>>>> specific > > > >>>>>>>>>>> task" > > > >>>>>>>>>>>> but > > > >>>>>>>>>>>> I'm happy with "record-staleness" if that drives the point > > > >> home > > > >>>>>>>>> better. > > > >>>>>>>>>>> So > > > >>>>>>>>>>>> it's the > > > >>>>>>>>>>>> "staleness of the record when it is received by that > task" -- > > > >>>>>>> will > > > >>>>>>>>> update > > > >>>>>>>>>>>> the KIP > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> B1/J: > > > >>>>>>>>>>>> I'm struggling to imagine a case where the min would > actually > > > >> be > > > >>>>>>>>> useful, > > > >>>>>>>>>>>> rather than > > > >>>>>>>>>>>> just intellectually interesting. I don't feel strongly > that we > > > >>>>>>>>> shouldn't > > > >>>>>>>>>>>> add it, but that's > > > >>>>>>>>>>>> why I didn't include it from the start. Can you enlighten > me > > > >>>>>>> with an > > > >>>>>>>>>>>> example? > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> I was also vaguely concerned about the overhead of adding > > > >>>>>>> multiple > > > >>>>>>>>>>>> percentile > > > >>>>>>>>>>>> metrics. Do we have any data to indicate what kind of > > > >>> performance > > > >>>>>>>>> hit we > > > >>>>>>>>>>>> take on > > > >>>>>>>>>>>> metrics computation? > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> Also, not to be too pedantic but the 50th percentile > would be > > > >>> the > > > >>>>>>>>> median > > > >>>>>>>>>>>> not the > > > >>>>>>>>>>>> mean. Would you propose to add the mean *and* the 50th > > > >>>>>>> percentile, or > > > >>>>>>>>>>> just > > > >>>>>>>>>>>> one > > > >>>>>>>>>>>> of the two? > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> Thanks all! > > > >>>>>>>>>>>> Sophie > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> On Wed, May 13, 2020 at 3:34 PM John Roesler < > > > >>>>>>> vvcep...@apache.org> > > > >>>>>>>>>>> wrote: > > > >>>>>>>>>>>> > > > >>>>>>>>>>>>> Hello all, and thanks for the KIP, Sophie, > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>> Just some comments on the discussion so far: > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>> B2/G1: > > > >>>>>>>>>>>>> In principle, it shouldn't matter whether we report > "spans" > > > >> or > > > >>>>>>>>>>>>> "end-to-end" latency. But in practice, some of the spans > are > > > >>>>>>> pretty > > > >>>>>>>>>>>>> difficult to really measure (like time spent waiting in > the > > > >>>>>>>>> topics, or > > > >>>>>>>>>>>>> time from the event happening to the ETL producer > choosing to > > > >>>>>>> send > > > >>>>>>>>> it, > > > >>>>>>>>>>>>> or time spent in send/receive buffers, etc., etc. > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>> In other words, it's practically easier to compute spans > by > > > >>>>>>>>> subtracting > > > >>>>>>>>>>>>> e2e latencies than it is to compute e2e latencies by > adding > > > >>>>>>> spans. > > > >>>>>>>>> You > > > >>>>>>>>>>>>> can even just consider that the span computation from e2e > > > >>>>>>> always > > > >>>>>>>>> just > > > >>>>>>>>>>>>> involves subtracting two numbers, whereas computing e2e > > > >> latency > > > >>>>>>>>> from > > > >>>>>>>>>>>>> spans involves adding _all_ the spans leading up to the > end > > > >> you > > > >>>>>>>>> care > > > >>>>>>>>>>> about. > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>> It seems like people really prefer to have spans when > they > > > >> are > > > >>>>>>>>>>> debugging > > > >>>>>>>>>>>>> latency problems, whereas e2e latency is a more general > > > >>>>>>> measurement > > > >>>>>>>>>>>>> that basically every person/application cares about and > > > >> should > > > >>>>>>> be > > > >>>>>>>>>>>>> monitoring. > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>> Altogether, it really seem to provide more value to more > > > >>>>>>> people if > > > >>>>>>>>> we > > > >>>>>>>>>>>>> report > > > >>>>>>>>>>>>> e2e latencies. Regarding "record-staleness" as a name, I > > > >> think > > > >>>>>>> I > > > >>>>>>>>> have > > > >>>>>>>>>>> no > > > >>>>>>>>>>>>> preference, I'd defer to other peoples' intuition. > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>> G2: > > > >>>>>>>>>>>>> I think the processor-node metric is nice, since the > inside > > > >> of > > > >>>>>>> a > > > >>>>>>>>> task > > > >>>>>>>>>>> can > > > >>>>>>>>>>>>> introduce a significant amount of latency in some cases. > > > >> Plus, > > > >>>>>>>>> it's a > > > >>>>>>>>>>> more > > > >>>>>>>>>>>>> direct measurement, if you really wanted to know (for the > > > >>>>>>> purposes > > > >>>>>>>>> of > > > >>>>>>>>>>> IQ > > > >>>>>>>>>>>>> or something) how long it takes source events to "show > up" at > > > >>>>>>> the > > > >>>>>>>>>>> store. > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>> I think actually recording it at every processor could be > > > >>>>>>>>> expensive, > > > >>>>>>>>>>> but we > > > >>>>>>>>>>>>> already record a bunch of metrics at the node level. > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>> B1: > > > >>>>>>>>>>>>> I think 50% could be reasonable to record also. Even if > it's > > > >> a > > > >>>>>>> poor > > > >>>>>>>>>>> metric > > > >>>>>>>>>>>>> for operational purposes, a lot of people might expect > to see > > > >>>>>>>>> "mean". > > > >>>>>>>>>>>>> Actually, > > > >>>>>>>>>>>>> I was surprised not to see "min". Is there a reason to > leave > > > >> it > > > >>>>>>>>> off? > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>> I might suggest: > > > >>>>>>>>>>>>> min, mean (50th), 75th, 99th, max > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>> B3: > > > >>>>>>>>>>>>> I agree we should include late records (though not the > ones > > > >> we > > > >>>>>>>>> drop). > > > >>>>>>>>>>>>> It may be spiky, but only when there are legitimately > some > > > >>>>>>> records > > > >>>>>>>>>>> with a > > > >>>>>>>>>>>>> high end-to-end latency, which is the whole point of > these > > > >>>>>>> metrics. > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>> That's it! I don't think I have any other feedback, other > > > >> than > > > >>>>>>> a > > > >>>>>>>>>>> request to > > > >>>>>>>>>>>>> also report "min". > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>> Thanks, > > > >>>>>>>>>>>>> -John > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>> On Wed, May 13, 2020, at 16:58, Guozhang Wang wrote: > > > >>>>>>>>>>>>>> Thanks Sophie for the KIP, a few quick thoughts: > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> 1) The end-to-end latency includes both the processing > > > >>>>>>> latency > > > >>>>>>>>> of the > > > >>>>>>>>>>>>> task > > > >>>>>>>>>>>>>> and the latency spent sitting in intermediate topics. I > have > > > >>>>>>> a > > > >>>>>>>>>>> similar > > > >>>>>>>>>>>>>> feeling as Boyang mentioned above that the latency > metric of > > > >>>>>>> a > > > >>>>>>>>> task A > > > >>>>>>>>>>>>>> actually measures the latency of the sub-topology up-to > but > > > >>>>>>> not > > > >>>>>>>>>>> including > > > >>>>>>>>>>>>>> the processing of A, which is a bit weird. > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> Maybe the my feeling comes from the name "latency" > itself, > > > >>>>>>> since > > > >>>>>>>>>>> today we > > > >>>>>>>>>>>>>> already have several "latency" metrics already which are > > > >>>>>>>>> measuring > > > >>>>>>>>>>>>> elapsed > > > >>>>>>>>>>>>>> system-time for processing a record / etc, while here > we are > > > >>>>>>>>>>> comparing > > > >>>>>>>>>>>>> the > > > >>>>>>>>>>>>>> system wallclock time with the record timestamp. > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> Maybe we can consider renaming it as "record-staleness" > > > >>>>>>> (note we > > > >>>>>>>>>>> already > > > >>>>>>>>>>>>>> have a "record-lateness" metric), in which case > recording at > > > >>>>>>> the > > > >>>>>>>>>>>>>> system-time before we start processing the record sounds > > > >> more > > > >>>>>>>>>>> natural. > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> 2) With that in mind, I'm wondering if the > > > >>>>>>> processor-node-level > > > >>>>>>>>> DEBUG > > > >>>>>>>>>>>>>> metric is worth to add, given that we already have a > > > >>>>>>> task-level > > > >>>>>>>>>>>>> processing > > > >>>>>>>>>>>>>> latency metric. Basically, a specific node's e2e > latency is > > > >>>>>>>>> similar > > > >>>>>>>>>>> to > > > >>>>>>>>>>>>> the > > > >>>>>>>>>>>>>> task-level e2e latency + task-level processing latency. > > > >>>>>>>>> Personally I > > > >>>>>>>>>>>>> think > > > >>>>>>>>>>>>>> having a task-level record-staleness metric is > sufficient. > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> Guozhang > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> On Wed, May 13, 2020 at 11:46 AM Sophie Blee-Goldman < > > > >>>>>>>>>>>>> sop...@confluent.io> > > > >>>>>>>>>>>>>> wrote: > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> 1. I felt that 50% was not a particularly useful gauge > for > > > >>>>>>> this > > > >>>>>>>>>>>>> specific > > > >>>>>>>>>>>>>>> metric, as > > > >>>>>>>>>>>>>>> it's presumably most useful at putting an *upper > *bound on > > > >>>>>>> the > > > >>>>>>>>>>> latency > > > >>>>>>>>>>>>> you > > > >>>>>>>>>>>>>>> can > > > >>>>>>>>>>>>>>> reasonably expect to see. I chose percentiles that > would > > > >>>>>>>>> hopefully > > > >>>>>>>>>>>>> give a > > > >>>>>>>>>>>>>>> good > > > >>>>>>>>>>>>>>> sense of what *most* records will experience, and what > > > >>>>>>> *close > > > >>>>>>>>> to > > > >>>>>>>>>>> all* > > > >>>>>>>>>>>>>>> records > > > >>>>>>>>>>>>>>> will. > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> However I'm not married to these specific numbers and > > > >>>>>>> could be > > > >>>>>>>>>>>>> convinced. > > > >>>>>>>>>>>>>>> Would be especially interested in hearing from users on > > > >>>>>>> this. > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> 2. I'm inclined to not include the "hop-to-hop > latency" in > > > >>>>>>>>> this KIP > > > >>>>>>>>>>>>> since > > > >>>>>>>>>>>>>>> users > > > >>>>>>>>>>>>>>> can always compute it themselves by subtracting the > > > >>>>>>> previous > > > >>>>>>>>> node's > > > >>>>>>>>>>>>>>> end-to-end latency. I guess we could do it either way > since > > > >>>>>>>>> you can > > > >>>>>>>>>>>>> always > > > >>>>>>>>>>>>>>> compute one from the other, but I think the end-to-end > > > >>>>>>> latency > > > >>>>>>>>>>> feels > > > >>>>>>>>>>>>> more > > > >>>>>>>>>>>>>>> valuable as it's main motivation is not to debug > > > >>>>>>> bottlenecks > > > >>>>>>>>> in the > > > >>>>>>>>>>>>>>> topology but > > > >>>>>>>>>>>>>>> to give users a sense of how long it takes arecord to > be > > > >>>>>>>>> reflected > > > >>>>>>>>>>> in > > > >>>>>>>>>>>>>>> certain parts > > > >>>>>>>>>>>>>>> of the topology. For example this might be useful for > users > > > >>>>>>>>> who are > > > >>>>>>>>>>>>>>> wondering > > > >>>>>>>>>>>>>>> roughly when a record that was just produced will be > > > >>>>>>> included > > > >>>>>>>>> in > > > >>>>>>>>>>> their > > > >>>>>>>>>>>>> IQ > > > >>>>>>>>>>>>>>> results. > > > >>>>>>>>>>>>>>> Debugging is just a nice side effect -- but maybe I > didn't > > > >>>>>>> make > > > >>>>>>>>>>> that > > > >>>>>>>>>>>>> clear > > > >>>>>>>>>>>>>>> enough > > > >>>>>>>>>>>>>>> in the KIP's motivation. > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> 3. Good question, I should address this in the KIP. The > > > >>>>>>> short > > > >>>>>>>>>>> answer is > > > >>>>>>>>>>>>>>> "yes", > > > >>>>>>>>>>>>>>> we will include late records. I added a paragraph to > the > > > >>>>>>> end > > > >>>>>>>>> of the > > > >>>>>>>>>>>>>>> Proposed > > > >>>>>>>>>>>>>>> Changes section explaining the reasoning here, please > let > > > >>>>>>> me > > > >>>>>>>>> know > > > >>>>>>>>>>> if > > > >>>>>>>>>>>>> you > > > >>>>>>>>>>>>>>> have > > > >>>>>>>>>>>>>>> any concerns. > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> 4. Assuming you're referring to the existing metric > > > >>>>>>>>>>> "process-latency", > > > >>>>>>>>>>>>> that > > > >>>>>>>>>>>>>>> metric > > > >>>>>>>>>>>>>>> reflects the time for the literal Node#process method > to > > > >>>>>>> run > > > >>>>>>>>>>> whereas > > > >>>>>>>>>>>>> this > > > >>>>>>>>>>>>>>> metric > > > >>>>>>>>>>>>>>> would always be measured relative to the event > timestamp. > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> That said, the naming collision there is pretty > confusing > > > >>>>>>> so > > > >>>>>>>>> I've > > > >>>>>>>>>>>>> renamed > > > >>>>>>>>>>>>>>> the > > > >>>>>>>>>>>>>>> metrics in this KIP to "end-to-end-latency" which I > feel > > > >>>>>>> better > > > >>>>>>>>>>>>> reflects > > > >>>>>>>>>>>>>>> the nature > > > >>>>>>>>>>>>>>> of the metric anyway. > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> Thanks for the feedback! > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> On Wed, May 13, 2020 at 10:21 AM Boyang Chen < > > > >>>>>>>>>>>>> reluctanthero...@gmail.com> > > > >>>>>>>>>>>>>>> wrote: > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> Thanks for the KIP Sophie. Getting the E2E latency is > > > >>>>>>>>> important > > > >>>>>>>>>>> for > > > >>>>>>>>>>>>>>>> understanding the bottleneck of the application. > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> A couple of questions and ideas: > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> 1. Could you clarify the rational of picking 75, 99 > and > > > >>>>>>> max > > > >>>>>>>>>>>>> percentiles? > > > >>>>>>>>>>>>>>>> Normally I see cases where we use 50, 90 percentile as > > > >>>>>>> well > > > >>>>>>>>> in > > > >>>>>>>>>>>>> production > > > >>>>>>>>>>>>>>>> systems. > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> 2. The current latency being computed is cumulative, > I.E > > > >>>>>>> if a > > > >>>>>>>>>>> record > > > >>>>>>>>>>>>> goes > > > >>>>>>>>>>>>>>>> through A -> B -> C, then P(C) = T(B->C) + P(B) = > > > >>>>>>> T(B->C) + > > > >>>>>>>>>>> T(A->B) + > > > >>>>>>>>>>>>>>> T(A) > > > >>>>>>>>>>>>>>>> and so on, where P() represents the captured latency, > > > >>>>>>> and T() > > > >>>>>>>>>>>>> represents > > > >>>>>>>>>>>>>>>> the time for transiting the records between two nodes, > > > >>>>>>>>> including > > > >>>>>>>>>>>>>>> processing > > > >>>>>>>>>>>>>>>> time. For monitoring purpose, maybe having T(B->C) and > > > >>>>>>>>> T(A->B) > > > >>>>>>>>>>> are > > > >>>>>>>>>>>>> more > > > >>>>>>>>>>>>>>>> natural to view as "hop-to-hop latency", otherwise if > > > >>>>>>> there > > > >>>>>>>>> is a > > > >>>>>>>>>>>>> spike in > > > >>>>>>>>>>>>>>>> T(A->B), both P(B) and P(C) are affected in the same > > > >>>>>>> time. > > > >>>>>>>>> In the > > > >>>>>>>>>>>>> same > > > >>>>>>>>>>>>>>>> spirit, the E2E latency is meaningful only when the > > > >>>>>>> record > > > >>>>>>>>> exits > > > >>>>>>>>>>>>> from the > > > >>>>>>>>>>>>>>>> sink as this marks the whole time this record spent > > > >>>>>>> inside > > > >>>>>>>>> the > > > >>>>>>>>>>>>> funnel. Do > > > >>>>>>>>>>>>>>>> you think we could have separate treatment for sink > > > >>>>>>> nodes and > > > >>>>>>>>>>> other > > > >>>>>>>>>>>>>>>> nodes, so that other nodes only count the time > receiving > > > >>>>>>> the > > > >>>>>>>>>>> record > > > >>>>>>>>>>>>> from > > > >>>>>>>>>>>>>>>> last hop? I'm not proposing a solution here, just > want to > > > >>>>>>>>> discuss > > > >>>>>>>>>>>>> this > > > >>>>>>>>>>>>>>>> alternative to see if it is reasonable. > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> 3. As we are going to monitor late arrival records as > > > >>>>>>> well, > > > >>>>>>>>> they > > > >>>>>>>>>>>>> would > > > >>>>>>>>>>>>>>>> create some really spiky graphs when the out-of-order > > > >>>>>>>>> records are > > > >>>>>>>>>>>>>>>> interleaving with on time records. Should we also > supply > > > >>>>>>> a > > > >>>>>>>>> smooth > > > >>>>>>>>>>>>> version > > > >>>>>>>>>>>>>>>> of the latency metrics, or user should just take care > of > > > >>>>>>> it > > > >>>>>>>>> by > > > >>>>>>>>>>>>> themself? > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> 4. Regarding this new metrics, we haven't discussed > its > > > >>>>>>>>> relation > > > >>>>>>>>>>>>> with our > > > >>>>>>>>>>>>>>>> existing processing latency metrics, could you add > some > > > >>>>>>>>> context > > > >>>>>>>>>>> on > > > >>>>>>>>>>>>>>>> comparison and a simple `when to use which` tutorial > for > > > >>>>>>> the > > > >>>>>>>>>>> best? > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> Boyang > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> On Tue, May 12, 2020 at 7:28 PM Sophie Blee-Goldman < > > > >>>>>>>>>>>>> sop...@confluent.io > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> wrote: > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>> Hey all, > > > >>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>> I'd like to kick off discussion on KIP-613 which aims > > > >>>>>>> to > > > >>>>>>>>> add > > > >>>>>>>>>>>>> end-to-end > > > >>>>>>>>>>>>>>>>> latency metrics to Streams. Please take a look: > > > >>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>> > > > >>>>>>>>>>> > > > >>>>>>>>> > > > >>>>>>> > > > >>>> > > > >>> > > > >> > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-613%3A+Add+end-to-end+latency+metrics+to+Streams > > > >>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>> Cheers, > > > >>>>>>>>>>>>>>>>> Sophie > > > >>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> -- > > > >>>>>>>>>>>>>> -- Guozhang > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>> > > > >>>>>>>>>>> > > > >>>>>>>>>> > > > >>>>>>>>> > > > >>>>>>> > > > >>>>>> > > > >>>> > > > >>>> > > > >>> > > > >> > > > > > > > > > > > > >