Hey Sophie, 90/99/min/max make sense to me.
On Fri, May 15, 2020 at 12:20 PM Sophie Blee-Goldman <sop...@confluent.io> wrote: > @Matthias > Regarding tracking the 50th percentile, I'll refer you to the 4:53 mark of > the video > *you* linked: https://youtu.be/lJ8ydIuPFeU?t=293 > > And that's what he says about the 95th percentile! Imagine what he would > say about > the 50th :P > > But seriously, since we can't seem to agree that the mean or 50th > percentile is actually > useful I'm inclined to resurrect my original proposal, neither. But I think > that's a good > argument against the 75th, which I admittedly chose somewhat arbitrarily as > an > intermediate between the 50th and the higher percentiles. How about: > > -max > -p99 > -p90 > -min > > with p50/mean still up for debate if anyone feels strongly for either of > them. > > Regarding the name, I've already flip-flopped on this so I'm definitely > still open to > further arguments. But the reason for changing it from end-to-end-latency > (which > is similar to what you propose) is that this metric technically reflects > how old (ie how "stale") > the record is when it's *received* by the operator, not when it's processed > by the operator. > It seemed like there was the potential for confusion that > "end-to-end-latency" might > represent the latency from the event creation to the time the processor is > done > processing it. > > @John > I'd rather err on the side of "not-enough" metrics as we can always add > this to the > stateless metrics later on. If we decide to measure the time at every node > and don't > find any evidence of a serious performance impact, and users indicate they > would > like to see this metric at all nodes, then we can easily start reporting > them as well. > WDYT? > > That said, sink nodes seem like a reasonable exception to the rule. > Obviously users > should be able to detect the time when the record reaches the output topic > but that > still leaves a gap in understanding how long the production latency was. > This mirrors > the consumption latency that is exposed by the task-level metrics, which > are measured > at the source node. For good symmetry what if we actually expose both the > source > and sink latency at the task-level? ie report both sets of statistical > measurements with > the additional tag -source/-sink > > @Bill > Thanks for the comment about regarding the min! I hadn't considered that > and it's > quite useful to think about how and what is useful from a users point of > view. > > Regarding your second. point, I'm inclined to leave that as an > implementation detail > but my take would be that the user should be allowed to control the record > timestamp > used for this with the timestamp extractor. My impression is that users may > often embed > the actual event time in the payload for whatever reason, and this > represents the "true" > timestamp as far as the Streams topology is concerned. > > > On Fri, May 15, 2020 at 11:05 AM Bill Bejeck <bbej...@gmail.com> wrote: > > > Thanks for the KIP, Sophie, this will be a useful metric to add. > > > > Regarding tracking min, I think it could be valuable for users to > discern > > which part of their topologies are more efficient since this is a > > task-level metric. I realize everyone seems to be on board with > including > > min anyway, but I wanted to add my 2 cents on this topic should we decide > > to revisit adding min or not. > > > > I do have a question regarding the calculation of staleness. > > Is there going to be a consideration for timestamp extractors? Users > could > > prefer to use a timestamp embedded in the payload, and it could skew the > > measurements. > > I was wondering if we should specify in the KIP if setting the arrival > time > > is always going to come from the record timestamp, or is this an > > implementation detail we can cover in the PR? > > > > Thanks! > > Bill > > > > On Fri, May 15, 2020 at 1:11 AM Matthias J. Sax <mj...@apache.org> > wrote: > > > > > Thanks for the KIP Sophie. > > > > > > I think it's not useful to record the avg/mean; it sensitive to > > > outliers. We should rather track the median (50th percentile). > > > > > > Not sure if tracking min is useful, but I am also ok to track it. > > > > > > However, I find it odd to track 75th percentile. Standard measures > would > > > the 90th or 95th -- I guess we don't need both, so maybe picking 90th > > > might be more useful? > > > > > > About the name: "staleness" wound really odd, and if fact the metric > > > does capture "latency" so we should call it "latency". I understand the > > > issue that we already have a latency metric. So maybe we could call it > > > `record-e2e-latency-*` ? > > > > > > While I agree that we should include out-or-order data (the KIP should > > > talk about `out-of-order` data, not `late` data; data is only `late` if > > > it's out-of-order and if it's dropped), I don't really understand why > > > the new metric would help to configure grace period or retention time? > > > As you mention in the KIP, both are define as max difference of > > > `event-time - stream-time` and thus the new metric that takes > > > system-/wallclock-time into account does not seem to help at all. > > > > > > > > > Btw: there is a great talk about "How NOT to Measure Latency" by Gil > > > Tene: https://www.youtube.com/watch?v=lJ8ydIuPFeU > > > > > > > > > -Matthias > > > > > > > > > On 5/14/20 7:17 PM, John Roesler wrote: > > > > Hi Sophie, > > > > > > > > It seems like there would still be plenty of use cases for recording > > > > this metric at all processors and not just stateful ones, but I'm > happy > > > > to suspend my arguments for now. Since you're proposing to keep > > > > them at the processor-node level, it will be seamless later to add > > > > in the stateless processors if we want. As a wise man once said, > > > > "Adding is always easier than removing." > > > > > > > > Regarding the time measurement, it's an implementation detail > > > > we don't need to consider in the KIP. Nevertheless, I'd greatly > > > > prefer to measure the system time again when recording the > > > > metric. I don't think we've seen any evidence that proves this > > > > would harm performance, and the amount of inaccuracy using > > > > the cached system time could incur is actually substantial. But, > > > > if you want to just "not mention this" in the KIP, we can defer to > > > > the actual PR discussion, at which time we're in a better position > > > > to use benchmarks, etc., to make the call. > > > > > > > > Along the lines of the measurement accuracy discussion, one > > > > minor thought I had is that maybe we should consider measuring > > > > the task staleness metric at the sink, rather than the source, so > that > > > > it includes the processing latency of the task itself, not just the > > > latency > > > > of everything up to, but not including, the task (which seems > confusing > > > > for users). I guess this could also be an implementation detail, > > though. > > > > > > > > Thanks for the update, > > > > -John > > > > > > > > On Thu, May 14, 2020, at 13:31, Sophie Blee-Goldman wrote: > > > >> Hey all, > > > >> > > > >> After discussing with Bruno I'd like to propose a small amendment, > > > >> which is to record the processor-node-level metrics only for > > *stateful* > > > >> *operators*. They would still be considered a "processor-node-level" > > > >> metric and not a "state-store-level" metric as the staleness is > still > > > >> a property of the node rather than of the state itself. However, it > > > seems > > > >> that this information is primarily useful for stateful operators > that > > > might > > > >> be exposing state via IQ or otherwise dependent on the record time > > > >> unlike a stateless operator. > > > >> > > > >> It's worth calling out that recent performance improvements to the > > > metrics > > > >> framework mean that we no longer fetch the system time at the > operator > > > >> level, but only once per task. In other words the system time is not > > > updated > > > >> between each process as a record flows through the subtopology, so > > > >> debugging the processor-level latency via the stateleness will not > be > > > >> possible.Note that this doesn't mean the operator-level metrics are > > not > > > >> *useful* relative to the task-level metric. Upstream caching and/or > > > >> suppression > > > >> can still cause a record's staleness at some downstream stateful > > > operator > > > >> to deviate from the task-level staleness (recorded at the source > > node). > > > >> > > > >> Please let me know if you have any concerns about this change. The > > > >> KIP has been updated with the new proposal > > > >> > > > >> On Thu, May 14, 2020 at 3:04 AM Bruno Cadonna <br...@confluent.io> > > > wrote: > > > >> > > > >>> Hi Sophie, > > > >>> > > > >>> Thank you for the KIP. > > > >>> > > > >>> The KIP looks good to me. > > > >>> > > > >>> 50th percentile: > > > >>> I think we do not need it now. If we need it, we can add it. Here > the > > > >>> old truism applies: Adding is always easier than removing. > > > >>> > > > >>> processor-node-level metrics: > > > >>> I think it is good to have the staleness metrics also on > > > >>> processor-node-level. If we do not want to record them on all > > > >>> processor nodes, you could restrict the recording to stateful > > > >>> processor-nodes, since those are the ones that would benefit most > > from > > > >>> the staleness metrics. > > > >>> > > > >>> Best, > > > >>> Bruno > > > >>> > > > >>> On Thu, May 14, 2020 at 4:15 AM Sophie Blee-Goldman < > > > sop...@confluent.io> > > > >>> wrote: > > > >>>> > > > >>>> Yeah, the specific reason was just to align with the current > > metrics. > > > >>>> > > > >>>> Is it better to conform than to be right? History has a lot to say > > on > > > >>> that > > > >>>> matter > > > >>>> but I'm not sure how much of it applies to the fine details of > > metrics > > > >>>> naming :P > > > >>>> > > > >>>> More seriously, I figured if people are looking at this metric > > they're > > > >>>> likely to > > > >>>> be looking at all the others. Then naming this one "-mean" would > > > probably > > > >>>> lead some to conclude that the "-avg" suffix in the other metrics > > has > > > a > > > >>>> different meaning. > > > >>>> > > > >>>> As for the percentiles, I actually like p99 (and p75) better. I'll > > > swap > > > >>>> that out > > > >>>> > > > >>>> On Wed, May 13, 2020 at 7:07 PM John Roesler <vvcep...@apache.org > > > > > >>> wrote: > > > >>>> > > > >>>>> Thanks Sophie, > > > >>>>> > > > >>>>> I hope this isn't too nit-picky, but is there a reason to choose > > > "avg" > > > >>>>> instead > > > >>>>> of "mean"? Maybe this is too paranoid, and I might be > oversensitive > > > >>> because > > > >>>>> of the mistake I just made earlier, but it strikes me that "avg" > is > > > >>>>> actually > > > >>>>> ambiguous, as it refers to a family of statistics, whereas "mean" > > is > > > >>>>> specific. > > > >>>>> I see other Kafka metrics with "avg", but none with "mean"; was > > that > > > >>> the > > > >>>>> reason? If so, I'm +1. > > > >>>>> > > > >>>>> Regarding the names of the percentile, I actually couldn't find > > _any_ > > > >>> other > > > >>>>> metrics that use percentile. Was there a reason to choose "99th" > as > > > >>> opposed > > > >>>>> to "p99" or any other scheme? This is not a criticism, I'm just > > > >>> primarily > > > >>>>> asking > > > >>>>> for consistency's sake. > > > >>>>> > > > >>>>> Thanks again, > > > >>>>> -John > > > >>>>> > > > >>>>> On Wed, May 13, 2020, at 19:19, Sophie Blee-Goldman wrote: > > > >>>>>> Alright, I can get behind adding the min metric for the sake of > > > >>> pretty > > > >>>>>> graphs > > > >>>>>> (and trivial computation). > > > >>>>>> > > > >>>>>> I'm still on the fence regarding the mean (or 50th percentile) > > but I > > > >>> can > > > >>>>> see > > > >>>>>> how users might expect it and find it a bit disorienting not to > > > >>> have. So > > > >>>>> the > > > >>>>>> updated proposed metrics are > > > >>>>>> > > > >>>>>> > > > >>>>>> - record-staleness-max [ms] > > > >>>>>> - record-staleness-99th [ms] *(99th percentile)* > > > >>>>>> - record-staleness-75th [ms] *(75th percentile)* > > > >>>>>> - record-staleness-avg [ms] *(mean)* > > > >>>>>> - record-staleness-min [ms] > > > >>>>>> > > > >>>>>> > > > >>>>>> On Wed, May 13, 2020 at 4:42 PM John Roesler < > vvcep...@apache.org > > > > > > >>>>> wrote: > > > >>>>>> > > > >>>>>>> Oh boy, I never miss an opportunity to embarrass myself. I > guess > > > >>> the > > > >>>>> mean > > > >>>>>>> seems more interesting to me than the median, but neither are > as > > > >>>>>>> interesting as the higher percentiles (99th and max). > > > >>>>>>> > > > >>>>>>> Min isn’t really important for any SLAs, but it does round out > > the > > > >>>>> mental > > > >>>>>>> picture of the distribution. I’ve always graphed min along with > > the > > > >>>>> other > > > >>>>>>> metrics to help me understand how fast the system can be, which > > > >>> helps > > > >>>>> in > > > >>>>>>> optimization decisions. It’s also a relatively inexpensive > metric > > > >>> to > > > >>>>>>> compute, so it might be nice to just throw it in. > > > >>>>>>> > > > >>>>>>> On Wed, May 13, 2020, at 18:18, Sophie Blee-Goldman wrote: > > > >>>>>>>> G1: > > > >>>>>>>> I was considering it as the "end-to-end latency *up* to the > > > >>> specific > > > >>>>>>> task" > > > >>>>>>>> but > > > >>>>>>>> I'm happy with "record-staleness" if that drives the point > home > > > >>>>> better. > > > >>>>>>> So > > > >>>>>>>> it's the > > > >>>>>>>> "staleness of the record when it is received by that task" -- > > > >>> will > > > >>>>> update > > > >>>>>>>> the KIP > > > >>>>>>>> > > > >>>>>>>> B1/J: > > > >>>>>>>> I'm struggling to imagine a case where the min would actually > be > > > >>>>> useful, > > > >>>>>>>> rather than > > > >>>>>>>> just intellectually interesting. I don't feel strongly that we > > > >>>>> shouldn't > > > >>>>>>>> add it, but that's > > > >>>>>>>> why I didn't include it from the start. Can you enlighten me > > > >>> with an > > > >>>>>>>> example? > > > >>>>>>>> > > > >>>>>>>> I was also vaguely concerned about the overhead of adding > > > >>> multiple > > > >>>>>>>> percentile > > > >>>>>>>> metrics. Do we have any data to indicate what kind of > > performance > > > >>>>> hit we > > > >>>>>>>> take on > > > >>>>>>>> metrics computation? > > > >>>>>>>> > > > >>>>>>>> Also, not to be too pedantic but the 50th percentile would be > > the > > > >>>>> median > > > >>>>>>>> not the > > > >>>>>>>> mean. Would you propose to add the mean *and* the 50th > > > >>> percentile, or > > > >>>>>>> just > > > >>>>>>>> one > > > >>>>>>>> of the two? > > > >>>>>>>> > > > >>>>>>>> Thanks all! > > > >>>>>>>> Sophie > > > >>>>>>>> > > > >>>>>>>> On Wed, May 13, 2020 at 3:34 PM John Roesler < > > > >>> vvcep...@apache.org> > > > >>>>>>> wrote: > > > >>>>>>>> > > > >>>>>>>>> Hello all, and thanks for the KIP, Sophie, > > > >>>>>>>>> > > > >>>>>>>>> Just some comments on the discussion so far: > > > >>>>>>>>> > > > >>>>>>>>> B2/G1: > > > >>>>>>>>> In principle, it shouldn't matter whether we report "spans" > or > > > >>>>>>>>> "end-to-end" latency. But in practice, some of the spans are > > > >>> pretty > > > >>>>>>>>> difficult to really measure (like time spent waiting in the > > > >>>>> topics, or > > > >>>>>>>>> time from the event happening to the ETL producer choosing to > > > >>> send > > > >>>>> it, > > > >>>>>>>>> or time spent in send/receive buffers, etc., etc. > > > >>>>>>>>> > > > >>>>>>>>> In other words, it's practically easier to compute spans by > > > >>>>> subtracting > > > >>>>>>>>> e2e latencies than it is to compute e2e latencies by adding > > > >>> spans. > > > >>>>> You > > > >>>>>>>>> can even just consider that the span computation from e2e > > > >>> always > > > >>>>> just > > > >>>>>>>>> involves subtracting two numbers, whereas computing e2e > latency > > > >>>>> from > > > >>>>>>>>> spans involves adding _all_ the spans leading up to the end > you > > > >>>>> care > > > >>>>>>> about. > > > >>>>>>>>> > > > >>>>>>>>> It seems like people really prefer to have spans when they > are > > > >>>>>>> debugging > > > >>>>>>>>> latency problems, whereas e2e latency is a more general > > > >>> measurement > > > >>>>>>>>> that basically every person/application cares about and > should > > > >>> be > > > >>>>>>>>> monitoring. > > > >>>>>>>>> > > > >>>>>>>>> Altogether, it really seem to provide more value to more > > > >>> people if > > > >>>>> we > > > >>>>>>>>> report > > > >>>>>>>>> e2e latencies. Regarding "record-staleness" as a name, I > think > > > >>> I > > > >>>>> have > > > >>>>>>> no > > > >>>>>>>>> preference, I'd defer to other peoples' intuition. > > > >>>>>>>>> > > > >>>>>>>>> G2: > > > >>>>>>>>> I think the processor-node metric is nice, since the inside > of > > > >>> a > > > >>>>> task > > > >>>>>>> can > > > >>>>>>>>> introduce a significant amount of latency in some cases. > Plus, > > > >>>>> it's a > > > >>>>>>> more > > > >>>>>>>>> direct measurement, if you really wanted to know (for the > > > >>> purposes > > > >>>>> of > > > >>>>>>> IQ > > > >>>>>>>>> or something) how long it takes source events to "show up" at > > > >>> the > > > >>>>>>> store. > > > >>>>>>>>> > > > >>>>>>>>> I think actually recording it at every processor could be > > > >>>>> expensive, > > > >>>>>>> but we > > > >>>>>>>>> already record a bunch of metrics at the node level. > > > >>>>>>>>> > > > >>>>>>>>> B1: > > > >>>>>>>>> I think 50% could be reasonable to record also. Even if it's > a > > > >>> poor > > > >>>>>>> metric > > > >>>>>>>>> for operational purposes, a lot of people might expect to see > > > >>>>> "mean". > > > >>>>>>>>> Actually, > > > >>>>>>>>> I was surprised not to see "min". Is there a reason to leave > it > > > >>>>> off? > > > >>>>>>>>> > > > >>>>>>>>> I might suggest: > > > >>>>>>>>> min, mean (50th), 75th, 99th, max > > > >>>>>>>>> > > > >>>>>>>>> B3: > > > >>>>>>>>> I agree we should include late records (though not the ones > we > > > >>>>> drop). > > > >>>>>>>>> It may be spiky, but only when there are legitimately some > > > >>> records > > > >>>>>>> with a > > > >>>>>>>>> high end-to-end latency, which is the whole point of these > > > >>> metrics. > > > >>>>>>>>> > > > >>>>>>>>> That's it! I don't think I have any other feedback, other > than > > > >>> a > > > >>>>>>> request to > > > >>>>>>>>> also report "min". > > > >>>>>>>>> > > > >>>>>>>>> Thanks, > > > >>>>>>>>> -John > > > >>>>>>>>> > > > >>>>>>>>> On Wed, May 13, 2020, at 16:58, Guozhang Wang wrote: > > > >>>>>>>>>> Thanks Sophie for the KIP, a few quick thoughts: > > > >>>>>>>>>> > > > >>>>>>>>>> 1) The end-to-end latency includes both the processing > > > >>> latency > > > >>>>> of the > > > >>>>>>>>> task > > > >>>>>>>>>> and the latency spent sitting in intermediate topics. I have > > > >>> a > > > >>>>>>> similar > > > >>>>>>>>>> feeling as Boyang mentioned above that the latency metric of > > > >>> a > > > >>>>> task A > > > >>>>>>>>>> actually measures the latency of the sub-topology up-to but > > > >>> not > > > >>>>>>> including > > > >>>>>>>>>> the processing of A, which is a bit weird. > > > >>>>>>>>>> > > > >>>>>>>>>> Maybe the my feeling comes from the name "latency" itself, > > > >>> since > > > >>>>>>> today we > > > >>>>>>>>>> already have several "latency" metrics already which are > > > >>>>> measuring > > > >>>>>>>>> elapsed > > > >>>>>>>>>> system-time for processing a record / etc, while here we are > > > >>>>>>> comparing > > > >>>>>>>>> the > > > >>>>>>>>>> system wallclock time with the record timestamp. > > > >>>>>>>>>> > > > >>>>>>>>>> Maybe we can consider renaming it as "record-staleness" > > > >>> (note we > > > >>>>>>> already > > > >>>>>>>>>> have a "record-lateness" metric), in which case recording at > > > >>> the > > > >>>>>>>>>> system-time before we start processing the record sounds > more > > > >>>>>>> natural. > > > >>>>>>>>>> > > > >>>>>>>>>> 2) With that in mind, I'm wondering if the > > > >>> processor-node-level > > > >>>>> DEBUG > > > >>>>>>>>>> metric is worth to add, given that we already have a > > > >>> task-level > > > >>>>>>>>> processing > > > >>>>>>>>>> latency metric. Basically, a specific node's e2e latency is > > > >>>>> similar > > > >>>>>>> to > > > >>>>>>>>> the > > > >>>>>>>>>> task-level e2e latency + task-level processing latency. > > > >>>>> Personally I > > > >>>>>>>>> think > > > >>>>>>>>>> having a task-level record-staleness metric is sufficient. > > > >>>>>>>>>> > > > >>>>>>>>>> > > > >>>>>>>>>> > > > >>>>>>>>>> Guozhang > > > >>>>>>>>>> > > > >>>>>>>>>> > > > >>>>>>>>>> > > > >>>>>>>>>> On Wed, May 13, 2020 at 11:46 AM Sophie Blee-Goldman < > > > >>>>>>>>> sop...@confluent.io> > > > >>>>>>>>>> wrote: > > > >>>>>>>>>> > > > >>>>>>>>>>> 1. I felt that 50% was not a particularly useful gauge for > > > >>> this > > > >>>>>>>>> specific > > > >>>>>>>>>>> metric, as > > > >>>>>>>>>>> it's presumably most useful at putting an *upper *bound on > > > >>> the > > > >>>>>>> latency > > > >>>>>>>>> you > > > >>>>>>>>>>> can > > > >>>>>>>>>>> reasonably expect to see. I chose percentiles that would > > > >>>>> hopefully > > > >>>>>>>>> give a > > > >>>>>>>>>>> good > > > >>>>>>>>>>> sense of what *most* records will experience, and what > > > >>> *close > > > >>>>> to > > > >>>>>>> all* > > > >>>>>>>>>>> records > > > >>>>>>>>>>> will. > > > >>>>>>>>>>> > > > >>>>>>>>>>> However I'm not married to these specific numbers and > > > >>> could be > > > >>>>>>>>> convinced. > > > >>>>>>>>>>> Would be especially interested in hearing from users on > > > >>> this. > > > >>>>>>>>>>> > > > >>>>>>>>>>> 2. I'm inclined to not include the "hop-to-hop latency" in > > > >>>>> this KIP > > > >>>>>>>>> since > > > >>>>>>>>>>> users > > > >>>>>>>>>>> can always compute it themselves by subtracting the > > > >>> previous > > > >>>>> node's > > > >>>>>>>>>>> end-to-end latency. I guess we could do it either way since > > > >>>>> you can > > > >>>>>>>>> always > > > >>>>>>>>>>> compute one from the other, but I think the end-to-end > > > >>> latency > > > >>>>>>> feels > > > >>>>>>>>> more > > > >>>>>>>>>>> valuable as it's main motivation is not to debug > > > >>> bottlenecks > > > >>>>> in the > > > >>>>>>>>>>> topology but > > > >>>>>>>>>>> to give users a sense of how long it takes arecord to be > > > >>>>> reflected > > > >>>>>>> in > > > >>>>>>>>>>> certain parts > > > >>>>>>>>>>> of the topology. For example this might be useful for users > > > >>>>> who are > > > >>>>>>>>>>> wondering > > > >>>>>>>>>>> roughly when a record that was just produced will be > > > >>> included > > > >>>>> in > > > >>>>>>> their > > > >>>>>>>>> IQ > > > >>>>>>>>>>> results. > > > >>>>>>>>>>> Debugging is just a nice side effect -- but maybe I didn't > > > >>> make > > > >>>>>>> that > > > >>>>>>>>> clear > > > >>>>>>>>>>> enough > > > >>>>>>>>>>> in the KIP's motivation. > > > >>>>>>>>>>> > > > >>>>>>>>>>> 3. Good question, I should address this in the KIP. The > > > >>> short > > > >>>>>>> answer is > > > >>>>>>>>>>> "yes", > > > >>>>>>>>>>> we will include late records. I added a paragraph to the > > > >>> end > > > >>>>> of the > > > >>>>>>>>>>> Proposed > > > >>>>>>>>>>> Changes section explaining the reasoning here, please let > > > >>> me > > > >>>>> know > > > >>>>>>> if > > > >>>>>>>>> you > > > >>>>>>>>>>> have > > > >>>>>>>>>>> any concerns. > > > >>>>>>>>>>> > > > >>>>>>>>>>> 4. Assuming you're referring to the existing metric > > > >>>>>>> "process-latency", > > > >>>>>>>>> that > > > >>>>>>>>>>> metric > > > >>>>>>>>>>> reflects the time for the literal Node#process method to > > > >>> run > > > >>>>>>> whereas > > > >>>>>>>>> this > > > >>>>>>>>>>> metric > > > >>>>>>>>>>> would always be measured relative to the event timestamp. > > > >>>>>>>>>>> > > > >>>>>>>>>>> That said, the naming collision there is pretty confusing > > > >>> so > > > >>>>> I've > > > >>>>>>>>> renamed > > > >>>>>>>>>>> the > > > >>>>>>>>>>> metrics in this KIP to "end-to-end-latency" which I feel > > > >>> better > > > >>>>>>>>> reflects > > > >>>>>>>>>>> the nature > > > >>>>>>>>>>> of the metric anyway. > > > >>>>>>>>>>> > > > >>>>>>>>>>> Thanks for the feedback! > > > >>>>>>>>>>> > > > >>>>>>>>>>> On Wed, May 13, 2020 at 10:21 AM Boyang Chen < > > > >>>>>>>>> reluctanthero...@gmail.com> > > > >>>>>>>>>>> wrote: > > > >>>>>>>>>>> > > > >>>>>>>>>>>> Thanks for the KIP Sophie. Getting the E2E latency is > > > >>>>> important > > > >>>>>>> for > > > >>>>>>>>>>>> understanding the bottleneck of the application. > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> A couple of questions and ideas: > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> 1. Could you clarify the rational of picking 75, 99 and > > > >>> max > > > >>>>>>>>> percentiles? > > > >>>>>>>>>>>> Normally I see cases where we use 50, 90 percentile as > > > >>> well > > > >>>>> in > > > >>>>>>>>> production > > > >>>>>>>>>>>> systems. > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> 2. The current latency being computed is cumulative, I.E > > > >>> if a > > > >>>>>>> record > > > >>>>>>>>> goes > > > >>>>>>>>>>>> through A -> B -> C, then P(C) = T(B->C) + P(B) = > > > >>> T(B->C) + > > > >>>>>>> T(A->B) + > > > >>>>>>>>>>> T(A) > > > >>>>>>>>>>>> and so on, where P() represents the captured latency, > > > >>> and T() > > > >>>>>>>>> represents > > > >>>>>>>>>>>> the time for transiting the records between two nodes, > > > >>>>> including > > > >>>>>>>>>>> processing > > > >>>>>>>>>>>> time. For monitoring purpose, maybe having T(B->C) and > > > >>>>> T(A->B) > > > >>>>>>> are > > > >>>>>>>>> more > > > >>>>>>>>>>>> natural to view as "hop-to-hop latency", otherwise if > > > >>> there > > > >>>>> is a > > > >>>>>>>>> spike in > > > >>>>>>>>>>>> T(A->B), both P(B) and P(C) are affected in the same > > > >>> time. > > > >>>>> In the > > > >>>>>>>>> same > > > >>>>>>>>>>>> spirit, the E2E latency is meaningful only when the > > > >>> record > > > >>>>> exits > > > >>>>>>>>> from the > > > >>>>>>>>>>>> sink as this marks the whole time this record spent > > > >>> inside > > > >>>>> the > > > >>>>>>>>> funnel. Do > > > >>>>>>>>>>>> you think we could have separate treatment for sink > > > >>> nodes and > > > >>>>>>> other > > > >>>>>>>>>>>> nodes, so that other nodes only count the time receiving > > > >>> the > > > >>>>>>> record > > > >>>>>>>>> from > > > >>>>>>>>>>>> last hop? I'm not proposing a solution here, just want to > > > >>>>> discuss > > > >>>>>>>>> this > > > >>>>>>>>>>>> alternative to see if it is reasonable. > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> 3. As we are going to monitor late arrival records as > > > >>> well, > > > >>>>> they > > > >>>>>>>>> would > > > >>>>>>>>>>>> create some really spiky graphs when the out-of-order > > > >>>>> records are > > > >>>>>>>>>>>> interleaving with on time records. Should we also supply > > > >>> a > > > >>>>> smooth > > > >>>>>>>>> version > > > >>>>>>>>>>>> of the latency metrics, or user should just take care of > > > >>> it > > > >>>>> by > > > >>>>>>>>> themself? > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> 4. Regarding this new metrics, we haven't discussed its > > > >>>>> relation > > > >>>>>>>>> with our > > > >>>>>>>>>>>> existing processing latency metrics, could you add some > > > >>>>> context > > > >>>>>>> on > > > >>>>>>>>>>>> comparison and a simple `when to use which` tutorial for > > > >>> the > > > >>>>>>> best? > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> Boyang > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> On Tue, May 12, 2020 at 7:28 PM Sophie Blee-Goldman < > > > >>>>>>>>> sop...@confluent.io > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> wrote: > > > >>>>>>>>>>>> > > > >>>>>>>>>>>>> Hey all, > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>> I'd like to kick off discussion on KIP-613 which aims > > > >>> to > > > >>>>> add > > > >>>>>>>>> end-to-end > > > >>>>>>>>>>>>> latency metrics to Streams. Please take a look: > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>> > > > >>>>>>>>>>> > > > >>>>>>>>> > > > >>>>>>> > > > >>>>> > > > >>> > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-613%3A+Add+end-to-end+latency+metrics+to+Streams > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>> Cheers, > > > >>>>>>>>>>>>> Sophie > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>> > > > >>>>>>>>>>> > > > >>>>>>>>>> > > > >>>>>>>>>> > > > >>>>>>>>>> -- > > > >>>>>>>>>> -- Guozhang > > > >>>>>>>>>> > > > >>>>>>>>> > > > >>>>>>>> > > > >>>>>>> > > > >>>>>> > > > >>>>> > > > >>> > > > >> > > > > > > > > >