Re: [DISCUSS] KIP-613: Add end-to-end latency metrics to Streams

Boyang Chen Fri, 15 May 2020 12:25:27 -0700

Hey Sophie,

90/99/min/max make sense to me.


On Fri, May 15, 2020 at 12:20 PM Sophie Blee-Goldman <sop...@confluent.io>
wrote:

> @Matthias
> Regarding tracking the 50th percentile, I'll refer you to the 4:53 mark of
> the video
> *you* linked: https://youtu.be/lJ8ydIuPFeU?t=293
>
> And that's what he says about the 95th percentile! Imagine what he would
> say about
> the 50th :P
>
> But seriously, since we can't seem to agree that the mean or 50th
> percentile is actually
> useful I'm inclined to resurrect my original proposal, neither. But I think
> that's a good
> argument against the 75th, which I admittedly chose somewhat arbitrarily as
> an
> intermediate between the 50th and the higher percentiles. How about:
>
> -max
> -p99
> -p90
> -min
>
> with p50/mean still up for debate if anyone feels strongly for either of
> them.
>
> Regarding the name, I've already flip-flopped on this so I'm definitely
> still open to
> further arguments. But the reason for changing it from end-to-end-latency
> (which
> is similar to what you propose) is that this metric technically reflects
> how old (ie how "stale")
> the record is when it's *received* by the operator, not when it's processed
> by the operator.
> It seemed like there was the potential for confusion that
> "end-to-end-latency" might
> represent the latency from the event creation to the time the processor is
> done
> processing it.
>
> @John
> I'd rather err on the side of "not-enough" metrics as we can always add
> this to the
> stateless metrics later on. If we decide to measure the time at every node
> and don't
> find any evidence of a serious performance impact, and users indicate they
> would
> like to see this metric at all nodes, then we can easily start reporting
> them as well.
> WDYT?
>
> That said, sink nodes seem like a reasonable exception to the rule.
> Obviously users
> should be able to detect the time when the record reaches the output topic
> but that
> still leaves a gap in understanding how long the production latency was.
> This mirrors
> the consumption latency that is exposed by the task-level metrics, which
> are measured
> at the source node. For good symmetry what if we actually expose both the
> source
> and sink latency at the task-level? ie report both sets of statistical
> measurements with
> the additional tag -source/-sink
>
> @Bill
> Thanks for the comment about regarding the min! I hadn't considered that
> and it's
> quite useful to think about how and what is useful from a users point of
> view.
>
> Regarding your second. point, I'm inclined to leave that as an
> implementation detail
> but my take would be that the user should be allowed to control the record
> timestamp
> used for this with the timestamp extractor. My impression is that users may
> often embed
> the actual event time in the payload for whatever reason, and this
> represents the "true"
> timestamp as far as the Streams topology is concerned.
>
>
> On Fri, May 15, 2020 at 11:05 AM Bill Bejeck <bbej...@gmail.com> wrote:
>
> > Thanks for the KIP, Sophie, this will be a useful metric to add.
> >
> > Regarding tracking min, I  think it could be valuable for users to
> discern
> > which part of their topologies are more efficient since this is a
> > task-level metric.  I realize everyone seems to be on board with
> including
> > min anyway, but I wanted to add my 2 cents on this topic should we decide
> > to revisit adding min or not.
> >
> > I do have a question regarding the calculation of staleness.
> > Is there going to be a consideration for timestamp extractors? Users
> could
> > prefer to use a timestamp embedded in the payload, and it could skew the
> > measurements.
> > I was wondering if we should specify in the KIP if setting the arrival
> time
> > is always going to come from the record timestamp, or is this an
> > implementation detail we can cover in the PR?
> >
> > Thanks!
> > Bill
> >
> > On Fri, May 15, 2020 at 1:11 AM Matthias J. Sax <mj...@apache.org>
> wrote:
> >
> > > Thanks for the KIP Sophie.
> > >
> > > I think it's not useful to record the avg/mean; it sensitive to
> > > outliers. We should rather track the median (50th percentile).
> > >
> > > Not sure if tracking min is useful, but I am also ok to track it.
> > >
> > > However, I find it odd to track 75th percentile. Standard measures
> would
> > > the 90th or 95th -- I guess we don't need both, so maybe picking 90th
> > > might be more useful?
> > >
> > > About the name: "staleness" wound really odd, and if fact the metric
> > > does capture "latency" so we should call it "latency". I understand the
> > > issue that we already have a latency metric. So maybe we could call it
> > > `record-e2e-latency-*` ?
> > >
> > > While I agree that we should include out-or-order data (the KIP should
> > > talk about `out-of-order` data, not `late` data; data is only `late` if
> > > it's out-of-order and if it's dropped), I don't really understand why
> > > the new metric would help to configure grace period or retention time?
> > > As you mention in the KIP, both are define as max difference of
> > > `event-time - stream-time` and thus the new metric that takes
> > > system-/wallclock-time into account does not seem to help at all.
> > >
> > >
> > > Btw: there is a great talk about "How NOT to Measure Latency" by Gil
> > > Tene: https://www.youtube.com/watch?v=lJ8ydIuPFeU
> > >
> > >
> > > -Matthias
> > >
> > >
> > > On 5/14/20 7:17 PM, John Roesler wrote:
> > > > Hi Sophie,
> > > >
> > > > It seems like there would still be plenty of use cases for recording
> > > > this metric at all processors and not just stateful ones, but I'm
> happy
> > > > to suspend my arguments for now. Since you're proposing to keep
> > > > them at the processor-node level, it will be seamless later to add
> > > > in the stateless processors if we want. As a wise man once said,
> > > > "Adding is always easier than removing."
> > > >
> > > > Regarding the time measurement, it's an implementation detail
> > > > we don't need to consider in the KIP. Nevertheless, I'd greatly
> > > > prefer to measure the system time again when recording the
> > > > metric. I don't think we've seen any evidence that proves this
> > > > would harm performance, and the amount of inaccuracy using
> > > > the cached system time could incur is actually substantial. But,
> > > > if you want to just "not mention this" in the KIP, we can defer to
> > > > the actual PR discussion, at which time we're in a better position
> > > > to use benchmarks, etc., to make the call.
> > > >
> > > > Along the lines of the measurement accuracy discussion, one
> > > > minor thought I had is that maybe we should consider measuring
> > > > the task staleness metric at the sink, rather than the source, so
> that
> > > > it includes the processing latency of the task itself, not just the
> > > latency
> > > > of everything up to, but not including, the task (which seems
> confusing
> > > > for users). I guess this could also be an implementation detail,
> > though.
> > > >
> > > > Thanks for the update,
> > > > -John
> > > >
> > > > On Thu, May 14, 2020, at 13:31, Sophie Blee-Goldman wrote:
> > > >> Hey all,
> > > >>
> > > >> After discussing with Bruno I'd like to propose a small amendment,
> > > >> which is to record the processor-node-level metrics only for
> > *stateful*
> > > >> *operators*. They would still be considered a "processor-node-level"
> > > >> metric and not a "state-store-level" metric as the staleness is
> still
> > > >> a property of the node rather than of the state itself. However, it
> > > seems
> > > >> that this information is primarily useful for stateful operators
> that
> > > might
> > > >> be exposing state via IQ or otherwise dependent on the record time
> > > >> unlike a stateless operator.
> > > >>
> > > >> It's worth calling out that recent performance improvements to the
> > > metrics
> > > >> framework mean that we no longer fetch the system time at the
> operator
> > > >> level, but only once per task. In other words the system time is not
> > > updated
> > > >> between each process as a record flows through the subtopology, so
> > > >> debugging the processor-level latency via the stateleness will not
> be
> > > >> possible.Note that this doesn't mean the operator-level metrics are
> > not
> > > >> *useful* relative to the task-level metric. Upstream caching and/or
> > > >> suppression
> > > >> can still cause a record's staleness at some downstream stateful
> > > operator
> > > >> to deviate from the task-level staleness (recorded at the source
> > node).
> > > >>
> > > >> Please let me know if you have any concerns about this change. The
> > > >> KIP has been updated with the new proposal
> > > >>
> > > >> On Thu, May 14, 2020 at 3:04 AM Bruno Cadonna <br...@confluent.io>
> > > wrote:
> > > >>
> > > >>> Hi Sophie,
> > > >>>
> > > >>> Thank you for the KIP.
> > > >>>
> > > >>> The KIP looks good to me.
> > > >>>
> > > >>> 50th percentile:
> > > >>> I think we do not need it now. If we need it, we can add it. Here
> the
> > > >>> old truism applies: Adding is always easier than removing.
> > > >>>
> > > >>> processor-node-level metrics:
> > > >>> I think it is good to have the staleness metrics also on
> > > >>> processor-node-level. If we do not want to record them on all
> > > >>> processor nodes, you could restrict the recording to stateful
> > > >>> processor-nodes, since those are the ones that would benefit most
> > from
> > > >>> the staleness metrics.
> > > >>>
> > > >>> Best,
> > > >>> Bruno
> > > >>>
> > > >>> On Thu, May 14, 2020 at 4:15 AM Sophie Blee-Goldman <
> > > sop...@confluent.io>
> > > >>> wrote:
> > > >>>>
> > > >>>> Yeah, the specific reason was just to align with the current
> > metrics.
> > > >>>>
> > > >>>> Is it better to conform than to be right? History has a lot to say
> > on
> > > >>> that
> > > >>>> matter
> > > >>>> but I'm not sure how much of it applies to the fine details of
> > metrics
> > > >>>> naming :P
> > > >>>>
> > > >>>> More seriously, I figured if people are looking at this metric
> > they're
> > > >>>> likely to
> > > >>>> be looking at all the others. Then naming this one "-mean" would
> > > probably
> > > >>>> lead some to conclude that the "-avg" suffix in the other metrics
> > has
> > > a
> > > >>>> different meaning.
> > > >>>>
> > > >>>> As for the percentiles, I actually like p99 (and p75) better. I'll
> > > swap
> > > >>>> that out
> > > >>>>
> > > >>>> On Wed, May 13, 2020 at 7:07 PM John Roesler <vvcep...@apache.org
> >
> > > >>> wrote:
> > > >>>>
> > > >>>>> Thanks Sophie,
> > > >>>>>
> > > >>>>> I hope this isn't too nit-picky, but is there a reason to choose
> > > "avg"
> > > >>>>> instead
> > > >>>>> of "mean"? Maybe this is too paranoid, and I might be
> oversensitive
> > > >>> because
> > > >>>>> of the mistake I just made earlier, but it strikes me that "avg"
> is
> > > >>>>> actually
> > > >>>>> ambiguous, as it refers to a family of statistics, whereas "mean"
> > is
> > > >>>>> specific.
> > > >>>>> I see other Kafka metrics with "avg", but none with "mean"; was
> > that
> > > >>> the
> > > >>>>> reason? If so, I'm +1.
> > > >>>>>
> > > >>>>> Regarding the names of the percentile, I actually couldn't find
> > _any_
> > > >>> other
> > > >>>>> metrics that use percentile. Was there a reason to choose "99th"
> as
> > > >>> opposed
> > > >>>>> to "p99" or any other scheme? This is not a criticism, I'm just
> > > >>> primarily
> > > >>>>> asking
> > > >>>>> for consistency's sake.
> > > >>>>>
> > > >>>>> Thanks again,
> > > >>>>> -John
> > > >>>>>
> > > >>>>> On Wed, May 13, 2020, at 19:19, Sophie Blee-Goldman wrote:
> > > >>>>>> Alright, I can get behind adding the min metric for the sake of
> > > >>> pretty
> > > >>>>>> graphs
> > > >>>>>> (and trivial computation).
> > > >>>>>>
> > > >>>>>> I'm still on the fence regarding the mean (or 50th percentile)
> > but I
> > > >>> can
> > > >>>>> see
> > > >>>>>> how users might expect it and find it a bit disorienting not to
> > > >>> have. So
> > > >>>>> the
> > > >>>>>> updated proposed metrics are
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>    - record-staleness-max [ms]
> > > >>>>>>    - record-staleness-99th [ms] *(99th percentile)*
> > > >>>>>>    - record-staleness-75th [ms] *(75th percentile)*
> > > >>>>>>    - record-staleness-avg [ms] *(mean)*
> > > >>>>>>    - record-staleness-min [ms]
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> On Wed, May 13, 2020 at 4:42 PM John Roesler <
> vvcep...@apache.org
> > >
> > > >>>>> wrote:
> > > >>>>>>
> > > >>>>>>> Oh boy, I never miss an opportunity to embarrass myself. I
> guess
> > > >>> the
> > > >>>>> mean
> > > >>>>>>> seems more interesting to me than the median, but neither are
> as
> > > >>>>>>> interesting as the higher percentiles (99th and max).
> > > >>>>>>>
> > > >>>>>>> Min isn’t really important for any SLAs, but it does round out
> > the
> > > >>>>> mental
> > > >>>>>>> picture of the distribution. I’ve always graphed min along with
> > the
> > > >>>>> other
> > > >>>>>>> metrics to help me understand how fast the system can be, which
> > > >>> helps
> > > >>>>> in
> > > >>>>>>> optimization decisions. It’s also a relatively inexpensive
> metric
> > > >>> to
> > > >>>>>>> compute, so it might be nice to just throw it in.
> > > >>>>>>>
> > > >>>>>>> On Wed, May 13, 2020, at 18:18, Sophie Blee-Goldman wrote:
> > > >>>>>>>> G1:
> > > >>>>>>>> I was considering it as the "end-to-end latency *up* to the
> > > >>> specific
> > > >>>>>>> task"
> > > >>>>>>>> but
> > > >>>>>>>> I'm happy with "record-staleness" if that drives the point
> home
> > > >>>>> better.
> > > >>>>>>> So
> > > >>>>>>>> it's the
> > > >>>>>>>> "staleness of the record when it is received by that task" --
> > > >>> will
> > > >>>>> update
> > > >>>>>>>> the KIP
> > > >>>>>>>>
> > > >>>>>>>> B1/J:
> > > >>>>>>>> I'm struggling to imagine a case where the min would actually
> be
> > > >>>>> useful,
> > > >>>>>>>> rather than
> > > >>>>>>>> just intellectually interesting. I don't feel strongly that we
> > > >>>>> shouldn't
> > > >>>>>>>> add it, but that's
> > > >>>>>>>> why I didn't include it from the start. Can you enlighten me
> > > >>> with an
> > > >>>>>>>> example?
> > > >>>>>>>>
> > > >>>>>>>> I was also vaguely concerned about the overhead of adding
> > > >>> multiple
> > > >>>>>>>> percentile
> > > >>>>>>>> metrics. Do we have any data to indicate what kind of
> > performance
> > > >>>>> hit we
> > > >>>>>>>> take on
> > > >>>>>>>> metrics computation?
> > > >>>>>>>>
> > > >>>>>>>> Also, not to be too pedantic but the 50th percentile would be
> > the
> > > >>>>> median
> > > >>>>>>>> not the
> > > >>>>>>>> mean. Would you propose to add the mean *and* the 50th
> > > >>> percentile, or
> > > >>>>>>> just
> > > >>>>>>>> one
> > > >>>>>>>> of the two?
> > > >>>>>>>>
> > > >>>>>>>> Thanks all!
> > > >>>>>>>> Sophie
> > > >>>>>>>>
> > > >>>>>>>> On Wed, May 13, 2020 at 3:34 PM John Roesler <
> > > >>> vvcep...@apache.org>
> > > >>>>>>> wrote:
> > > >>>>>>>>
> > > >>>>>>>>> Hello all, and thanks for the KIP, Sophie,
> > > >>>>>>>>>
> > > >>>>>>>>> Just some comments on the discussion so far:
> > > >>>>>>>>>
> > > >>>>>>>>> B2/G1:
> > > >>>>>>>>> In principle, it shouldn't matter whether we report "spans"
> or
> > > >>>>>>>>> "end-to-end" latency. But in practice, some of the spans are
> > > >>> pretty
> > > >>>>>>>>> difficult to really measure (like time spent waiting in the
> > > >>>>> topics, or
> > > >>>>>>>>> time from the event happening to the ETL producer choosing to
> > > >>> send
> > > >>>>> it,
> > > >>>>>>>>> or time spent in send/receive buffers, etc., etc.
> > > >>>>>>>>>
> > > >>>>>>>>> In other words, it's practically easier to compute spans by
> > > >>>>> subtracting
> > > >>>>>>>>> e2e latencies than it is to compute e2e latencies by adding
> > > >>> spans.
> > > >>>>> You
> > > >>>>>>>>> can even just consider that the span computation from e2e
> > > >>> always
> > > >>>>> just
> > > >>>>>>>>> involves subtracting two numbers, whereas computing e2e
> latency
> > > >>>>> from
> > > >>>>>>>>> spans involves adding _all_ the spans leading up to the end
> you
> > > >>>>> care
> > > >>>>>>> about.
> > > >>>>>>>>>
> > > >>>>>>>>> It seems like people really prefer to have spans when they
> are
> > > >>>>>>> debugging
> > > >>>>>>>>> latency problems, whereas e2e latency is a more general
> > > >>> measurement
> > > >>>>>>>>> that basically every person/application cares about and
> should
> > > >>> be
> > > >>>>>>>>> monitoring.
> > > >>>>>>>>>
> > > >>>>>>>>> Altogether, it really seem to provide more value to more
> > > >>> people if
> > > >>>>> we
> > > >>>>>>>>> report
> > > >>>>>>>>> e2e latencies. Regarding "record-staleness" as a name, I
> think
> > > >>> I
> > > >>>>> have
> > > >>>>>>> no
> > > >>>>>>>>> preference, I'd defer to other peoples' intuition.
> > > >>>>>>>>>
> > > >>>>>>>>> G2:
> > > >>>>>>>>> I think the processor-node metric is nice, since the inside
> of
> > > >>> a
> > > >>>>> task
> > > >>>>>>> can
> > > >>>>>>>>> introduce a significant amount of latency in some cases.
> Plus,
> > > >>>>> it's a
> > > >>>>>>> more
> > > >>>>>>>>> direct measurement, if you really wanted to know (for the
> > > >>> purposes
> > > >>>>> of
> > > >>>>>>> IQ
> > > >>>>>>>>> or something) how long it takes source events to "show up" at
> > > >>> the
> > > >>>>>>> store.
> > > >>>>>>>>>
> > > >>>>>>>>> I think actually recording it at every processor could be
> > > >>>>> expensive,
> > > >>>>>>> but we
> > > >>>>>>>>> already record a bunch of metrics at the node level.
> > > >>>>>>>>>
> > > >>>>>>>>> B1:
> > > >>>>>>>>> I think 50% could be reasonable to record also. Even if it's
> a
> > > >>> poor
> > > >>>>>>> metric
> > > >>>>>>>>> for operational purposes, a lot of people might expect to see
> > > >>>>> "mean".
> > > >>>>>>>>> Actually,
> > > >>>>>>>>> I was surprised not to see "min". Is there a reason to leave
> it
> > > >>>>> off?
> > > >>>>>>>>>
> > > >>>>>>>>> I might suggest:
> > > >>>>>>>>> min, mean (50th), 75th, 99th, max
> > > >>>>>>>>>
> > > >>>>>>>>> B3:
> > > >>>>>>>>> I agree we should include late records (though not the ones
> we
> > > >>>>> drop).
> > > >>>>>>>>> It may be spiky, but only when there are legitimately some
> > > >>> records
> > > >>>>>>> with a
> > > >>>>>>>>> high end-to-end latency, which is the whole point of these
> > > >>> metrics.
> > > >>>>>>>>>
> > > >>>>>>>>> That's it! I don't think I have any other feedback, other
> than
> > > >>> a
> > > >>>>>>> request to
> > > >>>>>>>>> also report "min".
> > > >>>>>>>>>
> > > >>>>>>>>> Thanks,
> > > >>>>>>>>> -John
> > > >>>>>>>>>
> > > >>>>>>>>> On Wed, May 13, 2020, at 16:58, Guozhang Wang wrote:
> > > >>>>>>>>>> Thanks Sophie for the KIP, a few quick thoughts:
> > > >>>>>>>>>>
> > > >>>>>>>>>> 1) The end-to-end latency includes both the processing
> > > >>> latency
> > > >>>>> of the
> > > >>>>>>>>> task
> > > >>>>>>>>>> and the latency spent sitting in intermediate topics. I have
> > > >>> a
> > > >>>>>>> similar
> > > >>>>>>>>>> feeling as Boyang mentioned above that the latency metric of
> > > >>> a
> > > >>>>> task A
> > > >>>>>>>>>> actually measures the latency of the sub-topology up-to but
> > > >>> not
> > > >>>>>>> including
> > > >>>>>>>>>> the processing of A, which is a bit weird.
> > > >>>>>>>>>>
> > > >>>>>>>>>> Maybe the my feeling comes from the name "latency" itself,
> > > >>> since
> > > >>>>>>> today we
> > > >>>>>>>>>> already have several "latency" metrics already which are
> > > >>>>> measuring
> > > >>>>>>>>> elapsed
> > > >>>>>>>>>> system-time for processing a record / etc, while here we are
> > > >>>>>>> comparing
> > > >>>>>>>>> the
> > > >>>>>>>>>> system wallclock time with the record timestamp.
> > > >>>>>>>>>>
> > > >>>>>>>>>> Maybe we can consider renaming it as "record-staleness"
> > > >>> (note we
> > > >>>>>>> already
> > > >>>>>>>>>> have a "record-lateness" metric), in which case recording at
> > > >>> the
> > > >>>>>>>>>> system-time before we start processing the record sounds
> more
> > > >>>>>>> natural.
> > > >>>>>>>>>>
> > > >>>>>>>>>> 2) With that in mind, I'm wondering if the
> > > >>> processor-node-level
> > > >>>>> DEBUG
> > > >>>>>>>>>> metric is worth to add, given that we already have a
> > > >>> task-level
> > > >>>>>>>>> processing
> > > >>>>>>>>>> latency metric. Basically, a specific node's e2e latency is
> > > >>>>> similar
> > > >>>>>>> to
> > > >>>>>>>>> the
> > > >>>>>>>>>> task-level e2e latency + task-level processing latency.
> > > >>>>> Personally I
> > > >>>>>>>>> think
> > > >>>>>>>>>> having a task-level record-staleness metric is sufficient.
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>> Guozhang
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>> On Wed, May 13, 2020 at 11:46 AM Sophie Blee-Goldman <
> > > >>>>>>>>> sop...@confluent.io>
> > > >>>>>>>>>> wrote:
> > > >>>>>>>>>>
> > > >>>>>>>>>>> 1. I felt that 50% was not a particularly useful gauge for
> > > >>> this
> > > >>>>>>>>> specific
> > > >>>>>>>>>>> metric, as
> > > >>>>>>>>>>> it's presumably most useful at putting an *upper *bound on
> > > >>> the
> > > >>>>>>> latency
> > > >>>>>>>>> you
> > > >>>>>>>>>>> can
> > > >>>>>>>>>>> reasonably expect to see. I chose percentiles that would
> > > >>>>> hopefully
> > > >>>>>>>>> give a
> > > >>>>>>>>>>> good
> > > >>>>>>>>>>> sense of what *most* records will experience, and what
> > > >>> *close
> > > >>>>> to
> > > >>>>>>> all*
> > > >>>>>>>>>>> records
> > > >>>>>>>>>>> will.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> However I'm not married to these specific numbers and
> > > >>> could be
> > > >>>>>>>>> convinced.
> > > >>>>>>>>>>> Would be especially interested in hearing from users on
> > > >>> this.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> 2. I'm inclined to not include the "hop-to-hop latency" in
> > > >>>>> this KIP
> > > >>>>>>>>> since
> > > >>>>>>>>>>> users
> > > >>>>>>>>>>> can always compute it themselves by subtracting the
> > > >>> previous
> > > >>>>> node's
> > > >>>>>>>>>>> end-to-end latency. I guess we could do it either way since
> > > >>>>> you can
> > > >>>>>>>>> always
> > > >>>>>>>>>>> compute one from the other, but I think the end-to-end
> > > >>> latency
> > > >>>>>>> feels
> > > >>>>>>>>> more
> > > >>>>>>>>>>> valuable as it's main motivation is not to debug
> > > >>> bottlenecks
> > > >>>>> in the
> > > >>>>>>>>>>> topology but
> > > >>>>>>>>>>> to give users a sense of how long it takes arecord to be
> > > >>>>> reflected
> > > >>>>>>> in
> > > >>>>>>>>>>> certain parts
> > > >>>>>>>>>>> of the topology. For example this might be useful for users
> > > >>>>> who are
> > > >>>>>>>>>>> wondering
> > > >>>>>>>>>>> roughly when a record that was just produced will be
> > > >>> included
> > > >>>>> in
> > > >>>>>>> their
> > > >>>>>>>>> IQ
> > > >>>>>>>>>>> results.
> > > >>>>>>>>>>> Debugging is just a nice side effect -- but maybe I didn't
> > > >>> make
> > > >>>>>>> that
> > > >>>>>>>>> clear
> > > >>>>>>>>>>> enough
> > > >>>>>>>>>>> in the KIP's motivation.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> 3. Good question, I should address this in the KIP. The
> > > >>> short
> > > >>>>>>> answer is
> > > >>>>>>>>>>> "yes",
> > > >>>>>>>>>>> we will include late records. I added a paragraph to the
> > > >>> end
> > > >>>>> of the
> > > >>>>>>>>>>> Proposed
> > > >>>>>>>>>>> Changes section explaining the reasoning here, please let
> > > >>> me
> > > >>>>> know
> > > >>>>>>> if
> > > >>>>>>>>> you
> > > >>>>>>>>>>> have
> > > >>>>>>>>>>> any concerns.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> 4. Assuming you're referring to the existing metric
> > > >>>>>>> "process-latency",
> > > >>>>>>>>> that
> > > >>>>>>>>>>> metric
> > > >>>>>>>>>>> reflects the time for the literal Node#process method to
> > > >>> run
> > > >>>>>>> whereas
> > > >>>>>>>>> this
> > > >>>>>>>>>>> metric
> > > >>>>>>>>>>> would always be measured relative to the event timestamp.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> That said, the naming collision there is pretty confusing
> > > >>> so
> > > >>>>> I've
> > > >>>>>>>>> renamed
> > > >>>>>>>>>>> the
> > > >>>>>>>>>>> metrics in this KIP to "end-to-end-latency" which I feel
> > > >>> better
> > > >>>>>>>>> reflects
> > > >>>>>>>>>>> the nature
> > > >>>>>>>>>>> of the metric anyway.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Thanks for the feedback!
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> On Wed, May 13, 2020 at 10:21 AM Boyang Chen <
> > > >>>>>>>>> reluctanthero...@gmail.com>
> > > >>>>>>>>>>> wrote:
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>> Thanks for the KIP Sophie. Getting the E2E latency is
> > > >>>>> important
> > > >>>>>>> for
> > > >>>>>>>>>>>> understanding the bottleneck of the application.
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> A couple of questions and ideas:
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> 1. Could you clarify the rational of picking 75, 99 and
> > > >>> max
> > > >>>>>>>>> percentiles?
> > > >>>>>>>>>>>> Normally I see cases where we use 50, 90 percentile as
> > > >>> well
> > > >>>>> in
> > > >>>>>>>>> production
> > > >>>>>>>>>>>> systems.
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> 2. The current latency being computed is cumulative, I.E
> > > >>> if a
> > > >>>>>>> record
> > > >>>>>>>>> goes
> > > >>>>>>>>>>>> through A -> B -> C, then P(C) = T(B->C) + P(B) =
> > > >>> T(B->C) +
> > > >>>>>>> T(A->B) +
> > > >>>>>>>>>>> T(A)
> > > >>>>>>>>>>>> and so on, where P() represents the captured latency,
> > > >>> and T()
> > > >>>>>>>>> represents
> > > >>>>>>>>>>>> the time for transiting the records between two nodes,
> > > >>>>> including
> > > >>>>>>>>>>> processing
> > > >>>>>>>>>>>> time. For monitoring purpose, maybe having T(B->C) and
> > > >>>>> T(A->B)
> > > >>>>>>> are
> > > >>>>>>>>> more
> > > >>>>>>>>>>>> natural to view as "hop-to-hop latency", otherwise if
> > > >>> there
> > > >>>>> is a
> > > >>>>>>>>> spike in
> > > >>>>>>>>>>>> T(A->B), both P(B) and P(C) are affected in the same
> > > >>> time.
> > > >>>>> In the
> > > >>>>>>>>> same
> > > >>>>>>>>>>>> spirit, the E2E latency is meaningful only when the
> > > >>> record
> > > >>>>> exits
> > > >>>>>>>>> from the
> > > >>>>>>>>>>>> sink as this marks the whole time this record spent
> > > >>> inside
> > > >>>>> the
> > > >>>>>>>>> funnel. Do
> > > >>>>>>>>>>>> you think we could have separate treatment for sink
> > > >>> nodes and
> > > >>>>>>> other
> > > >>>>>>>>>>>> nodes, so that other nodes only count the time receiving
> > > >>> the
> > > >>>>>>> record
> > > >>>>>>>>> from
> > > >>>>>>>>>>>> last hop? I'm not proposing a solution here, just want to
> > > >>>>> discuss
> > > >>>>>>>>> this
> > > >>>>>>>>>>>> alternative to see if it is reasonable.
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> 3. As we are going to monitor late arrival records as
> > > >>> well,
> > > >>>>> they
> > > >>>>>>>>> would
> > > >>>>>>>>>>>> create some really spiky graphs when the out-of-order
> > > >>>>> records are
> > > >>>>>>>>>>>> interleaving with on time records. Should we also supply
> > > >>> a
> > > >>>>> smooth
> > > >>>>>>>>> version
> > > >>>>>>>>>>>> of the latency metrics, or user should just take care of
> > > >>> it
> > > >>>>> by
> > > >>>>>>>>> themself?
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> 4. Regarding this new metrics, we haven't discussed its
> > > >>>>> relation
> > > >>>>>>>>> with our
> > > >>>>>>>>>>>> existing processing latency metrics, could you add some
> > > >>>>> context
> > > >>>>>>> on
> > > >>>>>>>>>>>> comparison and a simple `when to use which` tutorial for
> > > >>> the
> > > >>>>>>> best?
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> Boyang
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> On Tue, May 12, 2020 at 7:28 PM Sophie Blee-Goldman <
> > > >>>>>>>>> sop...@confluent.io
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>> Hey all,
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> I'd like to kick off discussion on KIP-613 which aims
> > > >>> to
> > > >>>>> add
> > > >>>>>>>>> end-to-end
> > > >>>>>>>>>>>>> latency metrics to Streams. Please take a look:
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>
> > > >>>>>
> > > >>>
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-613%3A+Add+end-to-end+latency+metrics+to+Streams
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> Cheers,
> > > >>>>>>>>>>>>> Sophie
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>> --
> > > >>>>>>>>>> -- Guozhang
> > > >>>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>
> > > >>
> > >
> > >
> >
>

Re: [DISCUSS] KIP-613: Add end-to-end latency metrics to Streams

Reply via email to