Re: [DISCUSS] KIP-613: Add end-to-end latency metrics to Streams

Sophie Blee-Goldman Fri, 15 May 2020 15:22:24 -0700

> @Matthias
> And that's what he says about the 95th percentile! Imagine what he would
say about the 50th :P


I should have kept watching another 10 minutes. He gets around to covering
the 50th, and let's
just say he is not a fan: https://youtu.be/lJ8ydIuPFeU?t=786

On Fri, May 15, 2020 at 3:15 PM John Roesler <[email protected]> wrote:

> Thanks, Sophie!
>
> I think this makes perfect sense. It will be much more intuitive to use
> the metric for the stated motivation this way.  I’d be in favor of the
> proposal after this update.
>
> Thanks again for taking this on,
> -John
>
> On Fri, May 15, 2020, at 17:07, Sophie Blee-Goldman wrote:
> > Hi all,
> >
> > I'd like to clarify/modify one aspect of this KIP, which is to record the
> > staleness
> > at the *completion* of the record's processing by the operator/task in
> > question,
> > rather than on the intake. The task-level metrics will be recorded at the
> > sink
> > node instead of at the source, and the operator-level metrics will be
> > recorded
> > at the end of the operation.
> >
> > The stated purpose and intended usefulness of this KIP is to give users a
> > way
> > to gauge roughly how long it takes for a record to be reflected in the
> > "results",
> > whether these results are being read from an output topic or through IQ.
> To
> > take the IQ example, the results of a record are obviously not visible
> until
> > *after* that node has finished processing it. The staleness, while still
> > potentially
> > useful as it can impact the *way* a record is processed in a stateful and
> > time-
> > dependent operator, is not part of the problem this KIP specifically set
> out
> > to solve.
> >
> > In light of this I think it's appropriate to revert the name change back
> to
> > include
> > latency, since "staleness" as described above only makes sense when
> > measuring
> > relative to the arrival of a record at a task/node. I'd propose to save
> the
> > term
> > "staleness" for that particular meaning, and adopt Matthias's suggestion
> of
> > "record-e2e-latency" for this.
> >
> > Thanks for hanging in there all. Please let me know if you have any
> > concerns about
> > this change!
> >
> > Sophie
> >
> > On Fri, May 15, 2020 at 2:25 PM Matthias J. Sax <[email protected]>
> wrote:
> >
> > > I am also happy with max/min/99/90. And I buy your naming argument
> about
> > > staleness vs latency.
> > >
> > >
> > > -Matthias
> > >
> > > On 5/15/20 12:24 PM, Boyang Chen wrote:
> > > > Hey Sophie,
> > > >
> > > > 90/99/min/max make sense to me.
> > > >
> > > > On Fri, May 15, 2020 at 12:20 PM Sophie Blee-Goldman <
> > > [email protected]>
> > > > wrote:
> > > >
> > > >> @Matthias
> > > >> Regarding tracking the 50th percentile, I'll refer you to the 4:53
> mark
> > > of
> > > >> the video
> > > >> *you* linked: https://youtu.be/lJ8ydIuPFeU?t=293
> > > >>
> > > >> And that's what he says about the 95th percentile! Imagine what he
> would
> > > >> say about
> > > >> the 50th :P
> > > >>
> > > >> But seriously, since we can't seem to agree that the mean or 50th
> > > >> percentile is actually
> > > >> useful I'm inclined to resurrect my original proposal, neither. But
> I
> > > think
> > > >> that's a good
> > > >> argument against the 75th, which I admittedly chose somewhat
> > > arbitrarily as
> > > >> an
> > > >> intermediate between the 50th and the higher percentiles. How about:
> > > >>
> > > >> -max
> > > >> -p99
> > > >> -p90
> > > >> -min
> > > >>
> > > >> with p50/mean still up for debate if anyone feels strongly for
> either of
> > > >> them.
> > > >>
> > > >> Regarding the name, I've already flip-flopped on this so I'm
> definitely
> > > >> still open to
> > > >> further arguments. But the reason for changing it from
> > > end-to-end-latency
> > > >> (which
> > > >> is similar to what you propose) is that this metric technically
> reflects
> > > >> how old (ie how "stale")
> > > >> the record is when it's *received* by the operator, not when it's
> > > processed
> > > >> by the operator.
> > > >> It seemed like there was the potential for confusion that
> > > >> "end-to-end-latency" might
> > > >> represent the latency from the event creation to the time the
> processor
> > > is
> > > >> done
> > > >> processing it.
> > > >>
> > > >> @John
> > > >> I'd rather err on the side of "not-enough" metrics as we can always
> add
> > > >> this to the
> > > >> stateless metrics later on. If we decide to measure the time at
> every
> > > node
> > > >> and don't
> > > >> find any evidence of a serious performance impact, and users
> indicate
> > > they
> > > >> would
> > > >> like to see this metric at all nodes, then we can easily start
> reporting
> > > >> them as well.
> > > >> WDYT?
> > > >>
> > > >> That said, sink nodes seem like a reasonable exception to the rule.
> > > >> Obviously users
> > > >> should be able to detect the time when the record reaches the output
> > > topic
> > > >> but that
> > > >> still leaves a gap in understanding how long the production latency
> was.
> > > >> This mirrors
> > > >> the consumption latency that is exposed by the task-level metrics,
> which
> > > >> are measured
> > > >> at the source node. For good symmetry what if we actually expose
> both
> > > the
> > > >> source
> > > >> and sink latency at the task-level? ie report both sets of
> statistical
> > > >> measurements with
> > > >> the additional tag -source/-sink
> > > >>
> > > >> @Bill
> > > >> Thanks for the comment about regarding the min! I hadn't considered
> that
> > > >> and it's
> > > >> quite useful to think about how and what is useful from a users
> point of
> > > >> view.
> > > >>
> > > >> Regarding your second. point, I'm inclined to leave that as an
> > > >> implementation detail
> > > >> but my take would be that the user should be allowed to control the
> > > record
> > > >> timestamp
> > > >> used for this with the timestamp extractor. My impression is that
> users
> > > may
> > > >> often embed
> > > >> the actual event time in the payload for whatever reason, and this
> > > >> represents the "true"
> > > >> timestamp as far as the Streams topology is concerned.
> > > >>
> > > >>
> > > >> On Fri, May 15, 2020 at 11:05 AM Bill Bejeck <[email protected]>
> wrote:
> > > >>
> > > >>> Thanks for the KIP, Sophie, this will be a useful metric to add.
> > > >>>
> > > >>> Regarding tracking min, I  think it could be valuable for users to
> > > >> discern
> > > >>> which part of their topologies are more efficient since this is a
> > > >>> task-level metric.  I realize everyone seems to be on board with
> > > >> including
> > > >>> min anyway, but I wanted to add my 2 cents on this topic should we
> > > decide
> > > >>> to revisit adding min or not.
> > > >>>
> > > >>> I do have a question regarding the calculation of staleness.
> > > >>> Is there going to be a consideration for timestamp extractors?
> Users
> > > >> could
> > > >>> prefer to use a timestamp embedded in the payload, and it could
> skew
> > > the
> > > >>> measurements.
> > > >>> I was wondering if we should specify in the KIP if setting the
> arrival
> > > >> time
> > > >>> is always going to come from the record timestamp, or is this an
> > > >>> implementation detail we can cover in the PR?
> > > >>>
> > > >>> Thanks!
> > > >>> Bill
> > > >>>
> > > >>> On Fri, May 15, 2020 at 1:11 AM Matthias J. Sax <[email protected]>
> > > >> wrote:
> > > >>>
> > > >>>> Thanks for the KIP Sophie.
> > > >>>>
> > > >>>> I think it's not useful to record the avg/mean; it sensitive to
> > > >>>> outliers. We should rather track the median (50th percentile).
> > > >>>>
> > > >>>> Not sure if tracking min is useful, but I am also ok to track it.
> > > >>>>
> > > >>>> However, I find it odd to track 75th percentile. Standard measures
> > > >> would
> > > >>>> the 90th or 95th -- I guess we don't need both, so maybe picking
> 90th
> > > >>>> might be more useful?
> > > >>>>
> > > >>>> About the name: "staleness" wound really odd, and if fact the
> metric
> > > >>>> does capture "latency" so we should call it "latency". I
> understand
> > > the
> > > >>>> issue that we already have a latency metric. So maybe we could
> call it
> > > >>>> `record-e2e-latency-*` ?
> > > >>>>
> > > >>>> While I agree that we should include out-or-order data (the KIP
> should
> > > >>>> talk about `out-of-order` data, not `late` data; data is only
> `late`
> > > if
> > > >>>> it's out-of-order and if it's dropped), I don't really understand
> why
> > > >>>> the new metric would help to configure grace period or retention
> time?
> > > >>>> As you mention in the KIP, both are define as max difference of
> > > >>>> `event-time - stream-time` and thus the new metric that takes
> > > >>>> system-/wallclock-time into account does not seem to help at all.
> > > >>>>
> > > >>>>
> > > >>>> Btw: there is a great talk about "How NOT to Measure Latency" by
> Gil
> > > >>>> Tene: https://www.youtube.com/watch?v=lJ8ydIuPFeU
> > > >>>>
> > > >>>>
> > > >>>> -Matthias
> > > >>>>
> > > >>>>
> > > >>>> On 5/14/20 7:17 PM, John Roesler wrote:
> > > >>>>> Hi Sophie,
> > > >>>>>
> > > >>>>> It seems like there would still be plenty of use cases for
> recording
> > > >>>>> this metric at all processors and not just stateful ones, but I'm
> > > >> happy
> > > >>>>> to suspend my arguments for now. Since you're proposing to keep
> > > >>>>> them at the processor-node level, it will be seamless later to
> add
> > > >>>>> in the stateless processors if we want. As a wise man once said,
> > > >>>>> "Adding is always easier than removing."
> > > >>>>>
> > > >>>>> Regarding the time measurement, it's an implementation detail
> > > >>>>> we don't need to consider in the KIP. Nevertheless, I'd greatly
> > > >>>>> prefer to measure the system time again when recording the
> > > >>>>> metric. I don't think we've seen any evidence that proves this
> > > >>>>> would harm performance, and the amount of inaccuracy using
> > > >>>>> the cached system time could incur is actually substantial. But,
> > > >>>>> if you want to just "not mention this" in the KIP, we can defer
> to
> > > >>>>> the actual PR discussion, at which time we're in a better
> position
> > > >>>>> to use benchmarks, etc., to make the call.
> > > >>>>>
> > > >>>>> Along the lines of the measurement accuracy discussion, one
> > > >>>>> minor thought I had is that maybe we should consider measuring
> > > >>>>> the task staleness metric at the sink, rather than the source, so
> > > >> that
> > > >>>>> it includes the processing latency of the task itself, not just
> the
> > > >>>> latency
> > > >>>>> of everything up to, but not including, the task (which seems
> > > >> confusing
> > > >>>>> for users). I guess this could also be an implementation detail,
> > > >>> though.
> > > >>>>>
> > > >>>>> Thanks for the update,
> > > >>>>> -John
> > > >>>>>
> > > >>>>> On Thu, May 14, 2020, at 13:31, Sophie Blee-Goldman wrote:
> > > >>>>>> Hey all,
> > > >>>>>>
> > > >>>>>> After discussing with Bruno I'd like to propose a small
> amendment,
> > > >>>>>> which is to record the processor-node-level metrics only for
> > > >>> *stateful*
> > > >>>>>> *operators*. They would still be considered a
> "processor-node-level"
> > > >>>>>> metric and not a "state-store-level" metric as the staleness is
> > > >> still
> > > >>>>>> a property of the node rather than of the state itself.
> However, it
> > > >>>> seems
> > > >>>>>> that this information is primarily useful for stateful operators
> > > >> that
> > > >>>> might
> > > >>>>>> be exposing state via IQ or otherwise dependent on the record
> time
> > > >>>>>> unlike a stateless operator.
> > > >>>>>>
> > > >>>>>> It's worth calling out that recent performance improvements to
> the
> > > >>>> metrics
> > > >>>>>> framework mean that we no longer fetch the system time at the
> > > >> operator
> > > >>>>>> level, but only once per task. In other words the system time
> is not
> > > >>>> updated
> > > >>>>>> between each process as a record flows through the subtopology,
> so
> > > >>>>>> debugging the processor-level latency via the stateleness will
> not
> > > >> be
> > > >>>>>> possible.Note that this doesn't mean the operator-level metrics
> are
> > > >>> not
> > > >>>>>> *useful* relative to the task-level metric. Upstream caching
> and/or
> > > >>>>>> suppression
> > > >>>>>> can still cause a record's staleness at some downstream stateful
> > > >>>> operator
> > > >>>>>> to deviate from the task-level staleness (recorded at the source
> > > >>> node).
> > > >>>>>>
> > > >>>>>> Please let me know if you have any concerns about this change.
> The
> > > >>>>>> KIP has been updated with the new proposal
> > > >>>>>>
> > > >>>>>> On Thu, May 14, 2020 at 3:04 AM Bruno Cadonna <
> [email protected]>
> > > >>>> wrote:
> > > >>>>>>
> > > >>>>>>> Hi Sophie,
> > > >>>>>>>
> > > >>>>>>> Thank you for the KIP.
> > > >>>>>>>
> > > >>>>>>> The KIP looks good to me.
> > > >>>>>>>
> > > >>>>>>> 50th percentile:
> > > >>>>>>> I think we do not need it now. If we need it, we can add it.
> Here
> > > >> the
> > > >>>>>>> old truism applies: Adding is always easier than removing.
> > > >>>>>>>
> > > >>>>>>> processor-node-level metrics:
> > > >>>>>>> I think it is good to have the staleness metrics also on
> > > >>>>>>> processor-node-level. If we do not want to record them on all
> > > >>>>>>> processor nodes, you could restrict the recording to stateful
> > > >>>>>>> processor-nodes, since those are the ones that would benefit
> most
> > > >>> from
> > > >>>>>>> the staleness metrics.
> > > >>>>>>>
> > > >>>>>>> Best,
> > > >>>>>>> Bruno
> > > >>>>>>>
> > > >>>>>>> On Thu, May 14, 2020 at 4:15 AM Sophie Blee-Goldman <
> > > >>>> [email protected]>
> > > >>>>>>> wrote:
> > > >>>>>>>>
> > > >>>>>>>> Yeah, the specific reason was just to align with the current
> > > >>> metrics.
> > > >>>>>>>>
> > > >>>>>>>> Is it better to conform than to be right? History has a lot
> to say
> > > >>> on
> > > >>>>>>> that
> > > >>>>>>>> matter
> > > >>>>>>>> but I'm not sure how much of it applies to the fine details of
> > > >>> metrics
> > > >>>>>>>> naming :P
> > > >>>>>>>>
> > > >>>>>>>> More seriously, I figured if people are looking at this metric
> > > >>> they're
> > > >>>>>>>> likely to
> > > >>>>>>>> be looking at all the others. Then naming this one "-mean"
> would
> > > >>>> probably
> > > >>>>>>>> lead some to conclude that the "-avg" suffix in the other
> metrics
> > > >>> has
> > > >>>> a
> > > >>>>>>>> different meaning.
> > > >>>>>>>>
> > > >>>>>>>> As for the percentiles, I actually like p99 (and p75) better.
> I'll
> > > >>>> swap
> > > >>>>>>>> that out
> > > >>>>>>>>
> > > >>>>>>>> On Wed, May 13, 2020 at 7:07 PM John Roesler <
> [email protected]
> > > >>>
> > > >>>>>>> wrote:
> > > >>>>>>>>
> > > >>>>>>>>> Thanks Sophie,
> > > >>>>>>>>>
> > > >>>>>>>>> I hope this isn't too nit-picky, but is there a reason to
> choose
> > > >>>> "avg"
> > > >>>>>>>>> instead
> > > >>>>>>>>> of "mean"? Maybe this is too paranoid, and I might be
> > > >> oversensitive
> > > >>>>>>> because
> > > >>>>>>>>> of the mistake I just made earlier, but it strikes me that
> "avg"
> > > >> is
> > > >>>>>>>>> actually
> > > >>>>>>>>> ambiguous, as it refers to a family of statistics, whereas
> "mean"
> > > >>> is
> > > >>>>>>>>> specific.
> > > >>>>>>>>> I see other Kafka metrics with "avg", but none with "mean";
> was
> > > >>> that
> > > >>>>>>> the
> > > >>>>>>>>> reason? If so, I'm +1.
> > > >>>>>>>>>
> > > >>>>>>>>> Regarding the names of the percentile, I actually couldn't
> find
> > > >>> _any_
> > > >>>>>>> other
> > > >>>>>>>>> metrics that use percentile. Was there a reason to choose
> "99th"
> > > >> as
> > > >>>>>>> opposed
> > > >>>>>>>>> to "p99" or any other scheme? This is not a criticism, I'm
> just
> > > >>>>>>> primarily
> > > >>>>>>>>> asking
> > > >>>>>>>>> for consistency's sake.
> > > >>>>>>>>>
> > > >>>>>>>>> Thanks again,
> > > >>>>>>>>> -John
> > > >>>>>>>>>
> > > >>>>>>>>> On Wed, May 13, 2020, at 19:19, Sophie Blee-Goldman wrote:
> > > >>>>>>>>>> Alright, I can get behind adding the min metric for the
> sake of
> > > >>>>>>> pretty
> > > >>>>>>>>>> graphs
> > > >>>>>>>>>> (and trivial computation).
> > > >>>>>>>>>>
> > > >>>>>>>>>> I'm still on the fence regarding the mean (or 50th
> percentile)
> > > >>> but I
> > > >>>>>>> can
> > > >>>>>>>>> see
> > > >>>>>>>>>> how users might expect it and find it a bit disorienting
> not to
> > > >>>>>>> have. So
> > > >>>>>>>>> the
> > > >>>>>>>>>> updated proposed metrics are
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>>    - record-staleness-max [ms]
> > > >>>>>>>>>>    - record-staleness-99th [ms] *(99th percentile)*
> > > >>>>>>>>>>    - record-staleness-75th [ms] *(75th percentile)*
> > > >>>>>>>>>>    - record-staleness-avg [ms] *(mean)*
> > > >>>>>>>>>>    - record-staleness-min [ms]
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>> On Wed, May 13, 2020 at 4:42 PM John Roesler <
> > > >> [email protected]
> > > >>>>
> > > >>>>>>>>> wrote:
> > > >>>>>>>>>>
> > > >>>>>>>>>>> Oh boy, I never miss an opportunity to embarrass myself. I
> > > >> guess
> > > >>>>>>> the
> > > >>>>>>>>> mean
> > > >>>>>>>>>>> seems more interesting to me than the median, but neither
> are
> > > >> as
> > > >>>>>>>>>>> interesting as the higher percentiles (99th and max).
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Min isn’t really important for any SLAs, but it does round
> out
> > > >>> the
> > > >>>>>>>>> mental
> > > >>>>>>>>>>> picture of the distribution. I’ve always graphed min along
> with
> > > >>> the
> > > >>>>>>>>> other
> > > >>>>>>>>>>> metrics to help me understand how fast the system can be,
> which
> > > >>>>>>> helps
> > > >>>>>>>>> in
> > > >>>>>>>>>>> optimization decisions. It’s also a relatively inexpensive
> > > >> metric
> > > >>>>>>> to
> > > >>>>>>>>>>> compute, so it might be nice to just throw it in.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> On Wed, May 13, 2020, at 18:18, Sophie Blee-Goldman wrote:
> > > >>>>>>>>>>>> G1:
> > > >>>>>>>>>>>> I was considering it as the "end-to-end latency *up* to
> the
> > > >>>>>>> specific
> > > >>>>>>>>>>> task"
> > > >>>>>>>>>>>> but
> > > >>>>>>>>>>>> I'm happy with "record-staleness" if that drives the point
> > > >> home
> > > >>>>>>>>> better.
> > > >>>>>>>>>>> So
> > > >>>>>>>>>>>> it's the
> > > >>>>>>>>>>>> "staleness of the record when it is received by that
> task" --
> > > >>>>>>> will
> > > >>>>>>>>> update
> > > >>>>>>>>>>>> the KIP
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> B1/J:
> > > >>>>>>>>>>>> I'm struggling to imagine a case where the min would
> actually
> > > >> be
> > > >>>>>>>>> useful,
> > > >>>>>>>>>>>> rather than
> > > >>>>>>>>>>>> just intellectually interesting. I don't feel strongly
> that we
> > > >>>>>>>>> shouldn't
> > > >>>>>>>>>>>> add it, but that's
> > > >>>>>>>>>>>> why I didn't include it from the start. Can you enlighten
> me
> > > >>>>>>> with an
> > > >>>>>>>>>>>> example?
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> I was also vaguely concerned about the overhead of adding
> > > >>>>>>> multiple
> > > >>>>>>>>>>>> percentile
> > > >>>>>>>>>>>> metrics. Do we have any data to indicate what kind of
> > > >>> performance
> > > >>>>>>>>> hit we
> > > >>>>>>>>>>>> take on
> > > >>>>>>>>>>>> metrics computation?
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> Also, not to be too pedantic but the 50th percentile
> would be
> > > >>> the
> > > >>>>>>>>> median
> > > >>>>>>>>>>>> not the
> > > >>>>>>>>>>>> mean. Would you propose to add the mean *and* the 50th
> > > >>>>>>> percentile, or
> > > >>>>>>>>>>> just
> > > >>>>>>>>>>>> one
> > > >>>>>>>>>>>> of the two?
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> Thanks all!
> > > >>>>>>>>>>>> Sophie
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> On Wed, May 13, 2020 at 3:34 PM John Roesler <
> > > >>>>>>> [email protected]>
> > > >>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>> Hello all, and thanks for the KIP, Sophie,
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> Just some comments on the discussion so far:
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> B2/G1:
> > > >>>>>>>>>>>>> In principle, it shouldn't matter whether we report
> "spans"
> > > >> or
> > > >>>>>>>>>>>>> "end-to-end" latency. But in practice, some of the spans
> are
> > > >>>>>>> pretty
> > > >>>>>>>>>>>>> difficult to really measure (like time spent waiting in
> the
> > > >>>>>>>>> topics, or
> > > >>>>>>>>>>>>> time from the event happening to the ETL producer
> choosing to
> > > >>>>>>> send
> > > >>>>>>>>> it,
> > > >>>>>>>>>>>>> or time spent in send/receive buffers, etc., etc.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> In other words, it's practically easier to compute spans
> by
> > > >>>>>>>>> subtracting
> > > >>>>>>>>>>>>> e2e latencies than it is to compute e2e latencies by
> adding
> > > >>>>>>> spans.
> > > >>>>>>>>> You
> > > >>>>>>>>>>>>> can even just consider that the span computation from e2e
> > > >>>>>>> always
> > > >>>>>>>>> just
> > > >>>>>>>>>>>>> involves subtracting two numbers, whereas computing e2e
> > > >> latency
> > > >>>>>>>>> from
> > > >>>>>>>>>>>>> spans involves adding _all_ the spans leading up to the
> end
> > > >> you
> > > >>>>>>>>> care
> > > >>>>>>>>>>> about.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> It seems like people really prefer to have spans when
> they
> > > >> are
> > > >>>>>>>>>>> debugging
> > > >>>>>>>>>>>>> latency problems, whereas e2e latency is a more general
> > > >>>>>>> measurement
> > > >>>>>>>>>>>>> that basically every person/application cares about and
> > > >> should
> > > >>>>>>> be
> > > >>>>>>>>>>>>> monitoring.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> Altogether, it really seem to provide more value to more
> > > >>>>>>> people if
> > > >>>>>>>>> we
> > > >>>>>>>>>>>>> report
> > > >>>>>>>>>>>>> e2e latencies. Regarding "record-staleness" as a name, I
> > > >> think
> > > >>>>>>> I
> > > >>>>>>>>> have
> > > >>>>>>>>>>> no
> > > >>>>>>>>>>>>> preference, I'd defer to other peoples' intuition.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> G2:
> > > >>>>>>>>>>>>> I think the processor-node metric is nice, since the
> inside
> > > >> of
> > > >>>>>>> a
> > > >>>>>>>>> task
> > > >>>>>>>>>>> can
> > > >>>>>>>>>>>>> introduce a significant amount of latency in some cases.
> > > >> Plus,
> > > >>>>>>>>> it's a
> > > >>>>>>>>>>> more
> > > >>>>>>>>>>>>> direct measurement, if you really wanted to know (for the
> > > >>>>>>> purposes
> > > >>>>>>>>> of
> > > >>>>>>>>>>> IQ
> > > >>>>>>>>>>>>> or something) how long it takes source events to "show
> up" at
> > > >>>>>>> the
> > > >>>>>>>>>>> store.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> I think actually recording it at every processor could be
> > > >>>>>>>>> expensive,
> > > >>>>>>>>>>> but we
> > > >>>>>>>>>>>>> already record a bunch of metrics at the node level.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> B1:
> > > >>>>>>>>>>>>> I think 50% could be reasonable to record also. Even if
> it's
> > > >> a
> > > >>>>>>> poor
> > > >>>>>>>>>>> metric
> > > >>>>>>>>>>>>> for operational purposes, a lot of people might expect
> to see
> > > >>>>>>>>> "mean".
> > > >>>>>>>>>>>>> Actually,
> > > >>>>>>>>>>>>> I was surprised not to see "min". Is there a reason to
> leave
> > > >> it
> > > >>>>>>>>> off?
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> I might suggest:
> > > >>>>>>>>>>>>> min, mean (50th), 75th, 99th, max
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> B3:
> > > >>>>>>>>>>>>> I agree we should include late records (though not the
> ones
> > > >> we
> > > >>>>>>>>> drop).
> > > >>>>>>>>>>>>> It may be spiky, but only when there are legitimately
> some
> > > >>>>>>> records
> > > >>>>>>>>>>> with a
> > > >>>>>>>>>>>>> high end-to-end latency, which is the whole point of
> these
> > > >>>>>>> metrics.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> That's it! I don't think I have any other feedback, other
> > > >> than
> > > >>>>>>> a
> > > >>>>>>>>>>> request to
> > > >>>>>>>>>>>>> also report "min".
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> Thanks,
> > > >>>>>>>>>>>>> -John
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> On Wed, May 13, 2020, at 16:58, Guozhang Wang wrote:
> > > >>>>>>>>>>>>>> Thanks Sophie for the KIP, a few quick thoughts:
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> 1) The end-to-end latency includes both the processing
> > > >>>>>>> latency
> > > >>>>>>>>> of the
> > > >>>>>>>>>>>>> task
> > > >>>>>>>>>>>>>> and the latency spent sitting in intermediate topics. I
> have
> > > >>>>>>> a
> > > >>>>>>>>>>> similar
> > > >>>>>>>>>>>>>> feeling as Boyang mentioned above that the latency
> metric of
> > > >>>>>>> a
> > > >>>>>>>>> task A
> > > >>>>>>>>>>>>>> actually measures the latency of the sub-topology up-to
> but
> > > >>>>>>> not
> > > >>>>>>>>>>> including
> > > >>>>>>>>>>>>>> the processing of A, which is a bit weird.
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> Maybe the my feeling comes from the name "latency"
> itself,
> > > >>>>>>> since
> > > >>>>>>>>>>> today we
> > > >>>>>>>>>>>>>> already have several "latency" metrics already which are
> > > >>>>>>>>> measuring
> > > >>>>>>>>>>>>> elapsed
> > > >>>>>>>>>>>>>> system-time for processing a record / etc, while here
> we are
> > > >>>>>>>>>>> comparing
> > > >>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>> system wallclock time with the record timestamp.
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> Maybe we can consider renaming it as "record-staleness"
> > > >>>>>>> (note we
> > > >>>>>>>>>>> already
> > > >>>>>>>>>>>>>> have a "record-lateness" metric), in which case
> recording at
> > > >>>>>>> the
> > > >>>>>>>>>>>>>> system-time before we start processing the record sounds
> > > >> more
> > > >>>>>>>>>>> natural.
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> 2) With that in mind, I'm wondering if the
> > > >>>>>>> processor-node-level
> > > >>>>>>>>> DEBUG
> > > >>>>>>>>>>>>>> metric is worth to add, given that we already have a
> > > >>>>>>> task-level
> > > >>>>>>>>>>>>> processing
> > > >>>>>>>>>>>>>> latency metric. Basically, a specific node's e2e
> latency is
> > > >>>>>>>>> similar
> > > >>>>>>>>>>> to
> > > >>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>> task-level e2e latency + task-level processing latency.
> > > >>>>>>>>> Personally I
> > > >>>>>>>>>>>>> think
> > > >>>>>>>>>>>>>> having a task-level record-staleness metric is
> sufficient.
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> Guozhang
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> On Wed, May 13, 2020 at 11:46 AM Sophie Blee-Goldman <
> > > >>>>>>>>>>>>> [email protected]>
> > > >>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> 1. I felt that 50% was not a particularly useful gauge
> for
> > > >>>>>>> this
> > > >>>>>>>>>>>>> specific
> > > >>>>>>>>>>>>>>> metric, as
> > > >>>>>>>>>>>>>>> it's presumably most useful at putting an *upper
> *bound on
> > > >>>>>>> the
> > > >>>>>>>>>>> latency
> > > >>>>>>>>>>>>> you
> > > >>>>>>>>>>>>>>> can
> > > >>>>>>>>>>>>>>> reasonably expect to see. I chose percentiles that
> would
> > > >>>>>>>>> hopefully
> > > >>>>>>>>>>>>> give a
> > > >>>>>>>>>>>>>>> good
> > > >>>>>>>>>>>>>>> sense of what *most* records will experience, and what
> > > >>>>>>> *close
> > > >>>>>>>>> to
> > > >>>>>>>>>>> all*
> > > >>>>>>>>>>>>>>> records
> > > >>>>>>>>>>>>>>> will.
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> However I'm not married to these specific numbers and
> > > >>>>>>> could be
> > > >>>>>>>>>>>>> convinced.
> > > >>>>>>>>>>>>>>> Would be especially interested in hearing from users on
> > > >>>>>>> this.
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> 2. I'm inclined to not include the "hop-to-hop
> latency" in
> > > >>>>>>>>> this KIP
> > > >>>>>>>>>>>>> since
> > > >>>>>>>>>>>>>>> users
> > > >>>>>>>>>>>>>>> can always compute it themselves by subtracting the
> > > >>>>>>> previous
> > > >>>>>>>>> node's
> > > >>>>>>>>>>>>>>> end-to-end latency. I guess we could do it either way
> since
> > > >>>>>>>>> you can
> > > >>>>>>>>>>>>> always
> > > >>>>>>>>>>>>>>> compute one from the other, but I think the end-to-end
> > > >>>>>>> latency
> > > >>>>>>>>>>> feels
> > > >>>>>>>>>>>>> more
> > > >>>>>>>>>>>>>>> valuable as it's main motivation is not to debug
> > > >>>>>>> bottlenecks
> > > >>>>>>>>> in the
> > > >>>>>>>>>>>>>>> topology but
> > > >>>>>>>>>>>>>>> to give users a sense of how long it takes arecord to
> be
> > > >>>>>>>>> reflected
> > > >>>>>>>>>>> in
> > > >>>>>>>>>>>>>>> certain parts
> > > >>>>>>>>>>>>>>> of the topology. For example this might be useful for
> users
> > > >>>>>>>>> who are
> > > >>>>>>>>>>>>>>> wondering
> > > >>>>>>>>>>>>>>> roughly when a record that was just produced will be
> > > >>>>>>> included
> > > >>>>>>>>> in
> > > >>>>>>>>>>> their
> > > >>>>>>>>>>>>> IQ
> > > >>>>>>>>>>>>>>> results.
> > > >>>>>>>>>>>>>>> Debugging is just a nice side effect -- but maybe I
> didn't
> > > >>>>>>> make
> > > >>>>>>>>>>> that
> > > >>>>>>>>>>>>> clear
> > > >>>>>>>>>>>>>>> enough
> > > >>>>>>>>>>>>>>> in the KIP's motivation.
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> 3. Good question, I should address this in the KIP. The
> > > >>>>>>> short
> > > >>>>>>>>>>> answer is
> > > >>>>>>>>>>>>>>> "yes",
> > > >>>>>>>>>>>>>>> we will include late records. I added a paragraph to
> the
> > > >>>>>>> end
> > > >>>>>>>>> of the
> > > >>>>>>>>>>>>>>> Proposed
> > > >>>>>>>>>>>>>>> Changes section explaining the reasoning here, please
> let
> > > >>>>>>> me
> > > >>>>>>>>> know
> > > >>>>>>>>>>> if
> > > >>>>>>>>>>>>> you
> > > >>>>>>>>>>>>>>> have
> > > >>>>>>>>>>>>>>> any concerns.
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> 4. Assuming you're referring to the existing metric
> > > >>>>>>>>>>> "process-latency",
> > > >>>>>>>>>>>>> that
> > > >>>>>>>>>>>>>>> metric
> > > >>>>>>>>>>>>>>> reflects the time for the literal Node#process method
> to
> > > >>>>>>> run
> > > >>>>>>>>>>> whereas
> > > >>>>>>>>>>>>> this
> > > >>>>>>>>>>>>>>> metric
> > > >>>>>>>>>>>>>>> would always be measured relative to the event
> timestamp.
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> That said, the naming collision there is pretty
> confusing
> > > >>>>>>> so
> > > >>>>>>>>> I've
> > > >>>>>>>>>>>>> renamed
> > > >>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>> metrics in this KIP to "end-to-end-latency" which I
> feel
> > > >>>>>>> better
> > > >>>>>>>>>>>>> reflects
> > > >>>>>>>>>>>>>>> the nature
> > > >>>>>>>>>>>>>>> of the metric anyway.
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> Thanks for the feedback!
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> On Wed, May 13, 2020 at 10:21 AM Boyang Chen <
> > > >>>>>>>>>>>>> [email protected]>
> > > >>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> Thanks for the KIP Sophie. Getting the E2E latency is
> > > >>>>>>>>> important
> > > >>>>>>>>>>> for
> > > >>>>>>>>>>>>>>>> understanding the bottleneck of the application.
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> A couple of questions and ideas:
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> 1. Could you clarify the rational of picking 75, 99
> and
> > > >>>>>>> max
> > > >>>>>>>>>>>>> percentiles?
> > > >>>>>>>>>>>>>>>> Normally I see cases where we use 50, 90 percentile as
> > > >>>>>>> well
> > > >>>>>>>>> in
> > > >>>>>>>>>>>>> production
> > > >>>>>>>>>>>>>>>> systems.
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> 2. The current latency being computed is cumulative,
> I.E
> > > >>>>>>> if a
> > > >>>>>>>>>>> record
> > > >>>>>>>>>>>>> goes
> > > >>>>>>>>>>>>>>>> through A -> B -> C, then P(C) = T(B->C) + P(B) =
> > > >>>>>>> T(B->C) +
> > > >>>>>>>>>>> T(A->B) +
> > > >>>>>>>>>>>>>>> T(A)
> > > >>>>>>>>>>>>>>>> and so on, where P() represents the captured latency,
> > > >>>>>>> and T()
> > > >>>>>>>>>>>>> represents
> > > >>>>>>>>>>>>>>>> the time for transiting the records between two nodes,
> > > >>>>>>>>> including
> > > >>>>>>>>>>>>>>> processing
> > > >>>>>>>>>>>>>>>> time. For monitoring purpose, maybe having T(B->C) and
> > > >>>>>>>>> T(A->B)
> > > >>>>>>>>>>> are
> > > >>>>>>>>>>>>> more
> > > >>>>>>>>>>>>>>>> natural to view as "hop-to-hop latency", otherwise if
> > > >>>>>>> there
> > > >>>>>>>>> is a
> > > >>>>>>>>>>>>> spike in
> > > >>>>>>>>>>>>>>>> T(A->B), both P(B) and P(C) are affected in the same
> > > >>>>>>> time.
> > > >>>>>>>>> In the
> > > >>>>>>>>>>>>> same
> > > >>>>>>>>>>>>>>>> spirit, the E2E latency is meaningful only when the
> > > >>>>>>> record
> > > >>>>>>>>> exits
> > > >>>>>>>>>>>>> from the
> > > >>>>>>>>>>>>>>>> sink as this marks the whole time this record spent
> > > >>>>>>> inside
> > > >>>>>>>>> the
> > > >>>>>>>>>>>>> funnel. Do
> > > >>>>>>>>>>>>>>>> you think we could have separate treatment for sink
> > > >>>>>>> nodes and
> > > >>>>>>>>>>> other
> > > >>>>>>>>>>>>>>>> nodes, so that other nodes only count the time
> receiving
> > > >>>>>>> the
> > > >>>>>>>>>>> record
> > > >>>>>>>>>>>>> from
> > > >>>>>>>>>>>>>>>> last hop? I'm not proposing a solution here, just
> want to
> > > >>>>>>>>> discuss
> > > >>>>>>>>>>>>> this
> > > >>>>>>>>>>>>>>>> alternative to see if it is reasonable.
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> 3. As we are going to monitor late arrival records as
> > > >>>>>>> well,
> > > >>>>>>>>> they
> > > >>>>>>>>>>>>> would
> > > >>>>>>>>>>>>>>>> create some really spiky graphs when the out-of-order
> > > >>>>>>>>> records are
> > > >>>>>>>>>>>>>>>> interleaving with on time records. Should we also
> supply
> > > >>>>>>> a
> > > >>>>>>>>> smooth
> > > >>>>>>>>>>>>> version
> > > >>>>>>>>>>>>>>>> of the latency metrics, or user should just take care
> of
> > > >>>>>>> it
> > > >>>>>>>>> by
> > > >>>>>>>>>>>>> themself?
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> 4. Regarding this new metrics, we haven't discussed
> its
> > > >>>>>>>>> relation
> > > >>>>>>>>>>>>> with our
> > > >>>>>>>>>>>>>>>> existing processing latency metrics, could you add
> some
> > > >>>>>>>>> context
> > > >>>>>>>>>>> on
> > > >>>>>>>>>>>>>>>> comparison and a simple `when to use which` tutorial
> for
> > > >>>>>>> the
> > > >>>>>>>>>>> best?
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> Boyang
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> On Tue, May 12, 2020 at 7:28 PM Sophie Blee-Goldman <
> > > >>>>>>>>>>>>> [email protected]
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> Hey all,
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> I'd like to kick off discussion on KIP-613 which aims
> > > >>>>>>> to
> > > >>>>>>>>> add
> > > >>>>>>>>>>>>> end-to-end
> > > >>>>>>>>>>>>>>>>> latency metrics to Streams. Please take a look:
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>
> > > >>>>
> > > >>>
> > > >>
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-613%3A+Add+end-to-end+latency+metrics+to+Streams
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> Cheers,
> > > >>>>>>>>>>>>>>>>> Sophie
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> --
> > > >>>>>>>>>>>>>> -- Guozhang
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>
> > > >>>>>>
> > > >>>>
> > > >>>>
> > > >>>
> > > >>
> > > >
> > >
> > >
> >
>

Re: [DISCUSS] KIP-613: Add end-to-end latency metrics to Streams

Reply via email to