Re: [DISCUSS] KIP-613: Add end-to-end latency metrics to Streams

Sophie Blee-Goldman Fri, 15 May 2020 19:43:24 -0700

Yeah, I had proposed to replace the "source + sink" task-level metric
with "sink only" task-level metrics plus the stateful node-level ones.


I completely agree that this proposal should and would look different
once we complete those two proposals, ie decouple caching and
consolidate suppression. But as it is right now, caching and suppression
cause the actual time-of-processing of records to vary greatly throughout
a subtopology such that the delta between a source node and an intermediate
one is potentially quite large.

For this exact reason I would argue that this metric is still valuable at
the node-level
 even if the "current time" is actually only updated once for a task.
Obviously this
would not convey any information about the actual processing latency, but
as you
stated, this is likely to be a small delta on the scale of milliseconds.
But the latency
due to a suppression can be on the order of minutes, hours, or even days! If
there is more than one suppression or caching in a subtopology, the source
and sink
e2e latency could be wildly misrepresentative of the e2e latency of a node
in the middle.

Anyways, I removed this last point from the current KIP document because I
felt
it was ultimately an implementation detail. But the larger point I would
argue is that:
if we are truly worried about the performance hit of fetching the system
time at every
node, we should just not fetch the system time at every node instead of
dropping the
metric altogether. It may be slightly off but only on the order of ms and
not on the order
of hours, and it's still more useful than no metric at all


On Fri, May 15, 2020 at 6:40 PM Guozhang Wang <[email protected]> wrote:

> Hi Sophie,
>
> I think your motivation makes sense, in fact before KIP-444 we used to have
> processor-node-level process latency as DEBUG which would holistically
> address the motivation but we removed it since it is indeed quite expensive
> (I do not have concrete numbers at hand, but basically you'd have to call
> system-wall-clock time a couple of time for each record, each processor
> node), and in the long run I feel such scenarios would be less common if we
> 1) decouple caching from emitting and 2) always put suppression right
> before sink only, and that's also why I've proposed to remove them in
> KIP-444.
>
> I missed an update in your wiki that right now you've updated the
> task-level metric to be recorded at the sink node, and thought it is still
> at the source node, but assuming it is still the case, then practically any
> state store's staleness of the task could roughly be measured as that
> task-level staleness plus a small delta (usually tens of mills), and the
> down stream task B's staleness - up stream task A's staleness would capture
> the suppression effect at the end of A plus the intermediate topic
> producing -> consuming latency, which usually would not require to be
> distinguished.
>
>
> Guozhang
>
>
> On Fri, May 15, 2020 at 6:09 PM Sophie Blee-Goldman <[email protected]>
> wrote:
>
> > Hey Guozhang,
> >
> > Thanks for the response. I meant to ask this earlier and forgot, but do
> you
> > have any data or benchmarking results on hand for the performance hit we
> > took due to the metrics? Or maybe better yet, on the performance we
> gained
> > back by dropping some metrics? I think it would help us to better
> > understand
> > the tradeoffs with some concrete numbers to compare.
> >
> > I guess I'm inclined to want to include these metrics at the operator
> level
> > on
> > DEBUG since, in my mind, that's exactly what DEBUG is for: performance-
> > focused users will presumably run at INFO level and even those who do
> > use DEBUG may only do so temporarily, when they actually have something
> > to debug.
> >
> > Of course, I'm not running a Streams app myself so all this is just
> > conjecture.
> > But it seems reasonable to interpret the two metrics levels as "minimal
> > useful
> > information" and "all information that provides insight into the system"
> >
> > If we can agree there, the question becomes "does this provide any
> > additional
> > insight?" I would still argue that it does. To give a concrete example,
> > let's say
> > the user has a basic
> >
> > SOURCE -> TRANSFORM -> AGGREGATE -> SUPPRESSION -> SINK
> >
> > subtopology, the application performing IQ on the transformer's state.
> The
> > user
> > wants to know how long it's taking records to be reflected in the results
> > of a query.
> > With only task-level metrics, they can only guess based on the e2e
> latency
> > of the
> > previous task and of this one. But neither of those is a good
> > representation of the
> > time it will take a record to be processed by the transformer: the
> previous
> > task's
> > latency will not include the time for the record to go through the
> > repartitioning,
> > and the current task's latency is measured after a suppression. The
> > suppression
> > itself will introduce significant latency and doesn't provide an accurate
> > estimate.
> > Caching would also do what it does best, messing with our understanding
> of
> > time :)
> >
> > It also means you can't measure the consumption latency, so there would
> be
> > no way to tell if the e2e latency is so high because the very first
> > subtopology is
> > extremely slow, or the record was severely delayed in reaching the input
> > topic.
> > I guess my earlier proposal to include both source and sink metrics could
> > address this issue, but it seems to solve the problem more holistically
> --
> > and for
> > more complex subtopologies -- by including metrics at the processor level
> >
> > On Fri, May 15, 2020 at 5:44 PM Guozhang Wang <[email protected]>
> wrote:
> >
> > > Hey folks,
> > >
> > > I want to make a final fight on processor-node-level metrics :) Post
> > > KIP-444 we've actually removed a lot of node-level metrics since it is
> > too
> > > expensive to record at that level and their values are not proven worth
> > it.
> > > Again I'd use the "we have to either enable all or none of DEBUG
> metrics"
> > > card, e.g. if we want to look into per-task process-latency which is
> > DEBUG,
> > > we'd have to record all process-node / state-store / cache level
> metrics.
> > > On the other hand, within a task we usually do not have a lot of
> stateful
> > > operator nodes and if we want to find if certain stateful nodes are the
> > > bottleneck of process latency, that is usually discoverable from the
> > > state-store level read / write latencies already. So I'd imagine that
> > > suppose there's a sub-topology with multiple state stores in it, if we
> > > found the task-level process latency is high, we can still tell which
> > state
> > > stores within it is the main bottleneck from the state store read /
> write
> > > latency (I'm assuming the IO latency is always dominant compared with
> > > others).
> > >
> > > Guozhang
> > >
> > >
> > >
> > >
> > > On Fri, May 15, 2020 at 3:22 PM Sophie Blee-Goldman <
> [email protected]
> > >
> > > wrote:
> > >
> > > > > @Matthias
> > > > > And that's what he says about the 95th percentile! Imagine what he
> > > would
> > > > say about the 50th :P
> > > >
> > > > I should have kept watching another 10 minutes. He gets around to
> > > covering
> > > > the 50th, and let's
> > > > just say he is not a fan: https://youtu.be/lJ8ydIuPFeU?t=786
> > > >
> > > > On Fri, May 15, 2020 at 3:15 PM John Roesler <[email protected]>
> > > wrote:
> > > >
> > > > > Thanks, Sophie!
> > > > >
> > > > > I think this makes perfect sense. It will be much more intuitive to
> > use
> > > > > the metric for the stated motivation this way.  I’d be in favor of
> > the
> > > > > proposal after this update.
> > > > >
> > > > > Thanks again for taking this on,
> > > > > -John
> > > > >
> > > > > On Fri, May 15, 2020, at 17:07, Sophie Blee-Goldman wrote:
> > > > > > Hi all,
> > > > > >
> > > > > > I'd like to clarify/modify one aspect of this KIP, which is to
> > record
> > > > the
> > > > > > staleness
> > > > > > at the *completion* of the record's processing by the
> operator/task
> > > in
> > > > > > question,
> > > > > > rather than on the intake. The task-level metrics will be
> recorded
> > at
> > > > the
> > > > > > sink
> > > > > > node instead of at the source, and the operator-level metrics
> will
> > be
> > > > > > recorded
> > > > > > at the end of the operation.
> > > > > >
> > > > > > The stated purpose and intended usefulness of this KIP is to give
> > > > users a
> > > > > > way
> > > > > > to gauge roughly how long it takes for a record to be reflected
> in
> > > the
> > > > > > "results",
> > > > > > whether these results are being read from an output topic or
> > through
> > > > IQ.
> > > > > To
> > > > > > take the IQ example, the results of a record are obviously not
> > > visible
> > > > > until
> > > > > > *after* that node has finished processing it. The staleness,
> while
> > > > still
> > > > > > potentially
> > > > > > useful as it can impact the *way* a record is processed in a
> > stateful
> > > > and
> > > > > > time-
> > > > > > dependent operator, is not part of the problem this KIP
> > specifically
> > > > set
> > > > > out
> > > > > > to solve.
> > > > > >
> > > > > > In light of this I think it's appropriate to revert the name
> change
> > > > back
> > > > > to
> > > > > > include
> > > > > > latency, since "staleness" as described above only makes sense
> when
> > > > > > measuring
> > > > > > relative to the arrival of a record at a task/node. I'd propose
> to
> > > save
> > > > > the
> > > > > > term
> > > > > > "staleness" for that particular meaning, and adopt Matthias's
> > > > suggestion
> > > > > of
> > > > > > "record-e2e-latency" for this.
> > > > > >
> > > > > > Thanks for hanging in there all. Please let me know if you have
> any
> > > > > > concerns about
> > > > > > this change!
> > > > > >
> > > > > > Sophie
> > > > > >
> > > > > > On Fri, May 15, 2020 at 2:25 PM Matthias J. Sax <
> [email protected]>
> > > > > wrote:
> > > > > >
> > > > > > > I am also happy with max/min/99/90. And I buy your naming
> > argument
> > > > > about
> > > > > > > staleness vs latency.
> > > > > > >
> > > > > > >
> > > > > > > -Matthias
> > > > > > >
> > > > > > > On 5/15/20 12:24 PM, Boyang Chen wrote:
> > > > > > > > Hey Sophie,
> > > > > > > >
> > > > > > > > 90/99/min/max make sense to me.
> > > > > > > >
> > > > > > > > On Fri, May 15, 2020 at 12:20 PM Sophie Blee-Goldman <
> > > > > > > [email protected]>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > >> @Matthias
> > > > > > > >> Regarding tracking the 50th percentile, I'll refer you to
> the
> > > 4:53
> > > > > mark
> > > > > > > of
> > > > > > > >> the video
> > > > > > > >> *you* linked: https://youtu.be/lJ8ydIuPFeU?t=293
> > > > > > > >>
> > > > > > > >> And that's what he says about the 95th percentile! Imagine
> > what
> > > he
> > > > > would
> > > > > > > >> say about
> > > > > > > >> the 50th :P
> > > > > > > >>
> > > > > > > >> But seriously, since we can't seem to agree that the mean or
> > > 50th
> > > > > > > >> percentile is actually
> > > > > > > >> useful I'm inclined to resurrect my original proposal,
> > neither.
> > > > But
> > > > > I
> > > > > > > think
> > > > > > > >> that's a good
> > > > > > > >> argument against the 75th, which I admittedly chose somewhat
> > > > > > > arbitrarily as
> > > > > > > >> an
> > > > > > > >> intermediate between the 50th and the higher percentiles.
> How
> > > > about:
> > > > > > > >>
> > > > > > > >> -max
> > > > > > > >> -p99
> > > > > > > >> -p90
> > > > > > > >> -min
> > > > > > > >>
> > > > > > > >> with p50/mean still up for debate if anyone feels strongly
> for
> > > > > either of
> > > > > > > >> them.
> > > > > > > >>
> > > > > > > >> Regarding the name, I've already flip-flopped on this so I'm
> > > > > definitely
> > > > > > > >> still open to
> > > > > > > >> further arguments. But the reason for changing it from
> > > > > > > end-to-end-latency
> > > > > > > >> (which
> > > > > > > >> is similar to what you propose) is that this metric
> > technically
> > > > > reflects
> > > > > > > >> how old (ie how "stale")
> > > > > > > >> the record is when it's *received* by the operator, not when
> > > it's
> > > > > > > processed
> > > > > > > >> by the operator.
> > > > > > > >> It seemed like there was the potential for confusion that
> > > > > > > >> "end-to-end-latency" might
> > > > > > > >> represent the latency from the event creation to the time
> the
> > > > > processor
> > > > > > > is
> > > > > > > >> done
> > > > > > > >> processing it.
> > > > > > > >>
> > > > > > > >> @John
> > > > > > > >> I'd rather err on the side of "not-enough" metrics as we can
> > > > always
> > > > > add
> > > > > > > >> this to the
> > > > > > > >> stateless metrics later on. If we decide to measure the time
> > at
> > > > > every
> > > > > > > node
> > > > > > > >> and don't
> > > > > > > >> find any evidence of a serious performance impact, and users
> > > > > indicate
> > > > > > > they
> > > > > > > >> would
> > > > > > > >> like to see this metric at all nodes, then we can easily
> start
> > > > > reporting
> > > > > > > >> them as well.
> > > > > > > >> WDYT?
> > > > > > > >>
> > > > > > > >> That said, sink nodes seem like a reasonable exception to
> the
> > > > rule.
> > > > > > > >> Obviously users
> > > > > > > >> should be able to detect the time when the record reaches
> the
> > > > output
> > > > > > > topic
> > > > > > > >> but that
> > > > > > > >> still leaves a gap in understanding how long the production
> > > > latency
> > > > > was.
> > > > > > > >> This mirrors
> > > > > > > >> the consumption latency that is exposed by the task-level
> > > metrics,
> > > > > which
> > > > > > > >> are measured
> > > > > > > >> at the source node. For good symmetry what if we actually
> > expose
> > > > > both
> > > > > > > the
> > > > > > > >> source
> > > > > > > >> and sink latency at the task-level? ie report both sets of
> > > > > statistical
> > > > > > > >> measurements with
> > > > > > > >> the additional tag -source/-sink
> > > > > > > >>
> > > > > > > >> @Bill
> > > > > > > >> Thanks for the comment about regarding the min! I hadn't
> > > > considered
> > > > > that
> > > > > > > >> and it's
> > > > > > > >> quite useful to think about how and what is useful from a
> > users
> > > > > point of
> > > > > > > >> view.
> > > > > > > >>
> > > > > > > >> Regarding your second. point, I'm inclined to leave that as
> an
> > > > > > > >> implementation detail
> > > > > > > >> but my take would be that the user should be allowed to
> > control
> > > > the
> > > > > > > record
> > > > > > > >> timestamp
> > > > > > > >> used for this with the timestamp extractor. My impression is
> > > that
> > > > > users
> > > > > > > may
> > > > > > > >> often embed
> > > > > > > >> the actual event time in the payload for whatever reason,
> and
> > > this
> > > > > > > >> represents the "true"
> > > > > > > >> timestamp as far as the Streams topology is concerned.
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> On Fri, May 15, 2020 at 11:05 AM Bill Bejeck <
> > [email protected]
> > > >
> > > > > wrote:
> > > > > > > >>
> > > > > > > >>> Thanks for the KIP, Sophie, this will be a useful metric to
> > > add.
> > > > > > > >>>
> > > > > > > >>> Regarding tracking min, I  think it could be valuable for
> > users
> > > > to
> > > > > > > >> discern
> > > > > > > >>> which part of their topologies are more efficient since
> this
> > > is a
> > > > > > > >>> task-level metric.  I realize everyone seems to be on board
> > > with
> > > > > > > >> including
> > > > > > > >>> min anyway, but I wanted to add my 2 cents on this topic
> > should
> > > > we
> > > > > > > decide
> > > > > > > >>> to revisit adding min or not.
> > > > > > > >>>
> > > > > > > >>> I do have a question regarding the calculation of
> staleness.
> > > > > > > >>> Is there going to be a consideration for timestamp
> > extractors?
> > > > > Users
> > > > > > > >> could
> > > > > > > >>> prefer to use a timestamp embedded in the payload, and it
> > could
> > > > > skew
> > > > > > > the
> > > > > > > >>> measurements.
> > > > > > > >>> I was wondering if we should specify in the KIP if setting
> > the
> > > > > arrival
> > > > > > > >> time
> > > > > > > >>> is always going to come from the record timestamp, or is
> this
> > > an
> > > > > > > >>> implementation detail we can cover in the PR?
> > > > > > > >>>
> > > > > > > >>> Thanks!
> > > > > > > >>> Bill
> > > > > > > >>>
> > > > > > > >>> On Fri, May 15, 2020 at 1:11 AM Matthias J. Sax <
> > > > [email protected]>
> > > > > > > >> wrote:
> > > > > > > >>>
> > > > > > > >>>> Thanks for the KIP Sophie.
> > > > > > > >>>>
> > > > > > > >>>> I think it's not useful to record the avg/mean; it
> sensitive
> > > to
> > > > > > > >>>> outliers. We should rather track the median (50th
> > percentile).
> > > > > > > >>>>
> > > > > > > >>>> Not sure if tracking min is useful, but I am also ok to
> > track
> > > > it.
> > > > > > > >>>>
> > > > > > > >>>> However, I find it odd to track 75th percentile. Standard
> > > > measures
> > > > > > > >> would
> > > > > > > >>>> the 90th or 95th -- I guess we don't need both, so maybe
> > > picking
> > > > > 90th
> > > > > > > >>>> might be more useful?
> > > > > > > >>>>
> > > > > > > >>>> About the name: "staleness" wound really odd, and if fact
> > the
> > > > > metric
> > > > > > > >>>> does capture "latency" so we should call it "latency". I
> > > > > understand
> > > > > > > the
> > > > > > > >>>> issue that we already have a latency metric. So maybe we
> > could
> > > > > call it
> > > > > > > >>>> `record-e2e-latency-*` ?
> > > > > > > >>>>
> > > > > > > >>>> While I agree that we should include out-or-order data
> (the
> > > KIP
> > > > > should
> > > > > > > >>>> talk about `out-of-order` data, not `late` data; data is
> > only
> > > > > `late`
> > > > > > > if
> > > > > > > >>>> it's out-of-order and if it's dropped), I don't really
> > > > understand
> > > > > why
> > > > > > > >>>> the new metric would help to configure grace period or
> > > retention
> > > > > time?
> > > > > > > >>>> As you mention in the KIP, both are define as max
> difference
> > > of
> > > > > > > >>>> `event-time - stream-time` and thus the new metric that
> > takes
> > > > > > > >>>> system-/wallclock-time into account does not seem to help
> at
> > > > all.
> > > > > > > >>>>
> > > > > > > >>>>
> > > > > > > >>>> Btw: there is a great talk about "How NOT to Measure
> > Latency"
> > > by
> > > > > Gil
> > > > > > > >>>> Tene: https://www.youtube.com/watch?v=lJ8ydIuPFeU
> > > > > > > >>>>
> > > > > > > >>>>
> > > > > > > >>>> -Matthias
> > > > > > > >>>>
> > > > > > > >>>>
> > > > > > > >>>> On 5/14/20 7:17 PM, John Roesler wrote:
> > > > > > > >>>>> Hi Sophie,
> > > > > > > >>>>>
> > > > > > > >>>>> It seems like there would still be plenty of use cases
> for
> > > > > recording
> > > > > > > >>>>> this metric at all processors and not just stateful ones,
> > but
> > > > I'm
> > > > > > > >> happy
> > > > > > > >>>>> to suspend my arguments for now. Since you're proposing
> to
> > > keep
> > > > > > > >>>>> them at the processor-node level, it will be seamless
> later
> > > to
> > > > > add
> > > > > > > >>>>> in the stateless processors if we want. As a wise man
> once
> > > > said,
> > > > > > > >>>>> "Adding is always easier than removing."
> > > > > > > >>>>>
> > > > > > > >>>>> Regarding the time measurement, it's an implementation
> > detail
> > > > > > > >>>>> we don't need to consider in the KIP. Nevertheless, I'd
> > > greatly
> > > > > > > >>>>> prefer to measure the system time again when recording
> the
> > > > > > > >>>>> metric. I don't think we've seen any evidence that proves
> > > this
> > > > > > > >>>>> would harm performance, and the amount of inaccuracy
> using
> > > > > > > >>>>> the cached system time could incur is actually
> substantial.
> > > > But,
> > > > > > > >>>>> if you want to just "not mention this" in the KIP, we can
> > > defer
> > > > > to
> > > > > > > >>>>> the actual PR discussion, at which time we're in a better
> > > > > position
> > > > > > > >>>>> to use benchmarks, etc., to make the call.
> > > > > > > >>>>>
> > > > > > > >>>>> Along the lines of the measurement accuracy discussion,
> one
> > > > > > > >>>>> minor thought I had is that maybe we should consider
> > > measuring
> > > > > > > >>>>> the task staleness metric at the sink, rather than the
> > > source,
> > > > so
> > > > > > > >> that
> > > > > > > >>>>> it includes the processing latency of the task itself,
> not
> > > just
> > > > > the
> > > > > > > >>>> latency
> > > > > > > >>>>> of everything up to, but not including, the task (which
> > seems
> > > > > > > >> confusing
> > > > > > > >>>>> for users). I guess this could also be an implementation
> > > > detail,
> > > > > > > >>> though.
> > > > > > > >>>>>
> > > > > > > >>>>> Thanks for the update,
> > > > > > > >>>>> -John
> > > > > > > >>>>>
> > > > > > > >>>>> On Thu, May 14, 2020, at 13:31, Sophie Blee-Goldman
> wrote:
> > > > > > > >>>>>> Hey all,
> > > > > > > >>>>>>
> > > > > > > >>>>>> After discussing with Bruno I'd like to propose a small
> > > > > amendment,
> > > > > > > >>>>>> which is to record the processor-node-level metrics only
> > for
> > > > > > > >>> *stateful*
> > > > > > > >>>>>> *operators*. They would still be considered a
> > > > > "processor-node-level"
> > > > > > > >>>>>> metric and not a "state-store-level" metric as the
> > staleness
> > > > is
> > > > > > > >> still
> > > > > > > >>>>>> a property of the node rather than of the state itself.
> > > > > However, it
> > > > > > > >>>> seems
> > > > > > > >>>>>> that this information is primarily useful for stateful
> > > > operators
> > > > > > > >> that
> > > > > > > >>>> might
> > > > > > > >>>>>> be exposing state via IQ or otherwise dependent on the
> > > record
> > > > > time
> > > > > > > >>>>>> unlike a stateless operator.
> > > > > > > >>>>>>
> > > > > > > >>>>>> It's worth calling out that recent performance
> > improvements
> > > to
> > > > > the
> > > > > > > >>>> metrics
> > > > > > > >>>>>> framework mean that we no longer fetch the system time
> at
> > > the
> > > > > > > >> operator
> > > > > > > >>>>>> level, but only once per task. In other words the system
> > > time
> > > > > is not
> > > > > > > >>>> updated
> > > > > > > >>>>>> between each process as a record flows through the
> > > > subtopology,
> > > > > so
> > > > > > > >>>>>> debugging the processor-level latency via the
> stateleness
> > > will
> > > > > not
> > > > > > > >> be
> > > > > > > >>>>>> possible.Note that this doesn't mean the operator-level
> > > > metrics
> > > > > are
> > > > > > > >>> not
> > > > > > > >>>>>> *useful* relative to the task-level metric. Upstream
> > caching
> > > > > and/or
> > > > > > > >>>>>> suppression
> > > > > > > >>>>>> can still cause a record's staleness at some downstream
> > > > stateful
> > > > > > > >>>> operator
> > > > > > > >>>>>> to deviate from the task-level staleness (recorded at
> the
> > > > source
> > > > > > > >>> node).
> > > > > > > >>>>>>
> > > > > > > >>>>>> Please let me know if you have any concerns about this
> > > change.
> > > > > The
> > > > > > > >>>>>> KIP has been updated with the new proposal
> > > > > > > >>>>>>
> > > > > > > >>>>>> On Thu, May 14, 2020 at 3:04 AM Bruno Cadonna <
> > > > > [email protected]>
> > > > > > > >>>> wrote:
> > > > > > > >>>>>>
> > > > > > > >>>>>>> Hi Sophie,
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> Thank you for the KIP.
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> The KIP looks good to me.
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> 50th percentile:
> > > > > > > >>>>>>> I think we do not need it now. If we need it, we can
> add
> > > it.
> > > > > Here
> > > > > > > >> the
> > > > > > > >>>>>>> old truism applies: Adding is always easier than
> > removing.
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> processor-node-level metrics:
> > > > > > > >>>>>>> I think it is good to have the staleness metrics also
> on
> > > > > > > >>>>>>> processor-node-level. If we do not want to record them
> on
> > > all
> > > > > > > >>>>>>> processor nodes, you could restrict the recording to
> > > stateful
> > > > > > > >>>>>>> processor-nodes, since those are the ones that would
> > > benefit
> > > > > most
> > > > > > > >>> from
> > > > > > > >>>>>>> the staleness metrics.
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> Best,
> > > > > > > >>>>>>> Bruno
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> On Thu, May 14, 2020 at 4:15 AM Sophie Blee-Goldman <
> > > > > > > >>>> [email protected]>
> > > > > > > >>>>>>> wrote:
> > > > > > > >>>>>>>>
> > > > > > > >>>>>>>> Yeah, the specific reason was just to align with the
> > > current
> > > > > > > >>> metrics.
> > > > > > > >>>>>>>>
> > > > > > > >>>>>>>> Is it better to conform than to be right? History has
> a
> > > lot
> > > > > to say
> > > > > > > >>> on
> > > > > > > >>>>>>> that
> > > > > > > >>>>>>>> matter
> > > > > > > >>>>>>>> but I'm not sure how much of it applies to the fine
> > > details
> > > > of
> > > > > > > >>> metrics
> > > > > > > >>>>>>>> naming :P
> > > > > > > >>>>>>>>
> > > > > > > >>>>>>>> More seriously, I figured if people are looking at
> this
> > > > metric
> > > > > > > >>> they're
> > > > > > > >>>>>>>> likely to
> > > > > > > >>>>>>>> be looking at all the others. Then naming this one
> > "-mean"
> > > > > would
> > > > > > > >>>> probably
> > > > > > > >>>>>>>> lead some to conclude that the "-avg" suffix in the
> > other
> > > > > metrics
> > > > > > > >>> has
> > > > > > > >>>> a
> > > > > > > >>>>>>>> different meaning.
> > > > > > > >>>>>>>>
> > > > > > > >>>>>>>> As for the percentiles, I actually like p99 (and p75)
> > > > better.
> > > > > I'll
> > > > > > > >>>> swap
> > > > > > > >>>>>>>> that out
> > > > > > > >>>>>>>>
> > > > > > > >>>>>>>> On Wed, May 13, 2020 at 7:07 PM John Roesler <
> > > > > [email protected]
> > > > > > > >>>
> > > > > > > >>>>>>> wrote:
> > > > > > > >>>>>>>>
> > > > > > > >>>>>>>>> Thanks Sophie,
> > > > > > > >>>>>>>>>
> > > > > > > >>>>>>>>> I hope this isn't too nit-picky, but is there a
> reason
> > to
> > > > > choose
> > > > > > > >>>> "avg"
> > > > > > > >>>>>>>>> instead
> > > > > > > >>>>>>>>> of "mean"? Maybe this is too paranoid, and I might be
> > > > > > > >> oversensitive
> > > > > > > >>>>>>> because
> > > > > > > >>>>>>>>> of the mistake I just made earlier, but it strikes me
> > > that
> > > > > "avg"
> > > > > > > >> is
> > > > > > > >>>>>>>>> actually
> > > > > > > >>>>>>>>> ambiguous, as it refers to a family of statistics,
> > > whereas
> > > > > "mean"
> > > > > > > >>> is
> > > > > > > >>>>>>>>> specific.
> > > > > > > >>>>>>>>> I see other Kafka metrics with "avg", but none with
> > > "mean";
> > > > > was
> > > > > > > >>> that
> > > > > > > >>>>>>> the
> > > > > > > >>>>>>>>> reason? If so, I'm +1.
> > > > > > > >>>>>>>>>
> > > > > > > >>>>>>>>> Regarding the names of the percentile, I actually
> > > couldn't
> > > > > find
> > > > > > > >>> _any_
> > > > > > > >>>>>>> other
> > > > > > > >>>>>>>>> metrics that use percentile. Was there a reason to
> > choose
> > > > > "99th"
> > > > > > > >> as
> > > > > > > >>>>>>> opposed
> > > > > > > >>>>>>>>> to "p99" or any other scheme? This is not a
> criticism,
> > > I'm
> > > > > just
> > > > > > > >>>>>>> primarily
> > > > > > > >>>>>>>>> asking
> > > > > > > >>>>>>>>> for consistency's sake.
> > > > > > > >>>>>>>>>
> > > > > > > >>>>>>>>> Thanks again,
> > > > > > > >>>>>>>>> -John
> > > > > > > >>>>>>>>>
> > > > > > > >>>>>>>>> On Wed, May 13, 2020, at 19:19, Sophie Blee-Goldman
> > > wrote:
> > > > > > > >>>>>>>>>> Alright, I can get behind adding the min metric for
> > the
> > > > > sake of
> > > > > > > >>>>>>> pretty
> > > > > > > >>>>>>>>>> graphs
> > > > > > > >>>>>>>>>> (and trivial computation).
> > > > > > > >>>>>>>>>>
> > > > > > > >>>>>>>>>> I'm still on the fence regarding the mean (or 50th
> > > > > percentile)
> > > > > > > >>> but I
> > > > > > > >>>>>>> can
> > > > > > > >>>>>>>>> see
> > > > > > > >>>>>>>>>> how users might expect it and find it a bit
> > disorienting
> > > > > not to
> > > > > > > >>>>>>> have. So
> > > > > > > >>>>>>>>> the
> > > > > > > >>>>>>>>>> updated proposed metrics are
> > > > > > > >>>>>>>>>>
> > > > > > > >>>>>>>>>>
> > > > > > > >>>>>>>>>>    - record-staleness-max [ms]
> > > > > > > >>>>>>>>>>    - record-staleness-99th [ms] *(99th percentile)*
> > > > > > > >>>>>>>>>>    - record-staleness-75th [ms] *(75th percentile)*
> > > > > > > >>>>>>>>>>    - record-staleness-avg [ms] *(mean)*
> > > > > > > >>>>>>>>>>    - record-staleness-min [ms]
> > > > > > > >>>>>>>>>>
> > > > > > > >>>>>>>>>>
> > > > > > > >>>>>>>>>> On Wed, May 13, 2020 at 4:42 PM John Roesler <
> > > > > > > >> [email protected]
> > > > > > > >>>>
> > > > > > > >>>>>>>>> wrote:
> > > > > > > >>>>>>>>>>
> > > > > > > >>>>>>>>>>> Oh boy, I never miss an opportunity to embarrass
> > > myself.
> > > > I
> > > > > > > >> guess
> > > > > > > >>>>>>> the
> > > > > > > >>>>>>>>> mean
> > > > > > > >>>>>>>>>>> seems more interesting to me than the median, but
> > > neither
> > > > > are
> > > > > > > >> as
> > > > > > > >>>>>>>>>>> interesting as the higher percentiles (99th and
> max).
> > > > > > > >>>>>>>>>>>
> > > > > > > >>>>>>>>>>> Min isn’t really important for any SLAs, but it
> does
> > > > round
> > > > > out
> > > > > > > >>> the
> > > > > > > >>>>>>>>> mental
> > > > > > > >>>>>>>>>>> picture of the distribution. I’ve always graphed
> min
> > > > along
> > > > > with
> > > > > > > >>> the
> > > > > > > >>>>>>>>> other
> > > > > > > >>>>>>>>>>> metrics to help me understand how fast the system
> can
> > > be,
> > > > > which
> > > > > > > >>>>>>> helps
> > > > > > > >>>>>>>>> in
> > > > > > > >>>>>>>>>>> optimization decisions. It’s also a relatively
> > > > inexpensive
> > > > > > > >> metric
> > > > > > > >>>>>>> to
> > > > > > > >>>>>>>>>>> compute, so it might be nice to just throw it in.
> > > > > > > >>>>>>>>>>>
> > > > > > > >>>>>>>>>>> On Wed, May 13, 2020, at 18:18, Sophie Blee-Goldman
> > > > wrote:
> > > > > > > >>>>>>>>>>>> G1:
> > > > > > > >>>>>>>>>>>> I was considering it as the "end-to-end latency
> *up*
> > > to
> > > > > the
> > > > > > > >>>>>>> specific
> > > > > > > >>>>>>>>>>> task"
> > > > > > > >>>>>>>>>>>> but
> > > > > > > >>>>>>>>>>>> I'm happy with "record-staleness" if that drives
> the
> > > > point
> > > > > > > >> home
> > > > > > > >>>>>>>>> better.
> > > > > > > >>>>>>>>>>> So
> > > > > > > >>>>>>>>>>>> it's the
> > > > > > > >>>>>>>>>>>> "staleness of the record when it is received by
> that
> > > > > task" --
> > > > > > > >>>>>>> will
> > > > > > > >>>>>>>>> update
> > > > > > > >>>>>>>>>>>> the KIP
> > > > > > > >>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>> B1/J:
> > > > > > > >>>>>>>>>>>> I'm struggling to imagine a case where the min
> would
> > > > > actually
> > > > > > > >> be
> > > > > > > >>>>>>>>> useful,
> > > > > > > >>>>>>>>>>>> rather than
> > > > > > > >>>>>>>>>>>> just intellectually interesting. I don't feel
> > strongly
> > > > > that we
> > > > > > > >>>>>>>>> shouldn't
> > > > > > > >>>>>>>>>>>> add it, but that's
> > > > > > > >>>>>>>>>>>> why I didn't include it from the start. Can you
> > > > enlighten
> > > > > me
> > > > > > > >>>>>>> with an
> > > > > > > >>>>>>>>>>>> example?
> > > > > > > >>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>> I was also vaguely concerned about the overhead of
> > > > adding
> > > > > > > >>>>>>> multiple
> > > > > > > >>>>>>>>>>>> percentile
> > > > > > > >>>>>>>>>>>> metrics. Do we have any data to indicate what kind
> > of
> > > > > > > >>> performance
> > > > > > > >>>>>>>>> hit we
> > > > > > > >>>>>>>>>>>> take on
> > > > > > > >>>>>>>>>>>> metrics computation?
> > > > > > > >>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>> Also, not to be too pedantic but the 50th
> percentile
> > > > > would be
> > > > > > > >>> the
> > > > > > > >>>>>>>>> median
> > > > > > > >>>>>>>>>>>> not the
> > > > > > > >>>>>>>>>>>> mean. Would you propose to add the mean *and* the
> > 50th
> > > > > > > >>>>>>> percentile, or
> > > > > > > >>>>>>>>>>> just
> > > > > > > >>>>>>>>>>>> one
> > > > > > > >>>>>>>>>>>> of the two?
> > > > > > > >>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>> Thanks all!
> > > > > > > >>>>>>>>>>>> Sophie
> > > > > > > >>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>> On Wed, May 13, 2020 at 3:34 PM John Roesler <
> > > > > > > >>>>>>> [email protected]>
> > > > > > > >>>>>>>>>>> wrote:
> > > > > > > >>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>> Hello all, and thanks for the KIP, Sophie,
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>> Just some comments on the discussion so far:
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>> B2/G1:
> > > > > > > >>>>>>>>>>>>> In principle, it shouldn't matter whether we
> report
> > > > > "spans"
> > > > > > > >> or
> > > > > > > >>>>>>>>>>>>> "end-to-end" latency. But in practice, some of
> the
> > > > spans
> > > > > are
> > > > > > > >>>>>>> pretty
> > > > > > > >>>>>>>>>>>>> difficult to really measure (like time spent
> > waiting
> > > in
> > > > > the
> > > > > > > >>>>>>>>> topics, or
> > > > > > > >>>>>>>>>>>>> time from the event happening to the ETL producer
> > > > > choosing to
> > > > > > > >>>>>>> send
> > > > > > > >>>>>>>>> it,
> > > > > > > >>>>>>>>>>>>> or time spent in send/receive buffers, etc., etc.
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>> In other words, it's practically easier to
> compute
> > > > spans
> > > > > by
> > > > > > > >>>>>>>>> subtracting
> > > > > > > >>>>>>>>>>>>> e2e latencies than it is to compute e2e latencies
> > by
> > > > > adding
> > > > > > > >>>>>>> spans.
> > > > > > > >>>>>>>>> You
> > > > > > > >>>>>>>>>>>>> can even just consider that the span computation
> > from
> > > > e2e
> > > > > > > >>>>>>> always
> > > > > > > >>>>>>>>> just
> > > > > > > >>>>>>>>>>>>> involves subtracting two numbers, whereas
> computing
> > > e2e
> > > > > > > >> latency
> > > > > > > >>>>>>>>> from
> > > > > > > >>>>>>>>>>>>> spans involves adding _all_ the spans leading up
> to
> > > the
> > > > > end
> > > > > > > >> you
> > > > > > > >>>>>>>>> care
> > > > > > > >>>>>>>>>>> about.
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>> It seems like people really prefer to have spans
> > when
> > > > > they
> > > > > > > >> are
> > > > > > > >>>>>>>>>>> debugging
> > > > > > > >>>>>>>>>>>>> latency problems, whereas e2e latency is a more
> > > general
> > > > > > > >>>>>>> measurement
> > > > > > > >>>>>>>>>>>>> that basically every person/application cares
> about
> > > and
> > > > > > > >> should
> > > > > > > >>>>>>> be
> > > > > > > >>>>>>>>>>>>> monitoring.
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>> Altogether, it really seem to provide more value
> to
> > > > more
> > > > > > > >>>>>>> people if
> > > > > > > >>>>>>>>> we
> > > > > > > >>>>>>>>>>>>> report
> > > > > > > >>>>>>>>>>>>> e2e latencies. Regarding "record-staleness" as a
> > > name,
> > > > I
> > > > > > > >> think
> > > > > > > >>>>>>> I
> > > > > > > >>>>>>>>> have
> > > > > > > >>>>>>>>>>> no
> > > > > > > >>>>>>>>>>>>> preference, I'd defer to other peoples'
> intuition.
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>> G2:
> > > > > > > >>>>>>>>>>>>> I think the processor-node metric is nice, since
> > the
> > > > > inside
> > > > > > > >> of
> > > > > > > >>>>>>> a
> > > > > > > >>>>>>>>> task
> > > > > > > >>>>>>>>>>> can
> > > > > > > >>>>>>>>>>>>> introduce a significant amount of latency in some
> > > > cases.
> > > > > > > >> Plus,
> > > > > > > >>>>>>>>> it's a
> > > > > > > >>>>>>>>>>> more
> > > > > > > >>>>>>>>>>>>> direct measurement, if you really wanted to know
> > (for
> > > > the
> > > > > > > >>>>>>> purposes
> > > > > > > >>>>>>>>> of
> > > > > > > >>>>>>>>>>> IQ
> > > > > > > >>>>>>>>>>>>> or something) how long it takes source events to
> > > "show
> > > > > up" at
> > > > > > > >>>>>>> the
> > > > > > > >>>>>>>>>>> store.
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>> I think actually recording it at every processor
> > > could
> > > > be
> > > > > > > >>>>>>>>> expensive,
> > > > > > > >>>>>>>>>>> but we
> > > > > > > >>>>>>>>>>>>> already record a bunch of metrics at the node
> > level.
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>> B1:
> > > > > > > >>>>>>>>>>>>> I think 50% could be reasonable to record also.
> > Even
> > > if
> > > > > it's
> > > > > > > >> a
> > > > > > > >>>>>>> poor
> > > > > > > >>>>>>>>>>> metric
> > > > > > > >>>>>>>>>>>>> for operational purposes, a lot of people might
> > > expect
> > > > > to see
> > > > > > > >>>>>>>>> "mean".
> > > > > > > >>>>>>>>>>>>> Actually,
> > > > > > > >>>>>>>>>>>>> I was surprised not to see "min". Is there a
> reason
> > > to
> > > > > leave
> > > > > > > >> it
> > > > > > > >>>>>>>>> off?
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>> I might suggest:
> > > > > > > >>>>>>>>>>>>> min, mean (50th), 75th, 99th, max
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>> B3:
> > > > > > > >>>>>>>>>>>>> I agree we should include late records (though
> not
> > > the
> > > > > ones
> > > > > > > >> we
> > > > > > > >>>>>>>>> drop).
> > > > > > > >>>>>>>>>>>>> It may be spiky, but only when there are
> > legitimately
> > > > > some
> > > > > > > >>>>>>> records
> > > > > > > >>>>>>>>>>> with a
> > > > > > > >>>>>>>>>>>>> high end-to-end latency, which is the whole point
> > of
> > > > > these
> > > > > > > >>>>>>> metrics.
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>> That's it! I don't think I have any other
> feedback,
> > > > other
> > > > > > > >> than
> > > > > > > >>>>>>> a
> > > > > > > >>>>>>>>>>> request to
> > > > > > > >>>>>>>>>>>>> also report "min".
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>> Thanks,
> > > > > > > >>>>>>>>>>>>> -John
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>> On Wed, May 13, 2020, at 16:58, Guozhang Wang
> > wrote:
> > > > > > > >>>>>>>>>>>>>> Thanks Sophie for the KIP, a few quick thoughts:
> > > > > > > >>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>> 1) The end-to-end latency includes both the
> > > processing
> > > > > > > >>>>>>> latency
> > > > > > > >>>>>>>>> of the
> > > > > > > >>>>>>>>>>>>> task
> > > > > > > >>>>>>>>>>>>>> and the latency spent sitting in intermediate
> > > topics.
> > > > I
> > > > > have
> > > > > > > >>>>>>> a
> > > > > > > >>>>>>>>>>> similar
> > > > > > > >>>>>>>>>>>>>> feeling as Boyang mentioned above that the
> latency
> > > > > metric of
> > > > > > > >>>>>>> a
> > > > > > > >>>>>>>>> task A
> > > > > > > >>>>>>>>>>>>>> actually measures the latency of the
> sub-topology
> > > > up-to
> > > > > but
> > > > > > > >>>>>>> not
> > > > > > > >>>>>>>>>>> including
> > > > > > > >>>>>>>>>>>>>> the processing of A, which is a bit weird.
> > > > > > > >>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>> Maybe the my feeling comes from the name
> "latency"
> > > > > itself,
> > > > > > > >>>>>>> since
> > > > > > > >>>>>>>>>>> today we
> > > > > > > >>>>>>>>>>>>>> already have several "latency" metrics already
> > which
> > > > are
> > > > > > > >>>>>>>>> measuring
> > > > > > > >>>>>>>>>>>>> elapsed
> > > > > > > >>>>>>>>>>>>>> system-time for processing a record / etc, while
> > > here
> > > > > we are
> > > > > > > >>>>>>>>>>> comparing
> > > > > > > >>>>>>>>>>>>> the
> > > > > > > >>>>>>>>>>>>>> system wallclock time with the record timestamp.
> > > > > > > >>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>> Maybe we can consider renaming it as
> > > > "record-staleness"
> > > > > > > >>>>>>> (note we
> > > > > > > >>>>>>>>>>> already
> > > > > > > >>>>>>>>>>>>>> have a "record-lateness" metric), in which case
> > > > > recording at
> > > > > > > >>>>>>> the
> > > > > > > >>>>>>>>>>>>>> system-time before we start processing the
> record
> > > > sounds
> > > > > > > >> more
> > > > > > > >>>>>>>>>>> natural.
> > > > > > > >>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>> 2) With that in mind, I'm wondering if the
> > > > > > > >>>>>>> processor-node-level
> > > > > > > >>>>>>>>> DEBUG
> > > > > > > >>>>>>>>>>>>>> metric is worth to add, given that we already
> > have a
> > > > > > > >>>>>>> task-level
> > > > > > > >>>>>>>>>>>>> processing
> > > > > > > >>>>>>>>>>>>>> latency metric. Basically, a specific node's e2e
> > > > > latency is
> > > > > > > >>>>>>>>> similar
> > > > > > > >>>>>>>>>>> to
> > > > > > > >>>>>>>>>>>>> the
> > > > > > > >>>>>>>>>>>>>> task-level e2e latency + task-level processing
> > > > latency.
> > > > > > > >>>>>>>>> Personally I
> > > > > > > >>>>>>>>>>>>> think
> > > > > > > >>>>>>>>>>>>>> having a task-level record-staleness metric is
> > > > > sufficient.
> > > > > > > >>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>> Guozhang
> > > > > > > >>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>> On Wed, May 13, 2020 at 11:46 AM Sophie
> > > Blee-Goldman <
> > > > > > > >>>>>>>>>>>>> [email protected]>
> > > > > > > >>>>>>>>>>>>>> wrote:
> > > > > > > >>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>> 1. I felt that 50% was not a particularly
> useful
> > > > gauge
> > > > > for
> > > > > > > >>>>>>> this
> > > > > > > >>>>>>>>>>>>> specific
> > > > > > > >>>>>>>>>>>>>>> metric, as
> > > > > > > >>>>>>>>>>>>>>> it's presumably most useful at putting an
> *upper
> > > > > *bound on
> > > > > > > >>>>>>> the
> > > > > > > >>>>>>>>>>> latency
> > > > > > > >>>>>>>>>>>>> you
> > > > > > > >>>>>>>>>>>>>>> can
> > > > > > > >>>>>>>>>>>>>>> reasonably expect to see. I chose percentiles
> > that
> > > > > would
> > > > > > > >>>>>>>>> hopefully
> > > > > > > >>>>>>>>>>>>> give a
> > > > > > > >>>>>>>>>>>>>>> good
> > > > > > > >>>>>>>>>>>>>>> sense of what *most* records will experience,
> and
> > > > what
> > > > > > > >>>>>>> *close
> > > > > > > >>>>>>>>> to
> > > > > > > >>>>>>>>>>> all*
> > > > > > > >>>>>>>>>>>>>>> records
> > > > > > > >>>>>>>>>>>>>>> will.
> > > > > > > >>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>> However I'm not married to these specific
> numbers
> > > and
> > > > > > > >>>>>>> could be
> > > > > > > >>>>>>>>>>>>> convinced.
> > > > > > > >>>>>>>>>>>>>>> Would be especially interested in hearing from
> > > users
> > > > on
> > > > > > > >>>>>>> this.
> > > > > > > >>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>> 2. I'm inclined to not include the "hop-to-hop
> > > > > latency" in
> > > > > > > >>>>>>>>> this KIP
> > > > > > > >>>>>>>>>>>>> since
> > > > > > > >>>>>>>>>>>>>>> users
> > > > > > > >>>>>>>>>>>>>>> can always compute it themselves by subtracting
> > the
> > > > > > > >>>>>>> previous
> > > > > > > >>>>>>>>> node's
> > > > > > > >>>>>>>>>>>>>>> end-to-end latency. I guess we could do it
> either
> > > way
> > > > > since
> > > > > > > >>>>>>>>> you can
> > > > > > > >>>>>>>>>>>>> always
> > > > > > > >>>>>>>>>>>>>>> compute one from the other, but I think the
> > > > end-to-end
> > > > > > > >>>>>>> latency
> > > > > > > >>>>>>>>>>> feels
> > > > > > > >>>>>>>>>>>>> more
> > > > > > > >>>>>>>>>>>>>>> valuable as it's main motivation is not to
> debug
> > > > > > > >>>>>>> bottlenecks
> > > > > > > >>>>>>>>> in the
> > > > > > > >>>>>>>>>>>>>>> topology but
> > > > > > > >>>>>>>>>>>>>>> to give users a sense of how long it takes
> > arecord
> > > to
> > > > > be
> > > > > > > >>>>>>>>> reflected
> > > > > > > >>>>>>>>>>> in
> > > > > > > >>>>>>>>>>>>>>> certain parts
> > > > > > > >>>>>>>>>>>>>>> of the topology. For example this might be
> useful
> > > for
> > > > > users
> > > > > > > >>>>>>>>> who are
> > > > > > > >>>>>>>>>>>>>>> wondering
> > > > > > > >>>>>>>>>>>>>>> roughly when a record that was just produced
> will
> > > be
> > > > > > > >>>>>>> included
> > > > > > > >>>>>>>>> in
> > > > > > > >>>>>>>>>>> their
> > > > > > > >>>>>>>>>>>>> IQ
> > > > > > > >>>>>>>>>>>>>>> results.
> > > > > > > >>>>>>>>>>>>>>> Debugging is just a nice side effect -- but
> > maybe I
> > > > > didn't
> > > > > > > >>>>>>> make
> > > > > > > >>>>>>>>>>> that
> > > > > > > >>>>>>>>>>>>> clear
> > > > > > > >>>>>>>>>>>>>>> enough
> > > > > > > >>>>>>>>>>>>>>> in the KIP's motivation.
> > > > > > > >>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>> 3. Good question, I should address this in the
> > KIP.
> > > > The
> > > > > > > >>>>>>> short
> > > > > > > >>>>>>>>>>> answer is
> > > > > > > >>>>>>>>>>>>>>> "yes",
> > > > > > > >>>>>>>>>>>>>>> we will include late records. I added a
> paragraph
> > > to
> > > > > the
> > > > > > > >>>>>>> end
> > > > > > > >>>>>>>>> of the
> > > > > > > >>>>>>>>>>>>>>> Proposed
> > > > > > > >>>>>>>>>>>>>>> Changes section explaining the reasoning here,
> > > please
> > > > > let
> > > > > > > >>>>>>> me
> > > > > > > >>>>>>>>> know
> > > > > > > >>>>>>>>>>> if
> > > > > > > >>>>>>>>>>>>> you
> > > > > > > >>>>>>>>>>>>>>> have
> > > > > > > >>>>>>>>>>>>>>> any concerns.
> > > > > > > >>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>> 4. Assuming you're referring to the existing
> > metric
> > > > > > > >>>>>>>>>>> "process-latency",
> > > > > > > >>>>>>>>>>>>> that
> > > > > > > >>>>>>>>>>>>>>> metric
> > > > > > > >>>>>>>>>>>>>>> reflects the time for the literal Node#process
> > > method
> > > > > to
> > > > > > > >>>>>>> run
> > > > > > > >>>>>>>>>>> whereas
> > > > > > > >>>>>>>>>>>>> this
> > > > > > > >>>>>>>>>>>>>>> metric
> > > > > > > >>>>>>>>>>>>>>> would always be measured relative to the event
> > > > > timestamp.
> > > > > > > >>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>> That said, the naming collision there is pretty
> > > > > confusing
> > > > > > > >>>>>>> so
> > > > > > > >>>>>>>>> I've
> > > > > > > >>>>>>>>>>>>> renamed
> > > > > > > >>>>>>>>>>>>>>> the
> > > > > > > >>>>>>>>>>>>>>> metrics in this KIP to "end-to-end-latency"
> > which I
> > > > > feel
> > > > > > > >>>>>>> better
> > > > > > > >>>>>>>>>>>>> reflects
> > > > > > > >>>>>>>>>>>>>>> the nature
> > > > > > > >>>>>>>>>>>>>>> of the metric anyway.
> > > > > > > >>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>> Thanks for the feedback!
> > > > > > > >>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>> On Wed, May 13, 2020 at 10:21 AM Boyang Chen <
> > > > > > > >>>>>>>>>>>>> [email protected]>
> > > > > > > >>>>>>>>>>>>>>> wrote:
> > > > > > > >>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>> Thanks for the KIP Sophie. Getting the E2E
> > latency
> > > > is
> > > > > > > >>>>>>>>> important
> > > > > > > >>>>>>>>>>> for
> > > > > > > >>>>>>>>>>>>>>>> understanding the bottleneck of the
> application.
> > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>> A couple of questions and ideas:
> > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>> 1. Could you clarify the rational of picking
> 75,
> > > 99
> > > > > and
> > > > > > > >>>>>>> max
> > > > > > > >>>>>>>>>>>>> percentiles?
> > > > > > > >>>>>>>>>>>>>>>> Normally I see cases where we use 50, 90
> > > percentile
> > > > as
> > > > > > > >>>>>>> well
> > > > > > > >>>>>>>>> in
> > > > > > > >>>>>>>>>>>>> production
> > > > > > > >>>>>>>>>>>>>>>> systems.
> > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>> 2. The current latency being computed is
> > > cumulative,
> > > > > I.E
> > > > > > > >>>>>>> if a
> > > > > > > >>>>>>>>>>> record
> > > > > > > >>>>>>>>>>>>> goes
> > > > > > > >>>>>>>>>>>>>>>> through A -> B -> C, then P(C) = T(B->C) +
> P(B)
> > =
> > > > > > > >>>>>>> T(B->C) +
> > > > > > > >>>>>>>>>>> T(A->B) +
> > > > > > > >>>>>>>>>>>>>>> T(A)
> > > > > > > >>>>>>>>>>>>>>>> and so on, where P() represents the captured
> > > > latency,
> > > > > > > >>>>>>> and T()
> > > > > > > >>>>>>>>>>>>> represents
> > > > > > > >>>>>>>>>>>>>>>> the time for transiting the records between
> two
> > > > nodes,
> > > > > > > >>>>>>>>> including
> > > > > > > >>>>>>>>>>>>>>> processing
> > > > > > > >>>>>>>>>>>>>>>> time. For monitoring purpose, maybe having
> > T(B->C)
> > > > and
> > > > > > > >>>>>>>>> T(A->B)
> > > > > > > >>>>>>>>>>> are
> > > > > > > >>>>>>>>>>>>> more
> > > > > > > >>>>>>>>>>>>>>>> natural to view as "hop-to-hop latency",
> > otherwise
> > > > if
> > > > > > > >>>>>>> there
> > > > > > > >>>>>>>>> is a
> > > > > > > >>>>>>>>>>>>> spike in
> > > > > > > >>>>>>>>>>>>>>>> T(A->B), both P(B) and P(C) are affected in
> the
> > > same
> > > > > > > >>>>>>> time.
> > > > > > > >>>>>>>>> In the
> > > > > > > >>>>>>>>>>>>> same
> > > > > > > >>>>>>>>>>>>>>>> spirit, the E2E latency is meaningful only
> when
> > > the
> > > > > > > >>>>>>> record
> > > > > > > >>>>>>>>> exits
> > > > > > > >>>>>>>>>>>>> from the
> > > > > > > >>>>>>>>>>>>>>>> sink as this marks the whole time this record
> > > spent
> > > > > > > >>>>>>> inside
> > > > > > > >>>>>>>>> the
> > > > > > > >>>>>>>>>>>>> funnel. Do
> > > > > > > >>>>>>>>>>>>>>>> you think we could have separate treatment for
> > > sink
> > > > > > > >>>>>>> nodes and
> > > > > > > >>>>>>>>>>> other
> > > > > > > >>>>>>>>>>>>>>>> nodes, so that other nodes only count the time
> > > > > receiving
> > > > > > > >>>>>>> the
> > > > > > > >>>>>>>>>>> record
> > > > > > > >>>>>>>>>>>>> from
> > > > > > > >>>>>>>>>>>>>>>> last hop? I'm not proposing a solution here,
> > just
> > > > > want to
> > > > > > > >>>>>>>>> discuss
> > > > > > > >>>>>>>>>>>>> this
> > > > > > > >>>>>>>>>>>>>>>> alternative to see if it is reasonable.
> > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>> 3. As we are going to monitor late arrival
> > records
> > > > as
> > > > > > > >>>>>>> well,
> > > > > > > >>>>>>>>> they
> > > > > > > >>>>>>>>>>>>> would
> > > > > > > >>>>>>>>>>>>>>>> create some really spiky graphs when the
> > > > out-of-order
> > > > > > > >>>>>>>>> records are
> > > > > > > >>>>>>>>>>>>>>>> interleaving with on time records. Should we
> > also
> > > > > supply
> > > > > > > >>>>>>> a
> > > > > > > >>>>>>>>> smooth
> > > > > > > >>>>>>>>>>>>> version
> > > > > > > >>>>>>>>>>>>>>>> of the latency metrics, or user should just
> take
> > > > care
> > > > > of
> > > > > > > >>>>>>> it
> > > > > > > >>>>>>>>> by
> > > > > > > >>>>>>>>>>>>> themself?
> > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>> 4. Regarding this new metrics, we haven't
> > > discussed
> > > > > its
> > > > > > > >>>>>>>>> relation
> > > > > > > >>>>>>>>>>>>> with our
> > > > > > > >>>>>>>>>>>>>>>> existing processing latency metrics, could you
> > add
> > > > > some
> > > > > > > >>>>>>>>> context
> > > > > > > >>>>>>>>>>> on
> > > > > > > >>>>>>>>>>>>>>>> comparison and a simple `when to use which`
> > > tutorial
> > > > > for
> > > > > > > >>>>>>> the
> > > > > > > >>>>>>>>>>> best?
> > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>> Boyang
> > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>> On Tue, May 12, 2020 at 7:28 PM Sophie
> > > Blee-Goldman
> > > > <
> > > > > > > >>>>>>>>>>>>> [email protected]
> > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>> wrote:
> > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>> Hey all,
> > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>> I'd like to kick off discussion on KIP-613
> > which
> > > > aims
> > > > > > > >>>>>>> to
> > > > > > > >>>>>>>>> add
> > > > > > > >>>>>>>>>>>>> end-to-end
> > > > > > > >>>>>>>>>>>>>>>>> latency metrics to Streams. Please take a
> look:
> > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>
> > > > > > > >>>>>>>>>
> > > > > > > >>>>>>>
> > > > > > > >>>>
> > > > > > > >>>
> > > > > > > >>
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-613%3A+Add+end-to-end+latency+metrics+to+Streams
> > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>> Cheers,
> > > > > > > >>>>>>>>>>>>>>>>> Sophie
> > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>> --
> > > > > > > >>>>>>>>>>>>>> -- Guozhang
> > > > > > > >>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>
> > > > > > > >>>>>>>>>>
> > > > > > > >>>>>>>>>
> > > > > > > >>>>>>>
> > > > > > > >>>>>>
> > > > > > > >>>>
> > > > > > > >>>>
> > > > > > > >>>
> > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> > > --
> > > -- Guozhang
> > >
> >
>
>
> --
> -- Guozhang
>

Re: [DISCUSS] KIP-613: Add end-to-end latency metrics to Streams

Reply via email to