Re: [DISCUSS] KIP-613: Add end-to-end latency metrics to Streams

Matthias J. Sax Fri, 15 May 2020 14:25:30 -0700

I am also happy with max/min/99/90. And I buy your naming argument about
staleness vs latency.



-Matthias

On 5/15/20 12:24 PM, Boyang Chen wrote:
> Hey Sophie,
> 
> 90/99/min/max make sense to me.
> 
> On Fri, May 15, 2020 at 12:20 PM Sophie Blee-Goldman <sop...@confluent.io>
> wrote:
> 
>> @Matthias
>> Regarding tracking the 50th percentile, I'll refer you to the 4:53 mark of
>> the video
>> *you* linked: https://youtu.be/lJ8ydIuPFeU?t=293
>>
>> And that's what he says about the 95th percentile! Imagine what he would
>> say about
>> the 50th :P
>>
>> But seriously, since we can't seem to agree that the mean or 50th
>> percentile is actually
>> useful I'm inclined to resurrect my original proposal, neither. But I think
>> that's a good
>> argument against the 75th, which I admittedly chose somewhat arbitrarily as
>> an
>> intermediate between the 50th and the higher percentiles. How about:
>>
>> -max
>> -p99
>> -p90
>> -min
>>
>> with p50/mean still up for debate if anyone feels strongly for either of
>> them.
>>
>> Regarding the name, I've already flip-flopped on this so I'm definitely
>> still open to
>> further arguments. But the reason for changing it from end-to-end-latency
>> (which
>> is similar to what you propose) is that this metric technically reflects
>> how old (ie how "stale")
>> the record is when it's *received* by the operator, not when it's processed
>> by the operator.
>> It seemed like there was the potential for confusion that
>> "end-to-end-latency" might
>> represent the latency from the event creation to the time the processor is
>> done
>> processing it.
>>
>> @John
>> I'd rather err on the side of "not-enough" metrics as we can always add
>> this to the
>> stateless metrics later on. If we decide to measure the time at every node
>> and don't
>> find any evidence of a serious performance impact, and users indicate they
>> would
>> like to see this metric at all nodes, then we can easily start reporting
>> them as well.
>> WDYT?
>>
>> That said, sink nodes seem like a reasonable exception to the rule.
>> Obviously users
>> should be able to detect the time when the record reaches the output topic
>> but that
>> still leaves a gap in understanding how long the production latency was.
>> This mirrors
>> the consumption latency that is exposed by the task-level metrics, which
>> are measured
>> at the source node. For good symmetry what if we actually expose both the
>> source
>> and sink latency at the task-level? ie report both sets of statistical
>> measurements with
>> the additional tag -source/-sink
>>
>> @Bill
>> Thanks for the comment about regarding the min! I hadn't considered that
>> and it's
>> quite useful to think about how and what is useful from a users point of
>> view.
>>
>> Regarding your second. point, I'm inclined to leave that as an
>> implementation detail
>> but my take would be that the user should be allowed to control the record
>> timestamp
>> used for this with the timestamp extractor. My impression is that users may
>> often embed
>> the actual event time in the payload for whatever reason, and this
>> represents the "true"
>> timestamp as far as the Streams topology is concerned.
>>
>>
>> On Fri, May 15, 2020 at 11:05 AM Bill Bejeck <bbej...@gmail.com> wrote:
>>
>>> Thanks for the KIP, Sophie, this will be a useful metric to add.
>>>
>>> Regarding tracking min, I  think it could be valuable for users to
>> discern
>>> which part of their topologies are more efficient since this is a
>>> task-level metric.  I realize everyone seems to be on board with
>> including
>>> min anyway, but I wanted to add my 2 cents on this topic should we decide
>>> to revisit adding min or not.
>>>
>>> I do have a question regarding the calculation of staleness.
>>> Is there going to be a consideration for timestamp extractors? Users
>> could
>>> prefer to use a timestamp embedded in the payload, and it could skew the
>>> measurements.
>>> I was wondering if we should specify in the KIP if setting the arrival
>> time
>>> is always going to come from the record timestamp, or is this an
>>> implementation detail we can cover in the PR?
>>>
>>> Thanks!
>>> Bill
>>>
>>> On Fri, May 15, 2020 at 1:11 AM Matthias J. Sax <mj...@apache.org>
>> wrote:
>>>
>>>> Thanks for the KIP Sophie.
>>>>
>>>> I think it's not useful to record the avg/mean; it sensitive to
>>>> outliers. We should rather track the median (50th percentile).
>>>>
>>>> Not sure if tracking min is useful, but I am also ok to track it.
>>>>
>>>> However, I find it odd to track 75th percentile. Standard measures
>> would
>>>> the 90th or 95th -- I guess we don't need both, so maybe picking 90th
>>>> might be more useful?
>>>>
>>>> About the name: "staleness" wound really odd, and if fact the metric
>>>> does capture "latency" so we should call it "latency". I understand the
>>>> issue that we already have a latency metric. So maybe we could call it
>>>> `record-e2e-latency-*` ?
>>>>
>>>> While I agree that we should include out-or-order data (the KIP should
>>>> talk about `out-of-order` data, not `late` data; data is only `late` if
>>>> it's out-of-order and if it's dropped), I don't really understand why
>>>> the new metric would help to configure grace period or retention time?
>>>> As you mention in the KIP, both are define as max difference of
>>>> `event-time - stream-time` and thus the new metric that takes
>>>> system-/wallclock-time into account does not seem to help at all.
>>>>
>>>>
>>>> Btw: there is a great talk about "How NOT to Measure Latency" by Gil
>>>> Tene: https://www.youtube.com/watch?v=lJ8ydIuPFeU
>>>>
>>>>
>>>> -Matthias
>>>>
>>>>
>>>> On 5/14/20 7:17 PM, John Roesler wrote:
>>>>> Hi Sophie,
>>>>>
>>>>> It seems like there would still be plenty of use cases for recording
>>>>> this metric at all processors and not just stateful ones, but I'm
>> happy
>>>>> to suspend my arguments for now. Since you're proposing to keep
>>>>> them at the processor-node level, it will be seamless later to add
>>>>> in the stateless processors if we want. As a wise man once said,
>>>>> "Adding is always easier than removing."
>>>>>
>>>>> Regarding the time measurement, it's an implementation detail
>>>>> we don't need to consider in the KIP. Nevertheless, I'd greatly
>>>>> prefer to measure the system time again when recording the
>>>>> metric. I don't think we've seen any evidence that proves this
>>>>> would harm performance, and the amount of inaccuracy using
>>>>> the cached system time could incur is actually substantial. But,
>>>>> if you want to just "not mention this" in the KIP, we can defer to
>>>>> the actual PR discussion, at which time we're in a better position
>>>>> to use benchmarks, etc., to make the call.
>>>>>
>>>>> Along the lines of the measurement accuracy discussion, one
>>>>> minor thought I had is that maybe we should consider measuring
>>>>> the task staleness metric at the sink, rather than the source, so
>> that
>>>>> it includes the processing latency of the task itself, not just the
>>>> latency
>>>>> of everything up to, but not including, the task (which seems
>> confusing
>>>>> for users). I guess this could also be an implementation detail,
>>> though.
>>>>>
>>>>> Thanks for the update,
>>>>> -John
>>>>>
>>>>> On Thu, May 14, 2020, at 13:31, Sophie Blee-Goldman wrote:
>>>>>> Hey all,
>>>>>>
>>>>>> After discussing with Bruno I'd like to propose a small amendment,
>>>>>> which is to record the processor-node-level metrics only for
>>> *stateful*
>>>>>> *operators*. They would still be considered a "processor-node-level"
>>>>>> metric and not a "state-store-level" metric as the staleness is
>> still
>>>>>> a property of the node rather than of the state itself. However, it
>>>> seems
>>>>>> that this information is primarily useful for stateful operators
>> that
>>>> might
>>>>>> be exposing state via IQ or otherwise dependent on the record time
>>>>>> unlike a stateless operator.
>>>>>>
>>>>>> It's worth calling out that recent performance improvements to the
>>>> metrics
>>>>>> framework mean that we no longer fetch the system time at the
>> operator
>>>>>> level, but only once per task. In other words the system time is not
>>>> updated
>>>>>> between each process as a record flows through the subtopology, so
>>>>>> debugging the processor-level latency via the stateleness will not
>> be
>>>>>> possible.Note that this doesn't mean the operator-level metrics are
>>> not
>>>>>> *useful* relative to the task-level metric. Upstream caching and/or
>>>>>> suppression
>>>>>> can still cause a record's staleness at some downstream stateful
>>>> operator
>>>>>> to deviate from the task-level staleness (recorded at the source
>>> node).
>>>>>>
>>>>>> Please let me know if you have any concerns about this change. The
>>>>>> KIP has been updated with the new proposal
>>>>>>
>>>>>> On Thu, May 14, 2020 at 3:04 AM Bruno Cadonna <br...@confluent.io>
>>>> wrote:
>>>>>>
>>>>>>> Hi Sophie,
>>>>>>>
>>>>>>> Thank you for the KIP.
>>>>>>>
>>>>>>> The KIP looks good to me.
>>>>>>>
>>>>>>> 50th percentile:
>>>>>>> I think we do not need it now. If we need it, we can add it. Here
>> the
>>>>>>> old truism applies: Adding is always easier than removing.
>>>>>>>
>>>>>>> processor-node-level metrics:
>>>>>>> I think it is good to have the staleness metrics also on
>>>>>>> processor-node-level. If we do not want to record them on all
>>>>>>> processor nodes, you could restrict the recording to stateful
>>>>>>> processor-nodes, since those are the ones that would benefit most
>>> from
>>>>>>> the staleness metrics.
>>>>>>>
>>>>>>> Best,
>>>>>>> Bruno
>>>>>>>
>>>>>>> On Thu, May 14, 2020 at 4:15 AM Sophie Blee-Goldman <
>>>> sop...@confluent.io>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Yeah, the specific reason was just to align with the current
>>> metrics.
>>>>>>>>
>>>>>>>> Is it better to conform than to be right? History has a lot to say
>>> on
>>>>>>> that
>>>>>>>> matter
>>>>>>>> but I'm not sure how much of it applies to the fine details of
>>> metrics
>>>>>>>> naming :P
>>>>>>>>
>>>>>>>> More seriously, I figured if people are looking at this metric
>>> they're
>>>>>>>> likely to
>>>>>>>> be looking at all the others. Then naming this one "-mean" would
>>>> probably
>>>>>>>> lead some to conclude that the "-avg" suffix in the other metrics
>>> has
>>>> a
>>>>>>>> different meaning.
>>>>>>>>
>>>>>>>> As for the percentiles, I actually like p99 (and p75) better. I'll
>>>> swap
>>>>>>>> that out
>>>>>>>>
>>>>>>>> On Wed, May 13, 2020 at 7:07 PM John Roesler <vvcep...@apache.org
>>>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Thanks Sophie,
>>>>>>>>>
>>>>>>>>> I hope this isn't too nit-picky, but is there a reason to choose
>>>> "avg"
>>>>>>>>> instead
>>>>>>>>> of "mean"? Maybe this is too paranoid, and I might be
>> oversensitive
>>>>>>> because
>>>>>>>>> of the mistake I just made earlier, but it strikes me that "avg"
>> is
>>>>>>>>> actually
>>>>>>>>> ambiguous, as it refers to a family of statistics, whereas "mean"
>>> is
>>>>>>>>> specific.
>>>>>>>>> I see other Kafka metrics with "avg", but none with "mean"; was
>>> that
>>>>>>> the
>>>>>>>>> reason? If so, I'm +1.
>>>>>>>>>
>>>>>>>>> Regarding the names of the percentile, I actually couldn't find
>>> _any_
>>>>>>> other
>>>>>>>>> metrics that use percentile. Was there a reason to choose "99th"
>> as
>>>>>>> opposed
>>>>>>>>> to "p99" or any other scheme? This is not a criticism, I'm just
>>>>>>> primarily
>>>>>>>>> asking
>>>>>>>>> for consistency's sake.
>>>>>>>>>
>>>>>>>>> Thanks again,
>>>>>>>>> -John
>>>>>>>>>
>>>>>>>>> On Wed, May 13, 2020, at 19:19, Sophie Blee-Goldman wrote:
>>>>>>>>>> Alright, I can get behind adding the min metric for the sake of
>>>>>>> pretty
>>>>>>>>>> graphs
>>>>>>>>>> (and trivial computation).
>>>>>>>>>>
>>>>>>>>>> I'm still on the fence regarding the mean (or 50th percentile)
>>> but I
>>>>>>> can
>>>>>>>>> see
>>>>>>>>>> how users might expect it and find it a bit disorienting not to
>>>>>>> have. So
>>>>>>>>> the
>>>>>>>>>> updated proposed metrics are
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>    - record-staleness-max [ms]
>>>>>>>>>>    - record-staleness-99th [ms] *(99th percentile)*
>>>>>>>>>>    - record-staleness-75th [ms] *(75th percentile)*
>>>>>>>>>>    - record-staleness-avg [ms] *(mean)*
>>>>>>>>>>    - record-staleness-min [ms]
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, May 13, 2020 at 4:42 PM John Roesler <
>> vvcep...@apache.org
>>>>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Oh boy, I never miss an opportunity to embarrass myself. I
>> guess
>>>>>>> the
>>>>>>>>> mean
>>>>>>>>>>> seems more interesting to me than the median, but neither are
>> as
>>>>>>>>>>> interesting as the higher percentiles (99th and max).
>>>>>>>>>>>
>>>>>>>>>>> Min isn’t really important for any SLAs, but it does round out
>>> the
>>>>>>>>> mental
>>>>>>>>>>> picture of the distribution. I’ve always graphed min along with
>>> the
>>>>>>>>> other
>>>>>>>>>>> metrics to help me understand how fast the system can be, which
>>>>>>> helps
>>>>>>>>> in
>>>>>>>>>>> optimization decisions. It’s also a relatively inexpensive
>> metric
>>>>>>> to
>>>>>>>>>>> compute, so it might be nice to just throw it in.
>>>>>>>>>>>
>>>>>>>>>>> On Wed, May 13, 2020, at 18:18, Sophie Blee-Goldman wrote:
>>>>>>>>>>>> G1:
>>>>>>>>>>>> I was considering it as the "end-to-end latency *up* to the
>>>>>>> specific
>>>>>>>>>>> task"
>>>>>>>>>>>> but
>>>>>>>>>>>> I'm happy with "record-staleness" if that drives the point
>> home
>>>>>>>>> better.
>>>>>>>>>>> So
>>>>>>>>>>>> it's the
>>>>>>>>>>>> "staleness of the record when it is received by that task" --
>>>>>>> will
>>>>>>>>> update
>>>>>>>>>>>> the KIP
>>>>>>>>>>>>
>>>>>>>>>>>> B1/J:
>>>>>>>>>>>> I'm struggling to imagine a case where the min would actually
>> be
>>>>>>>>> useful,
>>>>>>>>>>>> rather than
>>>>>>>>>>>> just intellectually interesting. I don't feel strongly that we
>>>>>>>>> shouldn't
>>>>>>>>>>>> add it, but that's
>>>>>>>>>>>> why I didn't include it from the start. Can you enlighten me
>>>>>>> with an
>>>>>>>>>>>> example?
>>>>>>>>>>>>
>>>>>>>>>>>> I was also vaguely concerned about the overhead of adding
>>>>>>> multiple
>>>>>>>>>>>> percentile
>>>>>>>>>>>> metrics. Do we have any data to indicate what kind of
>>> performance
>>>>>>>>> hit we
>>>>>>>>>>>> take on
>>>>>>>>>>>> metrics computation?
>>>>>>>>>>>>
>>>>>>>>>>>> Also, not to be too pedantic but the 50th percentile would be
>>> the
>>>>>>>>> median
>>>>>>>>>>>> not the
>>>>>>>>>>>> mean. Would you propose to add the mean *and* the 50th
>>>>>>> percentile, or
>>>>>>>>>>> just
>>>>>>>>>>>> one
>>>>>>>>>>>> of the two?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks all!
>>>>>>>>>>>> Sophie
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, May 13, 2020 at 3:34 PM John Roesler <
>>>>>>> vvcep...@apache.org>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hello all, and thanks for the KIP, Sophie,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Just some comments on the discussion so far:
>>>>>>>>>>>>>
>>>>>>>>>>>>> B2/G1:
>>>>>>>>>>>>> In principle, it shouldn't matter whether we report "spans"
>> or
>>>>>>>>>>>>> "end-to-end" latency. But in practice, some of the spans are
>>>>>>> pretty
>>>>>>>>>>>>> difficult to really measure (like time spent waiting in the
>>>>>>>>> topics, or
>>>>>>>>>>>>> time from the event happening to the ETL producer choosing to
>>>>>>> send
>>>>>>>>> it,
>>>>>>>>>>>>> or time spent in send/receive buffers, etc., etc.
>>>>>>>>>>>>>
>>>>>>>>>>>>> In other words, it's practically easier to compute spans by
>>>>>>>>> subtracting
>>>>>>>>>>>>> e2e latencies than it is to compute e2e latencies by adding
>>>>>>> spans.
>>>>>>>>> You
>>>>>>>>>>>>> can even just consider that the span computation from e2e
>>>>>>> always
>>>>>>>>> just
>>>>>>>>>>>>> involves subtracting two numbers, whereas computing e2e
>> latency
>>>>>>>>> from
>>>>>>>>>>>>> spans involves adding _all_ the spans leading up to the end
>> you
>>>>>>>>> care
>>>>>>>>>>> about.
>>>>>>>>>>>>>
>>>>>>>>>>>>> It seems like people really prefer to have spans when they
>> are
>>>>>>>>>>> debugging
>>>>>>>>>>>>> latency problems, whereas e2e latency is a more general
>>>>>>> measurement
>>>>>>>>>>>>> that basically every person/application cares about and
>> should
>>>>>>> be
>>>>>>>>>>>>> monitoring.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Altogether, it really seem to provide more value to more
>>>>>>> people if
>>>>>>>>> we
>>>>>>>>>>>>> report
>>>>>>>>>>>>> e2e latencies. Regarding "record-staleness" as a name, I
>> think
>>>>>>> I
>>>>>>>>> have
>>>>>>>>>>> no
>>>>>>>>>>>>> preference, I'd defer to other peoples' intuition.
>>>>>>>>>>>>>
>>>>>>>>>>>>> G2:
>>>>>>>>>>>>> I think the processor-node metric is nice, since the inside
>> of
>>>>>>> a
>>>>>>>>> task
>>>>>>>>>>> can
>>>>>>>>>>>>> introduce a significant amount of latency in some cases.
>> Plus,
>>>>>>>>> it's a
>>>>>>>>>>> more
>>>>>>>>>>>>> direct measurement, if you really wanted to know (for the
>>>>>>> purposes
>>>>>>>>> of
>>>>>>>>>>> IQ
>>>>>>>>>>>>> or something) how long it takes source events to "show up" at
>>>>>>> the
>>>>>>>>>>> store.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think actually recording it at every processor could be
>>>>>>>>> expensive,
>>>>>>>>>>> but we
>>>>>>>>>>>>> already record a bunch of metrics at the node level.
>>>>>>>>>>>>>
>>>>>>>>>>>>> B1:
>>>>>>>>>>>>> I think 50% could be reasonable to record also. Even if it's
>> a
>>>>>>> poor
>>>>>>>>>>> metric
>>>>>>>>>>>>> for operational purposes, a lot of people might expect to see
>>>>>>>>> "mean".
>>>>>>>>>>>>> Actually,
>>>>>>>>>>>>> I was surprised not to see "min". Is there a reason to leave
>> it
>>>>>>>>> off?
>>>>>>>>>>>>>
>>>>>>>>>>>>> I might suggest:
>>>>>>>>>>>>> min, mean (50th), 75th, 99th, max
>>>>>>>>>>>>>
>>>>>>>>>>>>> B3:
>>>>>>>>>>>>> I agree we should include late records (though not the ones
>> we
>>>>>>>>> drop).
>>>>>>>>>>>>> It may be spiky, but only when there are legitimately some
>>>>>>> records
>>>>>>>>>>> with a
>>>>>>>>>>>>> high end-to-end latency, which is the whole point of these
>>>>>>> metrics.
>>>>>>>>>>>>>
>>>>>>>>>>>>> That's it! I don't think I have any other feedback, other
>> than
>>>>>>> a
>>>>>>>>>>> request to
>>>>>>>>>>>>> also report "min".
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> -John
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, May 13, 2020, at 16:58, Guozhang Wang wrote:
>>>>>>>>>>>>>> Thanks Sophie for the KIP, a few quick thoughts:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 1) The end-to-end latency includes both the processing
>>>>>>> latency
>>>>>>>>> of the
>>>>>>>>>>>>> task
>>>>>>>>>>>>>> and the latency spent sitting in intermediate topics. I have
>>>>>>> a
>>>>>>>>>>> similar
>>>>>>>>>>>>>> feeling as Boyang mentioned above that the latency metric of
>>>>>>> a
>>>>>>>>> task A
>>>>>>>>>>>>>> actually measures the latency of the sub-topology up-to but
>>>>>>> not
>>>>>>>>>>> including
>>>>>>>>>>>>>> the processing of A, which is a bit weird.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Maybe the my feeling comes from the name "latency" itself,
>>>>>>> since
>>>>>>>>>>> today we
>>>>>>>>>>>>>> already have several "latency" metrics already which are
>>>>>>>>> measuring
>>>>>>>>>>>>> elapsed
>>>>>>>>>>>>>> system-time for processing a record / etc, while here we are
>>>>>>>>>>> comparing
>>>>>>>>>>>>> the
>>>>>>>>>>>>>> system wallclock time with the record timestamp.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Maybe we can consider renaming it as "record-staleness"
>>>>>>> (note we
>>>>>>>>>>> already
>>>>>>>>>>>>>> have a "record-lateness" metric), in which case recording at
>>>>>>> the
>>>>>>>>>>>>>> system-time before we start processing the record sounds
>> more
>>>>>>>>>>> natural.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2) With that in mind, I'm wondering if the
>>>>>>> processor-node-level
>>>>>>>>> DEBUG
>>>>>>>>>>>>>> metric is worth to add, given that we already have a
>>>>>>> task-level
>>>>>>>>>>>>> processing
>>>>>>>>>>>>>> latency metric. Basically, a specific node's e2e latency is
>>>>>>>>> similar
>>>>>>>>>>> to
>>>>>>>>>>>>> the
>>>>>>>>>>>>>> task-level e2e latency + task-level processing latency.
>>>>>>>>> Personally I
>>>>>>>>>>>>> think
>>>>>>>>>>>>>> having a task-level record-staleness metric is sufficient.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Guozhang
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, May 13, 2020 at 11:46 AM Sophie Blee-Goldman <
>>>>>>>>>>>>> sop...@confluent.io>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 1. I felt that 50% was not a particularly useful gauge for
>>>>>>> this
>>>>>>>>>>>>> specific
>>>>>>>>>>>>>>> metric, as
>>>>>>>>>>>>>>> it's presumably most useful at putting an *upper *bound on
>>>>>>> the
>>>>>>>>>>> latency
>>>>>>>>>>>>> you
>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>> reasonably expect to see. I chose percentiles that would
>>>>>>>>> hopefully
>>>>>>>>>>>>> give a
>>>>>>>>>>>>>>> good
>>>>>>>>>>>>>>> sense of what *most* records will experience, and what
>>>>>>> *close
>>>>>>>>> to
>>>>>>>>>>> all*
>>>>>>>>>>>>>>> records
>>>>>>>>>>>>>>> will.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> However I'm not married to these specific numbers and
>>>>>>> could be
>>>>>>>>>>>>> convinced.
>>>>>>>>>>>>>>> Would be especially interested in hearing from users on
>>>>>>> this.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2. I'm inclined to not include the "hop-to-hop latency" in
>>>>>>>>> this KIP
>>>>>>>>>>>>> since
>>>>>>>>>>>>>>> users
>>>>>>>>>>>>>>> can always compute it themselves by subtracting the
>>>>>>> previous
>>>>>>>>> node's
>>>>>>>>>>>>>>> end-to-end latency. I guess we could do it either way since
>>>>>>>>> you can
>>>>>>>>>>>>> always
>>>>>>>>>>>>>>> compute one from the other, but I think the end-to-end
>>>>>>> latency
>>>>>>>>>>> feels
>>>>>>>>>>>>> more
>>>>>>>>>>>>>>> valuable as it's main motivation is not to debug
>>>>>>> bottlenecks
>>>>>>>>> in the
>>>>>>>>>>>>>>> topology but
>>>>>>>>>>>>>>> to give users a sense of how long it takes arecord to be
>>>>>>>>> reflected
>>>>>>>>>>> in
>>>>>>>>>>>>>>> certain parts
>>>>>>>>>>>>>>> of the topology. For example this might be useful for users
>>>>>>>>> who are
>>>>>>>>>>>>>>> wondering
>>>>>>>>>>>>>>> roughly when a record that was just produced will be
>>>>>>> included
>>>>>>>>> in
>>>>>>>>>>> their
>>>>>>>>>>>>> IQ
>>>>>>>>>>>>>>> results.
>>>>>>>>>>>>>>> Debugging is just a nice side effect -- but maybe I didn't
>>>>>>> make
>>>>>>>>>>> that
>>>>>>>>>>>>> clear
>>>>>>>>>>>>>>> enough
>>>>>>>>>>>>>>> in the KIP's motivation.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 3. Good question, I should address this in the KIP. The
>>>>>>> short
>>>>>>>>>>> answer is
>>>>>>>>>>>>>>> "yes",
>>>>>>>>>>>>>>> we will include late records. I added a paragraph to the
>>>>>>> end
>>>>>>>>> of the
>>>>>>>>>>>>>>> Proposed
>>>>>>>>>>>>>>> Changes section explaining the reasoning here, please let
>>>>>>> me
>>>>>>>>> know
>>>>>>>>>>> if
>>>>>>>>>>>>> you
>>>>>>>>>>>>>>> have
>>>>>>>>>>>>>>> any concerns.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 4. Assuming you're referring to the existing metric
>>>>>>>>>>> "process-latency",
>>>>>>>>>>>>> that
>>>>>>>>>>>>>>> metric
>>>>>>>>>>>>>>> reflects the time for the literal Node#process method to
>>>>>>> run
>>>>>>>>>>> whereas
>>>>>>>>>>>>> this
>>>>>>>>>>>>>>> metric
>>>>>>>>>>>>>>> would always be measured relative to the event timestamp.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> That said, the naming collision there is pretty confusing
>>>>>>> so
>>>>>>>>> I've
>>>>>>>>>>>>> renamed
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> metrics in this KIP to "end-to-end-latency" which I feel
>>>>>>> better
>>>>>>>>>>>>> reflects
>>>>>>>>>>>>>>> the nature
>>>>>>>>>>>>>>> of the metric anyway.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks for the feedback!
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, May 13, 2020 at 10:21 AM Boyang Chen <
>>>>>>>>>>>>> reluctanthero...@gmail.com>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks for the KIP Sophie. Getting the E2E latency is
>>>>>>>>> important
>>>>>>>>>>> for
>>>>>>>>>>>>>>>> understanding the bottleneck of the application.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> A couple of questions and ideas:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 1. Could you clarify the rational of picking 75, 99 and
>>>>>>> max
>>>>>>>>>>>>> percentiles?
>>>>>>>>>>>>>>>> Normally I see cases where we use 50, 90 percentile as
>>>>>>> well
>>>>>>>>> in
>>>>>>>>>>>>> production
>>>>>>>>>>>>>>>> systems.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2. The current latency being computed is cumulative, I.E
>>>>>>> if a
>>>>>>>>>>> record
>>>>>>>>>>>>> goes
>>>>>>>>>>>>>>>> through A -> B -> C, then P(C) = T(B->C) + P(B) =
>>>>>>> T(B->C) +
>>>>>>>>>>> T(A->B) +
>>>>>>>>>>>>>>> T(A)
>>>>>>>>>>>>>>>> and so on, where P() represents the captured latency,
>>>>>>> and T()
>>>>>>>>>>>>> represents
>>>>>>>>>>>>>>>> the time for transiting the records between two nodes,
>>>>>>>>> including
>>>>>>>>>>>>>>> processing
>>>>>>>>>>>>>>>> time. For monitoring purpose, maybe having T(B->C) and
>>>>>>>>> T(A->B)
>>>>>>>>>>> are
>>>>>>>>>>>>> more
>>>>>>>>>>>>>>>> natural to view as "hop-to-hop latency", otherwise if
>>>>>>> there
>>>>>>>>> is a
>>>>>>>>>>>>> spike in
>>>>>>>>>>>>>>>> T(A->B), both P(B) and P(C) are affected in the same
>>>>>>> time.
>>>>>>>>> In the
>>>>>>>>>>>>> same
>>>>>>>>>>>>>>>> spirit, the E2E latency is meaningful only when the
>>>>>>> record
>>>>>>>>> exits
>>>>>>>>>>>>> from the
>>>>>>>>>>>>>>>> sink as this marks the whole time this record spent
>>>>>>> inside
>>>>>>>>> the
>>>>>>>>>>>>> funnel. Do
>>>>>>>>>>>>>>>> you think we could have separate treatment for sink
>>>>>>> nodes and
>>>>>>>>>>> other
>>>>>>>>>>>>>>>> nodes, so that other nodes only count the time receiving
>>>>>>> the
>>>>>>>>>>> record
>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>> last hop? I'm not proposing a solution here, just want to
>>>>>>>>> discuss
>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>> alternative to see if it is reasonable.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 3. As we are going to monitor late arrival records as
>>>>>>> well,
>>>>>>>>> they
>>>>>>>>>>>>> would
>>>>>>>>>>>>>>>> create some really spiky graphs when the out-of-order
>>>>>>>>> records are
>>>>>>>>>>>>>>>> interleaving with on time records. Should we also supply
>>>>>>> a
>>>>>>>>> smooth
>>>>>>>>>>>>> version
>>>>>>>>>>>>>>>> of the latency metrics, or user should just take care of
>>>>>>> it
>>>>>>>>> by
>>>>>>>>>>>>> themself?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 4. Regarding this new metrics, we haven't discussed its
>>>>>>>>> relation
>>>>>>>>>>>>> with our
>>>>>>>>>>>>>>>> existing processing latency metrics, could you add some
>>>>>>>>> context
>>>>>>>>>>> on
>>>>>>>>>>>>>>>> comparison and a simple `when to use which` tutorial for
>>>>>>> the
>>>>>>>>>>> best?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Boyang
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Tue, May 12, 2020 at 7:28 PM Sophie Blee-Goldman <
>>>>>>>>>>>>> sop...@confluent.io
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hey all,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I'd like to kick off discussion on KIP-613 which aims
>>>>>>> to
>>>>>>>>> add
>>>>>>>>>>>>> end-to-end
>>>>>>>>>>>>>>>>> latency metrics to Streams. Please take a look:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>
>>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-613%3A+Add+end-to-end+latency+metrics+to+Streams
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>> Sophie
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> -- Guozhang
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>>
>>>
>>
>

signature.asc
Description: OpenPGP digital signature

Re: [DISCUSS] KIP-613: Add end-to-end latency metrics to Streams

Reply via email to