Re: [DISCUSS] KIP-488: Clean up Sum,Count,Total Metrics

Sophie Blee-Goldman Tue, 16 Jul 2019 14:26:17 -0700

I'm +1 on Windowed, was about to suggest that as I was catching up on the
discussion but Bill beat me to it :)


On Tue, Jul 16, 2019 at 2:23 PM Bill Bejeck <[email protected]> wrote:

> Hi John,
>
> Thanks for the updates.
>
> I like RunningCount and RunningSum.
>
> What about WindowedCount, WindowedSum instead of Moving?
> I'm just throwing this out there as Windowed seems more intuitive to me,
> but I'm not married to the idea.
>
> -Bill
>
> On Tue, Jul 16, 2019 at 5:09 PM John Roesler <[email protected]> wrote:
>
> > No worries! Choosing good public API names is a high-impact design
> > activity.
> >
> > Matthias, Bruno, Bill, and Stanislav,
> >
> > You've all contributed to this discussion or the vote so far... How do
> > you feel about the proposed name change:
> >
> > MovingCount, MovingSum   (instead of Sampled)
> > RunningCount, RunningSum   (Instead of Total)
> >
> > Thanks,
> > -John
> >
> > On Tue, Jul 16, 2019 at 3:04 PM Ryanne Dolan <[email protected]>
> > wrote:
> > >
> > > John, that makes sense to me. Sorry for the bikeshedding.
> > >
> > > Ryanne
> > >
> > > On Tue, Jul 16, 2019 at 12:49 PM John Roesler <[email protected]>
> wrote:
> > >
> > > > Thanks for the explanation and the suggestion, Ryanne,
> > > >
> > > > I went with "sampled" just because these are instances of
> SampledStat,
> > > > which in the Kafka Metrics ecosystem are computed from a window of
> > > > recent samples. Thinking more about it, the fact that they are
> sampled
> > > > and the fact that they are windowed are orthogonal, which is what
> > > > you're pointing out... sampling by itself doesn't indicate that it's
> a
> > > > moving average.
> > > >
> > > > Since there is no way in Kafka Metrics for a metric to be sampled and
> > > > not windowed/moving/decaying, calling them Sampled would never be
> > > > incorrect. But to someone unfamiliar with the code, it wouldn't
> > > > immediately suggest the behavior of the metric that actually matters.
> > > > That is, the behavior that distinguishes the two classes of metrics
> we
> > > > want to disambiguate here.
> > > >
> > > > It sounds like you'd suggest a new matrix of names:
> > > > MovingCount, MovingSum
> > > > RunningCount, RunningSum
> > > >
> > > > Are these names unambiguous and self explanatory?
> > > >
> > > > Thanks,
> > > > -John
> > > >
> > > > On Tue, Jul 16, 2019 at 12:32 PM Ryanne Dolan <[email protected]
> >
> > > > wrote:
> > > > >
> > > > > > measurements, which decay/expire over time
> > > > >
> > > > > Thanks John for the clarification. This was my (re-)reading of the
> > code,
> > > > > but this is not what I think of when I hear "sampled". In fact,
> > you'll
> > > > > notice that the Wikipedia pages for "Sample (statistics)" and
> "Sample
> > > > > (signal processing)" do not contain the words decay, expire,
> recent,
> > > > > history, or anything similar.
> > > > >
> > > > > Similar to "running", I'd suggest the more correct "moving", as in
> > > > "moving
> > > > > average" and "moving sum", which involve looking back N samples,
> > > > applying a
> > > > > sliding window, decaying over time etc.
> > > > >
> > > > > Ryanne
> > > > >
> > > > > On Tue, Jul 16, 2019, 11:58 AM John Roesler <[email protected]>
> > wrote:
> > > > >
> > > > > > Thanks for raising this concern, Ryanne,
> > > > > >
> > > > > > "Sampled" indicates that the metrics is sampled, namely that we
> > > > > > maintain a set of samples from recent value measurements, which
> > > > > > decay/expire over time. So, the metric value is only
> > representative of
> > > > > > the recent past.
> > > > > >
> > > > > > "Total" indicates that the metric value contains all the
> > information
> > > > > > from the creation of the metric. For example., the total sum
> would
> > > > > > include all measurements since the app started up.
> > > > > >
> > > > > > It seems like your concern is that the word "total" doesn't
> really
> > > > > > pinpoint this meaning, which is true. It's especially confusing
> > that
> > > > > > another meaning of "total" is synonymous with "sum", rendering
> the
> > > > > > name "TotalSum" sort of absurd.
> > > > > >
> > > > > > We previously considered "cumulative", which was rejected as a
> > > > > > mouthful (it's four syllables) .
> > > > > >
> > > > > > You mentioned "running", which might be a more appropriate
> modifier
> > > > > > (RunningSum and RunningCount).
> > > > > >
> > > > > > What would everyone think about that?
> > > > > >
> > > > > > Thanks,
> > > > > > -John
> > > > > >
> > > > > > On Tue, Jul 16, 2019 at 11:27 AM Ryanne Dolan <
> > [email protected]>
> > > > > > wrote:
> > > > > > >
> > > > > > > John, I mentioned on the VOTE thread that the proposed names
> are
> > a
> > > > bit
> > > > > > > confusing,
> > > > > > >
> > > > > > > > given that "sum", "total", and "count" are roughly
> > synonymous...
> > > > > > >
> > > > > > > In particular, TotalSum is, I think, a "running total", though
> > the
> > > > naming
> > > > > > > doesn't really capture that.
> > > > > > >
> > > > > > > I think to avoid confusion, we should define exactly what
> > "total" and
> > > > > > > "sampled" are supposed to indicate, and perhaps pick
> appropriate
> > > > naming
> > > > > > > from there.
> > > > > > >
> > > > > > > Ryanne
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Jul 12, 2019 at 1:42 PM John Roesler <
> [email protected]>
> > > > wrote:
> > > > > > >
> > > > > > > > Hey, thanks Matthias and Bruno,
> > > > > > > >
> > > > > > > > I agree, "Cumulative" is a mouthful. "TotalX" sounds fine to
> > me.
> > > > > > > >
> > > > > > > > Also, yes, I would have liked to not have any modifier for
> > > > > > > > "non-sampled", but there is a name conflict with Sum.
> > > > > > > >
> > > > > > > > I'll update the KIP to reflect "TotalX" and then start the
> vote
> > > > thread.
> > > > > > > >
> > > > > > > > Thanks again,
> > > > > > > > -John
> > > > > > > >
> > > > > > > > On Fri, Jul 12, 2019 at 11:27 AM Bruno Cadonna <
> > [email protected]
> > > > >
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > OK, makes sense. Then, I am in favour of TotalCount and
> > TotalSum.
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Bruno
> > > > > > > > >
> > > > > > > > > On Fri, Jul 12, 2019 at 12:57 AM Matthias J. Sax <
> > > > > > [email protected]>
> > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > `Sum` is an existing name, for the "sampled sum" metric,
> > that
> > > > gets
> > > > > > > > > > deprecated. Hence, we cannot use it.
> > > > > > > > > >
> > > > > > > > > > If we cannot use `Sum` and use `TotalSum`, we should also
> > not
> > > > use
> > > > > > > > > > `Count` but `TotalCount` for consistency.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > -Matthias
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On 7/11/19 12:58 PM, Bruno Cadonna wrote:
> > > > > > > > > > > Hi John,
> > > > > > > > > > >
> > > > > > > > > > > Thank you for the KIP.
> > > > > > > > > > >
> > > > > > > > > > > LGTM
> > > > > > > > > > >
> > > > > > > > > > > I also do not like CumulativeSum/Count so much. I
> > propose to
> > > > just
> > > > > > > > call
> > > > > > > > > > > it Sum and Count.
> > > > > > > > > > >
> > > > > > > > > > > I understand that you want to unequivocally distinguish
> > the
> > > > two
> > > > > > > > metric
> > > > > > > > > > > functions by their names, but I have the feeling the
> > names
> > > > become
> > > > > > > > > > > artificially complex. The exact semantics can also be
> > > > documented
> > > > > > in
> > > > > > > > > > > the javadocs, which btw could also be improved in those
> > > > classes.
> > > > > > > > > > >
> > > > > > > > > > > Best,
> > > > > > > > > > > Bruno
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Thu, Jul 11, 2019 at 8:25 PM Matthias J. Sax <
> > > > > > > > [email protected]> wrote:
> > > > > > > > > > >>
> > > > > > > > > > >> Thanks for the KIP. Overall LGTM.
> > > > > > > > > > >>
> > > > > > > > > > >> The only though I have is, if we may want to use
> > `TotalSum`
> > > > and
> > > > > > > > > > >> `TotalCount` instead of `CumulativeSum/Count` as
> names?
> > > > > > > > > > >>
> > > > > > > > > > >>
> > > > > > > > > > >> -Matthias
> > > > > > > > > > >>
> > > > > > > > > > >>
> > > > > > > > > > >> On 7/11/19 9:31 AM, John Roesler wrote:
> > > > > > > > > > >>> Hi Kafka devs,
> > > > > > > > > > >>>
> > > > > > > > > > >>> I'd like to propose KIP-488 as a minor cleanup of
> some
> > of
> > > > our
> > > > > > > > metric
> > > > > > > > > > >>> implementations.
> > > > > > > > > > >>>
> > > > > > > > > > >>> KIP-488:
> https://cwiki.apache.org/confluence/x/kkAyBw
> > > > > > > > > > >>>
> > > > > > > > > > >>> Over time, iterative updates to these metrics has
> > resulted
> > > > in a
> > > > > > > > pretty
> > > > > > > > > > >>> confusing little collection of classes, and I've
> > personally
> > > > > > been
> > > > > > > > > > >>> involved in three separate moderately time-consuming
> > > > > > iterations of
> > > > > > > > me
> > > > > > > > > > >>> or someone else trying to work out which metrics are
> > > > > > available, and
> > > > > > > > > > >>> which ones are desired for a given use case. One of
> > these
> > > > was
> > > > > > > > actually
> > > > > > > > > > >>> a long-running bug in Kafka Streams' metrics, so not
> > only
> > > > has
> > > > > > this
> > > > > > > > > > >>> confusion been a time sink, but it has also led to
> > bugs.
> > > > > > > > > > >>>
> > > > > > > > > > >>> I'm hoping this change won't be too controversial.
> > > > > > > > > > >>>
> > > > > > > > > > >>> Thanks,
> > > > > > > > > > >>> -John
> > > > > > > > > > >>>
> > > > > > > > > > >>
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > > > >
> > > >
> >
>

Re: [DISCUSS] KIP-488: Clean up Sum,Count,Total Metrics

Reply via email to