Re: [DISCUSS] KIP-786: Emit Metric Client Quota Values

2021-11-04 Thread Mason Legere
Hi All,

Thanks for all the comments and suggestions!  Addressing them in order:


- Have you considered enabling the new metrics by default?
> - If you prefer keeping a configuration to enable them, what about
> renaming it to "client.quota.value.metric.enable" or even
> "quota.value.metric.enable"?


- Initially I wanted to stay away from having them enabled directly having
similar thoughts to what Luke mentioned.  That is, how this code will be
called for every Fetch/Produce API request
along with the fact that for most people's quotas are fairly static.
However, as the overhead is minimal, I have no hard opinions on this.
Additionally, I imagine most users are specifically selecting
which metrics are published anyways within their Prometheus agent config
(or something similar) and to Tom's point, this should also
support filtering.

- I believe I must have made a typo somewhere as
"client.quota.value.metric.enable" sounds more correct to me. Thanks for
the catch -- I will update the KIP.
For the scope of this KIP I preferred to have the additional "client" tag,
as quota values that are not scoped to the client level would not be
emitted.


But I think since the quota value won't change from time to time unless
> admin alter it, it might be waste of resources to record it on each
> produce/fetch API.
> It can alternatively be achieved by using the kafka-configs.sh to describe
> ALL users/clients/default to have a overview of the quota values when
> needed.


This is a good point and something that still bugs me a bit about this
design. It seems "unmetricy" to report just the values - especially when
they are largely static. However, with many clusters and many thousands
of clients, architecting a solution that fits into an already existing
metrics/alerting pipeline becomes quite a bit of work. I agree with Tom's
comment on this.


> In the case it would make more sense to recording the "available
> capacity" that a client has available at a given time as a rate. However,
> in order for the rate to have the correct value some additional work would
> be needed
>
> Could you add some more information about the additional work, and why
> we're rejecting it? This seems like it would be pretty useful
>

Good point, I will update the KIP to give more insight into this. The
difficulty with this is largely related to how brittle the quota
implementation currently is (i.e. quotas are attached to a metric) and
instead of being
able to record `Quota - value` incrementally we would need to aggregate
first for the math to work out. Without a more intrusive refactor this
amounts to adding a new `MeasurableStat` (or something equivalent).
That being said, I moved away from that option given that if you have the
quota value (which is already reported as a rate so the dimensional
analysis works out) then you can easily calculate the "available capacity"
remaining.
To my knowledge most alerting and monitoring platforms provide the
transform to take the difference between two time series so that seemed
like an easier option.

Thanks again for your points, I'll update the KIP!

Mason


On Thu, Nov 4, 2021 at 12:12 PM Bob Barrett
 wrote:

> +1 to Tom's point. Having a metric is a lot more convenient than needing to
> periodically call an API, and anyone who isn't interested in the metric
> should be able to just not collect it.
>
> Thanks for the KIP, Mason! I think this will be very useful.
>
> Under the rejected alternatives, you say the following:
>
> > In the case it would make more sense to recording the "available
> capacity" that a client has available at a given time as a rate. However,
> in order for the rate to have the correct value some additional work would
> be needed
>
> Could you add some more information about the additional work, and why
> we're rejecting it? This seems like it would be pretty useful (I could
> imagine plotting both the current and maximum "available capacity" for a
> client to visualize how much of their quota they are using). Maybe it makes
> sense to defer this work to a later KIP, or maybe it isn't practical at
> all, but it's not currently clear from this KIP why that is.
>
> Thanks,
> Bob
>
> On Thu, Nov 4, 2021 at 4:14 AM Tom Bentley  wrote:
>
> > This is a good point Luke. Unrelatedly, I've considered the value of
> > exposing gauges for the expiry time of SSL certificates, which similarly
> > change rarely. The metrics collected are used to build dashboards, create
> > alerts etc. Thus to have, for example, an alert on approaching
> certificate
> > expiry, a metric has to be collected _somehow_, and if Kafka doesn't
> expose
> > it itself, then it forces people to write, deploy and monitor something
> > which bridges, (e.g.) the admin client and the metric store.
> >
> > I would argue that so long as such nearly-static metrics are in the
> > minority then the cost of collecting and storing them is insignificant
> > compared with the whole set of broker metrics. So on balance it's

Re: [DISCUSS] KIP-786: Emit Metric Client Quota Values

2021-11-04 Thread Bob Barrett
+1 to Tom's point. Having a metric is a lot more convenient than needing to
periodically call an API, and anyone who isn't interested in the metric
should be able to just not collect it.

Thanks for the KIP, Mason! I think this will be very useful.

Under the rejected alternatives, you say the following:

> In the case it would make more sense to recording the "available
capacity" that a client has available at a given time as a rate. However,
in order for the rate to have the correct value some additional work would
be needed

Could you add some more information about the additional work, and why
we're rejecting it? This seems like it would be pretty useful (I could
imagine plotting both the current and maximum "available capacity" for a
client to visualize how much of their quota they are using). Maybe it makes
sense to defer this work to a later KIP, or maybe it isn't practical at
all, but it's not currently clear from this KIP why that is.

Thanks,
Bob

On Thu, Nov 4, 2021 at 4:14 AM Tom Bentley  wrote:

> This is a good point Luke. Unrelatedly, I've considered the value of
> exposing gauges for the expiry time of SSL certificates, which similarly
> change rarely. The metrics collected are used to build dashboards, create
> alerts etc. Thus to have, for example, an alert on approaching certificate
> expiry, a metric has to be collected _somehow_, and if Kafka doesn't expose
> it itself, then it forces people to write, deploy and monitor something
> which bridges, (e.g.) the admin client and the metric store.
>
> I would argue that so long as such nearly-static metrics are in the
> minority then the cost of collecting and storing them is insignificant
> compared with the whole set of broker metrics. So on balance it's probably
> better for Kafka to support them out of the box than for people to have to
> invent their components to get the information to the metric store. Also,
> I'm sure many metric collectors and stores will support filtering, so most
> of the cost can probably be avoided for those who don't want the
> functionality.
>
> Kind regards,
>
> Tom
>
> On Thu, Nov 4, 2021 at 10:25 AM Luke Chen  wrote:
>
> > Hi Mason,
> > Thanks for the KIP.
> > But I think since the quota value won't change from time to time unless
> > admin alter it, it might be waste of resources to record it on each
> > produce/fetch API.
> > It can alternatively be achieved by using the kafka-configs.sh to
> describe
> > ALL users/clients/default to have a overview of the quota values when
> > needed.
> >
> > What do you think?
> >
> > Thank you.
> > Luke,
> >
> >
> > On Wed, Nov 3, 2021 at 10:53 PM Mickael Maison  >
> > wrote:
> >
> > > Hi Mason,
> > >
> > > Thanks for the KIP. I think it's a good idea to also emit quota limits
> > > as metrics. It certainly simplifies monitoring/graphing if all the
> > > data come from the same source.
> > >
> > > The KIP looks good overall, just a couple of questions:
> > > - Have you considered enabling the new metrics by default?
> > > - If you prefer keeping a configuration to enable them, what about
> > > renaming it to "client.quota.value.metric.enable" or even
> > > "quota.value.metric.enable"?
> > >
> > > Thanks,
> > > Mickael
> > >
> > > On Wed, Oct 27, 2021 at 11:36 PM Mason Legere
> > >  wrote:
> > > >
> > > > Hi All,
> > > >
> > > > Haven't received any feedback on this yet but as it was a small
> change
> > > have
> > > > made a PR showing the functional components: pull request
> > > > 
> > > > Will update the related documentation outlining the new metric
> > attributes
> > > > in a bit.
> > > >
> > > > Best,
> > > > Mason Legere
> > > >
> > > > On Sat, Oct 23, 2021 at 4:00 PM Mason Legere <
> > > mason.leg...@salesforce.com>
> > > > wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > I would like to start a discussion for my proposed KIP-786
> > > > > <
> > >
> >
> https://cwiki.apache.org/confluence/pages/resumedraft.action?draftId=191335406=9a2f3d65-5633-47c8-994c-f5a14738cb1e;
> > >
> > > which
> > > > > aims to allow client quota values to be emitted as a standard jmx
> > MBean
> > > > > attribute - if enabled in the static broker configuration.
> > > > >
> > > > > Please note that I originally misnumbered this KIP and am
> re-creating
> > > this
> > > > > discussion thread for clarity. The original thread can be found at:
> > > Original
> > > > > Email Thread
> > > > > <
> > >
> >
> https://lists.apache.org/thread.html/r44e154761f22a42e4766f2098d1e33cb54865311f41648ebd9406a4f%40%3Cdev.kafka.apache.org%3E
> > > >
> > > > >
> > > > > Best,
> > > > > Mason Legere
> > > > >
> > >
> >
>


Re: [DISCUSS] KIP-786: Emit Metric Client Quota Values

2021-11-04 Thread Tom Bentley
This is a good point Luke. Unrelatedly, I've considered the value of
exposing gauges for the expiry time of SSL certificates, which similarly
change rarely. The metrics collected are used to build dashboards, create
alerts etc. Thus to have, for example, an alert on approaching certificate
expiry, a metric has to be collected _somehow_, and if Kafka doesn't expose
it itself, then it forces people to write, deploy and monitor something
which bridges, (e.g.) the admin client and the metric store.

I would argue that so long as such nearly-static metrics are in the
minority then the cost of collecting and storing them is insignificant
compared with the whole set of broker metrics. So on balance it's probably
better for Kafka to support them out of the box than for people to have to
invent their components to get the information to the metric store. Also,
I'm sure many metric collectors and stores will support filtering, so most
of the cost can probably be avoided for those who don't want the
functionality.

Kind regards,

Tom

On Thu, Nov 4, 2021 at 10:25 AM Luke Chen  wrote:

> Hi Mason,
> Thanks for the KIP.
> But I think since the quota value won't change from time to time unless
> admin alter it, it might be waste of resources to record it on each
> produce/fetch API.
> It can alternatively be achieved by using the kafka-configs.sh to describe
> ALL users/clients/default to have a overview of the quota values when
> needed.
>
> What do you think?
>
> Thank you.
> Luke,
>
>
> On Wed, Nov 3, 2021 at 10:53 PM Mickael Maison 
> wrote:
>
> > Hi Mason,
> >
> > Thanks for the KIP. I think it's a good idea to also emit quota limits
> > as metrics. It certainly simplifies monitoring/graphing if all the
> > data come from the same source.
> >
> > The KIP looks good overall, just a couple of questions:
> > - Have you considered enabling the new metrics by default?
> > - If you prefer keeping a configuration to enable them, what about
> > renaming it to "client.quota.value.metric.enable" or even
> > "quota.value.metric.enable"?
> >
> > Thanks,
> > Mickael
> >
> > On Wed, Oct 27, 2021 at 11:36 PM Mason Legere
> >  wrote:
> > >
> > > Hi All,
> > >
> > > Haven't received any feedback on this yet but as it was a small change
> > have
> > > made a PR showing the functional components: pull request
> > > 
> > > Will update the related documentation outlining the new metric
> attributes
> > > in a bit.
> > >
> > > Best,
> > > Mason Legere
> > >
> > > On Sat, Oct 23, 2021 at 4:00 PM Mason Legere <
> > mason.leg...@salesforce.com>
> > > wrote:
> > >
> > > > Hi All,
> > > >
> > > > I would like to start a discussion for my proposed KIP-786
> > > > <
> >
> https://cwiki.apache.org/confluence/pages/resumedraft.action?draftId=191335406=9a2f3d65-5633-47c8-994c-f5a14738cb1e;
> >
> > which
> > > > aims to allow client quota values to be emitted as a standard jmx
> MBean
> > > > attribute - if enabled in the static broker configuration.
> > > >
> > > > Please note that I originally misnumbered this KIP and am re-creating
> > this
> > > > discussion thread for clarity. The original thread can be found at:
> > Original
> > > > Email Thread
> > > > <
> >
> https://lists.apache.org/thread.html/r44e154761f22a42e4766f2098d1e33cb54865311f41648ebd9406a4f%40%3Cdev.kafka.apache.org%3E
> > >
> > > >
> > > > Best,
> > > > Mason Legere
> > > >
> >
>


Re: [DISCUSS] KIP-786: Emit Metric Client Quota Values

2021-11-04 Thread Luke Chen
Hi Mason,
Thanks for the KIP.
But I think since the quota value won't change from time to time unless
admin alter it, it might be waste of resources to record it on each
produce/fetch API.
It can alternatively be achieved by using the kafka-configs.sh to describe
ALL users/clients/default to have a overview of the quota values when
needed.

What do you think?

Thank you.
Luke,


On Wed, Nov 3, 2021 at 10:53 PM Mickael Maison 
wrote:

> Hi Mason,
>
> Thanks for the KIP. I think it's a good idea to also emit quota limits
> as metrics. It certainly simplifies monitoring/graphing if all the
> data come from the same source.
>
> The KIP looks good overall, just a couple of questions:
> - Have you considered enabling the new metrics by default?
> - If you prefer keeping a configuration to enable them, what about
> renaming it to "client.quota.value.metric.enable" or even
> "quota.value.metric.enable"?
>
> Thanks,
> Mickael
>
> On Wed, Oct 27, 2021 at 11:36 PM Mason Legere
>  wrote:
> >
> > Hi All,
> >
> > Haven't received any feedback on this yet but as it was a small change
> have
> > made a PR showing the functional components: pull request
> > 
> > Will update the related documentation outlining the new metric attributes
> > in a bit.
> >
> > Best,
> > Mason Legere
> >
> > On Sat, Oct 23, 2021 at 4:00 PM Mason Legere <
> mason.leg...@salesforce.com>
> > wrote:
> >
> > > Hi All,
> > >
> > > I would like to start a discussion for my proposed KIP-786
> > > <
> https://cwiki.apache.org/confluence/pages/resumedraft.action?draftId=191335406=9a2f3d65-5633-47c8-994c-f5a14738cb1e;>
> which
> > > aims to allow client quota values to be emitted as a standard jmx MBean
> > > attribute - if enabled in the static broker configuration.
> > >
> > > Please note that I originally misnumbered this KIP and am re-creating
> this
> > > discussion thread for clarity. The original thread can be found at:
> Original
> > > Email Thread
> > > <
> https://lists.apache.org/thread.html/r44e154761f22a42e4766f2098d1e33cb54865311f41648ebd9406a4f%40%3Cdev.kafka.apache.org%3E
> >
> > >
> > > Best,
> > > Mason Legere
> > >
>


Re: [DISCUSS] KIP-786: Emit Metric Client Quota Values

2021-11-03 Thread Mickael Maison
Hi Mason,

Thanks for the KIP. I think it's a good idea to also emit quota limits
as metrics. It certainly simplifies monitoring/graphing if all the
data come from the same source.

The KIP looks good overall, just a couple of questions:
- Have you considered enabling the new metrics by default?
- If you prefer keeping a configuration to enable them, what about
renaming it to "client.quota.value.metric.enable" or even
"quota.value.metric.enable"?

Thanks,
Mickael

On Wed, Oct 27, 2021 at 11:36 PM Mason Legere
 wrote:
>
> Hi All,
>
> Haven't received any feedback on this yet but as it was a small change have
> made a PR showing the functional components: pull request
> 
> Will update the related documentation outlining the new metric attributes
> in a bit.
>
> Best,
> Mason Legere
>
> On Sat, Oct 23, 2021 at 4:00 PM Mason Legere 
> wrote:
>
> > Hi All,
> >
> > I would like to start a discussion for my proposed KIP-786
> > 
> >  which
> > aims to allow client quota values to be emitted as a standard jmx MBean
> > attribute - if enabled in the static broker configuration.
> >
> > Please note that I originally misnumbered this KIP and am re-creating this
> > discussion thread for clarity. The original thread can be found at: Original
> > Email Thread
> > 
> >
> > Best,
> > Mason Legere
> >


Re: [DISCUSS] KIP-786: Emit Metric Client Quota Values

2021-10-27 Thread Mason Legere
Hi All,

Haven't received any feedback on this yet but as it was a small change have
made a PR showing the functional components: pull request

Will update the related documentation outlining the new metric attributes
in a bit.

Best,
Mason Legere

On Sat, Oct 23, 2021 at 4:00 PM Mason Legere 
wrote:

> Hi All,
>
> I would like to start a discussion for my proposed KIP-786
> 
>  which
> aims to allow client quota values to be emitted as a standard jmx MBean
> attribute - if enabled in the static broker configuration.
>
> Please note that I originally misnumbered this KIP and am re-creating this
> discussion thread for clarity. The original thread can be found at: Original
> Email Thread
> 
>
> Best,
> Mason Legere
>