Re: [VOTE] KIP-938: Add more metrics for measuring KRaft performance

2023-06-09 Thread Colin McCabe
Hi all,

Thanks for the votes and discussion. I'm going to close the vote.

With a binding +1 from David Arthur, Ron Dagostino, and Luke Chen, and 
non-binding +1s from Divij Vaidya and Igor Soarez, the vote passes.

regards,
Colin


On Thu, Jun 8, 2023, at 20:17, David Arthur wrote:
> Ok, thanks for the explanations. +1 binding from me
>
> -David
>
> On Thu, Jun 8, 2023 at 6:08 PM Colin McCabe  wrote:
>
>> One note. I added ForwardingManager queue metrics. This should be the last
>> addition!
>>
>> best,
>> Colin
>>
>> On Thu, Jun 8, 2023, at 14:47, Colin McCabe wrote:
>> > On Thu, Jun 8, 2023, at 10:00, David Arthur wrote:
>> >> Colin, thanks for the KIP! These all seem like pretty useful additions.
>> A
>> >> few quick questions
>> >>
>> >> 1) Will the value of TimedOutBrokerHeartbeatCount be zero for inactive
>> >> controllers?
>> >
>> > No. It will just stay at whatever it was when the controller became
>> > inactive (or 0 if it was never active).
>> >
>> >> Will the value reset to zero after an election, or only
>> >> process restart?
>> >
>> > Only process restart.
>> >
>> > The reason is that if the metric resets on a new controller being
>> > elected, a lot of metrics collection systems will potentially miss it.
>> > After all, one pattern we've seen is excessive load leading to
>> > excessive controller elections.
>> >
>> >> 2) Does HandleLoadSnapshotCount include the initial load?
>> >
>> > It does count the initial load.
>> >
>> >> I.e., will it always be at least 1
>> >
>> > Well, there isn't an initial load, if the cluster is brand new. :)
>> >
>> > But yes, in general a value of 1 is expected.
>> >
>> > best,
>> > Colin
>> >
>> >>
>> >> Thanks!
>> >> David
>> >>
>> >> On Wed, Jun 7, 2023 at 11:17 PM Luke Chen  wrote:
>> >>
>> >>> Hi Colin,
>> >>>
>> >>> Thanks for the response.
>> >>> I have no more comments.
>> >>> +1 (binding)
>> >>>
>> >>> Luke
>> >>>
>> >>> On Thu, Jun 8, 2023 at 6:02 AM Colin McCabe 
>> wrote:
>> >>> >
>> >>> > Hi Luke,
>> >>> >
>> >>> > Thanks for the review and the suggestion.
>> >>> >
>> >>> > I think we will add more "handling time" metrics later, but for now I
>> >>> don't want to make this KIP any bigger than it is already...
>> >>> >
>> >>> > best,
>> >>> > Colin
>> >>> >
>> >>> >
>> >>> > On Wed, Jun 7, 2023, at 03:12, Luke Chen wrote:
>> >>> > > Hi Colin,
>> >>> > >
>> >>> > > One comment:
>> >>> > > Should we add a metric to record the snapshot handling time?
>> >>> > > Since we know the snapshot loading might take long if the size is
>> huge.
>> >>> > > We might want to know how much time it is processed. WDYT?
>> >>> > >
>> >>> > > No matter you think we need it or not, the KIP LGTM.
>> >>> > > +1 from me.
>> >>> > >
>> >>> > >
>> >>> > > Thank you.
>> >>> > > Luke
>> >>> > >
>> >>> > > On Wed, Jun 7, 2023 at 1:33 PM Colin McCabe 
>> >>> wrote:
>> >>> > >>
>> >>> > >> Hi all,
>> >>> > >>
>> >>> > >> I added two new metrics to the list:
>> >>> > >>
>> >>> > >> * LatestSnapshotGeneratedBytes
>> >>> > >> * LatestSnapshotGeneratedAgeMs
>> >>> > >>
>> >>> > >> These will help monitor the period snapshot generation process.
>> >>> > >>
>> >>> > >> best,
>> >>> > >> Colin
>> >>> > >>
>> >>> > >>
>> >>> > >> On Tue, Jun 6, 2023, at 22:21, Colin McCabe wrote:
>> >>> > >> > Hi Divij,
>> >>> > >> >
>> >>> > >> > Yes, I am referring to the feature level. I changed the
>> description
>> >>> of
>> >>> > >> > CurrentMetadataVersion to reference the feature level
>> specifically.
>> >>> > >> >
>> >>> > >> > best,
>> >>> > >> > Colin
>> >>> > >> >
>> >>> > >> >
>> >>> > >> > On Tue, Jun 6, 2023, at 05:56, Divij Vaidya wrote:
>> >>> > >> >> "Each metadata version has a corresponding integer in the
>> >>> > >> >> MetadataVersion.java file."
>> >>> > >> >>
>> >>> > >> >> Please correct me if I'm wrong, but are you referring to
>> >>> "featureLevel"
>> >>> > >> >> in
>> >>> > >> >> the enum at
>> >>> > >> >>
>> >>>
>> https://github.com/apache/kafka/blob/trunk/server-common/src/main/java/org/apache/kafka/server/common/MetadataVersion.java#L45
>> >>> > >> >> ? Is yes, can we please update the description of the metric to
>> >>> make it
>> >>> > >> >> easier for the users to understand this? For example, we can
>> say,
>> >>> > >> >> "Represents the current metadata version as an integer value.
>> See
>> >>> > >> >> MetadataVersion (hyperlink) for a mapping between string and
>> >>> integer
>> >>> > >> >> formats of metadata version".
>> >>> > >> >>
>> >>> > >> >> --
>> >>> > >> >> Divij Vaidya
>> >>> > >> >>
>> >>> > >> >>
>> >>> > >> >>
>> >>> > >> >> On Tue, Jun 6, 2023 at 1:51 PM Ron Dagostino <
>> rndg...@gmail.com>
>> >>> wrote:
>> >>> > >> >>
>> >>> > >> >>> Thanks again for the KIP, Colin.  +1 (binding).
>> >>> > >> >>>
>> >>> > >> >>> Ron
>> >>> > >> >>>
>> >>> > >> >>> > On Jun 6, 2023, at 7:02 AM, Igor Soarez
>> >>> 
>> >>> > >> >>> wrote:
>> >>> > >> >>> >
>> >>> > >> >>> > Thanks for the KIP.
>> >>> > >> >>> >
>> >>> > >> >>> > Seems straightforward, LGTM.
>> >>> > 

Re: [VOTE] KIP-938: Add more metrics for measuring KRaft performance

2023-06-08 Thread David Arthur
Ok, thanks for the explanations. +1 binding from me

-David

On Thu, Jun 8, 2023 at 6:08 PM Colin McCabe  wrote:

> One note. I added ForwardingManager queue metrics. This should be the last
> addition!
>
> best,
> Colin
>
> On Thu, Jun 8, 2023, at 14:47, Colin McCabe wrote:
> > On Thu, Jun 8, 2023, at 10:00, David Arthur wrote:
> >> Colin, thanks for the KIP! These all seem like pretty useful additions.
> A
> >> few quick questions
> >>
> >> 1) Will the value of TimedOutBrokerHeartbeatCount be zero for inactive
> >> controllers?
> >
> > No. It will just stay at whatever it was when the controller became
> > inactive (or 0 if it was never active).
> >
> >> Will the value reset to zero after an election, or only
> >> process restart?
> >
> > Only process restart.
> >
> > The reason is that if the metric resets on a new controller being
> > elected, a lot of metrics collection systems will potentially miss it.
> > After all, one pattern we've seen is excessive load leading to
> > excessive controller elections.
> >
> >> 2) Does HandleLoadSnapshotCount include the initial load?
> >
> > It does count the initial load.
> >
> >> I.e., will it always be at least 1
> >
> > Well, there isn't an initial load, if the cluster is brand new. :)
> >
> > But yes, in general a value of 1 is expected.
> >
> > best,
> > Colin
> >
> >>
> >> Thanks!
> >> David
> >>
> >> On Wed, Jun 7, 2023 at 11:17 PM Luke Chen  wrote:
> >>
> >>> Hi Colin,
> >>>
> >>> Thanks for the response.
> >>> I have no more comments.
> >>> +1 (binding)
> >>>
> >>> Luke
> >>>
> >>> On Thu, Jun 8, 2023 at 6:02 AM Colin McCabe 
> wrote:
> >>> >
> >>> > Hi Luke,
> >>> >
> >>> > Thanks for the review and the suggestion.
> >>> >
> >>> > I think we will add more "handling time" metrics later, but for now I
> >>> don't want to make this KIP any bigger than it is already...
> >>> >
> >>> > best,
> >>> > Colin
> >>> >
> >>> >
> >>> > On Wed, Jun 7, 2023, at 03:12, Luke Chen wrote:
> >>> > > Hi Colin,
> >>> > >
> >>> > > One comment:
> >>> > > Should we add a metric to record the snapshot handling time?
> >>> > > Since we know the snapshot loading might take long if the size is
> huge.
> >>> > > We might want to know how much time it is processed. WDYT?
> >>> > >
> >>> > > No matter you think we need it or not, the KIP LGTM.
> >>> > > +1 from me.
> >>> > >
> >>> > >
> >>> > > Thank you.
> >>> > > Luke
> >>> > >
> >>> > > On Wed, Jun 7, 2023 at 1:33 PM Colin McCabe 
> >>> wrote:
> >>> > >>
> >>> > >> Hi all,
> >>> > >>
> >>> > >> I added two new metrics to the list:
> >>> > >>
> >>> > >> * LatestSnapshotGeneratedBytes
> >>> > >> * LatestSnapshotGeneratedAgeMs
> >>> > >>
> >>> > >> These will help monitor the period snapshot generation process.
> >>> > >>
> >>> > >> best,
> >>> > >> Colin
> >>> > >>
> >>> > >>
> >>> > >> On Tue, Jun 6, 2023, at 22:21, Colin McCabe wrote:
> >>> > >> > Hi Divij,
> >>> > >> >
> >>> > >> > Yes, I am referring to the feature level. I changed the
> description
> >>> of
> >>> > >> > CurrentMetadataVersion to reference the feature level
> specifically.
> >>> > >> >
> >>> > >> > best,
> >>> > >> > Colin
> >>> > >> >
> >>> > >> >
> >>> > >> > On Tue, Jun 6, 2023, at 05:56, Divij Vaidya wrote:
> >>> > >> >> "Each metadata version has a corresponding integer in the
> >>> > >> >> MetadataVersion.java file."
> >>> > >> >>
> >>> > >> >> Please correct me if I'm wrong, but are you referring to
> >>> "featureLevel"
> >>> > >> >> in
> >>> > >> >> the enum at
> >>> > >> >>
> >>>
> https://github.com/apache/kafka/blob/trunk/server-common/src/main/java/org/apache/kafka/server/common/MetadataVersion.java#L45
> >>> > >> >> ? Is yes, can we please update the description of the metric to
> >>> make it
> >>> > >> >> easier for the users to understand this? For example, we can
> say,
> >>> > >> >> "Represents the current metadata version as an integer value.
> See
> >>> > >> >> MetadataVersion (hyperlink) for a mapping between string and
> >>> integer
> >>> > >> >> formats of metadata version".
> >>> > >> >>
> >>> > >> >> --
> >>> > >> >> Divij Vaidya
> >>> > >> >>
> >>> > >> >>
> >>> > >> >>
> >>> > >> >> On Tue, Jun 6, 2023 at 1:51 PM Ron Dagostino <
> rndg...@gmail.com>
> >>> wrote:
> >>> > >> >>
> >>> > >> >>> Thanks again for the KIP, Colin.  +1 (binding).
> >>> > >> >>>
> >>> > >> >>> Ron
> >>> > >> >>>
> >>> > >> >>> > On Jun 6, 2023, at 7:02 AM, Igor Soarez
> >>> 
> >>> > >> >>> wrote:
> >>> > >> >>> >
> >>> > >> >>> > Thanks for the KIP.
> >>> > >> >>> >
> >>> > >> >>> > Seems straightforward, LGTM.
> >>> > >> >>> > Non binding +1.
> >>> > >> >>> >
> >>> > >> >>> > --
> >>> > >> >>> > Igor
> >>> > >> >>> >
> >>> > >> >>>
> >>>
>


-- 
-David


Re: [VOTE] KIP-938: Add more metrics for measuring KRaft performance

2023-06-08 Thread Colin McCabe
One note. I added ForwardingManager queue metrics. This should be the last 
addition!

best,
Colin

On Thu, Jun 8, 2023, at 14:47, Colin McCabe wrote:
> On Thu, Jun 8, 2023, at 10:00, David Arthur wrote:
>> Colin, thanks for the KIP! These all seem like pretty useful additions. A
>> few quick questions
>>
>> 1) Will the value of TimedOutBrokerHeartbeatCount be zero for inactive
>> controllers?
>
> No. It will just stay at whatever it was when the controller became 
> inactive (or 0 if it was never active).
>
>> Will the value reset to zero after an election, or only
>> process restart?
>
> Only process restart.
>
> The reason is that if the metric resets on a new controller being 
> elected, a lot of metrics collection systems will potentially miss it. 
> After all, one pattern we've seen is excessive load leading to 
> excessive controller elections.
>
>> 2) Does HandleLoadSnapshotCount include the initial load?
>
> It does count the initial load.
>
>> I.e., will it always be at least 1
>
> Well, there isn't an initial load, if the cluster is brand new. :)
>
> But yes, in general a value of 1 is expected.
>
> best,
> Colin
>
>>
>> Thanks!
>> David
>>
>> On Wed, Jun 7, 2023 at 11:17 PM Luke Chen  wrote:
>>
>>> Hi Colin,
>>>
>>> Thanks for the response.
>>> I have no more comments.
>>> +1 (binding)
>>>
>>> Luke
>>>
>>> On Thu, Jun 8, 2023 at 6:02 AM Colin McCabe  wrote:
>>> >
>>> > Hi Luke,
>>> >
>>> > Thanks for the review and the suggestion.
>>> >
>>> > I think we will add more "handling time" metrics later, but for now I
>>> don't want to make this KIP any bigger than it is already...
>>> >
>>> > best,
>>> > Colin
>>> >
>>> >
>>> > On Wed, Jun 7, 2023, at 03:12, Luke Chen wrote:
>>> > > Hi Colin,
>>> > >
>>> > > One comment:
>>> > > Should we add a metric to record the snapshot handling time?
>>> > > Since we know the snapshot loading might take long if the size is huge.
>>> > > We might want to know how much time it is processed. WDYT?
>>> > >
>>> > > No matter you think we need it or not, the KIP LGTM.
>>> > > +1 from me.
>>> > >
>>> > >
>>> > > Thank you.
>>> > > Luke
>>> > >
>>> > > On Wed, Jun 7, 2023 at 1:33 PM Colin McCabe 
>>> wrote:
>>> > >>
>>> > >> Hi all,
>>> > >>
>>> > >> I added two new metrics to the list:
>>> > >>
>>> > >> * LatestSnapshotGeneratedBytes
>>> > >> * LatestSnapshotGeneratedAgeMs
>>> > >>
>>> > >> These will help monitor the period snapshot generation process.
>>> > >>
>>> > >> best,
>>> > >> Colin
>>> > >>
>>> > >>
>>> > >> On Tue, Jun 6, 2023, at 22:21, Colin McCabe wrote:
>>> > >> > Hi Divij,
>>> > >> >
>>> > >> > Yes, I am referring to the feature level. I changed the description
>>> of
>>> > >> > CurrentMetadataVersion to reference the feature level specifically.
>>> > >> >
>>> > >> > best,
>>> > >> > Colin
>>> > >> >
>>> > >> >
>>> > >> > On Tue, Jun 6, 2023, at 05:56, Divij Vaidya wrote:
>>> > >> >> "Each metadata version has a corresponding integer in the
>>> > >> >> MetadataVersion.java file."
>>> > >> >>
>>> > >> >> Please correct me if I'm wrong, but are you referring to
>>> "featureLevel"
>>> > >> >> in
>>> > >> >> the enum at
>>> > >> >>
>>> https://github.com/apache/kafka/blob/trunk/server-common/src/main/java/org/apache/kafka/server/common/MetadataVersion.java#L45
>>> > >> >> ? Is yes, can we please update the description of the metric to
>>> make it
>>> > >> >> easier for the users to understand this? For example, we can say,
>>> > >> >> "Represents the current metadata version as an integer value. See
>>> > >> >> MetadataVersion (hyperlink) for a mapping between string and
>>> integer
>>> > >> >> formats of metadata version".
>>> > >> >>
>>> > >> >> --
>>> > >> >> Divij Vaidya
>>> > >> >>
>>> > >> >>
>>> > >> >>
>>> > >> >> On Tue, Jun 6, 2023 at 1:51 PM Ron Dagostino 
>>> wrote:
>>> > >> >>
>>> > >> >>> Thanks again for the KIP, Colin.  +1 (binding).
>>> > >> >>>
>>> > >> >>> Ron
>>> > >> >>>
>>> > >> >>> > On Jun 6, 2023, at 7:02 AM, Igor Soarez
>>> 
>>> > >> >>> wrote:
>>> > >> >>> >
>>> > >> >>> > Thanks for the KIP.
>>> > >> >>> >
>>> > >> >>> > Seems straightforward, LGTM.
>>> > >> >>> > Non binding +1.
>>> > >> >>> >
>>> > >> >>> > --
>>> > >> >>> > Igor
>>> > >> >>> >
>>> > >> >>>
>>>


Re: [VOTE] KIP-938: Add more metrics for measuring KRaft performance

2023-06-08 Thread Colin McCabe
On Thu, Jun 8, 2023, at 10:00, David Arthur wrote:
> Colin, thanks for the KIP! These all seem like pretty useful additions. A
> few quick questions
>
> 1) Will the value of TimedOutBrokerHeartbeatCount be zero for inactive
> controllers?

No. It will just stay at whatever it was when the controller became inactive 
(or 0 if it was never active).

> Will the value reset to zero after an election, or only
> process restart?

Only process restart.

The reason is that if the metric resets on a new controller being elected, a 
lot of metrics collection systems will potentially miss it. After all, one 
pattern we've seen is excessive load leading to excessive controller elections.

> 2) Does HandleLoadSnapshotCount include the initial load?

It does count the initial load.

> I.e., will it always be at least 1

Well, there isn't an initial load, if the cluster is brand new. :)

But yes, in general a value of 1 is expected.

best,
Colin

>
> Thanks!
> David
>
> On Wed, Jun 7, 2023 at 11:17 PM Luke Chen  wrote:
>
>> Hi Colin,
>>
>> Thanks for the response.
>> I have no more comments.
>> +1 (binding)
>>
>> Luke
>>
>> On Thu, Jun 8, 2023 at 6:02 AM Colin McCabe  wrote:
>> >
>> > Hi Luke,
>> >
>> > Thanks for the review and the suggestion.
>> >
>> > I think we will add more "handling time" metrics later, but for now I
>> don't want to make this KIP any bigger than it is already...
>> >
>> > best,
>> > Colin
>> >
>> >
>> > On Wed, Jun 7, 2023, at 03:12, Luke Chen wrote:
>> > > Hi Colin,
>> > >
>> > > One comment:
>> > > Should we add a metric to record the snapshot handling time?
>> > > Since we know the snapshot loading might take long if the size is huge.
>> > > We might want to know how much time it is processed. WDYT?
>> > >
>> > > No matter you think we need it or not, the KIP LGTM.
>> > > +1 from me.
>> > >
>> > >
>> > > Thank you.
>> > > Luke
>> > >
>> > > On Wed, Jun 7, 2023 at 1:33 PM Colin McCabe 
>> wrote:
>> > >>
>> > >> Hi all,
>> > >>
>> > >> I added two new metrics to the list:
>> > >>
>> > >> * LatestSnapshotGeneratedBytes
>> > >> * LatestSnapshotGeneratedAgeMs
>> > >>
>> > >> These will help monitor the period snapshot generation process.
>> > >>
>> > >> best,
>> > >> Colin
>> > >>
>> > >>
>> > >> On Tue, Jun 6, 2023, at 22:21, Colin McCabe wrote:
>> > >> > Hi Divij,
>> > >> >
>> > >> > Yes, I am referring to the feature level. I changed the description
>> of
>> > >> > CurrentMetadataVersion to reference the feature level specifically.
>> > >> >
>> > >> > best,
>> > >> > Colin
>> > >> >
>> > >> >
>> > >> > On Tue, Jun 6, 2023, at 05:56, Divij Vaidya wrote:
>> > >> >> "Each metadata version has a corresponding integer in the
>> > >> >> MetadataVersion.java file."
>> > >> >>
>> > >> >> Please correct me if I'm wrong, but are you referring to
>> "featureLevel"
>> > >> >> in
>> > >> >> the enum at
>> > >> >>
>> https://github.com/apache/kafka/blob/trunk/server-common/src/main/java/org/apache/kafka/server/common/MetadataVersion.java#L45
>> > >> >> ? Is yes, can we please update the description of the metric to
>> make it
>> > >> >> easier for the users to understand this? For example, we can say,
>> > >> >> "Represents the current metadata version as an integer value. See
>> > >> >> MetadataVersion (hyperlink) for a mapping between string and
>> integer
>> > >> >> formats of metadata version".
>> > >> >>
>> > >> >> --
>> > >> >> Divij Vaidya
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >> On Tue, Jun 6, 2023 at 1:51 PM Ron Dagostino 
>> wrote:
>> > >> >>
>> > >> >>> Thanks again for the KIP, Colin.  +1 (binding).
>> > >> >>>
>> > >> >>> Ron
>> > >> >>>
>> > >> >>> > On Jun 6, 2023, at 7:02 AM, Igor Soarez
>> 
>> > >> >>> wrote:
>> > >> >>> >
>> > >> >>> > Thanks for the KIP.
>> > >> >>> >
>> > >> >>> > Seems straightforward, LGTM.
>> > >> >>> > Non binding +1.
>> > >> >>> >
>> > >> >>> > --
>> > >> >>> > Igor
>> > >> >>> >
>> > >> >>>
>>


Re: [VOTE] KIP-938: Add more metrics for measuring KRaft performance

2023-06-08 Thread David Arthur
Colin, thanks for the KIP! These all seem like pretty useful additions. A
few quick questions

1) Will the value of TimedOutBrokerHeartbeatCount be zero for inactive
controllers? Will the value reset to zero after an election, or only
process restart?
2) Does HandleLoadSnapshotCount include the initial load? I.e., will it
always be at least 1

Thanks!
David

On Wed, Jun 7, 2023 at 11:17 PM Luke Chen  wrote:

> Hi Colin,
>
> Thanks for the response.
> I have no more comments.
> +1 (binding)
>
> Luke
>
> On Thu, Jun 8, 2023 at 6:02 AM Colin McCabe  wrote:
> >
> > Hi Luke,
> >
> > Thanks for the review and the suggestion.
> >
> > I think we will add more "handling time" metrics later, but for now I
> don't want to make this KIP any bigger than it is already...
> >
> > best,
> > Colin
> >
> >
> > On Wed, Jun 7, 2023, at 03:12, Luke Chen wrote:
> > > Hi Colin,
> > >
> > > One comment:
> > > Should we add a metric to record the snapshot handling time?
> > > Since we know the snapshot loading might take long if the size is huge.
> > > We might want to know how much time it is processed. WDYT?
> > >
> > > No matter you think we need it or not, the KIP LGTM.
> > > +1 from me.
> > >
> > >
> > > Thank you.
> > > Luke
> > >
> > > On Wed, Jun 7, 2023 at 1:33 PM Colin McCabe 
> wrote:
> > >>
> > >> Hi all,
> > >>
> > >> I added two new metrics to the list:
> > >>
> > >> * LatestSnapshotGeneratedBytes
> > >> * LatestSnapshotGeneratedAgeMs
> > >>
> > >> These will help monitor the period snapshot generation process.
> > >>
> > >> best,
> > >> Colin
> > >>
> > >>
> > >> On Tue, Jun 6, 2023, at 22:21, Colin McCabe wrote:
> > >> > Hi Divij,
> > >> >
> > >> > Yes, I am referring to the feature level. I changed the description
> of
> > >> > CurrentMetadataVersion to reference the feature level specifically.
> > >> >
> > >> > best,
> > >> > Colin
> > >> >
> > >> >
> > >> > On Tue, Jun 6, 2023, at 05:56, Divij Vaidya wrote:
> > >> >> "Each metadata version has a corresponding integer in the
> > >> >> MetadataVersion.java file."
> > >> >>
> > >> >> Please correct me if I'm wrong, but are you referring to
> "featureLevel"
> > >> >> in
> > >> >> the enum at
> > >> >>
> https://github.com/apache/kafka/blob/trunk/server-common/src/main/java/org/apache/kafka/server/common/MetadataVersion.java#L45
> > >> >> ? Is yes, can we please update the description of the metric to
> make it
> > >> >> easier for the users to understand this? For example, we can say,
> > >> >> "Represents the current metadata version as an integer value. See
> > >> >> MetadataVersion (hyperlink) for a mapping between string and
> integer
> > >> >> formats of metadata version".
> > >> >>
> > >> >> --
> > >> >> Divij Vaidya
> > >> >>
> > >> >>
> > >> >>
> > >> >> On Tue, Jun 6, 2023 at 1:51 PM Ron Dagostino 
> wrote:
> > >> >>
> > >> >>> Thanks again for the KIP, Colin.  +1 (binding).
> > >> >>>
> > >> >>> Ron
> > >> >>>
> > >> >>> > On Jun 6, 2023, at 7:02 AM, Igor Soarez
> 
> > >> >>> wrote:
> > >> >>> >
> > >> >>> > Thanks for the KIP.
> > >> >>> >
> > >> >>> > Seems straightforward, LGTM.
> > >> >>> > Non binding +1.
> > >> >>> >
> > >> >>> > --
> > >> >>> > Igor
> > >> >>> >
> > >> >>>
>


Re: [VOTE] KIP-938: Add more metrics for measuring KRaft performance

2023-06-07 Thread Luke Chen
Hi Colin,

Thanks for the response.
I have no more comments.
+1 (binding)

Luke

On Thu, Jun 8, 2023 at 6:02 AM Colin McCabe  wrote:
>
> Hi Luke,
>
> Thanks for the review and the suggestion.
>
> I think we will add more "handling time" metrics later, but for now I don't 
> want to make this KIP any bigger than it is already...
>
> best,
> Colin
>
>
> On Wed, Jun 7, 2023, at 03:12, Luke Chen wrote:
> > Hi Colin,
> >
> > One comment:
> > Should we add a metric to record the snapshot handling time?
> > Since we know the snapshot loading might take long if the size is huge.
> > We might want to know how much time it is processed. WDYT?
> >
> > No matter you think we need it or not, the KIP LGTM.
> > +1 from me.
> >
> >
> > Thank you.
> > Luke
> >
> > On Wed, Jun 7, 2023 at 1:33 PM Colin McCabe  wrote:
> >>
> >> Hi all,
> >>
> >> I added two new metrics to the list:
> >>
> >> * LatestSnapshotGeneratedBytes
> >> * LatestSnapshotGeneratedAgeMs
> >>
> >> These will help monitor the period snapshot generation process.
> >>
> >> best,
> >> Colin
> >>
> >>
> >> On Tue, Jun 6, 2023, at 22:21, Colin McCabe wrote:
> >> > Hi Divij,
> >> >
> >> > Yes, I am referring to the feature level. I changed the description of
> >> > CurrentMetadataVersion to reference the feature level specifically.
> >> >
> >> > best,
> >> > Colin
> >> >
> >> >
> >> > On Tue, Jun 6, 2023, at 05:56, Divij Vaidya wrote:
> >> >> "Each metadata version has a corresponding integer in the
> >> >> MetadataVersion.java file."
> >> >>
> >> >> Please correct me if I'm wrong, but are you referring to "featureLevel"
> >> >> in
> >> >> the enum at
> >> >> https://github.com/apache/kafka/blob/trunk/server-common/src/main/java/org/apache/kafka/server/common/MetadataVersion.java#L45
> >> >> ? Is yes, can we please update the description of the metric to make it
> >> >> easier for the users to understand this? For example, we can say,
> >> >> "Represents the current metadata version as an integer value. See
> >> >> MetadataVersion (hyperlink) for a mapping between string and integer
> >> >> formats of metadata version".
> >> >>
> >> >> --
> >> >> Divij Vaidya
> >> >>
> >> >>
> >> >>
> >> >> On Tue, Jun 6, 2023 at 1:51 PM Ron Dagostino  wrote:
> >> >>
> >> >>> Thanks again for the KIP, Colin.  +1 (binding).
> >> >>>
> >> >>> Ron
> >> >>>
> >> >>> > On Jun 6, 2023, at 7:02 AM, Igor Soarez 
> >> >>> wrote:
> >> >>> >
> >> >>> > Thanks for the KIP.
> >> >>> >
> >> >>> > Seems straightforward, LGTM.
> >> >>> > Non binding +1.
> >> >>> >
> >> >>> > --
> >> >>> > Igor
> >> >>> >
> >> >>>


Re: [VOTE] KIP-938: Add more metrics for measuring KRaft performance

2023-06-07 Thread Colin McCabe
Hi Luke,

Thanks for the review and the suggestion.

I think we will add more "handling time" metrics later, but for now I don't 
want to make this KIP any bigger than it is already...

best,
Colin


On Wed, Jun 7, 2023, at 03:12, Luke Chen wrote:
> Hi Colin,
>
> One comment:
> Should we add a metric to record the snapshot handling time?
> Since we know the snapshot loading might take long if the size is huge.
> We might want to know how much time it is processed. WDYT?
>
> No matter you think we need it or not, the KIP LGTM.
> +1 from me.
>
>
> Thank you.
> Luke
>
> On Wed, Jun 7, 2023 at 1:33 PM Colin McCabe  wrote:
>>
>> Hi all,
>>
>> I added two new metrics to the list:
>>
>> * LatestSnapshotGeneratedBytes
>> * LatestSnapshotGeneratedAgeMs
>>
>> These will help monitor the period snapshot generation process.
>>
>> best,
>> Colin
>>
>>
>> On Tue, Jun 6, 2023, at 22:21, Colin McCabe wrote:
>> > Hi Divij,
>> >
>> > Yes, I am referring to the feature level. I changed the description of
>> > CurrentMetadataVersion to reference the feature level specifically.
>> >
>> > best,
>> > Colin
>> >
>> >
>> > On Tue, Jun 6, 2023, at 05:56, Divij Vaidya wrote:
>> >> "Each metadata version has a corresponding integer in the
>> >> MetadataVersion.java file."
>> >>
>> >> Please correct me if I'm wrong, but are you referring to "featureLevel"
>> >> in
>> >> the enum at
>> >> https://github.com/apache/kafka/blob/trunk/server-common/src/main/java/org/apache/kafka/server/common/MetadataVersion.java#L45
>> >> ? Is yes, can we please update the description of the metric to make it
>> >> easier for the users to understand this? For example, we can say,
>> >> "Represents the current metadata version as an integer value. See
>> >> MetadataVersion (hyperlink) for a mapping between string and integer
>> >> formats of metadata version".
>> >>
>> >> --
>> >> Divij Vaidya
>> >>
>> >>
>> >>
>> >> On Tue, Jun 6, 2023 at 1:51 PM Ron Dagostino  wrote:
>> >>
>> >>> Thanks again for the KIP, Colin.  +1 (binding).
>> >>>
>> >>> Ron
>> >>>
>> >>> > On Jun 6, 2023, at 7:02 AM, Igor Soarez 
>> >>> wrote:
>> >>> >
>> >>> > Thanks for the KIP.
>> >>> >
>> >>> > Seems straightforward, LGTM.
>> >>> > Non binding +1.
>> >>> >
>> >>> > --
>> >>> > Igor
>> >>> >
>> >>>


Re: [VOTE] KIP-938: Add more metrics for measuring KRaft performance

2023-06-07 Thread Divij Vaidya
"Yes, I am referring to the feature level. I changed the description of
CurrentMetadataVersion to reference the feature level specifically."

Thanks Colin. I have reviewed the KIP after the latest changes (including
addition of two new metrics). It looks good to me.

+1 (non-binding)

--
Divij Vaidya



On Wed, Jun 7, 2023 at 12:12 PM Luke Chen  wrote:

> Hi Colin,
>
> One comment:
> Should we add a metric to record the snapshot handling time?
> Since we know the snapshot loading might take long if the size is huge.
> We might want to know how much time it is processed. WDYT?
>
> No matter you think we need it or not, the KIP LGTM.
> +1 from me.
>
>
> Thank you.
> Luke
>
> On Wed, Jun 7, 2023 at 1:33 PM Colin McCabe  wrote:
> >
> > Hi all,
> >
> > I added two new metrics to the list:
> >
> > * LatestSnapshotGeneratedBytes
> > * LatestSnapshotGeneratedAgeMs
> >
> > These will help monitor the period snapshot generation process.
> >
> > best,
> > Colin
> >
> >
> > On Tue, Jun 6, 2023, at 22:21, Colin McCabe wrote:
> > > Hi Divij,
> > >
> > > Yes, I am referring to the feature level. I changed the description of
> > > CurrentMetadataVersion to reference the feature level specifically.
> > >
> > > best,
> > > Colin
> > >
> > >
> > > On Tue, Jun 6, 2023, at 05:56, Divij Vaidya wrote:
> > >> "Each metadata version has a corresponding integer in the
> > >> MetadataVersion.java file."
> > >>
> > >> Please correct me if I'm wrong, but are you referring to
> "featureLevel"
> > >> in
> > >> the enum at
> > >>
> https://github.com/apache/kafka/blob/trunk/server-common/src/main/java/org/apache/kafka/server/common/MetadataVersion.java#L45
> > >> ? Is yes, can we please update the description of the metric to make
> it
> > >> easier for the users to understand this? For example, we can say,
> > >> "Represents the current metadata version as an integer value. See
> > >> MetadataVersion (hyperlink) for a mapping between string and integer
> > >> formats of metadata version".
> > >>
> > >> --
> > >> Divij Vaidya
> > >>
> > >>
> > >>
> > >> On Tue, Jun 6, 2023 at 1:51 PM Ron Dagostino 
> wrote:
> > >>
> > >>> Thanks again for the KIP, Colin.  +1 (binding).
> > >>>
> > >>> Ron
> > >>>
> > >>> > On Jun 6, 2023, at 7:02 AM, Igor Soarez 
> > >>> wrote:
> > >>> >
> > >>> > Thanks for the KIP.
> > >>> >
> > >>> > Seems straightforward, LGTM.
> > >>> > Non binding +1.
> > >>> >
> > >>> > --
> > >>> > Igor
> > >>> >
> > >>>
>


Re: [VOTE] KIP-938: Add more metrics for measuring KRaft performance

2023-06-07 Thread Luke Chen
Hi Colin,

One comment:
Should we add a metric to record the snapshot handling time?
Since we know the snapshot loading might take long if the size is huge.
We might want to know how much time it is processed. WDYT?

No matter you think we need it or not, the KIP LGTM.
+1 from me.


Thank you.
Luke

On Wed, Jun 7, 2023 at 1:33 PM Colin McCabe  wrote:
>
> Hi all,
>
> I added two new metrics to the list:
>
> * LatestSnapshotGeneratedBytes
> * LatestSnapshotGeneratedAgeMs
>
> These will help monitor the period snapshot generation process.
>
> best,
> Colin
>
>
> On Tue, Jun 6, 2023, at 22:21, Colin McCabe wrote:
> > Hi Divij,
> >
> > Yes, I am referring to the feature level. I changed the description of
> > CurrentMetadataVersion to reference the feature level specifically.
> >
> > best,
> > Colin
> >
> >
> > On Tue, Jun 6, 2023, at 05:56, Divij Vaidya wrote:
> >> "Each metadata version has a corresponding integer in the
> >> MetadataVersion.java file."
> >>
> >> Please correct me if I'm wrong, but are you referring to "featureLevel"
> >> in
> >> the enum at
> >> https://github.com/apache/kafka/blob/trunk/server-common/src/main/java/org/apache/kafka/server/common/MetadataVersion.java#L45
> >> ? Is yes, can we please update the description of the metric to make it
> >> easier for the users to understand this? For example, we can say,
> >> "Represents the current metadata version as an integer value. See
> >> MetadataVersion (hyperlink) for a mapping between string and integer
> >> formats of metadata version".
> >>
> >> --
> >> Divij Vaidya
> >>
> >>
> >>
> >> On Tue, Jun 6, 2023 at 1:51 PM Ron Dagostino  wrote:
> >>
> >>> Thanks again for the KIP, Colin.  +1 (binding).
> >>>
> >>> Ron
> >>>
> >>> > On Jun 6, 2023, at 7:02 AM, Igor Soarez 
> >>> wrote:
> >>> >
> >>> > Thanks for the KIP.
> >>> >
> >>> > Seems straightforward, LGTM.
> >>> > Non binding +1.
> >>> >
> >>> > --
> >>> > Igor
> >>> >
> >>>


Re: [VOTE] KIP-938: Add more metrics for measuring KRaft performance

2023-06-06 Thread Colin McCabe
Hi all,

I added two new metrics to the list:

* LatestSnapshotGeneratedBytes
* LatestSnapshotGeneratedAgeMs

These will help monitor the period snapshot generation process.

best,
Colin


On Tue, Jun 6, 2023, at 22:21, Colin McCabe wrote:
> Hi Divij,
>
> Yes, I am referring to the feature level. I changed the description of 
> CurrentMetadataVersion to reference the feature level specifically.
>
> best,
> Colin
>
>
> On Tue, Jun 6, 2023, at 05:56, Divij Vaidya wrote:
>> "Each metadata version has a corresponding integer in the
>> MetadataVersion.java file."
>>
>> Please correct me if I'm wrong, but are you referring to "featureLevel" 
>> in
>> the enum at
>> https://github.com/apache/kafka/blob/trunk/server-common/src/main/java/org/apache/kafka/server/common/MetadataVersion.java#L45
>> ? Is yes, can we please update the description of the metric to make it
>> easier for the users to understand this? For example, we can say,
>> "Represents the current metadata version as an integer value. See
>> MetadataVersion (hyperlink) for a mapping between string and integer
>> formats of metadata version".
>>
>> --
>> Divij Vaidya
>>
>>
>>
>> On Tue, Jun 6, 2023 at 1:51 PM Ron Dagostino  wrote:
>>
>>> Thanks again for the KIP, Colin.  +1 (binding).
>>>
>>> Ron
>>>
>>> > On Jun 6, 2023, at 7:02 AM, Igor Soarez 
>>> wrote:
>>> >
>>> > Thanks for the KIP.
>>> >
>>> > Seems straightforward, LGTM.
>>> > Non binding +1.
>>> >
>>> > --
>>> > Igor
>>> >
>>>


Re: [VOTE] KIP-938: Add more metrics for measuring KRaft performance

2023-06-06 Thread Colin McCabe
Hi Divij,

Yes, I am referring to the feature level. I changed the description of 
CurrentMetadataVersion to reference the feature level specifically.

best,
Colin


On Tue, Jun 6, 2023, at 05:56, Divij Vaidya wrote:
> "Each metadata version has a corresponding integer in the
> MetadataVersion.java file."
>
> Please correct me if I'm wrong, but are you referring to "featureLevel" 
> in
> the enum at
> https://github.com/apache/kafka/blob/trunk/server-common/src/main/java/org/apache/kafka/server/common/MetadataVersion.java#L45
> ? Is yes, can we please update the description of the metric to make it
> easier for the users to understand this? For example, we can say,
> "Represents the current metadata version as an integer value. See
> MetadataVersion (hyperlink) for a mapping between string and integer
> formats of metadata version".
>
> --
> Divij Vaidya
>
>
>
> On Tue, Jun 6, 2023 at 1:51 PM Ron Dagostino  wrote:
>
>> Thanks again for the KIP, Colin.  +1 (binding).
>>
>> Ron
>>
>> > On Jun 6, 2023, at 7:02 AM, Igor Soarez 
>> wrote:
>> >
>> > Thanks for the KIP.
>> >
>> > Seems straightforward, LGTM.
>> > Non binding +1.
>> >
>> > --
>> > Igor
>> >
>>


Re: [VOTE] KIP-938: Add more metrics for measuring KRaft performance

2023-06-06 Thread Divij Vaidya
"Each metadata version has a corresponding integer in the
MetadataVersion.java file."

Please correct me if I'm wrong, but are you referring to "featureLevel" in
the enum at
https://github.com/apache/kafka/blob/trunk/server-common/src/main/java/org/apache/kafka/server/common/MetadataVersion.java#L45
? Is yes, can we please update the description of the metric to make it
easier for the users to understand this? For example, we can say,
"Represents the current metadata version as an integer value. See
MetadataVersion (hyperlink) for a mapping between string and integer
formats of metadata version".

--
Divij Vaidya



On Tue, Jun 6, 2023 at 1:51 PM Ron Dagostino  wrote:

> Thanks again for the KIP, Colin.  +1 (binding).
>
> Ron
>
> > On Jun 6, 2023, at 7:02 AM, Igor Soarez 
> wrote:
> >
> > Thanks for the KIP.
> >
> > Seems straightforward, LGTM.
> > Non binding +1.
> >
> > --
> > Igor
> >
>


Re: [VOTE] KIP-938: Add more metrics for measuring KRaft performance

2023-06-06 Thread Ron Dagostino
Thanks again for the KIP, Colin.  +1 (binding).

Ron

> On Jun 6, 2023, at 7:02 AM, Igor Soarez  wrote:
> 
> Thanks for the KIP.
> 
> Seems straightforward, LGTM.
> Non binding +1.
> 
> --
> Igor
> 


Re: [VOTE] KIP-938: Add more metrics for measuring KRaft performance

2023-06-06 Thread Igor Soarez
Thanks for the KIP.

Seems straightforward, LGTM.
Non binding +1.

--
Igor



[VOTE] KIP-938: Add more metrics for measuring KRaft performance

2023-06-05 Thread Colin McCabe
Hi all,

I'd like to open the vote for KIP-938: Add more metrics for measuring KRaft 
performance.

Take a look here: https://cwiki.apache.org/confluence/x/gBU0Dw

Thanks to everyone who reviewed the proposal.

best,
Colin