Re: [DISCUSS] KIP-835: Monitor KRaft Controller Quorum Health

Colin McCabe Wed, 11 May 2022 12:48:15 -0700

Hi José,

Thanks for the KIP! I think this will be a nice improvement.


I had the same question as Luke and Jason: what's the default here for the 
NoOpRecord time? :) We should add a value here even if we think we'll adjust it 
later, just to give a feeling for how much traffic this would create. Perhaps 
500 ms?

Also how about naming the time configuration something like 
"metadata.max.idle.interval.ms", to make it clear that this is the maximum time 
we can go between writes to the metadata log. I don't get that meaning out of 
"monitor interval"...

> Broker
> kafka.server:type=broker-metadata-metrics,name=last-committed-offset 
> kafka.server:type=broker-metadata-metrics,name=last-committed-timestamp 
> kafka.server:type=broker-metadata-metrics,name=last-committed-lag-ms

I'd suggest renaming this as "last-applied-offset-*" to make it clear that the 
offset we're measuring is the last one that the broker applied. The last 
committed offset is something else, a more log-layer-centric concept.

(It would also be good to have a separate metric which purely reflected the 
last metadata offset we've fetched and not applied, so we can see if the delta 
between that and last-applied-offset increases...)

regards,
Colin


On Wed, May 11, 2022, at 12:27, David Arthur wrote:
> José, thanks for the KIP! I think this is a good approach for proving
> the liveness of the quorum when metadata is not changing.
>
> 1. Based on the config name "metadata.monitor.write.interval.ms" I'm
> guessing the intention is to have a regularly scheduled write. If the
> quorum is busy with lots of the writes, we wouldn't need this NoopRecord
> right? Maybe we can instead write a NoopRecord only after some amount of
> idle time.
>
> 2. A small naming suggestion, what about "NoOpRecord"?
>
> 3. Typo in one of the metric names: "MetadataWriteOffses"
>
> 4. We should consider omitting these records from the log dump tool, or at
> least adding an option to skip over them.
>
> 5. If (somehow) one of these NoopRecord appeared in the snapshot, it sounds
> like the broker/controller would skip over them. We should specify this
> behavior in the KIP
>
> Cheers,
> David
>
> On Wed, May 11, 2022 at 2:37 AM Luke Chen <show...@gmail.com> wrote:
>
>> Hi José,
>>
>> Thanks for the KIP!
>>
>> Some questions:
>> 1. Jason has asked but you didn't answer: What is the default value for `
>> metadata.monitor.write.interval.ms`?
>> 2. The `noopRecord` API key is `TBD`. Why can't we put the "currently used
>> API Key nums + 1" into it? Any concern?
>> 3. typo: name=MetadataWriteOffse[t]s
>> 4. I don't understand the difference between MetadataLastCommittedOffset
>> and MetadataWriteOffsets metrics. I think the 2 values will always be
>> identical, unless the controller metadata write failed, is that correct?
>>
>>
>> Thank you.
>> Luke
>>
>> On Wed, May 11, 2022 at 5:58 AM José Armando García Sancio
>> <jsan...@confluent.io.invalid> wrote:
>>
>> > Thanks for your feedback Jason, much appreciated.
>> >
>> > Here are the changes to the KIP:
>> >
>> >
>> https://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?pageId=211883219&selectedPageVersions=5&selectedPageVersions=4
>> >
>> > On Tue, May 10, 2022 at 1:34 PM Jason Gustafson
>> > <ja...@confluent.io.invalid> wrote:
>> > > The approach sounds reasonable. By the way, I think one of the gaps we
>> > have
>> > > today is when the leader gets partitioned from the remaining voters. I
>> > > believe it continues acting as a leader indefinitely. I was considering
>> > > whether this periodic write can address the issue. Basically it can be
>> > used
>> > > to force a leader to prove it is still the leader by committing some
>> > data.
>> > > Say, for example, that the leader fails to commit the record after the
>> > > fetch timeout expires, then perhaps it could start a new election. What
>> > do
>> > > you think?
>> >
>> > We have an issue for this at
>> > https://issues.apache.org/jira/browse/KAFKA-13621. I updated the issue
>> > with your feedback and included some of my thoughts. Do you mind if we
>> > move this conversation to that issue?
>> >
>> > > A couple additional questions:
>> > >
>> > > - What is the default value for `metadata.monitor.write.interval.ms`?
>> > Also,
>> > > I'm wondering if `controller` would be a more suitable prefix?
>> >
>> > Yeah. I am not sure. Looking at the current configuration we have both
>> > prefixes. For example, with the `controller` prefix we have
>> > `controller.quorum.voters`, `controller.listener.names`,
>> > `controller.quota.window.num`, etc. For the `metadata` prefix we have
>> > `metadata.log.dir`, `metadata.log.*` and `metadat.max.retention.ms`,
>> > etc.
>> > I get the impression that we use `metadata` for things that are kinda
>> > log/disk related and `controller` for things that are not. I am
>> > thinking that the `metadata` prefix is more consistent with the
>> > current situation. What do you think Jason?
>> >
>> > > - Could we avoid letting BrokerMetadataPublisher escape into the metric
>> > > name? Letting the classnames leak into the metrics tends to cause
>> > > compatibility issues over time.
>> >
>> > Good point. For Raft we use `kafka.server:type=raft-metrics,name=...`.
>> > I'll change it to
>> > `kafka.server:type=broker-metadata-metrics,name=...`.
>> >
>> > Thanks,
>> > -José
>> >
>>
>
>
> -- 
> David Arthur

Re: [DISCUSS] KIP-835: Monitor KRaft Controller Quorum Health

Reply via email to