Re: [DISCUSS] KIP-143: Controller Health Metrics

Jun Rao Wed, 10 May 2017 08:47:16 -0700

Hi, Onur,

We probably don't want to do the 1-to-1 mapping from the event type to the
controller state since some of the event types are implementation details.
How about the following mapping?


0 - idle
1 - controller change (Startup, ControllerChange, Reelect)
2 - broker change (BrokerChange)
3 - topic creation/change (TopicChange, PartitionModifications)
4 - topic deletion (TopicDeletion, TopicDeletionStopReplicaResult)
5 - partition reassigning (PartitionReassignment,
PartitionReassignmentIsrChange)
6 - auto leader balancing (AutoPreferredReplicaLeaderElection)
7 - manual leader balancing (PreferredReplicaLeaderElection)
8 - controlled shutdown (ControlledShutdown)
9 - isr change (IsrChangeNotification)

For each state, we will add a corresponding timer to track the rate and the
latency, if it's not there already (e.g., broker change and controlled
shutdown). If there are future changes to the controller, we can make a
call whether the new event should be mapped to one of the existing states
or a new state.

Thanks,

Jun

On Tue, May 9, 2017 at 6:17 PM, Onur Karaman <onurkaraman.apa...@gmail.com>
wrote:

> @Ismael, Jun
> After bringing up an earlier point twice now, it still doesn't feel like
> it's been commented on/addressed, so I'm going to give it one more shot:
> Assuming that ControllerState should reflect the current event being
> processed, the KIP is missing states.
>
> The controller currently has 14 event types:
> BrokerChange
> TopicChange
> PartitionModifications
> TopicDeletion
> PartitionReassignment
> PartitionReassignmentIsrChange
> IsrChangeNotification
> PreferredReplicaLeaderElection
> AutoPreferredReplicaLeaderElection
> ControlledShutdown
> TopicDeletionStopReplicaResult
> Startup
> ControllerChange
> Reelect
>
> The KIP only shows 10 event types (and it's not a proper subset of the
> above set).
>
> I think this mismatch would cause the ControllerState to incorrectly be in
> the Idle state when in fact the controller could be doing a lot of work.
>
> 1. Should ControllerState exactly consist of the 14 controller event types
> + the 1 Idle state?
> 2. If so, what's the process for adding/removing/merging event types
> w.r.t. this metric?
>
> On Tue, May 9, 2017 at 4:45 PM, Ismael Juma <ism...@juma.me.uk> wrote:
>
>> Becket,
>>
>> Are you OK with extending the metrics via a subsequent KIP (assuming that
>> what we're doing here doesn't prevent that)? The KIP freeze is tomorrow
>> (although I will give it an extra day or two as many in the community have
>> been attending the Kafka Summit this week), so we should avoid increasing
>> the scope unless it's important for future improvements.
>>
>> Thanks,
>> Ismael
>>
>> On Wed, May 10, 2017 at 12:09 AM, Jun Rao <j...@confluent.io> wrote:
>>
>> > Hi, Becket,
>> >
>> > q10. The reason why there is not a timer metric for broker change event
>> is
>> > that the controller currently always has a LeaderElectionRateAndTimeMs
>> > timer metric (in ControllerStats).
>> >
>> > q11. I agree that that it's useful to know the queue time in the
>> > controller event queue and suggested that earlier. Onur thinks that
>> it's a
>> > bit too early to add that since we are about to change how to queue
>> events
>> > from ZK. Similarly, we will probably also make changes to batch requests
>> > from the controller to the broker. So, perhaps we can add more metrics
>> once
>> > those changes in the controller have been made. For now, knowing the
>> > controller state and the controller channel queue size is probably good
>> > enough.
>> >
>> > Thanks,
>> >
>> > Jun
>> >
>> >
>> >
>> > On Mon, May 8, 2017 at 10:05 PM, Becket Qin <becket....@gmail.com>
>> wrote:
>> >
>> >> @Ismael,
>> >>
>> >> About the stage and event type. Yes, I think each event handling should
>> >> have those stages covered. It is similar to what we are doing for the
>> >> requests on the broker side. We have benefited from such systematic
>> metric
>> >> structure a lot so I think it would be worth following the same way in
>> the
>> >> controller.
>> >>
>> >> As an example, for BrokerChangeEvent (or any event), I am thinking we
>> >> would
>> >> have the following metrics:
>> >> 1. EventRate
>> >> 2. EventQueueTime : The event queue time
>> >> 3. EventHandlingTime: The event handling time (including zk path
>> updates)
>> >> 4. EventControllerChannelQueueTime: The queue time of the
>> corresponding
>> >> LeaderAndIsrRequest and UpdateMetadataRequest in the controller channel
>> >> queue.
>> >> 5. EventControllerChannelSendTime: The time to send the corresponding
>> >> requests, could be the total time for the corresponding
>> >> LeaderAndIsrRequest
>> >> and UpdateMetadataRequest. It is typically small, but in some cases
>> could
>> >> be slower than we expect.
>> >> 6. EventAckTime: The time waiting for all the corresponding responses.
>> >> 7. EventHandlingTotalTime: sum of 2-6
>> >>
>> >> Note that all the metrics are from the event and cluster wide state
>> >> transition point of view, but not from a single request type point of
>> >> view.
>> >> Among the above metrics, 4,5,6 could potentially be per broker.
>> >>
>> >> Thanks,
>> >>
>> >> Jiangjie (Becket) Qin
>> >>
>> >> On Mon, May 8, 2017 at 7:24 PM, Onur Karaman <
>> >> onurkaraman.apa...@gmail.com>
>> >> wrote:
>> >>
>> >> > I had a similar comment to Becket but accidentally posted it on the
>> vote
>> >> > thread last Friday. From that thread:
>> >> > "I noticed that both the ControllerState metric and the
>> *RateAndTimeMs
>> >> > metrics
>> >> > only cover a subset of the controller event types. Was this
>> >> intentional?"
>> >> >
>> >> > I think it makes most sense to just have the ControllerState metric
>> and
>> >> > *RateAndTimeMs metrics exactly match up with the event types.
>> >> >
>> >> > On Mon, May 8, 2017 at 4:35 PM, Ismael Juma <ism...@juma.me.uk>
>> wrote:
>> >> >
>> >> > > Hi Becket,
>> >> > >
>> >> > > Thanks for the feedback. Comments inline.
>> >> > >
>> >> > > On Tue, May 9, 2017 at 12:19 AM, Becket Qin <becket....@gmail.com>
>> >> > wrote:
>> >> > >>
>> >> > >> q10. With event loop based controller design, it seems natrural to
>> >> have
>> >> > >> the
>> >> > >> processing time for each controller event type. In that case, the
>> >> > current
>> >> > >> metrics seem not covering all the event processing time? e.g.
>> (broker
>> >> > >> joining and broker failure event handling time).
>> >> > >>
>> >> > >
>> >> > > I'll leave it to Jun to explain why some metrics were not included
>> in
>> >> the
>> >> > > proposal.
>> >> > >
>> >> > > q11. There are actually a couple of stages for controller state
>> >> > transition
>> >> > >> and propagation. More specifically:
>> >> > >> Stage 1: Event queue time
>> >> > >> Stage 2: Event process time (including all the zk path updates)
>> >> > >> Stage 3: State propagation time (including the state propagation
>> >> queuing
>> >> > >> time on the controller and the actual request sent to response
>> >> receive
>> >> > >> time)
>> >> > >>
>> >> > >> I think it worth to have metrics for each of the stage. Stage 3
>> might
>> >> > be a
>> >> > >> little tricky because we may need to potentially manage the per
>> >> broker
>> >> > >> propagation time. Arguably the state propagation time is a little
>> >> > >> overlapping with the broker side request handling time, but for
>> some
>> >> > state
>> >> > >> change that involves multiple types of requests, it could still be
>> >> > useful
>> >> > >> to know what is the time for a state transition to be propagated
>> to
>> >> the
>> >> > >> entire cluster.
>> >> > >>
>> >> > >
>> >> > > Can you please elaborate on how you would like to see this
>> measured.
>> >> Do
>> >> > > you mean that each event would have a separate metric for each of
>> >> these
>> >> > > stages? Maybe a worked out example would make it clear.
>> >> > >
>> >> > > q11. Regarding (13), controller actually do not update the ISR but
>> >> just
>> >> > >> read it. The error message seems from the brokers. It usually
>> >> happens in
>> >> > >> the following case:
>> >> > >> 1. Current leader broker has the cached ZNode version V
>> >> > >> 2. The controller elected a new leader and updated the ZNode, now
>> >> ZNode
>> >> > >> version becomes V+1,
>> >> > >> 3. Controller sends the LeaderAndIsrRequest to the replica
>> brokers to
>> >> > >> propagate the new leader information as well as the new zkVersion.
>> >> > >> 4. Before the old leader process the LeaderAndIsrRequest from
>> >> controller
>> >> > >> in
>> >> > >> step 3, The old leader tries to update ISR and found the cached
>> >> > zkVersion
>> >> > >> V
>> >> > >> is different from the actual zkVersion V+1.
>> >> > >>
>> >> > >> It looks that this is not a controller metric.
>> >> > >
>> >> > >
>> >> > > Yes, it's not a controller metric, that's why it's under the
>> >> "Partition
>> >> > > Metrics" section. In the PR, I actually implemented it in
>> >> ReplicaManager
>> >> > > alongside IsrExpandsPerSec and IsrShrinksPerSec.
>> >> > >
>> >> > > Thanks,
>> >> > > Ismael
>> >> > >
>> >> >
>> >>
>> >
>> >
>>
>
>

Re: [DISCUSS] KIP-143: Controller Health Metrics

Reply via email to