Ismael, Great, that sounds lovely.
I'd like a `Timer` (using yammer metrics parlance) over how long it took to process the event, so we can get at p99 and max times spent processing things. Maybe we could even do a log at warning level if event processing takes over some timeout? Thanks Tom On Thu, Apr 27, 2017 at 3:59 PM, Ismael Juma <ism...@juma.me.uk> wrote: > Hi Tom, > > Yes, the plan is to merge KAFKA-5028 first and then use a lock-free > approach for the new metrics. I considered mentioning that in the KIP > given KAFKA-5120, but didn't in the end. I'll add it to make it clear. > > Regarding locks, they are removed by KAFKA-5028, as you say. So, if I > understand correctly, you are suggesting an event processing rate metric > with event type as a tag? Onur and Jun, what do you think? > > Ismael > > On Thu, Apr 27, 2017 at 3:47 PM, Tom Crayford <tcrayf...@heroku.com> > wrote: > > > Hi, > > > > We (Heroku) are very excited about this KIP, as we've struggled a bit > with > > controller stability recently. Having these additional metrics would be > > wonderful. > > > > I'd like to ensure polling these metrics *doesn't* hold any locks etc, > > because, as noted in https://issues.apache.org/jira/browse/KAFKA-5120, > > that > > lock can be held for quite some time. This may become not an issue as of > > KAFKA-5028 though. > > > > Lastly, I'd love to see some metrics around how long the controller > spends > > inside its lock. We've been tracking an issue ( > > https://issues.apache.org/jira/browse/KAFKA-5116) where it can hold the > > lock for many, many minutes in a zk client listener thread when > responding > > to a single request. I'm not sure how that plays into > > https://issues.apache.org/jira/browse/KAFKA-5028 (which I assume will > land > > before this metrics patch), but it feels like there will be equivalent > > problems ("how long does it spend processing any individual message from > > the queue, broken down by message type"). > > > > These are minor improvements though, the addition of more metrics to the > > controller is already going to be very helpful. > > > > Thanks > > > > Tom Crayford > > Heroku Kafka > > > > On Thu, Apr 27, 2017 at 3:10 PM, Ismael Juma <ism...@juma.me.uk> wrote: > > > > > Hi all, > > > > > > We've posted "KIP-143: Controller Health Metrics" for discussion: > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP- > > > 143%3A+Controller+Health+Metrics > > > > > > Please take a look. Your feedback is appreciated. > > > > > > Thanks, > > > Ismael > > > > > >