Hi Jason, That is a good point in making the ratio metric consistent with the broker idle metrics. I have updated the KIP to calculate the poll idle ratio as you suggested.
Thanks for pointing this out. Regards, Kevin On Fri, Sep 20, 2019 at 10:09 AM Jason Gustafson <ja...@confluent.io> wrote: > Hi Kevin, > > For the computation of the idle ratio, can we make it consistent with the > idle ratios on the broker? Basically we use the following: > > idle ratio = idle time / total time > > So when the consumer is idle (i.e waiting for records), then the idle ratio > approaches 1. When the application is busy processing, it approaches 0. > Does that make sense? > > Thanks, > Jason > > > On Tue, Sep 17, 2019 at 7:26 PM Satish Duggana <satish.dugg...@gmail.com> > wrote: > > > Hi Kevin, > > Thanks for adding useful metrics with the KIP. > > > > On Wed, 18 Sep, 2019, 1:49 AM Kevin Lu, <lu.ke...@berkeley.edu> wrote: > > > > > Hi Manikumar, > > > > > > Thanks for the support. > > > > > > Since we have added a couple additional metrics, I have renamed the KIP > > > title to reflect the content better: KIP-517: Add consumer metrics to > > > observe user poll behavior > > > < > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-517%3A+Add+consumer+metrics+to+observe+user+poll+behavior > > > > > > > > > > Regards, > > > Kevin > > > > > > On Tue, Sep 17, 2019 at 11:07 AM Manikumar <manikumar.re...@gmail.com> > > > wrote: > > > > > > > Hi Kevin, > > > > > > > > Thanks for the KIP. LGTM. This will be useful. > > > > > > > > Thanks, > > > > > > > > On Mon, Sep 16, 2019 at 10:17 PM Harsha Chintalapani < > ka...@harsha.io> > > > > wrote: > > > > > > > > > Thanks. +1 LGTM. > > > > > > > > > > > > > > > On Mon, Sep 16, 2019 at 9:19 AM, Kevin Lu <lu.ke...@berkeley.edu> > > > wrote: > > > > > > > > > > > Hi Harsha, > > > > > > > > > > > > Thanks for the feedback. I have added *last-poll-seconds-ago* to > > the > > > > KIP > > > > > > (being consistent with *last-heartbeat-seconds-ago*). > > > > > > > > > > > > Regards, > > > > > > Kevin > > > > > > > > > > > > On Sat, Sep 14, 2019 at 9:44 AM Harsha Chintalapani < > > ka...@harsha.io > > > > > > > > > > wrote: > > > > > > > > > > > > Thanks Kevin for the KIP. Overall LGTM. > > > > > > On you second point, I think the metric will be really useful to > > > > indicate > > > > > > the perf bottlenecks on user code vs kakfa consumer/broker. > > > > > > > > > > > > Thanks, > > > > > > Harsha > > > > > > > > > > > > On Fri, Sep 13, 2019 at 2:41 PM, Kevin Lu <lu.ke...@berkeley.edu > > > > > > wrote: > > > > > > > > > > > > Hi Radai & Jason, > > > > > > > > > > > > Thanks for the support and suggestion. > > > > > > > > > > > > 1. I think ratio is a good additional metric since the current > > > proposed > > > > > > metrics are only absolute times which may not be useful in all > > > > scenarios. > > > > > > > > > > > > I have added this to the KIP: > > > > > > * - poll-idle-ratio*: The fraction of time the consumer spent > > waiting > > > > for > > > > > > the user to process records from poll. > > > > > > > > > > > > Thoughts on the metric name/description? > > > > > > > > > > > > 2. Would it be useful to include a metric measuring the time > since > > > poll > > > > > > was last called? Similar to *heartbeat-last-seconds-ago*, it > would > > be > > > > > > *poll-last-ms-ago. > > > > > > *This could be useful if (1) the user has a very high > > > > *max.poll.interval. > > > > > > ms > > > > > > <http://max.poll.interval.ms>* configured and typically spends a > > > long > > > > > > time processing, or (2) comparing this metric with others such as > > > > > > *heartbeat-last-seconds-ago* or something else for gathering data > > in > > > > root > > > > > > cause analyses (or identifying potential consumer bugs related to > > > > poll). > > > > > > > > > > > > Regards, > > > > > > Kevin > > > > > > > > > > > > On Fri, Sep 13, 2019 at 10:39 AM Jason Gustafson < > > ja...@confluent.io > > > > > > > > > > wrote: > > > > > > > > > > > > Hi Kevin, > > > > > > > > > > > > This looks reasonable to me. I'd also +1 Radai's suggestion if > > you're > > > > > > willing. Something like an idle ratio for the consumer would be > > > > helpful. > > > > > > > > > > > > Thanks, > > > > > > Jason > > > > > > > > > > > > On Fri, Sep 13, 2019 at 10:08 AM radai < > radai.rosenbl...@gmail.com > > > > > > > > > wrote: > > > > > > > > > > > > while youre at it another metric that we have found to be useful > > is % > > > > > > > > > > > > time > > > > > > > > > > > > spent in user code vs time spent in poll() (so time between poll > > > calls > > > > / > > > > > > time inside poll calls) - the higher the % value the more > > indicative > > > of > > > > > > user code being the cause of performance bottlenecks. > > > > > > > > > > > > On Fri, Sep 13, 2019 at 9:14 AM Kevin Lu <lu.ke...@berkeley.edu> > > > > wrote: > > > > > > > > > > > > Hi All, > > > > > > > > > > > > Happy Friday! Bumping this. Any thoughts? > > > > > > > > > > > > Thanks. > > > > > > > > > > > > Regards, > > > > > > Kevin > > > > > > > > > > > > On Thu, Sep 5, 2019 at 9:35 AM Kevin Lu <lu.ke...@berkeley.edu> > > > wrote: > > > > > > > > > > > > Hi All, > > > > > > > > > > > > I'd like to propose a new consumer metric that measures the time > > > > > > > > > > > > between > > > > > > > > > > > > calls to poll() for use in issues related to hitting > > > > > > > > > > > > max.poll.interval.ms > > > > > > > > > > > > due to long processing time. > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/ > > > > > > KIP-517%3A+Add+consumer+metric+indicating+time+between+poll+calls > > > > > > > > > > > > Please give it a read and let me know what you think. > > > > > > > > > > > > Thanks! > > > > > > > > > > > > Regards, > > > > > > Kevin > > > > > > > > > > > > > > > > > > > > > > > > > > >