Adrian Moreno <amore...@redhat.com> writes: > On 6/19/23 10:36, Eelco Chaudron wrote: >> On 16 Jun 2023, at 19:19, Aaron Conole wrote: >> >>> Martin Kennelly <mkenn...@redhat.com> writes: >>> >>>> Hey ovs community, >>>> >>>> I am a developer working on ovn-kubernetes and I want to >>>> programmatically consume long poll information >>>> i.e: >>>> ovs|00211|timeval(handler25)|WARN|Unreasonably long 52388ms poll >>>> interval (752ms user, 209ms system) >>>> >>>> This is currently exposed via journal logs but it's not practical >>>> to consume it there programmatically and I was >>>> hoping you could add it to coverage metrics. >>> >>> I think it could be useful. I do want to be careful about exposing >>> these kinds of data in a way that could be misinterpreted. Already, >>> that log in particular gets misinterpreted quite a bit, and RH gets >>> customers claiming OVS is misbehaving when they've oversubscribed the >>> system. >> +1 >> > > Maybe it's a good time to start documenting coverage counters?
I agree - we should have at least some kind of documentation. Actually, it would be really nice if we could do something during COVERAGE_DEFINE() that would be like: COVERAGE_DEFINE(ctr, "description") and then we can generate documentation from the COVERAGE_DEFINE()s as well as querying for it with an ovs-appctl command. That might be trying to be too fancy, though. >>> Mechanically, it would be pretty simple to do something like: >>> >>> --- >>> diff --git a/lib/timeval.c b/lib/timeval.c >>> index 193c7bab17..00e5f2a74d 100644 >>> --- a/lib/timeval.c >>> +++ b/lib/timeval.c >>> @@ -40,6 +40,7 @@ >>> #include "openvswitch/vlog.h" >>> >>> VLOG_DEFINE_THIS_MODULE(timeval); >>> +COVERAGE_DEFINE(long_poll_interval); >>> >>> #if !defined(HAVE_CLOCK_GETTIME) >>> typedef unsigned int clockid_t; >>> @@ -645,6 +646,8 @@ log_poll_interval(long long int last_wakeup) >>> struct rusage rusage; >>> >>> if (!getrusage_thread(&rusage)) { >>> + COVERAGE_INC(long_poll_interval); >>> + >>> VLOG_WARN("Unreasonably long %lldms poll interval" >>> " (%lldms user, %lldms system)", >>> interval, >>> --- >>> >>> This would at least expose the coverage data via the coverage framework >>> and it can be queried via ovs-appctl. Actually, the advantage here is >>> that the coverage counter can track some details about X/sec over the >>> last 5 seconds, minute, hour, in addition to the total, so we can see >>> whether the condition is ongoing. >> _______________________________________________ >> dev mailing list >> d...@openvswitch.org >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >> _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev