Adrian Moreno <amore...@redhat.com> writes:

> On 6/19/23 10:36, Eelco Chaudron wrote:
>> On 16 Jun 2023, at 19:19, Aaron Conole wrote:
>> 
>>> Martin Kennelly <mkenn...@redhat.com> writes:
>>>
>>>> Hey ovs community,
>>>>
>>>> I am a developer working on ovn-kubernetes and I want to
>>>> programmatically consume long poll information
>>>> i.e:
>>>> ovs|00211|timeval(handler25)|WARN|Unreasonably long 52388ms poll
>>>> interval (752ms user, 209ms system)
>>>>
>>>> This is currently exposed via journal logs but it's not practical
>>>> to consume it there programmatically and I was
>>>> hoping you could add it to coverage metrics.
>>>
>>> I think it could be useful.  I do want to be careful about exposing
>>> these kinds of data in a way that could be misinterpreted.  Already,
>>> that log in particular gets misinterpreted quite a bit, and RH gets
>>> customers claiming OVS is misbehaving when they've oversubscribed the
>>> system.
>> +1
>> 
>
> Maybe it's a good time to start documenting coverage counters?

I agree - we should have at least some kind of documentation.  Actually,
it would be really nice if we could do something during
COVERAGE_DEFINE() that would be like:

  COVERAGE_DEFINE(ctr, "description")

and then we can generate documentation from the COVERAGE_DEFINE()s as
well as querying for it with an ovs-appctl command.

That might be trying to be too fancy, though.

>>> Mechanically, it would be pretty simple to do something like:
>>>
>>> ---
>>> diff --git a/lib/timeval.c b/lib/timeval.c
>>> index 193c7bab17..00e5f2a74d 100644
>>> --- a/lib/timeval.c
>>> +++ b/lib/timeval.c
>>> @@ -40,6 +40,7 @@
>>>   #include "openvswitch/vlog.h"
>>>
>>>   VLOG_DEFINE_THIS_MODULE(timeval);
>>> +COVERAGE_DEFINE(long_poll_interval);
>>>
>>>   #if !defined(HAVE_CLOCK_GETTIME)
>>>   typedef unsigned int clockid_t;
>>> @@ -645,6 +646,8 @@ log_poll_interval(long long int last_wakeup)
>>>           struct rusage rusage;
>>>
>>>           if (!getrusage_thread(&rusage)) {
>>> +            COVERAGE_INC(long_poll_interval);
>>> +
>>>               VLOG_WARN("Unreasonably long %lldms poll interval"
>>>                         " (%lldms user, %lldms system)",
>>>                         interval,
>>> ---
>>>
>>> This would at least expose the coverage data via the coverage framework
>>> and it can be queried via ovs-appctl.  Actually, the advantage here is
>>> that the coverage counter can track some details about X/sec over the
>>> last 5 seconds, minute, hour, in addition to the total, so we can see
>>> whether the condition is ongoing.
>> _______________________________________________
>> dev mailing list
>> d...@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>> 

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to