Re: [ovs-discuss] [ovs-dev] Request adding long poll interval metrics to coverage metrics

2023-06-21 Thread Aaron Conole via discuss
Ilya Maximets  writes:

> On 6/20/23 16:10, Aaron Conole wrote:
>> Adrian Moreno  writes:
>> 
>>> On 6/19/23 10:36, Eelco Chaudron wrote:
 On 16 Jun 2023, at 19:19, Aaron Conole wrote:

> Martin Kennelly  writes:
>
>> Hey ovs community,
>>
>> I am a developer working on ovn-kubernetes and I want to
>> programmatically consume long poll information
>> i.e:
>> ovs|00211|timeval(handler25)|WARN|Unreasonably long 52388ms poll
>> interval (752ms user, 209ms system)
>>
>> This is currently exposed via journal logs but it's not practical
>> to consume it there programmatically and I was
>> hoping you could add it to coverage metrics.
>
> I think it could be useful.  I do want to be careful about exposing
> these kinds of data in a way that could be misinterpreted.  Already,
> that log in particular gets misinterpreted quite a bit, and RH gets
> customers claiming OVS is misbehaving when they've oversubscribed the
> system.
 +1

>>>
>>> Maybe it's a good time to start documenting coverage counters?
>> 
>> I agree - we should have at least some kind of documentation.  Actually,
>> it would be really nice if we could do something during
>> COVERAGE_DEFINE() that would be like:
>> 
>>   COVERAGE_DEFINE(ctr, "description")
>> 
>> and then we can generate documentation from the COVERAGE_DEFINE()s as
>> well as querying for it with an ovs-appctl command.
>> 
>> That might be trying to be too fancy, though.
>
>
> The problem with documenting coverage counters is that we can't
> and shouldn't claim that these are in any way a stable API.
> They are mainly for developers.  Code can change and there might
> be no meaningful way preserve current counters or their names.
> They can change in each release including minor ones without
> prior notice.

Agreed that these are developer focused counters, and not proper KPIs.
Even with that, there can be some information for others to take some
useful information back to the developers with.  For example, we can see
how often this particular event is happening to correlate with system
issues.  I guess that is why I prefer the in-code documentation to doing
some kind of "traditional" man-page for these as well.

> Best regards, Ilya Maximets.
>
>> 
> Mechanically, it would be pretty simple to do something like:
>
> ---
> diff --git a/lib/timeval.c b/lib/timeval.c
> index 193c7bab17..00e5f2a74d 100644
> --- a/lib/timeval.c
> +++ b/lib/timeval.c
> @@ -40,6 +40,7 @@
>   #include "openvswitch/vlog.h"
>
>   VLOG_DEFINE_THIS_MODULE(timeval);
> +COVERAGE_DEFINE(long_poll_interval);
>
>   #if !defined(HAVE_CLOCK_GETTIME)
>   typedef unsigned int clockid_t;
> @@ -645,6 +646,8 @@ log_poll_interval(long long int last_wakeup)
>   struct rusage rusage;
>
>   if (!getrusage_thread(&rusage)) {
> +COVERAGE_INC(long_poll_interval);
> +
>   VLOG_WARN("Unreasonably long %lldms poll interval"
> " (%lldms user, %lldms system)",
> interval,
> ---
>
> This would at least expose the coverage data via the coverage framework
> and it can be queried via ovs-appctl.  Actually, the advantage here is
> that the coverage counter can track some details about X/sec over the
> last 5 seconds, minute, hour, in addition to the total, so we can see
> whether the condition is ongoing.
 ___
 dev mailing list
 d...@openvswitch.org
 https://mail.openvswitch.org/mailman/listinfo/ovs-dev

>> 
>> ___
>> dev mailing list
>> d...@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>> 

___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] [ovs-dev] Request adding long poll interval metrics to coverage metrics

2023-06-21 Thread Aaron Conole via discuss
Eelco Chaudron  writes:

> On 20 Jun 2023, at 16:57, Ilya Maximets wrote:
>
>> On 6/20/23 16:10, Aaron Conole wrote:
>>> Adrian Moreno  writes:
>>>
 On 6/19/23 10:36, Eelco Chaudron wrote:
> On 16 Jun 2023, at 19:19, Aaron Conole wrote:
>
>> Martin Kennelly  writes:
>>
>>> Hey ovs community,
>>>
>>> I am a developer working on ovn-kubernetes and I want to
>>> programmatically consume long poll information
>>> i.e:
>>> ovs|00211|timeval(handler25)|WARN|Unreasonably long 52388ms poll
>>> interval (752ms user, 209ms system)
>>>
>>> This is currently exposed via journal logs but it's not practical
>>> to consume it there programmatically and I was
>>> hoping you could add it to coverage metrics.
>>
>> I think it could be useful.  I do want to be careful about exposing
>> these kinds of data in a way that could be misinterpreted.  Already,
>> that log in particular gets misinterpreted quite a bit, and RH gets
>> customers claiming OVS is misbehaving when they've oversubscribed the
>> system.
> +1
>

 Maybe it's a good time to start documenting coverage counters?
>>>
>>> I agree - we should have at least some kind of documentation.  Actually,
>>> it would be really nice if we could do something during
>>> COVERAGE_DEFINE() that would be like:
>>>
>>>   COVERAGE_DEFINE(ctr, "description")
>>>
>>> and then we can generate documentation from the COVERAGE_DEFINE()s as
>>> well as querying for it with an ovs-appctl command.
>
> I think this might be the only way to keep the doc in sync, however,
> who is going to document the 320+ existing ones?

That becomes a bit of a retro-fit challenge I think.  Regardless, maybe
the effort would be worth it.

>>> That might be trying to be too fancy, though.
>>
>>
>> The problem with documenting coverage counters is that we can't
>> and shouldn't claim that these are in any way a stable API.
>> They are mainly for developers.  Code can change and there might
>> be no meaningful way preserve current counters or their names.
>> They can change in each release including minor ones without
>> prior notice.
>
> +1000 as even if the direct code near the counter does not changes, it
> could still be impacted the actual counter. Which I have seen before.
>
>> Best regards, Ilya Maximets.
>>
>>>
>> Mechanically, it would be pretty simple to do something like:
>>
>> ---
>> diff --git a/lib/timeval.c b/lib/timeval.c
>> index 193c7bab17..00e5f2a74d 100644
>> --- a/lib/timeval.c
>> +++ b/lib/timeval.c
>> @@ -40,6 +40,7 @@
>>   #include "openvswitch/vlog.h"
>>
>>   VLOG_DEFINE_THIS_MODULE(timeval);
>> +COVERAGE_DEFINE(long_poll_interval);
>>
>>   #if !defined(HAVE_CLOCK_GETTIME)
>>   typedef unsigned int clockid_t;
>> @@ -645,6 +646,8 @@ log_poll_interval(long long int last_wakeup)
>>   struct rusage rusage;
>>
>>   if (!getrusage_thread(&rusage)) {
>> +COVERAGE_INC(long_poll_interval);
>> +
>>   VLOG_WARN("Unreasonably long %lldms poll interval"
>> " (%lldms user, %lldms system)",
>> interval,
>> ---
>>
>> This would at least expose the coverage data via the coverage framework
>> and it can be queried via ovs-appctl.  Actually, the advantage here is
>> that the coverage counter can track some details about X/sec over the
>> last 5 seconds, minute, hour, in addition to the total, so we can see
>> whether the condition is ongoing.
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
>>>
>>> ___
>>> dev mailing list
>>> d...@openvswitch.org
>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>
>>
>> ___
>> dev mailing list
>> d...@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] OVN North DB (Security)

2023-06-21 Thread Gavin McKee via discuss
Thanks for the advice Frode.

Gav

On Tue, 20 Jun 2023 at 23:33, Frode Nordahl 
wrote:

> Hello, Gavin,
>
> On Tue, Jun 20, 2023 at 10:55 PM Gavin McKee via discuss
>  wrote:
> > Is it possible to control who gets to write to OVN NB ?
>
> With the Southbound DB there is an OVN RBAC feature that may be used [0],
> however no such feature currently exists for the Northbound DB.
>
> > I want to ensure that no hypervisor with ovn-nbctl can write
> configuration into the North DB.  Is there any approach I can use?
>
> With the lack of RBAC for NB DB there are a couple of other approaches that
> could be used:
>
> 1. Set up a firewall on the units providing the NB DB, not allowing
>connections from hypervisors.
>
> 2. Enable TLS/SSL and use a different certificate chain for NB and SB DBs.
>When enabled, the ovsdb-server will verify the clients certificate and
>refuse connections from those it cannot verify.
>
> 0: https://docs.ovn.org/en/latest/tutorials/ovn-rbac.html
>
> --
> Frode Nordahl
>
> > Gav
> > ___
> > discuss mailing list
> > disc...@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss