Re: [ovs-discuss] [OVN] ovn-northd takes much CPU when no configuration update

2020-08-04 Thread Tony Liu
Hi Han,

Sounds good. I am looking forward to incremental-processing,
and will go from there.

BTW, it would be great if you could let me know how to set probe
interval for 3-node cluster, here or in another thread.


Thanks!

Tony

> -Original Message-
> From: Han Zhou 
> Sent: Tuesday, August 4, 2020 4:02 PM
> To: Tony Liu 
> Cc: Han Zhou ; Numan Siddique ; Ben Pfaff
> ; Leonid Ryzhyk ; ovs-dev  d...@openvswitch.org>; ovs-discuss 
> Subject: Re: [ovs-discuss] [OVN] ovn-northd takes much CPU when no
> configuration update
> 
> Hi Tony,
> 
> I am glad it is more clear now. For your concern regarding taking too
> much time for one round of computing, it is valid, but I guess it is not
> directly related to the IDLE probe any more, right?
> The OVSDB IDL in fact already does some of the work of caping and
> buffering like what you proposed. The IDL will read a limited number of
> messages to get processed in each round (and the remaining messages are
> buffered in the stream). However, sometimes a single notification
> message can contain a huge amount of data. It is hard to split the data
> from one single notification, because the data are internally dependent
> on each other.
> 
> Without incremental-processing, the size of the data change doesn't
> matter much because all data is recomputed anyway. I'd suggest to see
> what's the outcome of incremental-processing, and see if any further
> improvement is still needed for handling big transactions.
> 
> In my opinion, the special cases of a big data change triggered by
> scenarios such as data restore can be handled by operational approaches
> instead of implementation. For example, you could adjust the probe
> interval before doing data restore, and change it back afterwards. But
> of course, if there are good ways to implement we should definitely
> consider.
> 
> 
> Thanks,
> Han
> 
> 
> On Tue, Aug 4, 2020 at 2:00 PM Tony Liu  <mailto:tonyliu0...@hotmail.com> > wrote:
> 
> 
>   Hi Han,
> 
>   Thanks for clarifications! It's crystal clear.
> 
>   My concern, in general, is blocking. For onv-northd, or OVSDB
> client,
>   (I assume all OVSDB clients are using the same library for
> connection,
>   proble, etc.?) when handing current event, it won't be interrupted
> to
>   handle any incoming event, right? How long does it take to handle a
>   computing event for big chunk of data? How much data can be
> buffered
>   to be computed? Is there estimated maximum time for handle so much
> data?
> 
>   In case it takes more than 5s to process an event, then the peer
> will
>   drop the connection because of probe timeout.
> 
>   With incremental-process, if I restore DB, then that still could be
> a
>   huge incremental, unless the incremental size is controlled. That's
>   probably why you recommend to restore to existing cluster, to avoid
>   huge incremental from restoring to a fresh cluster. Am I right?
> 
>   What I used to do is to chop big data into pieces and to be handled
> by
>   multiple event loops. That way, other events will have a chance to
> get
>   processed. So big chunk of data won't cause blocking.
> 
>   Enlarge probe interval will sort of resolve the issue, but it will
> lose
>   the point of probing. Just like that election timer, enlarge the
> timer
>   avoids often failover, but it also increases the failover time when
> real
>   problem happens. And yes, I agree that it's on control plane and
> doesn't
>   break data plane, but just like in networking world, routing
> convergence
>   is very important.
> 
>   I am thinking, in your incremental-processing, if the time for each
> event
>   loop can be capped or controlled, that would be very helpful. The
> side
>   effect of that option is memory consumption. You will need to
> buffer more
>   data. But today, it's lots easier to increase memory to boost
> performance.
> 
> 
>   Thanks!
> 
>   Tony
> 
>   > -Original Message-
>   > From: Han Zhou mailto:hz...@ovn.org> >
>   > Sent: Tuesday, August 4, 2020 12:34 PM
>   > To: Tony Liu  <mailto:tonyliu0...@hotmail.com> >
>   > Cc: Han Zhou mailto:hz...@ovn.org> >; Numan
> Siddique mailto:num...@ovn.org> >; Ben Pfaff
>   > mailto:b...@ovn.org> >; Leonid Ryzhyk
> mailto:lryz...@vmware.com> >; ovs-dev> d...@openvswitch.org <mailto:d...@openvswitch.org> >; ovs-discuss
> mailto:ovs-discuss@openvswitch.org> >
>   > Subject: Re: [ovs-discuss] [OVN] 

Re: [ovs-discuss] [OVN] ovn-northd takes much CPU when no configuration update

2020-08-04 Thread Tony Liu
Hi Han,

Thanks for clarifications! It's crystal clear.

My concern, in general, is blocking. For onv-northd, or OVSDB client,
(I assume all OVSDB clients are using the same library for connection,
proble, etc.?) when handing current event, it won't be interrupted to
handle any incoming event, right? How long does it take to handle a
computing event for big chunk of data? How much data can be buffered
to be computed? Is there estimated maximum time for handle so much data?

In case it takes more than 5s to process an event, then the peer will
drop the connection because of probe timeout.

With incremental-process, if I restore DB, then that still could be a
huge incremental, unless the incremental size is controlled. That's
probably why you recommend to restore to existing cluster, to avoid
huge incremental from restoring to a fresh cluster. Am I right?

What I used to do is to chop big data into pieces and to be handled by
multiple event loops. That way, other events will have a chance to get
processed. So big chunk of data won't cause blocking.

Enlarge probe interval will sort of resolve the issue, but it will lose
the point of probing. Just like that election timer, enlarge the timer
avoids often failover, but it also increases the failover time when real
problem happens. And yes, I agree that it's on control plane and doesn't
break data plane, but just like in networking world, routing convergence
is very important.

I am thinking, in your incremental-processing, if the time for each event
loop can be capped or controlled, that would be very helpful. The side
effect of that option is memory consumption. You will need to buffer more
data. But today, it's lots easier to increase memory to boost performance.


Thanks!

Tony

> -Original Message-
> From: Han Zhou 
> Sent: Tuesday, August 4, 2020 12:34 PM
> To: Tony Liu 
> Cc: Han Zhou ; Numan Siddique ; Ben Pfaff
> ; Leonid Ryzhyk ; ovs-dev  d...@openvswitch.org>; ovs-discuss 
> Subject: Re: [ovs-discuss] [OVN] ovn-northd takes much CPU when no
> configuration update
> 
> 
> 
> On Tue, Aug 4, 2020 at 11:40 AM Tony Liu  <mailto:tonyliu0...@hotmail.com> > wrote:
> 
> 
>   Inline...
> 
>   Thanks!
> 
>   Tony
>   > -Original Message-
>   > From: Han Zhou mailto:hz...@ovn.org> >
>   > Sent: Tuesday, August 4, 2020 11:01 AM
>   > To: Numan Siddique mailto:num...@ovn.org> >; Ben
> Pfaff mailto:b...@ovn.org> >; Leonid
>   > Ryzhyk mailto:lryz...@vmware.com> >
>   > Cc: Tony Liu  <mailto:tonyliu0...@hotmail.com> >; Han Zhou  <mailto:hz...@ovn.org> >; ovs-
>       > dev mailto:ovs-...@openvswitch.org> >;
> ovs-discuss mailto:ovs-
> disc...@openvswitch.org> >
>   > Subject: Re: [ovs-discuss] [OVN] ovn-northd takes much CPU when
> no
>   > configuration update
>   >
>   >
>   >
>   > On Tue, Aug 4, 2020 at 12:38 AM Numan Siddique  <mailto:num...@ovn.org>
>   > <mailto:num...@ovn.org <mailto:num...@ovn.org> > > wrote:
>   >
>   >
>   >
>   >
>   >   On Tue, Aug 4, 2020 at 9:02 AM Tony Liu
> mailto:tonyliu0...@hotmail.com>
>   > <mailto:tonyliu0...@hotmail.com
> <mailto:tonyliu0...@hotmail.com> > > wrote:
>   >
>   >
>   >   The probe awakes recomputing?
>   >   There is probe every 5 seconds. Without any
> connection
>   > up/down or failover,
>   >   ovn-northd will recompute everything every 5
> seconds, no
>   > matter what?
>   >   Really?
>   >
>   >   Anyways, I will increase the probe interval for now,
> see if
>   > that helps.
>   >
>   >
>   >
>   >   I think we should optimise this case. I am planning to look
> into
>   > this.
>   >
>   >   Thanks
>   >   Numan
>   >
>   >
>   > Thanks Numan.
>   > I'd like to discuss more on this before we move forward to change
>   > anything.
>   >
>   > 1) Regarding the problem itself, the CPU cost triggered by OVSDB
> IDLE
>   > probe when there is no configuration change to compute, I don't
> think it
>   > matters that much in real production. It simply wastes CPU cycles
> when
>   > there is nothing to do, so what harm would it do here? For ovn-
> northd,
>   > since it is the centralized component, we would always ensure
> there is
>   > enough CPU available for ovn-north when computing is needed, and
> t

Re: [ovs-discuss] [OVN] ovn-northd takes much CPU when no configuration update

2020-08-04 Thread Han Zhou
On Tue, Aug 4, 2020 at 11:40 AM Tony Liu  wrote:

> Inline...
>
> Thanks!
>
> Tony
> > -Original Message-
> > From: Han Zhou 
> > Sent: Tuesday, August 4, 2020 11:01 AM
> > To: Numan Siddique ; Ben Pfaff ; Leonid
> > Ryzhyk 
> > Cc: Tony Liu ; Han Zhou ; ovs-
> > dev ; ovs-discuss 
> > Subject: Re: [ovs-discuss] [OVN] ovn-northd takes much CPU when no
> > configuration update
> >
> >
> >
> > On Tue, Aug 4, 2020 at 12:38 AM Numan Siddique  > <mailto:num...@ovn.org> > wrote:
> >
> >
> >
> >
> >   On Tue, Aug 4, 2020 at 9:02 AM Tony Liu  > <mailto:tonyliu0...@hotmail.com> > wrote:
> >
> >
> >   The probe awakes recomputing?
> >   There is probe every 5 seconds. Without any connection
> > up/down or failover,
> >   ovn-northd will recompute everything every 5 seconds, no
> > matter what?
> >   Really?
> >
> >   Anyways, I will increase the probe interval for now, see if
> > that helps.
> >
> >
> >
> >   I think we should optimise this case. I am planning to look into
> > this.
> >
> >   Thanks
> >   Numan
> >
> >
> > Thanks Numan.
> > I'd like to discuss more on this before we move forward to change
> > anything.
> >
> > 1) Regarding the problem itself, the CPU cost triggered by OVSDB IDLE
> > probe when there is no configuration change to compute, I don't think it
> > matters that much in real production. It simply wastes CPU cycles when
> > there is nothing to do, so what harm would it do here? For ovn-northd,
> > since it is the centralized component, we would always ensure there is
> > enough CPU available for ovn-north when computing is needed, and this
> > reservation will be wasted anyway when there is no change to compute. So,
> > I'd avoid making any change specifically only to address this issue. I
> > could be wrong, though. I'd like to hear what would be the real concern
> > if this is not addressed.
>
> Is more vCPUs going to help here? Is ovn-northd multi-thread?
>
>
ovn-northd is single threaded. It can be changed to have a separate thread
for the probe handling, but I don't see any obvious benefit.


> I am probably still missing something here. The probe is there all times,
> every 5s.


The probe is sent only if there is no activity on the OVSDB connection
during the interval, that's why it is called "IDLE" probe. If there is
already interaction during the past interval, no probe will be sent.


> If ovn-northd is in the middle of a computing, is a probe going
> to make ovn-northd restart the computing?


No, it won't. Firstly, it is unlikely that a probe is received during
computing, unless the probe interval is set too short. Secondly, even when
it happens, the current computing will complete and all needed changes will
be enforced to SB DB regardless of the probe received during the computing.
The probe will be handled in the next round of the main loop, and it will
trigger another round of computing which is useless but not harmful either.
There is probably one case I can think of that causes a little latency -
when another NB DB change (say, change2) comes during the computing
triggered by the probe, then the handling for the change2 will be delayed a
little until the computing triggered by the probe completes. But the chance
is rather low, especially if the probe interval is enlarged, and in the
unlucky case, the impact is just a little delay in the change handling.


> Or the probe only triggers
> computing when ovn-northd is idle? Even with the latter case, what's the
> intention to trigger computing by probe?
>
>
It is not triggered intentionally for the probe. It is just because
ovn-northd doesn't distinguish if it is woken up by a probe only or if
there are any changes that need to be processed. Many events can wake up
ovn-northd, and once it is wake up it will compute everything. I agree it
can be optimized (we already optimized this for ovn-controller). I am just
wondering if it worth to be optimized specifically. Or we just get it for
free as a byproduct when implementing incremental-processing, which is
already in the road map.

Does this clarify a little?

Thanks,
Han


> >
> > 2) ovn-northd incremental processing would avoid this CPU problem
> > naturally. So let's discuss how to move forward for incremental
> > processing, which is much more important because it also solves the CPU
> > efficiency when handling the changes, and the IDLE probe problem is just
> > a byproduct. I believe the DDlog branch would have solved this problem.
> > However, it

Re: [ovs-discuss] [OVN] ovn-northd takes much CPU when no configuration update

2020-08-04 Thread Tony Liu
Inline...

Thanks!

Tony
> -Original Message-
> From: Han Zhou 
> Sent: Tuesday, August 4, 2020 11:01 AM
> To: Numan Siddique ; Ben Pfaff ; Leonid
> Ryzhyk 
> Cc: Tony Liu ; Han Zhou ; ovs-
> dev ; ovs-discuss 
> Subject: Re: [ovs-discuss] [OVN] ovn-northd takes much CPU when no
> configuration update
> 
> 
> 
> On Tue, Aug 4, 2020 at 12:38 AM Numan Siddique  <mailto:num...@ovn.org> > wrote:
> 
> 
> 
> 
>   On Tue, Aug 4, 2020 at 9:02 AM Tony Liu  <mailto:tonyliu0...@hotmail.com> > wrote:
> 
> 
>   The probe awakes recomputing?
>   There is probe every 5 seconds. Without any connection
> up/down or failover,
>   ovn-northd will recompute everything every 5 seconds, no
> matter what?
>   Really?
> 
>   Anyways, I will increase the probe interval for now, see if
> that helps.
> 
> 
> 
>   I think we should optimise this case. I am planning to look into
> this.
> 
>   Thanks
>   Numan
> 
> 
> Thanks Numan.
> I'd like to discuss more on this before we move forward to change
> anything.
> 
> 1) Regarding the problem itself, the CPU cost triggered by OVSDB IDLE
> probe when there is no configuration change to compute, I don't think it
> matters that much in real production. It simply wastes CPU cycles when
> there is nothing to do, so what harm would it do here? For ovn-northd,
> since it is the centralized component, we would always ensure there is
> enough CPU available for ovn-north when computing is needed, and this
> reservation will be wasted anyway when there is no change to compute. So,
> I'd avoid making any change specifically only to address this issue. I
> could be wrong, though. I'd like to hear what would be the real concern
> if this is not addressed.

Is more vCPUs going to help here? Is ovn-northd multi-thread?

I am probably still missing something here. The probe is there all times,
every 5s. If ovn-northd is in the middle of a computing, is a probe going
to make ovn-northd restart the computing? Or the probe only triggers
computing when ovn-northd is idle? Even with the latter case, what's the
intention to trigger computing by probe?

> 
> 2) ovn-northd incremental processing would avoid this CPU problem
> naturally. So let's discuss how to move forward for incremental
> processing, which is much more important because it also solves the CPU
> efficiency when handling the changes, and the IDLE probe problem is just
> a byproduct. I believe the DDlog branch would have solved this problem.
> However, it seems we are not sure about the current status of DDlog. As
> you proposed at the last OVN meeting, an alternative is to implement
> partial incremental-processing using the I-P engine like ovn-controller.
> While I have no objection to this, we'd better check with Ben and Leonid
> on the plan to avoid overlapping and waste of work. @Ben @Leonid, would
> you mind sharing the status here since you were not at the meeting last
> week?

My point is that, a probe is not supposed to trigger a computing, no matter
it's full or incremental.

> 
> 
> 
> Thanks,
> Han
> 
> 
> 
> 
> 
> 
>   Thanks!
> 
>   Tony
> 
>   > -Original Message-
>   > From: Han Zhou mailto:hz...@ovn.org> >
>   > Sent: Monday, August 3, 2020 8:22 PM
>           > To: Tony Liu  <mailto:tonyliu0...@hotmail.com> >
>   > Cc: Han Zhou mailto:hz...@ovn.org> >; ovs-
> discuss mailto:ovs-
> disc...@openvswitch.org> >;
>   > ovs-dev mailto:ovs-
> d...@openvswitch.org> >
>   > Subject: Re: [ovs-discuss] [OVN] ovn-northd takes much CPU
> when no
>   > configuration update
>   >
>   > Sorry that I didn't make it clear enough. The OVSDB probe
> itself doesn't
>   > take much CPU, but the probe awakes ovn-northd main loop,
> which
>   > recompute everything, which is why you see CPU spike.
>   > It will be solved by incremental-processing, when only
> delta is
>   > processed, and in case of probe handling, there is no
> change in
>   > configuration, so the delta is zero.
>   > For now, please follow the steps to adjust probe interval,
> if the CPU of
>   > ovn-northd (when there is no configuration change) is a
> concern for you.
>   > But please remember that this has no impact to the real CPU
> usage for
>   > handling configuration changes.
>   >
&

Re: [ovs-discuss] [OVN] ovn-northd takes much CPU when no configuration update

2020-08-04 Thread Tony Liu
Thanks Numan for looking into it!
Probe is for health check only, it's not supposed to trigger translation,
even with incremental implementation. Translation should be triggered only
when a ovn-northd becomes active.


Tony

> -Original Message-
> From: Numan Siddique 
> Sent: Tuesday, August 4, 2020 12:38 AM
> To: Tony Liu 
> Cc: Han Zhou ; ovs-dev ; ovs-
> discuss 
> Subject: Re: [ovs-discuss] [OVN] ovn-northd takes much CPU when no
> configuration update
> 
> 
> 
> On Tue, Aug 4, 2020 at 9:02 AM Tony Liu  <mailto:tonyliu0...@hotmail.com> > wrote:
> 
> 
>   The probe awakes recomputing?
>   There is probe every 5 seconds. Without any connection up/down or
> failover,
>   ovn-northd will recompute everything every 5 seconds, no matter
> what?
>   Really?
> 
>   Anyways, I will increase the probe interval for now, see if that
> helps.
> 
> 
> 
> I think we should optimise this case. I am planning to look into this.
> 
> Thanks
> Numan
> 
> 
> 
> 
>   Thanks!
> 
>   Tony
> 
>   > -Original Message-
>   > From: Han Zhou mailto:hz...@ovn.org> >
>   > Sent: Monday, August 3, 2020 8:22 PM
>   > To: Tony Liu  <mailto:tonyliu0...@hotmail.com> >
>   > Cc: Han Zhou mailto:hz...@ovn.org> >; ovs-discuss
> mailto:ovs-discuss@openvswitch.org> >;
>   > ovs-dev mailto:ovs-
> d...@openvswitch.org> >
>   > Subject: Re: [ovs-discuss] [OVN] ovn-northd takes much CPU when
> no
>   > configuration update
>   >
>   > Sorry that I didn't make it clear enough. The OVSDB probe itself
> doesn't
>   > take much CPU, but the probe awakes ovn-northd main loop, which
>   > recompute everything, which is why you see CPU spike.
>   > It will be solved by incremental-processing, when only delta is
>   > processed, and in case of probe handling, there is no change in
>   > configuration, so the delta is zero.
>   > For now, please follow the steps to adjust probe interval, if the
> CPU of
>   > ovn-northd (when there is no configuration change) is a concern
> for you.
>   > But please remember that this has no impact to the real CPU usage
> for
>   > handling configuration changes.
>   >
>   >
>   > Thanks,
>   > Han
>   >
>   >
>   > On Mon, Aug 3, 2020 at 8:11 PM Tony Liu  <mailto:tonyliu0...@hotmail.com>
>   > <mailto:tonyliu0...@hotmail.com
> <mailto:tonyliu0...@hotmail.com> > > wrote:
>   >
>   >
>   >   Health check (5 sec internal) taking 30%-100% CPU is
> definitely not
>   > acceptable,
>   >   if that's really the case. There must be some blocking (and
> not
>   > yielding CPU)
>   >   in coding, which is not supposed to be there.
>   >
>   >   Could you point me to the coding for such health check?
>   >   Is it single thread? Does it use any event library?
>   >
>   >
>   >   Thanks!
>   >
>   >   Tony
>   >
>   >   > -Original Message-
>   >   > From: Han Zhou mailto:hz...@ovn.org>
> <mailto:hz...@ovn.org <mailto:hz...@ovn.org> > >
>   >   > Sent: Saturday, August 1, 2020 9:11 PM
>   >   > To: Tony Liu  <mailto:tonyliu0...@hotmail.com>
>   > <mailto:tonyliu0...@hotmail.com
> <mailto:tonyliu0...@hotmail.com> > >
>   >   > Cc: ovs-discuss mailto:ovs-
> disc...@openvswitch.org>  <mailto:ovs- <mailto:ovs->
>   > disc...@openvswitch.org <mailto:disc...@openvswitch.org> > >;
> ovs-dev>   > d...@openvswitch.org <mailto:d...@openvswitch.org>
> <mailto:d...@openvswitch.org <mailto:d...@openvswitch.org> > >
>   >   > Subject: Re: [ovs-discuss] [OVN] ovn-northd takes much
> CPU when
>   > no
>   >   > configuration update
>   >   >
>   >   >
>   >   >
>   >   > On Fri, Jul 31, 2020 at 4:14 PM Tony Liu
> mailto:tonyliu0...@hotmail.com>
>   > <mailto:tonyliu0...@hotmail.com <mailto:tonyliu0...@hotmail.com> >
>   >   > <mailto:tonyliu0...@hotmail.com
> <mailto:tonyliu0...@hotmail.com>
>   > <mailto:tonyliu0...@hotmail.com
> <mailto:tonyliu0...@hotmail.com> > > > wrote:
>   >   >
>   >

Re: [ovs-discuss] [OVN] ovn-northd takes much CPU when no configuration update

2020-08-04 Thread Numan Siddique
On Tue, Aug 4, 2020 at 9:02 AM Tony Liu  wrote:

> The probe awakes recomputing?
> There is probe every 5 seconds. Without any connection up/down or failover,
> ovn-northd will recompute everything every 5 seconds, no matter what?
> Really?
>
> Anyways, I will increase the probe interval for now, see if that helps.
>

I think we should optimise this case. I am planning to look into this.

Thanks
Numan


>
>
> Thanks!
>
> Tony
>
> > -Original Message-
> > From: Han Zhou 
> > Sent: Monday, August 3, 2020 8:22 PM
> > To: Tony Liu 
> > Cc: Han Zhou ; ovs-discuss ;
> > ovs-dev 
> > Subject: Re: [ovs-discuss] [OVN] ovn-northd takes much CPU when no
> > configuration update
> >
> > Sorry that I didn't make it clear enough. The OVSDB probe itself doesn't
> > take much CPU, but the probe awakes ovn-northd main loop, which
> > recompute everything, which is why you see CPU spike.
> > It will be solved by incremental-processing, when only delta is
> > processed, and in case of probe handling, there is no change in
> > configuration, so the delta is zero.
> > For now, please follow the steps to adjust probe interval, if the CPU of
> > ovn-northd (when there is no configuration change) is a concern for you.
> > But please remember that this has no impact to the real CPU usage for
> > handling configuration changes.
> >
> >
> > Thanks,
> > Han
> >
> >
> > On Mon, Aug 3, 2020 at 8:11 PM Tony Liu  > <mailto:tonyliu0...@hotmail.com> > wrote:
> >
> >
> >   Health check (5 sec internal) taking 30%-100% CPU is definitely not
> > acceptable,
> >   if that's really the case. There must be some blocking (and not
> > yielding CPU)
> >   in coding, which is not supposed to be there.
> >
> >   Could you point me to the coding for such health check?
> >   Is it single thread? Does it use any event library?
> >
> >
> >   Thanks!
> >
> >   Tony
> >
> >   > -Original Message-
> >   > From: Han Zhou mailto:hz...@ovn.org> >
> >   > Sent: Saturday, August 1, 2020 9:11 PM
> >   > To: Tony Liu  > <mailto:tonyliu0...@hotmail.com> >
> >   > Cc: ovs-discuss mailto:ovs-
> > disc...@openvswitch.org> >; ovs-dev  >   > d...@openvswitch.org <mailto:d...@openvswitch.org> >
> >   > Subject: Re: [ovs-discuss] [OVN] ovn-northd takes much CPU when
> > no
> >   > configuration update
> >   >
> >   >
> >   >
> >   > On Fri, Jul 31, 2020 at 4:14 PM Tony Liu <
> tonyliu0...@hotmail.com
> > <mailto:tonyliu0...@hotmail.com>
> >   > <mailto:tonyliu0...@hotmail.com
> > <mailto:tonyliu0...@hotmail.com> > > wrote:
> >   >
> >   >
> >   >   Hi,
> >   >
> >   >   I see the active ovn-northd takes much CPU (30% - 100%)
> > when there
> >   > is no
> >   >   configuration from OpenStack, nothing happening on all
> > chassis
> >   > nodes either.
> >   >
> >   >   Is this expected? What is it busy with?
> >   >
> >   >
> >   >
> >   >
> >   > Yes, this is expected. It is due to the OVSDB probe between ovn-
> > northd
> >   > and NB/SB OVSDB servers, which is used to detect the OVSDB
> > connection
> >   > failure.
> >   > Usually this is not a concern (unlike the probe with a large
> > number of
> >   > ovn-controller clients), because ovn-northd is a centralized
> > component
> >   > and the CPU cost when there is no configuration change doesn't
> > matter
> >   > that much. However, if it is a concern, the probe interval
> > (default 5
> >   > sec) can be changed.
> >   > If you change, remember to change on both server side and client
> > side.
> >   > For client side (ovn-northd), it is configured in the NB DB's
> > NB_Global
> >   > table's options:northd_probe_interval. See man page of ovn-nb(5).
> >   > For server side (NB and SB), it is configured in the NB and SB
> > DB's
> >   > Connection table's inactivity_probe column.
> >   >
> >   > Thanks,
> >   > Han
> >   >
> >   >
> >   >
> >   >   
> >   >   2020-07-31T23:08:09.511Z|04

Re: [ovs-discuss] [OVN] ovn-northd takes much CPU when no configuration update

2020-08-03 Thread Tony Liu
The probe awakes recomputing?
There is probe every 5 seconds. Without any connection up/down or failover,
ovn-northd will recompute everything every 5 seconds, no matter what?
Really?

Anyways, I will increase the probe interval for now, see if that helps.


Thanks!

Tony

> -Original Message-
> From: Han Zhou 
> Sent: Monday, August 3, 2020 8:22 PM
> To: Tony Liu 
> Cc: Han Zhou ; ovs-discuss ;
> ovs-dev 
> Subject: Re: [ovs-discuss] [OVN] ovn-northd takes much CPU when no
> configuration update
> 
> Sorry that I didn't make it clear enough. The OVSDB probe itself doesn't
> take much CPU, but the probe awakes ovn-northd main loop, which
> recompute everything, which is why you see CPU spike.
> It will be solved by incremental-processing, when only delta is
> processed, and in case of probe handling, there is no change in
> configuration, so the delta is zero.
> For now, please follow the steps to adjust probe interval, if the CPU of
> ovn-northd (when there is no configuration change) is a concern for you.
> But please remember that this has no impact to the real CPU usage for
> handling configuration changes.
> 
> 
> Thanks,
> Han
> 
> 
> On Mon, Aug 3, 2020 at 8:11 PM Tony Liu  <mailto:tonyliu0...@hotmail.com> > wrote:
> 
> 
>   Health check (5 sec internal) taking 30%-100% CPU is definitely not
> acceptable,
>   if that's really the case. There must be some blocking (and not
> yielding CPU)
>   in coding, which is not supposed to be there.
> 
>   Could you point me to the coding for such health check?
>   Is it single thread? Does it use any event library?
> 
> 
>   Thanks!
> 
>   Tony
> 
>   > -Original Message-
>   > From: Han Zhou mailto:hz...@ovn.org> >
>   > Sent: Saturday, August 1, 2020 9:11 PM
>   > To: Tony Liu  <mailto:tonyliu0...@hotmail.com> >
>       > Cc: ovs-discuss mailto:ovs-
> disc...@openvswitch.org> >; ovs-dev> d...@openvswitch.org <mailto:d...@openvswitch.org> >
>   > Subject: Re: [ovs-discuss] [OVN] ovn-northd takes much CPU when
> no
>   > configuration update
>   >
>   >
>   >
>   > On Fri, Jul 31, 2020 at 4:14 PM Tony Liu  <mailto:tonyliu0...@hotmail.com>
>   > <mailto:tonyliu0...@hotmail.com
> <mailto:tonyliu0...@hotmail.com> > > wrote:
>   >
>   >
>   >   Hi,
>   >
>   >   I see the active ovn-northd takes much CPU (30% - 100%)
> when there
>   > is no
>   >   configuration from OpenStack, nothing happening on all
> chassis
>   > nodes either.
>   >
>   >   Is this expected? What is it busy with?
>   >
>   >
>   >
>   >
>   > Yes, this is expected. It is due to the OVSDB probe between ovn-
> northd
>   > and NB/SB OVSDB servers, which is used to detect the OVSDB
> connection
>   > failure.
>   > Usually this is not a concern (unlike the probe with a large
> number of
>   > ovn-controller clients), because ovn-northd is a centralized
> component
>   > and the CPU cost when there is no configuration change doesn't
> matter
>   > that much. However, if it is a concern, the probe interval
> (default 5
>   > sec) can be changed.
>   > If you change, remember to change on both server side and client
> side.
>   > For client side (ovn-northd), it is configured in the NB DB's
> NB_Global
>   > table's options:northd_probe_interval. See man page of ovn-nb(5).
>   > For server side (NB and SB), it is configured in the NB and SB
> DB's
>   > Connection table's inactivity_probe column.
>   >
>   > Thanks,
>   > Han
>   >
>   >
>   >
>   >   
>   >   2020-07-31T23:08:09.511Z|04267|poll_loop|DBG|wakeup due to
> [POLLIN]
>   > on fd 8 (10.6.20.84:44358 <http://10.6.20.84:44358>
> <http://10.6.20.84:44358> <->10.6.20.84:6641 <http://10.6.20.84:6641>
>   > <http://10.6.20.84:6641> ) at lib/stream-fd.c:157 (68% CPU usage)
>   >   2020-07-
> 31T23:08:09.512Z|04268|jsonrpc|DBG|tcp:10.6.20.84:6641
> <http://10.6.20.84:6641>
>   > <http://10.6.20.84:6641> : received request, method="echo",
> params=[],
>   > id="echo"
>   >   2020-07-
> 31T23:08:09.512Z|04269|jsonrpc|DBG|tcp:10.6.20.84:6641
> <http://10.6.20.84:6641>
>   > <http://10.6.20.84:6641> : send reply

Re: [ovs-discuss] [OVN] ovn-northd takes much CPU when no configuration update

2020-08-03 Thread Han Zhou
Sorry that I didn't make it clear enough. The OVSDB probe itself doesn't
take much CPU, but the probe awakes ovn-northd main loop, which recompute
everything, which is why you see CPU spike.
It will be solved by incremental-processing, when only delta is processed,
and in case of probe handling, there is no change in configuration, so the
delta is zero.
For now, please follow the steps to adjust probe interval, if the CPU of
ovn-northd (when there is no configuration change) is a concern for you.
But please remember that this has no impact to the real CPU usage for
handling configuration changes.

Thanks,
Han

On Mon, Aug 3, 2020 at 8:11 PM Tony Liu  wrote:

> Health check (5 sec internal) taking 30%-100% CPU is definitely not
> acceptable,
> if that's really the case. There must be some blocking (and not yielding
> CPU)
> in coding, which is not supposed to be there.
>
> Could you point me to the coding for such health check?
> Is it single thread? Does it use any event library?
>
>
> Thanks!
>
> Tony
>
> > -Original Message-
> > From: Han Zhou 
> > Sent: Saturday, August 1, 2020 9:11 PM
> > To: Tony Liu 
> > Cc: ovs-discuss ; ovs-dev  > d...@openvswitch.org>
> > Subject: Re: [ovs-discuss] [OVN] ovn-northd takes much CPU when no
> > configuration update
> >
> >
> >
> > On Fri, Jul 31, 2020 at 4:14 PM Tony Liu  > <mailto:tonyliu0...@hotmail.com> > wrote:
> >
> >
> >   Hi,
> >
> >   I see the active ovn-northd takes much CPU (30% - 100%) when there
> > is no
> >   configuration from OpenStack, nothing happening on all chassis
> > nodes either.
> >
> >   Is this expected? What is it busy with?
> >
> >
> >
> >
> > Yes, this is expected. It is due to the OVSDB probe between ovn-northd
> > and NB/SB OVSDB servers, which is used to detect the OVSDB connection
> > failure.
> > Usually this is not a concern (unlike the probe with a large number of
> > ovn-controller clients), because ovn-northd is a centralized component
> > and the CPU cost when there is no configuration change doesn't matter
> > that much. However, if it is a concern, the probe interval (default 5
> > sec) can be changed.
> > If you change, remember to change on both server side and client side.
> > For client side (ovn-northd), it is configured in the NB DB's NB_Global
> > table's options:northd_probe_interval. See man page of ovn-nb(5).
> > For server side (NB and SB), it is configured in the NB and SB DB's
> > Connection table's inactivity_probe column.
> >
> > Thanks,
> > Han
> >
> >
> >
> >   
> >   2020-07-31T23:08:09.511Z|04267|poll_loop|DBG|wakeup due to [POLLIN]
> > on fd 8 (10.6.20.84:44358 <http://10.6.20.84:44358> <->10.6.20.84:6641
> > <http://10.6.20.84:6641> ) at lib/stream-fd.c:157 (68% CPU usage)
> >   2020-07-31T23:08:09.512Z|04268|jsonrpc|DBG|tcp:10.6.20.84:6641
> > <http://10.6.20.84:6641> : received request, method="echo", params=[],
> > id="echo"
> >   2020-07-31T23:08:09.512Z|04269|jsonrpc|DBG|tcp:10.6.20.84:6641
> > <http://10.6.20.84:6641> : send reply, result=[], id="echo"
> >   2020-07-31T23:08:12.777Z|04270|poll_loop|DBG|wakeup due to [POLLIN]
> > on fd 9 (10.6.20.84:49158 <http://10.6.20.84:49158> <->10.6.20.85:6642
> > <http://10.6.20.85:6642> ) at lib/stream-fd.c:157 (34% CPU usage)
> >   2020-07-31T23:08:12.777Z|04271|reconnect|DBG|tcp:10.6.20.85:6642
> > <http://10.6.20.85:6642> : idle 5002 ms, sending inactivity probe
> >   2020-07-31T23:08:12.777Z|04272|reconnect|DBG|tcp:10.6.20.85:6642
> > <http://10.6.20.85:6642> : entering IDLE
> >   2020-07-31T23:08:12.777Z|04273|jsonrpc|DBG|tcp:10.6.20.85:6642
> > <http://10.6.20.85:6642> : send request, method="echo", params=[],
> > id="echo"
> >   2020-07-31T23:08:12.777Z|04274|jsonrpc|DBG|tcp:10.6.20.85:6642
> > <http://10.6.20.85:6642> : received request, method="echo", params=[],
> > id="echo"
> >   2020-07-31T23:08:12.777Z|04275|reconnect|DBG|tcp:10.6.20.85:6642
> > <http://10.6.20.85:6642> : entering ACTIVE
> >   2020-07-31T23:08:12.777Z|04276|jsonrpc|DBG|tcp:10.6.20.85:6642
> > <http://10.6.20.85:6642> : send reply, result=[], id="echo"
> >   2020-07-31T23:08:13.635Z|04277|poll_loop|DBG|wakeup due to [POLLIN]
> > on fd 9 (10.6.20.84:49158 <http://10.6.20.84:49158> <->10.6.20.85:6642
> > <http://10.6.20.85:6642> ) at

Re: [ovs-discuss] [OVN] ovn-northd takes much CPU when no configuration update

2020-08-03 Thread Tony Liu
Health check (5 sec internal) taking 30%-100% CPU is definitely not acceptable,
if that's really the case. There must be some blocking (and not yielding CPU)
in coding, which is not supposed to be there.

Could you point me to the coding for such health check?
Is it single thread? Does it use any event library?


Thanks!

Tony

> -Original Message-
> From: Han Zhou 
> Sent: Saturday, August 1, 2020 9:11 PM
> To: Tony Liu 
> Cc: ovs-discuss ; ovs-dev  d...@openvswitch.org>
> Subject: Re: [ovs-discuss] [OVN] ovn-northd takes much CPU when no
> configuration update
> 
> 
> 
> On Fri, Jul 31, 2020 at 4:14 PM Tony Liu  <mailto:tonyliu0...@hotmail.com> > wrote:
> 
> 
>   Hi,
> 
>   I see the active ovn-northd takes much CPU (30% - 100%) when there
> is no
>   configuration from OpenStack, nothing happening on all chassis
> nodes either.
> 
>   Is this expected? What is it busy with?
> 
> 
> 
> 
> Yes, this is expected. It is due to the OVSDB probe between ovn-northd
> and NB/SB OVSDB servers, which is used to detect the OVSDB connection
> failure.
> Usually this is not a concern (unlike the probe with a large number of
> ovn-controller clients), because ovn-northd is a centralized component
> and the CPU cost when there is no configuration change doesn't matter
> that much. However, if it is a concern, the probe interval (default 5
> sec) can be changed.
> If you change, remember to change on both server side and client side.
> For client side (ovn-northd), it is configured in the NB DB's NB_Global
> table's options:northd_probe_interval. See man page of ovn-nb(5).
> For server side (NB and SB), it is configured in the NB and SB DB's
> Connection table's inactivity_probe column.
> 
> Thanks,
> Han
> 
> 
> 
>   
>   2020-07-31T23:08:09.511Z|04267|poll_loop|DBG|wakeup due to [POLLIN]
> on fd 8 (10.6.20.84:44358 <http://10.6.20.84:44358> <->10.6.20.84:6641
> <http://10.6.20.84:6641> ) at lib/stream-fd.c:157 (68% CPU usage)
>   2020-07-31T23:08:09.512Z|04268|jsonrpc|DBG|tcp:10.6.20.84:6641
> <http://10.6.20.84:6641> : received request, method="echo", params=[],
> id="echo"
>   2020-07-31T23:08:09.512Z|04269|jsonrpc|DBG|tcp:10.6.20.84:6641
> <http://10.6.20.84:6641> : send reply, result=[], id="echo"
>   2020-07-31T23:08:12.777Z|04270|poll_loop|DBG|wakeup due to [POLLIN]
> on fd 9 (10.6.20.84:49158 <http://10.6.20.84:49158> <->10.6.20.85:6642
> <http://10.6.20.85:6642> ) at lib/stream-fd.c:157 (34% CPU usage)
>   2020-07-31T23:08:12.777Z|04271|reconnect|DBG|tcp:10.6.20.85:6642
> <http://10.6.20.85:6642> : idle 5002 ms, sending inactivity probe
>   2020-07-31T23:08:12.777Z|04272|reconnect|DBG|tcp:10.6.20.85:6642
> <http://10.6.20.85:6642> : entering IDLE
>   2020-07-31T23:08:12.777Z|04273|jsonrpc|DBG|tcp:10.6.20.85:6642
> <http://10.6.20.85:6642> : send request, method="echo", params=[],
> id="echo"
>   2020-07-31T23:08:12.777Z|04274|jsonrpc|DBG|tcp:10.6.20.85:6642
> <http://10.6.20.85:6642> : received request, method="echo", params=[],
> id="echo"
>   2020-07-31T23:08:12.777Z|04275|reconnect|DBG|tcp:10.6.20.85:6642
> <http://10.6.20.85:6642> : entering ACTIVE
>   2020-07-31T23:08:12.777Z|04276|jsonrpc|DBG|tcp:10.6.20.85:6642
> <http://10.6.20.85:6642> : send reply, result=[], id="echo"
>   2020-07-31T23:08:13.635Z|04277|poll_loop|DBG|wakeup due to [POLLIN]
> on fd 9 (10.6.20.84:49158 <http://10.6.20.84:49158> <->10.6.20.85:6642
> <http://10.6.20.85:6642> ) at lib/stream-fd.c:157 (34% CPU usage)
>   2020-07-31T23:08:13.635Z|04278|jsonrpc|DBG|tcp:10.6.20.85:6642
> <http://10.6.20.85:6642> : received reply, result=[], id="echo"
>   2020-07-31T23:08:14.480Z|04279|hmap|DBG|Dropped 129 log messages in
> last 5 seconds (most recently, 0 seconds ago) due to excessive rate
>   2020-07-31T23:08:14.480Z|04280|hmap|DBG|lib/shash.c:112: 2 buckets
> with 6+ nodes, including 2 buckets with 6 nodes (32 nodes total across
> 32 buckets)
>   2020-07-31T23:08:14.513Z|04281|poll_loop|DBG|wakeup due to 27-ms
> timeout at lib/reconnect.c:643 (34% CPU usage)
>   2020-07-31T23:08:14.513Z|04282|reconnect|DBG|tcp:10.6.20.84:6641
> <http://10.6.20.84:6641> : idle 5001 ms, sending inactivity probe
>   2020-07-31T23:08:14.513Z|04283|reconnect|DBG|tcp:10.6.20.84:6641
> <http://10.6.20.84:6641> : entering IDLE
>   2020-07-31T23:08:14.513Z|04284|jsonrpc|DBG|tcp:10.6.20.84:6641
> <http://10.6.20.84:6641> : send request, method="echo", params=[],

Re: [ovs-discuss] [OVN] ovn-northd takes much CPU when no configuration update

2020-08-01 Thread Han Zhou
On Fri, Jul 31, 2020 at 4:14 PM Tony Liu  wrote:

> Hi,
>
> I see the active ovn-northd takes much CPU (30% - 100%) when there is no
> configuration from OpenStack, nothing happening on all chassis nodes
> either.
>
> Is this expected? What is it busy with?
>
>
Yes, this is expected. It is due to the OVSDB probe between ovn-northd and
NB/SB OVSDB servers, which is used to detect the OVSDB connection failure.
Usually this is not a concern (unlike the probe with a large number of
ovn-controller clients), because ovn-northd is a centralized component and
the CPU cost when there is no configuration change doesn't matter that
much. However, if it is a concern, the probe interval (default 5 sec) can
be changed.
If you change, remember to change on both server side and client side.
For client side (ovn-northd), it is configured in the NB DB's NB_Global
table's options:northd_probe_interval. See man page of ovn-nb(5).
For server side (NB and SB), it is configured in the NB and SB DB's
Connection table's inactivity_probe column.

Thanks,
Han


> 
> 2020-07-31T23:08:09.511Z|04267|poll_loop|DBG|wakeup due to [POLLIN] on fd
> 8 (10.6.20.84:44358<->10.6.20.84:6641) at lib/stream-fd.c:157 (68% CPU
> usage)
> 2020-07-31T23:08:09.512Z|04268|jsonrpc|DBG|tcp:10.6.20.84:6641: received
> request, method="echo", params=[], id="echo"
> 2020-07-31T23:08:09.512Z|04269|jsonrpc|DBG|tcp:10.6.20.84:6641: send
> reply, result=[], id="echo"
> 2020-07-31T23:08:12.777Z|04270|poll_loop|DBG|wakeup due to [POLLIN] on fd
> 9 (10.6.20.84:49158<->10.6.20.85:6642) at lib/stream-fd.c:157 (34% CPU
> usage)
> 2020-07-31T23:08:12.777Z|04271|reconnect|DBG|tcp:10.6.20.85:6642: idle
> 5002 ms, sending inactivity probe
> 2020-07-31T23:08:12.777Z|04272|reconnect|DBG|tcp:10.6.20.85:6642:
> entering IDLE
> 2020-07-31T23:08:12.777Z|04273|jsonrpc|DBG|tcp:10.6.20.85:6642: send
> request, method="echo", params=[], id="echo"
> 2020-07-31T23:08:12.777Z|04274|jsonrpc|DBG|tcp:10.6.20.85:6642: received
> request, method="echo", params=[], id="echo"
> 2020-07-31T23:08:12.777Z|04275|reconnect|DBG|tcp:10.6.20.85:6642:
> entering ACTIVE
> 2020-07-31T23:08:12.777Z|04276|jsonrpc|DBG|tcp:10.6.20.85:6642: send
> reply, result=[], id="echo"
> 2020-07-31T23:08:13.635Z|04277|poll_loop|DBG|wakeup due to [POLLIN] on fd
> 9 (10.6.20.84:49158<->10.6.20.85:6642) at lib/stream-fd.c:157 (34% CPU
> usage)
> 2020-07-31T23:08:13.635Z|04278|jsonrpc|DBG|tcp:10.6.20.85:6642: received
> reply, result=[], id="echo"
> 2020-07-31T23:08:14.480Z|04279|hmap|DBG|Dropped 129 log messages in last 5
> seconds (most recently, 0 seconds ago) due to excessive rate
> 2020-07-31T23:08:14.480Z|04280|hmap|DBG|lib/shash.c:112: 2 buckets with 6+
> nodes, including 2 buckets with 6 nodes (32 nodes total across 32 buckets)
> 2020-07-31T23:08:14.513Z|04281|poll_loop|DBG|wakeup due to 27-ms timeout
> at lib/reconnect.c:643 (34% CPU usage)
> 2020-07-31T23:08:14.513Z|04282|reconnect|DBG|tcp:10.6.20.84:6641: idle
> 5001 ms, sending inactivity probe
> 2020-07-31T23:08:14.513Z|04283|reconnect|DBG|tcp:10.6.20.84:6641:
> entering IDLE
> 2020-07-31T23:08:14.513Z|04284|jsonrpc|DBG|tcp:10.6.20.84:6641: send
> request, method="echo", params=[], id="echo"
> 2020-07-31T23:08:15.370Z|04285|poll_loop|DBG|wakeup due to [POLLIN] on fd
> 8 (10.6.20.84:44358<->10.6.20.84:6641) at lib/stream-fd.c:157 (34% CPU
> usage)
> 2020-07-31T23:08:15.370Z|04286|jsonrpc|DBG|tcp:10.6.20.84:6641: received
> request, method="echo", params=[], id="echo"
> 2020-07-31T23:08:15.370Z|04287|reconnect|DBG|tcp:10.6.20.84:6641:
> entering ACTIVE
> 2020-07-31T23:08:15.370Z|04288|jsonrpc|DBG|tcp:10.6.20.84:6641: send
> reply, result=[], id="echo"
> 2020-07-31T23:08:16.236Z|04289|poll_loop|DBG|wakeup due to 0-ms timeout at
> tcp:10.6.20.84:6641 (100% CPU usage)
> 2020-07-31T23:08:16.236Z|04290|jsonrpc|DBG|tcp:10.6.20.84:6641: received
> reply, result=[], id="echo"
> 2020-07-31T23:08:17.778Z|04291|poll_loop|DBG|wakeup due to [POLLIN] on fd
> 9 (10.6.20.84:49158<->10.6.20.85:6642) at lib/stream-fd.c:157 (100% CPU
> usage)
> 2020-07-31T23:08:17.778Z|04292|jsonrpc|DBG|tcp:10.6.20.85:6642: received
> request, method="echo", params=[], id="echo"
> 2020-07-31T23:08:17.778Z|04293|jsonrpc|DBG|tcp:10.6.20.85:6642: send
> reply, result=[], id="echo"
> 2020-07-31T23:08:20.372Z|04294|poll_loop|DBG|wakeup due to [POLLIN] on fd
> 8 (10.6.20.84:44358<->10.6.20.84:6641) at lib/stream-fd.c:157 (41% CPU
> usage)
> 2020-07-31T23:08:20.372Z|04295|reconnect|DBG|tcp:10.6.20.84:6641: idle
> 5002 ms, sending inactivity probe
> 2020-07-31T23:08:20.372Z|04296|reconnect|DBG|tcp:10.6.20.84:6641:
> entering IDLE
> 2020-07-31T23:08:20.372Z|04297|jsonrpc|DBG|tcp:10.6.20.84:6641: send
> request, method="echo", params=[], id="echo"
> 2020-07-31T23:08:20.372Z|04298|jsonrpc|DBG|tcp:10.6.20.84:6641: received
> request, method="echo", params=[], id="echo"
> 2020-07-31T23:08:20.372Z|04299|reconnect|DBG|tcp:10.6.20.84:6641:
> entering ACTIVE
>