Re: [ovs-discuss] [OVN] ovn-northd takes much CPU when no configuration update

Han Zhou Mon, 03 Aug 2020 20:23:46 -0700

Sorry that I didn't make it clear enough. The OVSDB probe itself doesn't
take much CPU, but the probe awakes ovn-northd main loop, which recompute
everything, which is why you see CPU spike.
It will be solved by incremental-processing, when only delta is processed,
and in case of probe handling, there is no change in configuration, so the
delta is zero.
For now, please follow the steps to adjust probe interval, if the CPU of
ovn-northd (when there is no configuration change) is a concern for you.
But please remember that this has no impact to the real CPU usage for
handling configuration changes.


Thanks,
Han

On Mon, Aug 3, 2020 at 8:11 PM Tony Liu <tonyliu0...@hotmail.com> wrote:

> Health check (5 sec internal) taking 30%-100% CPU is definitely not
> acceptable,
> if that's really the case. There must be some blocking (and not yielding
> CPU)
> in coding, which is not supposed to be there.
>
> Could you point me to the coding for such health check?
> Is it single thread? Does it use any event library?
>
>
> Thanks!
>
> Tony
>
> > -----Original Message-----
> > From: Han Zhou <hz...@ovn.org>
> > Sent: Saturday, August 1, 2020 9:11 PM
> > To: Tony Liu <tonyliu0...@hotmail.com>
> > Cc: ovs-discuss <ovs-discuss@openvswitch.org>; ovs-dev <ovs-
> > d...@openvswitch.org>
> > Subject: Re: [ovs-discuss] [OVN] ovn-northd takes much CPU when no
> > configuration update
> >
> >
> >
> > On Fri, Jul 31, 2020 at 4:14 PM Tony Liu <tonyliu0...@hotmail.com
> > <mailto:tonyliu0...@hotmail.com> > wrote:
> >
> >
> >       Hi,
> >
> >       I see the active ovn-northd takes much CPU (30% - 100%) when there
> > is no
> >       configuration from OpenStack, nothing happening on all chassis
> > nodes either.
> >
> >       Is this expected? What is it busy with?
> >
> >
> >
> >
> > Yes, this is expected. It is due to the OVSDB probe between ovn-northd
> > and NB/SB OVSDB servers, which is used to detect the OVSDB connection
> > failure.
> > Usually this is not a concern (unlike the probe with a large number of
> > ovn-controller clients), because ovn-northd is a centralized component
> > and the CPU cost when there is no configuration change doesn't matter
> > that much. However, if it is a concern, the probe interval (default 5
> > sec) can be changed.
> > If you change, remember to change on both server side and client side.
> > For client side (ovn-northd), it is configured in the NB DB's NB_Global
> > table's options:northd_probe_interval. See man page of ovn-nb(5).
> > For server side (NB and SB), it is configured in the NB and SB DB's
> > Connection table's inactivity_probe column.
> >
> > Thanks,
> > Han
> >
> >
> >
> >       ========
> >       2020-07-31T23:08:09.511Z|04267|poll_loop|DBG|wakeup due to [POLLIN]
> > on fd 8 (10.6.20.84:44358 <http://10.6.20.84:44358> <->10.6.20.84:6641
> > <http://10.6.20.84:6641> ) at lib/stream-fd.c:157 (68% CPU usage)
> >       2020-07-31T23:08:09.512Z|04268|jsonrpc|DBG|tcp:10.6.20.84:6641
> > <http://10.6.20.84:6641> : received request, method="echo", params=[],
> > id="echo"
> >       2020-07-31T23:08:09.512Z|04269|jsonrpc|DBG|tcp:10.6.20.84:6641
> > <http://10.6.20.84:6641> : send reply, result=[], id="echo"
> >       2020-07-31T23:08:12.777Z|04270|poll_loop|DBG|wakeup due to [POLLIN]
> > on fd 9 (10.6.20.84:49158 <http://10.6.20.84:49158> <->10.6.20.85:6642
> > <http://10.6.20.85:6642> ) at lib/stream-fd.c:157 (34% CPU usage)
> >       2020-07-31T23:08:12.777Z|04271|reconnect|DBG|tcp:10.6.20.85:6642
> > <http://10.6.20.85:6642> : idle 5002 ms, sending inactivity probe
> >       2020-07-31T23:08:12.777Z|04272|reconnect|DBG|tcp:10.6.20.85:6642
> > <http://10.6.20.85:6642> : entering IDLE
> >       2020-07-31T23:08:12.777Z|04273|jsonrpc|DBG|tcp:10.6.20.85:6642
> > <http://10.6.20.85:6642> : send request, method="echo", params=[],
> > id="echo"
> >       2020-07-31T23:08:12.777Z|04274|jsonrpc|DBG|tcp:10.6.20.85:6642
> > <http://10.6.20.85:6642> : received request, method="echo", params=[],
> > id="echo"
> >       2020-07-31T23:08:12.777Z|04275|reconnect|DBG|tcp:10.6.20.85:6642
> > <http://10.6.20.85:6642> : entering ACTIVE
> >       2020-07-31T23:08:12.777Z|04276|jsonrpc|DBG|tcp:10.6.20.85:6642
> > <http://10.6.20.85:6642> : send reply, result=[], id="echo"
> >       2020-07-31T23:08:13.635Z|04277|poll_loop|DBG|wakeup due to [POLLIN]
> > on fd 9 (10.6.20.84:49158 <http://10.6.20.84:49158> <->10.6.20.85:6642
> > <http://10.6.20.85:6642> ) at lib/stream-fd.c:157 (34% CPU usage)
> >       2020-07-31T23:08:13.635Z|04278|jsonrpc|DBG|tcp:10.6.20.85:6642
> > <http://10.6.20.85:6642> : received reply, result=[], id="echo"
> >       2020-07-31T23:08:14.480Z|04279|hmap|DBG|Dropped 129 log messages in
> > last 5 seconds (most recently, 0 seconds ago) due to excessive rate
> >       2020-07-31T23:08:14.480Z|04280|hmap|DBG|lib/shash.c:112: 2 buckets
> > with 6+ nodes, including 2 buckets with 6 nodes (32 nodes total across
> > 32 buckets)
> >       2020-07-31T23:08:14.513Z|04281|poll_loop|DBG|wakeup due to 27-ms
> > timeout at lib/reconnect.c:643 (34% CPU usage)
> >       2020-07-31T23:08:14.513Z|04282|reconnect|DBG|tcp:10.6.20.84:6641
> > <http://10.6.20.84:6641> : idle 5001 ms, sending inactivity probe
> >       2020-07-31T23:08:14.513Z|04283|reconnect|DBG|tcp:10.6.20.84:6641
> > <http://10.6.20.84:6641> : entering IDLE
> >       2020-07-31T23:08:14.513Z|04284|jsonrpc|DBG|tcp:10.6.20.84:6641
> > <http://10.6.20.84:6641> : send request, method="echo", params=[],
> > id="echo"
> >       2020-07-31T23:08:15.370Z|04285|poll_loop|DBG|wakeup due to [POLLIN]
> > on fd 8 (10.6.20.84:44358 <http://10.6.20.84:44358> <->10.6.20.84:6641
> > <http://10.6.20.84:6641> ) at lib/stream-fd.c:157 (34% CPU usage)
> >       2020-07-31T23:08:15.370Z|04286|jsonrpc|DBG|tcp:10.6.20.84:6641
> > <http://10.6.20.84:6641> : received request, method="echo", params=[],
> > id="echo"
> >       2020-07-31T23:08:15.370Z|04287|reconnect|DBG|tcp:10.6.20.84:6641
> > <http://10.6.20.84:6641> : entering ACTIVE
> >       2020-07-31T23:08:15.370Z|04288|jsonrpc|DBG|tcp:10.6.20.84:6641
> > <http://10.6.20.84:6641> : send reply, result=[], id="echo"
> >       2020-07-31T23:08:16.236Z|04289|poll_loop|DBG|wakeup due to 0-ms
> > timeout at tcp:10.6.20.84:6641 <http://10.6.20.84:6641>  (100% CPU
> usage)
> >       2020-07-31T23:08:16.236Z|04290|jsonrpc|DBG|tcp:10.6.20.84:6641
> > <http://10.6.20.84:6641> : received reply, result=[], id="echo"
> >       2020-07-31T23:08:17.778Z|04291|poll_loop|DBG|wakeup due to [POLLIN]
> > on fd 9 (10.6.20.84:49158 <http://10.6.20.84:49158> <->10.6.20.85:6642
> > <http://10.6.20.85:6642> ) at lib/stream-fd.c:157 (100% CPU usage)
> >       2020-07-31T23:08:17.778Z|04292|jsonrpc|DBG|tcp:10.6.20.85:6642
> > <http://10.6.20.85:6642> : received request, method="echo", params=[],
> > id="echo"
> >       2020-07-31T23:08:17.778Z|04293|jsonrpc|DBG|tcp:10.6.20.85:6642
> > <http://10.6.20.85:6642> : send reply, result=[], id="echo"
> >       2020-07-31T23:08:20.372Z|04294|poll_loop|DBG|wakeup due to [POLLIN]
> > on fd 8 (10.6.20.84:44358 <http://10.6.20.84:44358> <->10.6.20.84:6641
> > <http://10.6.20.84:6641> ) at lib/stream-fd.c:157 (41% CPU usage)
> >       2020-07-31T23:08:20.372Z|04295|reconnect|DBG|tcp:10.6.20.84:6641
> > <http://10.6.20.84:6641> : idle 5002 ms, sending inactivity probe
> >       2020-07-31T23:08:20.372Z|04296|reconnect|DBG|tcp:10.6.20.84:6641
> > <http://10.6.20.84:6641> : entering IDLE
> >       2020-07-31T23:08:20.372Z|04297|jsonrpc|DBG|tcp:10.6.20.84:6641
> > <http://10.6.20.84:6641> : send request, method="echo", params=[],
> > id="echo"
> >       2020-07-31T23:08:20.372Z|04298|jsonrpc|DBG|tcp:10.6.20.84:6641
> > <http://10.6.20.84:6641> : received request, method="echo", params=[],
> > id="echo"
> >       2020-07-31T23:08:20.372Z|04299|reconnect|DBG|tcp:10.6.20.84:6641
> > <http://10.6.20.84:6641> : entering ACTIVE
> >       2020-07-31T23:08:20.372Z|04300|jsonrpc|DBG|tcp:10.6.20.84:6641
> > <http://10.6.20.84:6641> : send reply, result=[], id="echo"
> >       2020-07-31T23:08:20.376Z|04301|hmap|DBG|Dropped 181 log messages in
> > last 6 seconds (most recently, 1 seconds ago) due to excessive rate
> >       2020-07-31T23:08:20.376Z|04302|hmap|DBG|northd/ovn-northd.c:595: 2
> > buckets with 6+ nodes, including 2 buckets with 6 nodes (256 nodes total
> > across 256 buckets)
> >       2020-07-31T23:08:21.222Z|04303|poll_loop|DBG|wakeup due to [POLLIN]
> > on fd 8 (10.6.20.84:44358 <http://10.6.20.84:44358> <->10.6.20.84:6641
> > <http://10.6.20.84:6641> ) at lib/stream-fd.c:157 (41% CPU usage)
> >       2020-07-31T23:08:21.223Z|04304|jsonrpc|DBG|tcp:10.6.20.84:6641
> > <http://10.6.20.84:6641> : received reply, result=[], id="echo"
> >       2020-07-31T23:08:22.779Z|04305|poll_loop|DBG|wakeup due to 706-ms
> > timeout at lib/reconnect.c:643 (41% CPU usage)
> >       2020-07-31T23:08:22.779Z|04306|reconnect|DBG|tcp:10.6.20.85:6642
> > <http://10.6.20.85:6642> : idle 5001 ms, sending inactivity probe
> >       2020-07-31T23:08:22.779Z|04307|reconnect|DBG|tcp:10.6.20.85:6642
> > <http://10.6.20.85:6642> : entering IDLE
> >       2020-07-31T23:08:22.779Z|04308|jsonrpc|DBG|tcp:10.6.20.85:6642
> > <http://10.6.20.85:6642> : send request, method="echo", params=[],
> > id="echo"
> >       2020-07-31T23:08:23.632Z|04309|poll_loop|DBG|wakeup due to [POLLIN]
> > on fd 9 (10.6.20.84:49158 <http://10.6.20.84:49158> <->10.6.20.85:6642
> > <http://10.6.20.85:6642> ) at lib/stream-fd.c:157 (79% CPU usage)
> >       2020-07-31T23:08:23.632Z|04310|jsonrpc|DBG|tcp:10.6.20.85:6642
> > <http://10.6.20.85:6642> : received request, method="echo", params=[],
> > id="echo"
> >       2020-07-31T23:08:23.632Z|04311|reconnect|DBG|tcp:10.6.20.85:6642
> > <http://10.6.20.85:6642> : entering ACTIVE
> >       2020-07-31T23:08:23.632Z|04312|jsonrpc|DBG|tcp:10.6.20.85:6642
> > <http://10.6.20.85:6642> : send reply, result=[], id="echo"
> >       2020-07-31T23:08:24.481Z|04313|poll_loop|DBG|wakeup due to 0-ms
> > timeout at tcp:10.6.20.85:6642 <http://10.6.20.85:6642>  (79% CPU usage)
> >       2020-07-31T23:08:24.481Z|04314|jsonrpc|DBG|tcp:10.6.20.85:6642
> > <http://10.6.20.85:6642> : received reply, result=[], id="echo"
> >       2020-07-31T23:08:25.372Z|04315|poll_loop|DBG|wakeup due to [POLLIN]
> > on fd 8 (10.6.20.84:44358 <http://10.6.20.84:44358> <->10.6.20.84:6641
> > <http://10.6.20.84:6641> ) at lib/stream-fd.c:157 (79% CPU usage)
> >       2020-07-31T23:08:25.372Z|04316|jsonrpc|DBG|tcp:10.6.20.84:6641
> > <http://10.6.20.84:6641> : received request, method="echo", params=[],
> > id="echo"
> >       2020-07-31T23:08:25.372Z|04317|jsonrpc|DBG|tcp:10.6.20.84:6641
> > <http://10.6.20.84:6641> : send reply, result=[], id="echo"
> >       2020-07-31T23:08:28.634Z|04318|poll_loop|DBG|wakeup due to [POLLIN]
> > on fd 9 (10.6.20.84:49158 <http://10.6.20.84:49158> <->10.6.20.85:6642
> > <http://10.6.20.85:6642> ) at lib/stream-fd.c:157 (50% CPU usage)
> >       2020-07-31T23:08:28.634Z|04319|reconnect|DBG|tcp:10.6.20.85:6642
> > <http://10.6.20.85:6642> : idle 5002 ms, sending inactivity probe
> >       2020-07-31T23:08:28.634Z|04320|reconnect|DBG|tcp:10.6.20.85:6642
> > <http://10.6.20.85:6642> : entering IDLE
> >       2020-07-31T23:08:28.634Z|04321|jsonrpc|DBG|tcp:10.6.20.85:6642
> > <http://10.6.20.85:6642> : send request, method="echo", params=[],
> > id="echo"
> >       2020-07-31T23:08:28.634Z|04322|jsonrpc|DBG|tcp:10.6.20.85:6642
> > <http://10.6.20.85:6642> : received request, method="echo", params=[],
> > id="echo"
> >       2020-07-31T23:08:28.634Z|04323|reconnect|DBG|tcp:10.6.20.85:6642
> > <http://10.6.20.85:6642> : entering ACTIVE
> >       2020-07-31T23:08:28.634Z|04324|jsonrpc|DBG|tcp:10.6.20.85:6642
> > <http://10.6.20.85:6642> : send reply, result=[], id="echo"
> >       2020-07-31T23:08:28.638Z|04325|hmap|DBG|Dropped 266 log messages in
> > last 9 seconds (most recently, 3 seconds ago) due to excessive rate
> >       2020-07-31T23:08:28.638Z|04326|hmap|DBG|northd/ovn-northd.c:595: 2
> > buckets with 6+ nodes, including 2 buckets with 6 nodes (256 nodes total
> > across 256 buckets)
> >       ========
> >
> >       Thanks!
> >
> >       Tony
> >
> >       _______________________________________________
> >       discuss mailing list
> >       disc...@openvswitch.org <mailto:disc...@openvswitch.org>
> >       https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
> >
>
>

_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Re: [ovs-discuss] [OVN] ovn-northd takes much CPU when no configuration update

Reply via email to