Enabled debug logging, there are tons of messages.
Note there are 4353 datapath bindings and 13078 port bindings in SB.
4097 LS, 8470 LSP, 256 LR and 4352 LRP in NB. Every 16 LS connect to
a router. All routers connect to the external network.

ovn-controller on compute node is good. The ovn-controller on gateway
node is taking 100% cpu. It's probably related to the ports on the
external network? Any specific messages I need to check?

Any hint to look into it is appreciated!


Thanks!

Tony
> -----Original Message-----
> From: Han Zhou <zhou...@gmail.com>
> Sent: Friday, August 7, 2020 12:39 PM
> To: Tony Liu <tonyliu0...@hotmail.com>
> Cc: ovs-discuss <ovs-discuss@openvswitch.org>; ovs-dev <ovs-
> d...@openvswitch.org>
> Subject: Re: [ovs-discuss] [OVN] ovn-controller takes 100% cpu while no
> changes in sb-db
> 
> 
> 
> On Fri, Aug 7, 2020 at 12:35 PM Tony Liu <tonyliu0...@hotmail.com
> <mailto:tonyliu0...@hotmail.com> > wrote:
> 
> 
>       Inline...
> 
>       Thanks!
> 
>       Tony
>       > -----Original Message-----
>       > From: Han Zhou <zhou...@gmail.com <mailto:zhou...@gmail.com> >
>       > Sent: Friday, August 7, 2020 12:29 PM
>       > To: Tony Liu <tonyliu0...@hotmail.com
> <mailto:tonyliu0...@hotmail.com> >
>       > Cc: ovs-discuss <ovs-discuss@openvswitch.org <mailto:ovs-
> disc...@openvswitch.org> >; ovs-dev <ovs-
>       > d...@openvswitch.org <mailto:d...@openvswitch.org> >
>       > Subject: Re: [ovs-discuss] [OVN] ovn-controller takes 100% cpu
> while no
>       > changes in sb-db
>       >
>       >
>       >
>       > On Fri, Aug 7, 2020 at 12:19 PM Tony Liu <tonyliu0...@hotmail.com
> <mailto:tonyliu0...@hotmail.com>
>       > <mailto:tonyliu0...@hotmail.com
> <mailto:tonyliu0...@hotmail.com> > > wrote:
>       >
>       >
>       >       ovn-controller is using UNIX socket connecting to local
> ovsdb-
>       > server.
>       >
>       > From the log you were showing, you were using tcp:127.0.0.1:6640
> <http://127.0.0.1:6640>
> 
>       Sorry, what I meant was, given your advice, I just made the change
> for
>       ovn-controller to use UNIX socket.
> 
> 
> 
> Oh, I see, no worries.
> 
> 
>       > <http://127.0.0.1:6640>  to connect the local ovsdb.
>       > >       2020-08-
> 07T16:38:04.022Z|29253|reconnect|WARN|tcp:127.0.0.1:6640
> <http://127.0.0.1:6640>
>       > > <http://127.0.0.1:6640> <http://127.0.0.1:6640> : connection
> dropped
>       > > (Broken pipe)
>       >
>       >
>       >       Inactivity probe doesn't seem to be the cause of high cpu
> usage.
>       >
>       >       The wakeup on connection to sb-db is always followed by a
>       > "unreasonably
>       >       long" warning. I guess the pollin event loop is stuck for
> too long,
>       > like
>       >       10s as below.
>       >       ========
>       >       2020-08-07T18:46:49.301Z|00296|poll_loop|INFO|wakeup due to
> [POLLIN]
>       > on fd 19 (10.6.20.91:60712 <http://10.6.20.91:60712>
> <http://10.6.20.91:60712> <->10.6.20.86:6642 <http://10.6.20.86:6642>
>       > <http://10.6.20.86:6642> ) at lib/stream-fd.c:157 (99% CPU usage)
>       >       2020-08-07T18:46:59.460Z|00297|timeval|WARN|Unreasonably
> long
>       > 10153ms poll interval (10075ms user, 1ms system)
>       >       ========
>       >
>       >       Could that stuck loop be the cause of high cpu usage?
>       >       What is it polling in?
>       >       Why is it stuck, waiting for message from sb-db?
>       >       Isn't it supposed to release the cpu while waiting?
>       >
>       >
>       >
>       > This log means there are messages received from 10.6.20.86:6642
> <http://10.6.20.86:6642>
>       > <http://10.6.20.86:6642>  (the SB DB). Is there SB change? The
> CPU is
>       > spent on handling the SB change. Some type of SB changes are not
> handled
>       > incrementally.
> 
>       SB update is driven by ovn-northd in case anything changed in NB,
>       and ovn-controller in case anything changed on chassis. No, there
>       is nothing changed in NB, neither chassis.
> 
>       Should I bump logging level up to dbg? Is that going to show me
>       what messages ovn-controller is handling?
> 
> 
> 
> Yes, debug log should show the details.
> 
> 
> 
>       >
>       >       Thanks!
>       >
>       >       Tony
>       >
>       >       > -----Original Message-----
>       >       > From: Han Zhou <zhou...@gmail.com
> <mailto:zhou...@gmail.com>  <mailto:zhou...@gmail.com
> <mailto:zhou...@gmail.com> > >
>       >       > Sent: Friday, August 7, 2020 10:32 AM
>       >       > To: Tony Liu <tonyliu0...@hotmail.com
> <mailto:tonyliu0...@hotmail.com>
>       > <mailto:tonyliu0...@hotmail.com
> <mailto:tonyliu0...@hotmail.com> > >
>       >       > Cc: ovs-discuss <ovs-discuss@openvswitch.org <mailto:ovs-
> disc...@openvswitch.org>  <mailto:ovs- <mailto:ovs->
>       > disc...@openvswitch.org <mailto:disc...@openvswitch.org> > >;
> ovs-dev <ovs-
>       >       > d...@openvswitch.org <mailto:d...@openvswitch.org>
> <mailto:d...@openvswitch.org <mailto:d...@openvswitch.org> > >
>       >       > Subject: Re: [ovs-discuss] [OVN] ovn-controller takes 100%
> cpu
>       > while no
>       >       > changes in sb-db
>       >       >
>       >       >
>       >       >
>       >       > On Fri, Aug 7, 2020 at 10:05 AM Tony Liu
> <tonyliu0...@hotmail.com <mailto:tonyliu0...@hotmail.com>
>       > <mailto:tonyliu0...@hotmail.com <mailto:tonyliu0...@hotmail.com> >
>       >       > <mailto:tonyliu0...@hotmail.com
> <mailto:tonyliu0...@hotmail.com>
>       > <mailto:tonyliu0...@hotmail.com
> <mailto:tonyliu0...@hotmail.com> > > > wrote:
>       >       >
>       >       >
>       >       >       Hi,
>       >       >
>       >       >       Here are some logging snippets from ovn-controller.
>       >       >       ========
>       >       >       2020-08-
> 07T16:38:04.020Z|29250|timeval|WARN|Unreasonably
>       > long
>       >       > 8954ms poll interval (8895ms user, 0ms system)
>       >       >       ========
>       >       >       What's that mean? Is it harmless?
>       >       >
>       >       >       ========
>       >       >       2020-08-07T16:38:04.021Z|29251|timeval|WARN|context
>       > switches: 0
>       >       > voluntary, 6 involuntary
>       >       >       2020-08-
> 07T16:38:04.022Z|29252|poll_loop|INFO|wakeup due to
>       > [POLLIN]
>       >       > on fd 19 (10.6.20.91:60398 <http://10.6.20.91:60398>
> <http://10.6.20.91:60398>
>       > <http://10.6.20.91:60398> <->10.6.20.86:6642
> <http://10.6.20.86:6642>  <http://10.6.20.86:6642>
>       >       > <http://10.6.20.86:6642> ) at lib/stream-fd.c:157 (99%
> CPU usage)
>       >       >       ========
>       >       >       Is this wakeup caused by changes in sb-db?
>       >       >       Why is ovn-controller so busy?
>       >       >
>       >       >       ========
>       >       >       2020-08-
>       > 07T16:38:04.022Z|29253|reconnect|WARN|tcp:127.0.0.1:6640
> <http://127.0.0.1:6640>
>       > <http://127.0.0.1:6640>
>       >       > <http://127.0.0.1:6640> : connection dropped (Broken pipe)
>       >       >       ========
>       >       >       Connection to local ovsdb-server is dropped.
>       >       >       Is this caused by the timeout of inactivity probe?
>       >       >
>       >       >       ========
>       >       >       2020-08-
> 07T16:38:04.035Z|29254|poll_loop|INFO|wakeup due to
>       > [POLLIN]
>       >       > on fd 20 (<->/var/run/openvswitch/br-int.mgmt) at
> lib/stream-
>       > fd.c:157
>       >       > (99% CPU usage)
>       >       >       ========
>       >       >       What causes this wakeup?
>       >       >
>       >       >       ========
>       >       >       2020-08-
> 07T16:38:04.048Z|29255|poll_loop|INFO|wakeup due to
>       > 0-ms
>       >       > timeout at lib/ovsdb-idl.c:5391 (99% CPU usage)
>       >       >       ========
>       >       >       What's this 0-ms wakeup mean?
>       >       >
>       >       >       ========
>       >       >       2020-08-
> 07T16:38:05.022Z|29256|poll_loop|INFO|wakeup due to
>       > 962-ms
>       >       > timeout at lib/reconnect.c:643 (99% CPU usage)
>       >       >       2020-08-
>       > 07T16:38:05.023Z|29257|reconnect|INFO|tcp:127.0.0.1:6640
> <http://127.0.0.1:6640>
>       > <http://127.0.0.1:6640>
>       >       > <http://127.0.0.1:6640> : connecting...
>       >       >       2020-08-
> 07T16:38:05.041Z|29258|poll_loop|INFO|wakeup due to
>       >       > [POLLOUT] on fd 14 (127.0.0.1:51478
> <http://127.0.0.1:51478>  <http://127.0.0.1:51478>
>       > <http://127.0.0.1:51478> <-
>       >       > >127.0.0.1:6640 <http://127.0.0.1:6640>
> <http://127.0.0.1:6640>  <http://127.0.0.1:6640> )
>       > at lib/stream-fd.c:153 (99%
>       >       > CPU usage)
>       >       >       2020-08-
>       > 07T16:38:05.041Z|29259|reconnect|INFO|tcp:127.0.0.1:6640
> <http://127.0.0.1:6640>
>       > <http://127.0.0.1:6640>
>       >       > <http://127.0.0.1:6640> : connected
>       >       >       ================
>       >       >       Retry to connect to local ovsdb-server. A pollout
> event is
>       >       > triggered
>       >       >       right after connection is established. What's
> poolout?
>       >       >
>       >       >       ovn-controller is taking 100% CPU now, and there is
> no
>       > changes in
>       >       >       sb-db (not busy). It seems that it's busy with
> local ovsdb-
>       > server
>       >       >       or vswitchd. I'd like to understand why ovn-
> controller is
>       > so busy?
>       >       >       All inactivity probe intervals are set to 30s.
>       >       >
>       >       >
>       >       >
>       >       >
>       >       > Is there change from the local ovsdb? You can enable dbg
> log to
>       > see what
>       >       > is happening.
>       >       > For the local ovsdb probe, I have mentioned in the other
> thread
>       > that
>       >       > UNIX socket is recommended (instead of tcp 127.0.0.1).
> Using UNIX
>       > socket
>       >       > disables probe by default.
>       >       >
>       >       > Thanks,
>       >       > Han
>       >
>       >
> 
> 

_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to