Inline...

Thanks!

Tony
> -----Original Message-----
> From: Han Zhou <zhou...@gmail.com>
> Sent: Friday, August 7, 2020 12:29 PM
> To: Tony Liu <tonyliu0...@hotmail.com>
> Cc: ovs-discuss <ovs-disc...@openvswitch.org>; ovs-dev <ovs-
> d...@openvswitch.org>
> Subject: Re: [ovs-discuss] [OVN] ovn-controller takes 100% cpu while no
> changes in sb-db
> 
> 
> 
> On Fri, Aug 7, 2020 at 12:19 PM Tony Liu <tonyliu0...@hotmail.com
> <mailto:tonyliu0...@hotmail.com> > wrote:
> 
> 
>       ovn-controller is using UNIX socket connecting to local ovsdb-
> server.
> 
> From the log you were showing, you were using tcp:127.0.0.1:6640

Sorry, what I meant was, given your advice, I just made the change for
ovn-controller to use UNIX socket.

> <http://127.0.0.1:6640>  to connect the local ovsdb.
> >       2020-08-07T16:38:04.022Z|29253|reconnect|WARN|tcp:127.0.0.1:6640
> > <http://127.0.0.1:6640> <http://127.0.0.1:6640> : connection dropped
> > (Broken pipe)
> 
> 
>       Inactivity probe doesn't seem to be the cause of high cpu usage.
> 
>       The wakeup on connection to sb-db is always followed by a
> "unreasonably
>       long" warning. I guess the pollin event loop is stuck for too long,
> like
>       10s as below.
>       ========
>       2020-08-07T18:46:49.301Z|00296|poll_loop|INFO|wakeup due to [POLLIN]
> on fd 19 (10.6.20.91:60712 <http://10.6.20.91:60712> <->10.6.20.86:6642
> <http://10.6.20.86:6642> ) at lib/stream-fd.c:157 (99% CPU usage)
>       2020-08-07T18:46:59.460Z|00297|timeval|WARN|Unreasonably long
> 10153ms poll interval (10075ms user, 1ms system)
>       ========
> 
>       Could that stuck loop be the cause of high cpu usage?
>       What is it polling in?
>       Why is it stuck, waiting for message from sb-db?
>       Isn't it supposed to release the cpu while waiting?
> 
> 
> 
> This log means there are messages received from 10.6.20.86:6642
> <http://10.6.20.86:6642>  (the SB DB). Is there SB change? The CPU is
> spent on handling the SB change. Some type of SB changes are not handled
> incrementally.

SB update is driven by ovn-northd in case anything changed in NB,
and ovn-controller in case anything changed on chassis. No, there
is nothing changed in NB, neither chassis.

Should I bump logging level up to dbg? Is that going to show me
what messages ovn-controller is handling?

> 
>       Thanks!
> 
>       Tony
> 
>       > -----Original Message-----
>       > From: Han Zhou <zhou...@gmail.com <mailto:zhou...@gmail.com> >
>       > Sent: Friday, August 7, 2020 10:32 AM
>       > To: Tony Liu <tonyliu0...@hotmail.com
> <mailto:tonyliu0...@hotmail.com> >
>       > Cc: ovs-discuss <ovs-disc...@openvswitch.org <mailto:ovs-
> disc...@openvswitch.org> >; ovs-dev <ovs-
>       > d...@openvswitch.org <mailto:d...@openvswitch.org> >
>       > Subject: Re: [ovs-discuss] [OVN] ovn-controller takes 100% cpu
> while no
>       > changes in sb-db
>       >
>       >
>       >
>       > On Fri, Aug 7, 2020 at 10:05 AM Tony Liu <tonyliu0...@hotmail.com
> <mailto:tonyliu0...@hotmail.com>
>       > <mailto:tonyliu0...@hotmail.com
> <mailto:tonyliu0...@hotmail.com> > > wrote:
>       >
>       >
>       >       Hi,
>       >
>       >       Here are some logging snippets from ovn-controller.
>       >       ========
>       >       2020-08-07T16:38:04.020Z|29250|timeval|WARN|Unreasonably
> long
>       > 8954ms poll interval (8895ms user, 0ms system)
>       >       ========
>       >       What's that mean? Is it harmless?
>       >
>       >       ========
>       >       2020-08-07T16:38:04.021Z|29251|timeval|WARN|context
> switches: 0
>       > voluntary, 6 involuntary
>       >       2020-08-07T16:38:04.022Z|29252|poll_loop|INFO|wakeup due to
> [POLLIN]
>       > on fd 19 (10.6.20.91:60398 <http://10.6.20.91:60398>
> <http://10.6.20.91:60398> <->10.6.20.86:6642 <http://10.6.20.86:6642>
>       > <http://10.6.20.86:6642> ) at lib/stream-fd.c:157 (99% CPU usage)
>       >       ========
>       >       Is this wakeup caused by changes in sb-db?
>       >       Why is ovn-controller so busy?
>       >
>       >       ========
>       >       2020-08-
> 07T16:38:04.022Z|29253|reconnect|WARN|tcp:127.0.0.1:6640
> <http://127.0.0.1:6640>
>       > <http://127.0.0.1:6640> : connection dropped (Broken pipe)
>       >       ========
>       >       Connection to local ovsdb-server is dropped.
>       >       Is this caused by the timeout of inactivity probe?
>       >
>       >       ========
>       >       2020-08-07T16:38:04.035Z|29254|poll_loop|INFO|wakeup due to
> [POLLIN]
>       > on fd 20 (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-
> fd.c:157
>       > (99% CPU usage)
>       >       ========
>       >       What causes this wakeup?
>       >
>       >       ========
>       >       2020-08-07T16:38:04.048Z|29255|poll_loop|INFO|wakeup due to
> 0-ms
>       > timeout at lib/ovsdb-idl.c:5391 (99% CPU usage)
>       >       ========
>       >       What's this 0-ms wakeup mean?
>       >
>       >       ========
>       >       2020-08-07T16:38:05.022Z|29256|poll_loop|INFO|wakeup due to
> 962-ms
>       > timeout at lib/reconnect.c:643 (99% CPU usage)
>       >       2020-08-
> 07T16:38:05.023Z|29257|reconnect|INFO|tcp:127.0.0.1:6640
> <http://127.0.0.1:6640>
>       > <http://127.0.0.1:6640> : connecting...
>       >       2020-08-07T16:38:05.041Z|29258|poll_loop|INFO|wakeup due to
>       > [POLLOUT] on fd 14 (127.0.0.1:51478 <http://127.0.0.1:51478>
> <http://127.0.0.1:51478> <-
>       > >127.0.0.1:6640 <http://127.0.0.1:6640>  <http://127.0.0.1:6640> )
> at lib/stream-fd.c:153 (99%
>       > CPU usage)
>       >       2020-08-
> 07T16:38:05.041Z|29259|reconnect|INFO|tcp:127.0.0.1:6640
> <http://127.0.0.1:6640>
>       > <http://127.0.0.1:6640> : connected
>       >       ================
>       >       Retry to connect to local ovsdb-server. A pollout event is
>       > triggered
>       >       right after connection is established. What's poolout?
>       >
>       >       ovn-controller is taking 100% CPU now, and there is no
> changes in
>       >       sb-db (not busy). It seems that it's busy with local ovsdb-
> server
>       >       or vswitchd. I'd like to understand why ovn-controller is
> so busy?
>       >       All inactivity probe intervals are set to 30s.
>       >
>       >
>       >
>       >
>       > Is there change from the local ovsdb? You can enable dbg log to
> see what
>       > is happening.
>       > For the local ovsdb probe, I have mentioned in the other thread
> that
>       > UNIX socket is recommended (instead of tcp 127.0.0.1). Using UNIX
> socket
>       > disables probe by default.
>       >
>       > Thanks,
>       > Han
> 
> 

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to