Here is the outpu. ================ [root@gateway-1 ~]# docker exec ovn_controller ovs-appctl -t /run/ovn/ovn-controller.6.ctl coverage/show Event coverage, avg rate over last: 5 seconds, last minute, last hour, hash=e70a83c8: lflow_run 0.0/sec 0.083/sec 0.0725/sec total: 295 miniflow_malloc 0.0/sec 44356.817/sec 44527.3975/sec total: 180635403 hindex_pathological 0.0/sec 0.000/sec 0.0000/sec total: 7187 hindex_expand 0.0/sec 0.000/sec 0.0000/sec total: 17 hmap_pathological 0.0/sec 4.167/sec 4.1806/sec total: 25091 hmap_expand 0.0/sec 5366.500/sec 5390.0800/sec total: 23680738 txn_unchanged 0.0/sec 0.300/sec 0.3175/sec total: 11024 txn_incomplete 0.0/sec 0.100/sec 0.0836/sec total: 974 txn_success 0.0/sec 0.033/sec 0.0308/sec total: 129 txn_try_again 0.0/sec 0.000/sec 0.0003/sec total: 1 poll_create_node 0.4/sec 1.933/sec 1.9575/sec total: 55611 poll_zero_timeout 0.0/sec 0.067/sec 0.0556/sec total: 241 rconn_queued 0.0/sec 0.050/sec 0.0594/sec total: 1208720 rconn_sent 0.0/sec 0.050/sec 0.0594/sec total: 1208720 seq_change 0.2/sec 0.783/sec 0.7492/sec total: 13962 pstream_open 0.0/sec 0.000/sec 0.0000/sec total: 1 stream_open 0.0/sec 0.000/sec 0.0003/sec total: 5 unixctl_received 0.0/sec 0.000/sec 0.0011/sec total: 4 unixctl_replied 0.0/sec 0.000/sec 0.0011/sec total: 4 util_xalloc 0.8/sec 1396586.967/sec 240916.6047/sec total: 5834154064 vconn_open 0.0/sec 0.000/sec 0.0000/sec total: 2 vconn_received 0.0/sec 0.050/sec 0.0594/sec total: 632 vconn_sent 0.0/sec 0.050/sec 0.0494/sec total: 1213248 netlink_received 0.0/sec 0.300/sec 0.2900/sec total: 1188 netlink_recv_jumbo 0.0/sec 0.083/sec 0.0725/sec total: 296 netlink_sent 0.0/sec 0.300/sec 0.2900/sec total: 1188 cmap_expand 0.0/sec 0.000/sec 0.0000/sec total: 3 82 events never hit [root@gateway-1 ~]# docker exec ovn_controller ovs-appctl -t /run/ovn/ovn-controller.6.ctl coverage/show Event coverage, avg rate over last: 5 seconds, last minute, last hour, hash=d0107601: lflow_run 0.2/sec 0.083/sec 0.0717/sec total: 296 miniflow_malloc 122834.2/sec 51180.917/sec 43930.2869/sec total: 181249574 hindex_pathological 0.0/sec 0.000/sec 0.0000/sec total: 7187 hindex_expand 0.0/sec 0.000/sec 0.0000/sec total: 17 hmap_pathological 13.2/sec 4.967/sec 4.1264/sec total: 25157 hmap_expand 14982.2/sec 6205.067/sec 5317.9547/sec total: 23755649 txn_unchanged 1.4/sec 0.400/sec 0.3144/sec total: 11031 txn_incomplete 0.4/sec 0.117/sec 0.0825/sec total: 976 txn_success 0.2/sec 0.050/sec 0.0306/sec total: 130 txn_try_again 0.0/sec 0.000/sec 0.0003/sec total: 1 poll_create_node 7.6/sec 2.467/sec 1.9353/sec total: 55649 poll_zero_timeout 0.4/sec 0.100/sec 0.0547/sec total: 243 rconn_queued 0.4/sec 0.083/sec 0.0592/sec total: 1208722 rconn_sent 0.4/sec 0.083/sec 0.0592/sec total: 1208722 seq_change 2.2/sec 0.900/sec 0.7394/sec total: 13973 pstream_open 0.0/sec 0.000/sec 0.0000/sec total: 1 stream_open 0.0/sec 0.000/sec 0.0003/sec total: 5 unixctl_received 0.2/sec 0.017/sec 0.0014/sec total: 5 unixctl_replied 0.2/sec 0.017/sec 0.0014/sec total: 5 util_xalloc 3870474.6/sec 1611767.833/sec 222119.2950/sec total: 5853506437 vconn_open 0.0/sec 0.000/sec 0.0000/sec total: 2 vconn_received 0.4/sec 0.083/sec 0.0592/sec total: 634 vconn_sent 0.4/sec 0.083/sec 0.0492/sec total: 1213250 netlink_received 0.8/sec 0.333/sec 0.2861/sec total: 1192 netlink_recv_jumbo 0.2/sec 0.083/sec 0.0717/sec total: 297 netlink_sent 0.8/sec 0.333/sec 0.2861/sec total: 1192 cmap_expand 0.0/sec 0.000/sec 0.0000/sec total: 3 82 events never hit [root@gateway-1 ~]# docker exec ovn_controller ovs-appctl -t /run/ovn/ovn-controller.6.ctl coverage/show Event coverage, avg rate over last: 5 seconds, last minute, last hour, hash=069e0b25: lflow_run 0.2/sec 0.100/sec 0.0719/sec total: 297 miniflow_malloc 122834.2/sec 61417.100/sec 44100.8900/sec total: 181863745 hindex_pathological 0.0/sec 0.000/sec 0.0000/sec total: 7187 hindex_expand 0.0/sec 0.000/sec 0.0000/sec total: 17 hmap_pathological 10.0/sec 5.800/sec 4.1403/sec total: 25207 hmap_expand 14756.6/sec 7434.783/sec 5338.4500/sec total: 23829432 txn_unchanged 0.4/sec 0.433/sec 0.3150/sec total: 11033 txn_incomplete 0.0/sec 0.117/sec 0.0825/sec total: 976 txn_success 0.0/sec 0.050/sec 0.0306/sec total: 130 txn_try_again 0.0/sec 0.000/sec 0.0003/sec total: 1 poll_create_node 2.6/sec 2.650/sec 1.9389/sec total: 55662 poll_zero_timeout 0.0/sec 0.100/sec 0.0547/sec total: 243 rconn_queued 0.0/sec 0.083/sec 0.0592/sec total: 1208722 rconn_sent 0.0/sec 0.083/sec 0.0592/sec total: 1208722 seq_change 1.4/sec 1.000/sec 0.7414/sec total: 13980 pstream_open 0.0/sec 0.000/sec 0.0000/sec total: 1 stream_open 0.0/sec 0.000/sec 0.0003/sec total: 5 unixctl_received 0.2/sec 0.033/sec 0.0017/sec total: 6 unixctl_replied 0.2/sec 0.033/sec 0.0017/sec total: 6 util_xalloc 3864890.0/sec 1933841.933/sec 227487.1978/sec total: 5872830887 vconn_open 0.0/sec 0.000/sec 0.0000/sec total: 2 vconn_received 0.0/sec 0.083/sec 0.0592/sec total: 634 vconn_sent 0.0/sec 0.083/sec 0.0492/sec total: 1213250 netlink_received 0.8/sec 0.400/sec 0.2872/sec total: 1196 netlink_recv_jumbo 0.2/sec 0.100/sec 0.0719/sec total: 298 netlink_sent 0.8/sec 0.400/sec 0.2872/sec total: 1196 cmap_expand 0.0/sec 0.000/sec 0.0000/sec total: 3 82 events never hit [root@gateway-1 ~]# docker exec ovn_controller ovs-appctl -t /run/ovn/ovn-controller.6.ctl coverage/show Event coverage, avg rate over last: 5 seconds, last minute, last hour, hash=069e0b25: lflow_run 0.0/sec 0.083/sec 0.0719/sec total: 297 miniflow_malloc 0.0/sec 51180.917/sec 44100.8900/sec total: 181863745 hindex_pathological 0.0/sec 0.000/sec 0.0000/sec total: 7187 hindex_expand 0.0/sec 0.000/sec 0.0000/sec total: 17 hmap_pathological 2.4/sec 4.967/sec 4.1436/sec total: 25219 hmap_expand 171.8/sec 6205.350/sec 5338.6886/sec total: 23830291 txn_unchanged 1.2/sec 0.433/sec 0.3167/sec total: 11039 txn_incomplete 0.0/sec 0.100/sec 0.0825/sec total: 976 txn_success 0.0/sec 0.033/sec 0.0306/sec total: 130 txn_try_again 0.0/sec 0.000/sec 0.0003/sec total: 1 poll_create_node 4.6/sec 2.583/sec 1.9453/sec total: 55685 poll_zero_timeout 0.0/sec 0.067/sec 0.0547/sec total: 243 rconn_queued 0.2/sec 0.083/sec 0.0594/sec total: 1208723 rconn_sent 0.2/sec 0.083/sec 0.0594/sec total: 1208723 seq_change 1.0/sec 0.933/sec 0.7428/sec total: 13985 pstream_open 0.0/sec 0.000/sec 0.0000/sec total: 1 stream_open 0.0/sec 0.000/sec 0.0003/sec total: 5 unixctl_received 0.2/sec 0.050/sec 0.0019/sec total: 7 unixctl_replied 0.2/sec 0.050/sec 0.0019/sec total: 7 util_xalloc 4345.0/sec 1611785.933/sec 227493.2325/sec total: 5872852612 vconn_open 0.0/sec 0.000/sec 0.0000/sec total: 2 vconn_received 0.2/sec 0.083/sec 0.0594/sec total: 635 vconn_sent 0.2/sec 0.083/sec 0.0494/sec total: 1213251 netlink_received 0.0/sec 0.333/sec 0.2872/sec total: 1196 netlink_recv_jumbo 0.0/sec 0.083/sec 0.0719/sec total: 298 netlink_sent 0.0/sec 0.333/sec 0.2872/sec total: 1196 cmap_expand 0.0/sec 0.000/sec 0.0000/sec total: 3 82 events never hit ================
Thanks! Tony > -----Original Message----- > From: Han Zhou <zhou...@gmail.com> > Sent: Friday, August 7, 2020 1:09 PM > To: Tony Liu <tonyliu0...@hotmail.com> > Cc: ovs-discuss <ovs-discuss@openvswitch.org>; ovs-dev <ovs- > d...@openvswitch.org> > Subject: Re: [ovs-discuss] [OVN] ovn-controller takes 100% cpu while no > changes in sb-db > > > > On Fri, Aug 7, 2020 at 12:57 PM Tony Liu <tonyliu0...@hotmail.com > <mailto:tonyliu0...@hotmail.com> > wrote: > > > Enabled debug logging, there are tons of messages. > Note there are 4353 datapath bindings and 13078 port bindings in SB. > 4097 LS, 8470 LSP, 256 LR and 4352 LRP in NB. Every 16 LS connect > to > a router. All routers connect to the external network. > > ovn-controller on compute node is good. The ovn-controller on > gateway > node is taking 100% cpu. It's probably related to the ports on the > external network? Any specific messages I need to check? > > Any hint to look into it is appreciated! > > > > > If it happens only on gateway, it may be related to ARP handling. Could > you share the output of ovn-appctl -t ovn-controller coverage/show, with > 2 - 3 runs in 5s interval? > For the debug log, I'd first check if there is any OVSDB notification > from SB DB, and if yes, what are the changes. > > > > Thanks! > > Tony > > -----Original Message----- > > From: Han Zhou <zhou...@gmail.com <mailto:zhou...@gmail.com> > > > Sent: Friday, August 7, 2020 12:39 PM > > To: Tony Liu <tonyliu0...@hotmail.com > <mailto:tonyliu0...@hotmail.com> > > > Cc: ovs-discuss <ovs-discuss@openvswitch.org <mailto:ovs- > disc...@openvswitch.org> >; ovs-dev <ovs- > > d...@openvswitch.org <mailto:d...@openvswitch.org> > > > Subject: Re: [ovs-discuss] [OVN] ovn-controller takes 100% cpu > while no > > changes in sb-db > > > > > > > > On Fri, Aug 7, 2020 at 12:35 PM Tony Liu <tonyliu0...@hotmail.com > <mailto:tonyliu0...@hotmail.com> > > <mailto:tonyliu0...@hotmail.com > <mailto:tonyliu0...@hotmail.com> > > wrote: > > > > > > Inline... > > > > Thanks! > > > > Tony > > > -----Original Message----- > > > From: Han Zhou <zhou...@gmail.com > <mailto:zhou...@gmail.com> <mailto:zhou...@gmail.com > <mailto:zhou...@gmail.com> > > > > > Sent: Friday, August 7, 2020 12:29 PM > > > To: Tony Liu <tonyliu0...@hotmail.com > <mailto:tonyliu0...@hotmail.com> > > <mailto:tonyliu0...@hotmail.com > <mailto:tonyliu0...@hotmail.com> > > > > > Cc: ovs-discuss <ovs-discuss@openvswitch.org <mailto:ovs- > disc...@openvswitch.org> <mailto:ovs- <mailto:ovs-> > > disc...@openvswitch.org <mailto:disc...@openvswitch.org> > >; > ovs-dev <ovs- > > > d...@openvswitch.org <mailto:d...@openvswitch.org> > <mailto:d...@openvswitch.org <mailto:d...@openvswitch.org> > > > > > Subject: Re: [ovs-discuss] [OVN] ovn-controller takes 100% > cpu > > while no > > > changes in sb-db > > > > > > > > > > > > On Fri, Aug 7, 2020 at 12:19 PM Tony Liu > <tonyliu0...@hotmail.com <mailto:tonyliu0...@hotmail.com> > > <mailto:tonyliu0...@hotmail.com <mailto:tonyliu0...@hotmail.com> > > > > <mailto:tonyliu0...@hotmail.com > <mailto:tonyliu0...@hotmail.com> > > <mailto:tonyliu0...@hotmail.com > <mailto:tonyliu0...@hotmail.com> > > > wrote: > > > > > > > > > ovn-controller is using UNIX socket connecting to > local > > ovsdb- > > > server. > > > > > > From the log you were showing, you were using > tcp:127.0.0.1:6640 <http://127.0.0.1:6640> > > <http://127.0.0.1:6640> > > > > Sorry, what I meant was, given your advice, I just made the > change > > for > > ovn-controller to use UNIX socket. > > > > > > > > Oh, I see, no worries. > > > > > > > <http://127.0.0.1:6640> to connect the local ovsdb. > > > > 2020-08- > > 07T16:38:04.022Z|29253|reconnect|WARN|tcp:127.0.0.1:6640 > <http://127.0.0.1:6640> > > <http://127.0.0.1:6640> > > > > <http://127.0.0.1:6640> <http://127.0.0.1:6640> : > connection > > dropped > > > > (Broken pipe) > > > > > > > > > Inactivity probe doesn't seem to be the cause of > high cpu > > usage. > > > > > > The wakeup on connection to sb-db is always > followed by a > > > "unreasonably > > > long" warning. I guess the pollin event loop is > stuck for > > too long, > > > like > > > 10s as below. > > > ======== > > > 2020-08- > 07T18:46:49.301Z|00296|poll_loop|INFO|wakeup due to > > [POLLIN] > > > on fd 19 (10.6.20.91:60712 <http://10.6.20.91:60712> > <http://10.6.20.91:60712> > > <http://10.6.20.91:60712> <->10.6.20.86:6642 > <http://10.6.20.86:6642> <http://10.6.20.86:6642> > > > <http://10.6.20.86:6642> ) at lib/stream-fd.c:157 (99% > CPU usage) > > > 2020-08- > 07T18:46:59.460Z|00297|timeval|WARN|Unreasonably > > long > > > 10153ms poll interval (10075ms user, 1ms system) > > > ======== > > > > > > Could that stuck loop be the cause of high cpu > usage? > > > What is it polling in? > > > Why is it stuck, waiting for message from sb-db? > > > Isn't it supposed to release the cpu while waiting? > > > > > > > > > > > > This log means there are messages received from > 10.6.20.86:6642 <http://10.6.20.86:6642> > > <http://10.6.20.86:6642> > > > <http://10.6.20.86:6642> (the SB DB). Is there SB change? > The > > CPU is > > > spent on handling the SB change. Some type of SB changes > are not > > handled > > > incrementally. > > > > SB update is driven by ovn-northd in case anything changed > in NB, > > and ovn-controller in case anything changed on chassis. No, > there > > is nothing changed in NB, neither chassis. > > > > Should I bump logging level up to dbg? Is that going to > show me > > what messages ovn-controller is handling? > > > > > > > > Yes, debug log should show the details. > > > > > > > > > > > > Thanks! > > > > > > Tony > > > > > > > -----Original Message----- > > > > From: Han Zhou <zhou...@gmail.com > <mailto:zhou...@gmail.com> > > <mailto:zhou...@gmail.com <mailto:zhou...@gmail.com> > > <mailto:zhou...@gmail.com <mailto:zhou...@gmail.com> > > <mailto:zhou...@gmail.com <mailto:zhou...@gmail.com> > > > > > > > Sent: Friday, August 7, 2020 10:32 AM > > > > To: Tony Liu <tonyliu0...@hotmail.com > <mailto:tonyliu0...@hotmail.com> > > <mailto:tonyliu0...@hotmail.com <mailto:tonyliu0...@hotmail.com> > > > > <mailto:tonyliu0...@hotmail.com > <mailto:tonyliu0...@hotmail.com> > > <mailto:tonyliu0...@hotmail.com > <mailto:tonyliu0...@hotmail.com> > > > > > > > Cc: ovs-discuss <ovs-discuss@openvswitch.org > <mailto:ovs-discuss@openvswitch.org> <mailto:ovs- <mailto:ovs-> > > disc...@openvswitch.org <mailto:disc...@openvswitch.org> > > <mailto:ovs- <mailto:ovs-> <mailto:ovs- <mailto:ovs-> > > > > disc...@openvswitch.org <mailto:disc...@openvswitch.org> > <mailto:disc...@openvswitch.org <mailto:disc...@openvswitch.org> > > >; > > ovs-dev <ovs- > > > > d...@openvswitch.org <mailto:d...@openvswitch.org> > <mailto:d...@openvswitch.org <mailto:d...@openvswitch.org> > > > <mailto:d...@openvswitch.org <mailto:d...@openvswitch.org> > <mailto:d...@openvswitch.org <mailto:d...@openvswitch.org> > > > > > > > Subject: Re: [ovs-discuss] [OVN] ovn-controller > takes 100% > > cpu > > > while no > > > > changes in sb-db > > > > > > > > > > > > > > > > On Fri, Aug 7, 2020 at 10:05 AM Tony Liu > > <tonyliu0...@hotmail.com <mailto:tonyliu0...@hotmail.com> > <mailto:tonyliu0...@hotmail.com <mailto:tonyliu0...@hotmail.com> > > > > <mailto:tonyliu0...@hotmail.com > <mailto:tonyliu0...@hotmail.com> <mailto:tonyliu0...@hotmail.com > <mailto:tonyliu0...@hotmail.com> > > > > > > <mailto:tonyliu0...@hotmail.com > <mailto:tonyliu0...@hotmail.com> > > <mailto:tonyliu0...@hotmail.com <mailto:tonyliu0...@hotmail.com> > > > > <mailto:tonyliu0...@hotmail.com > <mailto:tonyliu0...@hotmail.com> > > <mailto:tonyliu0...@hotmail.com > <mailto:tonyliu0...@hotmail.com> > > > > wrote: > > > > > > > > > > > > Hi, > > > > > > > > Here are some logging snippets from ovn- > controller. > > > > ======== > > > > 2020-08- > > 07T16:38:04.020Z|29250|timeval|WARN|Unreasonably > > > long > > > > 8954ms poll interval (8895ms user, 0ms system) > > > > ======== > > > > What's that mean? Is it harmless? > > > > > > > > ======== > > > > 2020-08- > 07T16:38:04.021Z|29251|timeval|WARN|context > > > switches: 0 > > > > voluntary, 6 involuntary > > > > 2020-08- > > 07T16:38:04.022Z|29252|poll_loop|INFO|wakeup due to > > > [POLLIN] > > > > on fd 19 (10.6.20.91:60398 > <http://10.6.20.91:60398> <http://10.6.20.91:60398> > > <http://10.6.20.91:60398> > > > <http://10.6.20.91:60398> <->10.6.20.86:6642 > <http://10.6.20.86:6642> > > <http://10.6.20.86:6642> <http://10.6.20.86:6642> > > > > <http://10.6.20.86:6642> ) at lib/stream-fd.c:157 > (99% > > CPU usage) > > > > ======== > > > > Is this wakeup caused by changes in sb-db? > > > > Why is ovn-controller so busy? > > > > > > > > ======== > > > > 2020-08- > > > 07T16:38:04.022Z|29253|reconnect|WARN|tcp:127.0.0.1:6640 > <http://127.0.0.1:6640> > > <http://127.0.0.1:6640> > > > <http://127.0.0.1:6640> > > > > <http://127.0.0.1:6640> : connection dropped > (Broken pipe) > > > > ======== > > > > Connection to local ovsdb-server is dropped. > > > > Is this caused by the timeout of inactivity > probe? > > > > > > > > ======== > > > > 2020-08- > > 07T16:38:04.035Z|29254|poll_loop|INFO|wakeup due to > > > [POLLIN] > > > > on fd 20 (<->/var/run/openvswitch/br-int.mgmt) at > > lib/stream- > > > fd.c:157 > > > > (99% CPU usage) > > > > ======== > > > > What causes this wakeup? > > > > > > > > ======== > > > > 2020-08- > > 07T16:38:04.048Z|29255|poll_loop|INFO|wakeup due to > > > 0-ms > > > > timeout at lib/ovsdb-idl.c:5391 (99% CPU usage) > > > > ======== > > > > What's this 0-ms wakeup mean? > > > > > > > > ======== > > > > 2020-08- > > 07T16:38:05.022Z|29256|poll_loop|INFO|wakeup due to > > > 962-ms > > > > timeout at lib/reconnect.c:643 (99% CPU usage) > > > > 2020-08- > > > 07T16:38:05.023Z|29257|reconnect|INFO|tcp:127.0.0.1:6640 > <http://127.0.0.1:6640> > > <http://127.0.0.1:6640> > > > <http://127.0.0.1:6640> > > > > <http://127.0.0.1:6640> : connecting... > > > > 2020-08- > > 07T16:38:05.041Z|29258|poll_loop|INFO|wakeup due to > > > > [POLLOUT] on fd 14 (127.0.0.1:51478 > <http://127.0.0.1:51478> > > <http://127.0.0.1:51478> <http://127.0.0.1:51478> > > > <http://127.0.0.1:51478> <- > > > > >127.0.0.1:6640 <http://127.0.0.1:6640> > <http://127.0.0.1:6640> > > <http://127.0.0.1:6640> <http://127.0.0.1:6640> ) > > > at lib/stream-fd.c:153 (99% > > > > CPU usage) > > > > 2020-08- > > > 07T16:38:05.041Z|29259|reconnect|INFO|tcp:127.0.0.1:6640 > <http://127.0.0.1:6640> > > <http://127.0.0.1:6640> > > > <http://127.0.0.1:6640> > > > > <http://127.0.0.1:6640> : connected > > > > ================ > > > > Retry to connect to local ovsdb-server. A > pollout > > event is > > > > triggered > > > > right after connection is established. > What's > > poolout? > > > > > > > > ovn-controller is taking 100% CPU now, and > there is > > no > > > changes in > > > > sb-db (not busy). It seems that it's busy > with > > local ovsdb- > > > server > > > > or vswitchd. I'd like to understand why > ovn- > > controller is > > > so busy? > > > > All inactivity probe intervals are set to > 30s. > > > > > > > > > > > > > > > > > > > > Is there change from the local ovsdb? You can > enable dbg > > log to > > > see what > > > > is happening. > > > > For the local ovsdb probe, I have mentioned in > the other > > thread > > > that > > > > UNIX socket is recommended (instead of tcp > 127.0.0.1). > > Using UNIX > > > socket > > > > disables probe by default. > > > > > > > > Thanks, > > > > Han > > > > > > > > > > > > _______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss