Good to know. Yes, I am using 20.03.
Will try to upgrade.

Thanks!

Tony
> -----Original Message-----
> From: Han Zhou <zhou...@gmail.com>
> Sent: Friday, August 7, 2020 3:39 PM
> To: Tony Liu <tonyliu0...@hotmail.com>
> Cc: ovs-dev <ovs-dev@openvswitch.org>; ovs-discuss <ovs-
> disc...@openvswitch.org>
> Subject: Re: [ovs-discuss] [OVN] ovn-controller takes 100% cpu while no
> changes in sb-db
> 
> The change in external-ids is not monitored by ovn-controller any more
> after version 20.06. Probably you are still using an older version?
> 
> 
> On Fri, Aug 7, 2020 at 3:11 PM Tony Liu <tonyliu0...@hotmail.com
> <mailto:tonyliu0...@hotmail.com> > wrote:
> 
> 
>       Is the changes made by Neutron liveness-check in SB properly
> handled
>       by inc_proc_eng?
> 
>       Given my testing scale, even the frequency is lowered by that fix,
>       ovn-controller still takes quite lot of CPU and time to compute SB.
> 
> 
>       Thanks!
> 
>       Tony
>       > -----Original Message-----
>       > From: Tony Liu <tonyliu0...@hotmail.com
> <mailto:tonyliu0...@hotmail.com> >
>       > Sent: Friday, August 7, 2020 1:25 PM
>       > To: Tony Liu <tonyliu0...@hotmail.com
> <mailto:tonyliu0...@hotmail.com> >; Han Zhou <zhou...@gmail.com
> <mailto:zhou...@gmail.com> >
>       > Cc: ovs-dev <ovs-dev@openvswitch.org <mailto:ovs-
> d...@openvswitch.org> >; ovs-discuss <ovs-
>       > disc...@openvswitch.org <mailto:disc...@openvswitch.org> >
>       > Subject: RE: [ovs-discuss] [OVN] ovn-controller takes 100% cpu
> while no
>       > changes in sb-db
>       >
>       > Thanks for the hint!
>       > ========
>       > 2020-08-07T19:44:18.019Z|616614|jsonrpc|DBG|tcp:10.6.20.84:6642
> <http://10.6.20.84:6642> :
>       > received notification, method="update3",
>       > params=[["monid","OVN_Southbound"],"5002cb22-13e5-490a-9a64-
>       > 5d48914138ca",{"Chassis":{"0390b621-152b-48a0-a3d1-
>       >
> 2973c0b823cc":{"modify":{"external_ids":["map",[["neutron:liveness_check
>       > _at","2020-08-07T19:44:07.130233+00:00"]]]}}}}]
>       > ========
>       >
>       > Nailed it...
>       > https://bugs.launchpad.net/neutron/+bug/1883554
>       >
>       >
>       > Tony
>       > > -----Original Message-----
>       > > From: dev <ovs-dev-boun...@openvswitch.org <mailto:ovs-dev-
> boun...@openvswitch.org> > On Behalf Of Tony Liu
>       > > Sent: Friday, August 7, 2020 1:14 PM
>       > > To: Han Zhou <zhou...@gmail.com <mailto:zhou...@gmail.com> >
>       > > Cc: ovs-dev <ovs-dev@openvswitch.org <mailto:ovs-
> d...@openvswitch.org> >; ovs-discuss <ovs-
>       > > disc...@openvswitch.org <mailto:disc...@openvswitch.org> >
>       > > Subject: Re: [ovs-dev] [ovs-discuss] [OVN] ovn-controller takes
> 100%
>       > > cpu while no changes in sb-db
>       > >
>       > >
>       > > Here is the outpu.
>       > > ================
>       > > [root@gateway-1 ~]# docker exec ovn_controller ovs-appctl -t
>       > > /run/ovn/ovn-controller.6.ctl coverage/show Event coverage, avg
> rate
>       > > over last: 5 seconds, last minute, last hour,  hash=e70a83c8:
>       > > lflow_run                  0.0/sec     0.083/sec
> 0.0725/sec
>       > > total: 295
>       > > miniflow_malloc            0.0/sec 44356.817/sec
> 44527.3975/sec
>       > > total: 180635403
>       > > hindex_pathological        0.0/sec     0.000/sec
> 0.0000/sec
>       > > total: 7187
>       > > hindex_expand              0.0/sec     0.000/sec
> 0.0000/sec
>       > > total: 17
>       > > hmap_pathological          0.0/sec     4.167/sec
> 4.1806/sec
>       > > total: 25091
>       > > hmap_expand                0.0/sec  5366.500/sec
> 5390.0800/sec
>       > > total: 23680738
>       > > txn_unchanged              0.0/sec     0.300/sec
> 0.3175/sec
>       > > total: 11024
>       > > txn_incomplete             0.0/sec     0.100/sec
> 0.0836/sec
>       > > total: 974
>       > > txn_success                0.0/sec     0.033/sec
> 0.0308/sec
>       > > total: 129
>       > > txn_try_again              0.0/sec     0.000/sec
> 0.0003/sec
>       > > total: 1
>       > > poll_create_node           0.4/sec     1.933/sec
> 1.9575/sec
>       > > total: 55611
>       > > poll_zero_timeout          0.0/sec     0.067/sec
> 0.0556/sec
>       > > total: 241
>       > > rconn_queued               0.0/sec     0.050/sec
> 0.0594/sec
>       > > total: 1208720
>       > > rconn_sent                 0.0/sec     0.050/sec
> 0.0594/sec
>       > > total: 1208720
>       > > seq_change                 0.2/sec     0.783/sec
> 0.7492/sec
>       > > total: 13962
>       > > pstream_open               0.0/sec     0.000/sec
> 0.0000/sec
>       > > total: 1
>       > > stream_open                0.0/sec     0.000/sec
> 0.0003/sec
>       > > total: 5
>       > > unixctl_received           0.0/sec     0.000/sec
> 0.0011/sec
>       > > total: 4
>       > > unixctl_replied            0.0/sec     0.000/sec
> 0.0011/sec
>       > > total: 4
>       > > util_xalloc                0.8/sec 1396586.967/sec
> 240916.6047/sec
>       > > total: 5834154064
>       > > vconn_open                 0.0/sec     0.000/sec
> 0.0000/sec
>       > > total: 2
>       > > vconn_received             0.0/sec     0.050/sec
> 0.0594/sec
>       > > total: 632
>       > > vconn_sent                 0.0/sec     0.050/sec
> 0.0494/sec
>       > > total: 1213248
>       > > netlink_received           0.0/sec     0.300/sec
> 0.2900/sec
>       > > total: 1188
>       > > netlink_recv_jumbo         0.0/sec     0.083/sec
> 0.0725/sec
>       > > total: 296
>       > > netlink_sent               0.0/sec     0.300/sec
> 0.2900/sec
>       > > total: 1188
>       > > cmap_expand                0.0/sec     0.000/sec
> 0.0000/sec
>       > > total: 3
>       > > 82 events never hit
>       > > [root@gateway-1 ~]# docker exec ovn_controller ovs-appctl -t
>       > > /run/ovn/ovn-controller.6.ctl coverage/show Event coverage, avg
> rate
>       > > over last: 5 seconds, last minute, last hour,  hash=d0107601:
>       > > lflow_run                  0.2/sec     0.083/sec
> 0.0717/sec
>       > > total: 296
>       > > miniflow_malloc          122834.2/sec 51180.917/sec
> 43930.2869/sec
>       > > total: 181249574
>       > > hindex_pathological        0.0/sec     0.000/sec
> 0.0000/sec
>       > > total: 7187
>       > > hindex_expand              0.0/sec     0.000/sec
> 0.0000/sec
>       > > total: 17
>       > > hmap_pathological         13.2/sec     4.967/sec
> 4.1264/sec
>       > > total: 25157
>       > > hmap_expand              14982.2/sec  6205.067/sec
> 5317.9547/sec
>       > > total: 23755649
>       > > txn_unchanged              1.4/sec     0.400/sec
> 0.3144/sec
>       > > total: 11031
>       > > txn_incomplete             0.4/sec     0.117/sec
> 0.0825/sec
>       > > total: 976
>       > > txn_success                0.2/sec     0.050/sec
> 0.0306/sec
>       > > total: 130
>       > > txn_try_again              0.0/sec     0.000/sec
> 0.0003/sec
>       > > total: 1
>       > > poll_create_node           7.6/sec     2.467/sec
> 1.9353/sec
>       > > total: 55649
>       > > poll_zero_timeout          0.4/sec     0.100/sec
> 0.0547/sec
>       > > total: 243
>       > > rconn_queued               0.4/sec     0.083/sec
> 0.0592/sec
>       > > total: 1208722
>       > > rconn_sent                 0.4/sec     0.083/sec
> 0.0592/sec
>       > > total: 1208722
>       > > seq_change                 2.2/sec     0.900/sec
> 0.7394/sec
>       > > total: 13973
>       > > pstream_open               0.0/sec     0.000/sec
> 0.0000/sec
>       > > total: 1
>       > > stream_open                0.0/sec     0.000/sec
> 0.0003/sec
>       > > total: 5
>       > > unixctl_received           0.2/sec     0.017/sec
> 0.0014/sec
>       > > total: 5
>       > > unixctl_replied            0.2/sec     0.017/sec
> 0.0014/sec
>       > > total: 5
>       > > util_xalloc              3870474.6/sec 1611767.833/sec
>       > 222119.2950/sec
>       > > total: 5853506437
>       > > vconn_open                 0.0/sec     0.000/sec
> 0.0000/sec
>       > > total: 2
>       > > vconn_received             0.4/sec     0.083/sec
> 0.0592/sec
>       > > total: 634
>       > > vconn_sent                 0.4/sec     0.083/sec
> 0.0492/sec
>       > > total: 1213250
>       > > netlink_received           0.8/sec     0.333/sec
> 0.2861/sec
>       > > total: 1192
>       > > netlink_recv_jumbo         0.2/sec     0.083/sec
> 0.0717/sec
>       > > total: 297
>       > > netlink_sent               0.8/sec     0.333/sec
> 0.2861/sec
>       > > total: 1192
>       > > cmap_expand                0.0/sec     0.000/sec
> 0.0000/sec
>       > > total: 3
>       > > 82 events never hit
>       > > [root@gateway-1 ~]# docker exec ovn_controller ovs-appctl -t
>       > > /run/ovn/ovn-controller.6.ctl coverage/show Event coverage, avg
> rate
>       > > over last: 5 seconds, last minute, last hour,  hash=069e0b25:
>       > > lflow_run                  0.2/sec     0.100/sec
> 0.0719/sec
>       > > total: 297
>       > > miniflow_malloc          122834.2/sec 61417.100/sec
> 44100.8900/sec
>       > > total: 181863745
>       > > hindex_pathological        0.0/sec     0.000/sec
> 0.0000/sec
>       > > total: 7187
>       > > hindex_expand              0.0/sec     0.000/sec
> 0.0000/sec
>       > > total: 17
>       > > hmap_pathological         10.0/sec     5.800/sec
> 4.1403/sec
>       > > total: 25207
>       > > hmap_expand              14756.6/sec  7434.783/sec
> 5338.4500/sec
>       > > total: 23829432
>       > > txn_unchanged              0.4/sec     0.433/sec
> 0.3150/sec
>       > > total: 11033
>       > > txn_incomplete             0.0/sec     0.117/sec
> 0.0825/sec
>       > > total: 976
>       > > txn_success                0.0/sec     0.050/sec
> 0.0306/sec
>       > > total: 130
>       > > txn_try_again              0.0/sec     0.000/sec
> 0.0003/sec
>       > > total: 1
>       > > poll_create_node           2.6/sec     2.650/sec
> 1.9389/sec
>       > > total: 55662
>       > > poll_zero_timeout          0.0/sec     0.100/sec
> 0.0547/sec
>       > > total: 243
>       > > rconn_queued               0.0/sec     0.083/sec
> 0.0592/sec
>       > > total: 1208722
>       > > rconn_sent                 0.0/sec     0.083/sec
> 0.0592/sec
>       > > total: 1208722
>       > > seq_change                 1.4/sec     1.000/sec
> 0.7414/sec
>       > > total: 13980
>       > > pstream_open               0.0/sec     0.000/sec
> 0.0000/sec
>       > > total: 1
>       > > stream_open                0.0/sec     0.000/sec
> 0.0003/sec
>       > > total: 5
>       > > unixctl_received           0.2/sec     0.033/sec
> 0.0017/sec
>       > > total: 6
>       > > unixctl_replied            0.2/sec     0.033/sec
> 0.0017/sec
>       > > total: 6
>       > > util_xalloc              3864890.0/sec 1933841.933/sec
>       > 227487.1978/sec
>       > > total: 5872830887
>       > > vconn_open                 0.0/sec     0.000/sec
> 0.0000/sec
>       > > total: 2
>       > > vconn_received             0.0/sec     0.083/sec
> 0.0592/sec
>       > > total: 634
>       > > vconn_sent                 0.0/sec     0.083/sec
> 0.0492/sec
>       > > total: 1213250
>       > > netlink_received           0.8/sec     0.400/sec
> 0.2872/sec
>       > > total: 1196
>       > > netlink_recv_jumbo         0.2/sec     0.100/sec
> 0.0719/sec
>       > > total: 298
>       > > netlink_sent               0.8/sec     0.400/sec
> 0.2872/sec
>       > > total: 1196
>       > > cmap_expand                0.0/sec     0.000/sec
> 0.0000/sec
>       > > total: 3
>       > > 82 events never hit
>       > > [root@gateway-1 ~]# docker exec ovn_controller ovs-appctl -t
>       > > /run/ovn/ovn-controller.6.ctl coverage/show Event coverage, avg
> rate
>       > > over last: 5 seconds, last minute, last hour,  hash=069e0b25:
>       > > lflow_run                  0.0/sec     0.083/sec
> 0.0719/sec
>       > > total: 297
>       > > miniflow_malloc            0.0/sec 51180.917/sec
> 44100.8900/sec
>       > > total: 181863745
>       > > hindex_pathological        0.0/sec     0.000/sec
> 0.0000/sec
>       > > total: 7187
>       > > hindex_expand              0.0/sec     0.000/sec
> 0.0000/sec
>       > > total: 17
>       > > hmap_pathological          2.4/sec     4.967/sec
> 4.1436/sec
>       > > total: 25219
>       > > hmap_expand              171.8/sec  6205.350/sec
> 5338.6886/sec
>       > > total: 23830291
>       > > txn_unchanged              1.2/sec     0.433/sec
> 0.3167/sec
>       > > total: 11039
>       > > txn_incomplete             0.0/sec     0.100/sec
> 0.0825/sec
>       > > total: 976
>       > > txn_success                0.0/sec     0.033/sec
> 0.0306/sec
>       > > total: 130
>       > > txn_try_again              0.0/sec     0.000/sec
> 0.0003/sec
>       > > total: 1
>       > > poll_create_node           4.6/sec     2.583/sec
> 1.9453/sec
>       > > total: 55685
>       > > poll_zero_timeout          0.0/sec     0.067/sec
> 0.0547/sec
>       > > total: 243
>       > > rconn_queued               0.2/sec     0.083/sec
> 0.0594/sec
>       > > total: 1208723
>       > > rconn_sent                 0.2/sec     0.083/sec
> 0.0594/sec
>       > > total: 1208723
>       > > seq_change                 1.0/sec     0.933/sec
> 0.7428/sec
>       > > total: 13985
>       > > pstream_open               0.0/sec     0.000/sec
> 0.0000/sec
>       > > total: 1
>       > > stream_open                0.0/sec     0.000/sec
> 0.0003/sec
>       > > total: 5
>       > > unixctl_received           0.2/sec     0.050/sec
> 0.0019/sec
>       > > total: 7
>       > > unixctl_replied            0.2/sec     0.050/sec
> 0.0019/sec
>       > > total: 7
>       > > util_xalloc              4345.0/sec 1611785.933/sec
> 227493.2325/sec
>       > > total: 5872852612
>       > > vconn_open                 0.0/sec     0.000/sec
> 0.0000/sec
>       > > total: 2
>       > > vconn_received             0.2/sec     0.083/sec
> 0.0594/sec
>       > > total: 635
>       > > vconn_sent                 0.2/sec     0.083/sec
> 0.0494/sec
>       > > total: 1213251
>       > > netlink_received           0.0/sec     0.333/sec
> 0.2872/sec
>       > > total: 1196
>       > > netlink_recv_jumbo         0.0/sec     0.083/sec
> 0.0719/sec
>       > > total: 298
>       > > netlink_sent               0.0/sec     0.333/sec
> 0.2872/sec
>       > > total: 1196
>       > > cmap_expand                0.0/sec     0.000/sec
> 0.0000/sec
>       > > total: 3
>       > > 82 events never hit
>       > > ================
>       > >
>       > >
>       > > Thanks!
>       > >
>       > > Tony
>       > > > -----Original Message-----
>       > > > From: Han Zhou <zhou...@gmail.com <mailto:zhou...@gmail.com> >
>       > > > Sent: Friday, August 7, 2020 1:09 PM
>       > > > To: Tony Liu <tonyliu0...@hotmail.com
> <mailto:tonyliu0...@hotmail.com> >
>       > > > Cc: ovs-discuss <ovs-disc...@openvswitch.org <mailto:ovs-
> disc...@openvswitch.org> >; ovs-dev <ovs-
>       > > > d...@openvswitch.org <mailto:d...@openvswitch.org> >
>       > > > Subject: Re: [ovs-discuss] [OVN] ovn-controller takes 100%
> cpu while
>       > > > no changes in sb-db
>       > > >
>       > > >
>       > > >
>       > > > On Fri, Aug 7, 2020 at 12:57 PM Tony Liu
> <tonyliu0...@hotmail.com <mailto:tonyliu0...@hotmail.com>
>       > > > <mailto:tonyliu0...@hotmail.com
> <mailto:tonyliu0...@hotmail.com> > > wrote:
>       > > >
>       > > >
>       > > >   Enabled debug logging, there are tons of messages.
>       > > >   Note there are 4353 datapath bindings and 13078 port
> bindings in SB.
>       > > >   4097 LS, 8470 LSP, 256 LR and 4352 LRP in NB. Every 16 LS
> connect
>       > > to
>       > > >   a router. All routers connect to the external network.
>       > > >
>       > > >   ovn-controller on compute node is good. The ovn-controller
> on
>       > > gateway
>       > > >   node is taking 100% cpu. It's probably related to the ports
> on the
>       > > >   external network? Any specific messages I need to check?
>       > > >
>       > > >   Any hint to look into it is appreciated!
>       > > >
>       > > >
>       > > >
>       > > >
>       > > > If it happens only on gateway, it may be related to ARP
> handling.
>       > > > Could you share the output of ovn-appctl -t ovn-controller
>       > > > coverage/show, with
>       > > > 2 - 3 runs in 5s interval?
>       > > > For the debug log, I'd first check if there is any OVSDB
>       > > > notification from SB DB, and if yes, what are the changes.
>       > > >
>       > > >
>       > > >
>       > > >   Thanks!
>       > > >
>       > > >   Tony
>       > > >   > -----Original Message-----
>       > > >   > From: Han Zhou <zhou...@gmail.com
> <mailto:zhou...@gmail.com>  <mailto:zhou...@gmail.com
> <mailto:zhou...@gmail.com> > >
>       > > >   > Sent: Friday, August 7, 2020 12:39 PM
>       > > >   > To: Tony Liu <tonyliu0...@hotmail.com
> <mailto:tonyliu0...@hotmail.com>
>       > > > <mailto:tonyliu0...@hotmail.com
> <mailto:tonyliu0...@hotmail.com> > >
>       > > >   > Cc: ovs-discuss <ovs-disc...@openvswitch.org <mailto:ovs-
> disc...@openvswitch.org>  <mailto:ovs- <mailto:ovs->
>       > > > disc...@openvswitch.org <mailto:disc...@openvswitch.org> > >;
> ovs-dev <ovs-
>       > > >   > d...@openvswitch.org <mailto:d...@openvswitch.org>
> <mailto:d...@openvswitch.org <mailto:d...@openvswitch.org> > >
>       > > >   > Subject: Re: [ovs-discuss] [OVN] ovn-controller takes 100%
> cpu
>       > > > while no
>       > > >   > changes in sb-db
>       > > >   >
>       > > >   >
>       > > >   >
>       > > >   > On Fri, Aug 7, 2020 at 12:35 PM Tony Liu
> <tonyliu0...@hotmail.com <mailto:tonyliu0...@hotmail.com>
>       > > > <mailto:tonyliu0...@hotmail.com
> <mailto:tonyliu0...@hotmail.com> >
>       > > >   > <mailto:tonyliu0...@hotmail.com
> <mailto:tonyliu0...@hotmail.com>
>       > > > <mailto:tonyliu0...@hotmail.com
> <mailto:tonyliu0...@hotmail.com> > > > wrote:
>       > > >   >
>       > > >   >
>       > > >   >       Inline...
>       > > >   >
>       > > >   >       Thanks!
>       > > >   >
>       > > >   >       Tony
>       > > >   >       > -----Original Message-----
>       > > >   >       > From: Han Zhou <zhou...@gmail.com
> <mailto:zhou...@gmail.com>
>       > > > <mailto:zhou...@gmail.com <mailto:zhou...@gmail.com> >
> <mailto:zhou...@gmail.com <mailto:zhou...@gmail.com>
>       > > > <mailto:zhou...@gmail.com <mailto:zhou...@gmail.com> > > >
>       > > >   >       > Sent: Friday, August 7, 2020 12:29 PM
>       > > >   >       > To: Tony Liu <tonyliu0...@hotmail.com
> <mailto:tonyliu0...@hotmail.com>
>       > > > <mailto:tonyliu0...@hotmail.com
> <mailto:tonyliu0...@hotmail.com> >
>       > > >   > <mailto:tonyliu0...@hotmail.com
> <mailto:tonyliu0...@hotmail.com>
>       > > > <mailto:tonyliu0...@hotmail.com
> <mailto:tonyliu0...@hotmail.com> > > >
>       > > >   >       > Cc: ovs-discuss <ovs-disc...@openvswitch.org
> <mailto:ovs-disc...@openvswitch.org>  <mailto:ovs- <mailto:ovs->
>       > > > disc...@openvswitch.org <mailto:disc...@openvswitch.org> >
> <mailto:ovs- <mailto:ovs->  <mailto:ovs- <mailto:ovs-> >
>       > > >   > disc...@openvswitch.org <mailto:disc...@openvswitch.org>
> <mailto:disc...@openvswitch.org <mailto:disc...@openvswitch.org> > > >;
>       > > > ovs-dev <ovs-
>       > > >   >       > d...@openvswitch.org <mailto:d...@openvswitch.org>
> <mailto:d...@openvswitch.org <mailto:d...@openvswitch.org> >
>       > > > <mailto:d...@openvswitch.org <mailto:d...@openvswitch.org>
> <mailto:d...@openvswitch.org <mailto:d...@openvswitch.org> > > >
>       > > >   >       > Subject: Re: [ovs-discuss] [OVN] ovn-controller
> takes 100%
>       > > > cpu
>       > > >   > while no
>       > > >   >       > changes in sb-db
>       > > >   >       >
>       > > >   >       >
>       > > >   >       >
>       > > >   >       > On Fri, Aug 7, 2020 at 12:19 PM Tony Liu
>       > > > <tonyliu0...@hotmail.com <mailto:tonyliu0...@hotmail.com>
> <mailto:tonyliu0...@hotmail.com <mailto:tonyliu0...@hotmail.com> >
>       > > >   > <mailto:tonyliu0...@hotmail.com
> <mailto:tonyliu0...@hotmail.com>  <mailto:tonyliu0...@hotmail.com
> <mailto:tonyliu0...@hotmail.com> > >
>       > > >   >       > <mailto:tonyliu0...@hotmail.com
> <mailto:tonyliu0...@hotmail.com>
>       > > > <mailto:tonyliu0...@hotmail.com
> <mailto:tonyliu0...@hotmail.com> >
>       > > >   > <mailto:tonyliu0...@hotmail.com
> <mailto:tonyliu0...@hotmail.com>
>       > > > <mailto:tonyliu0...@hotmail.com
> <mailto:tonyliu0...@hotmail.com> > > > > wrote:
>       > > >   >       >
>       > > >   >       >
>       > > >   >       >       ovn-controller is using UNIX socket
> connecting to
>       > > > local
>       > > >   > ovsdb-
>       > > >   >       > server.
>       > > >   >       >
>       > > >   >       > From the log you were showing, you were using
>       > > > tcp:127.0.0.1:6640 <http://127.0.0.1:6640>
> <http://127.0.0.1:6640>
>       > > >   > <http://127.0.0.1:6640>
>       > > >   >
>       > > >   >       Sorry, what I meant was, given your advice, I just
> made the
>       > > > change
>       > > >   > for
>       > > >   >       ovn-controller to use UNIX socket.
>       > > >   >
>       > > >   >
>       > > >   >
>       > > >   > Oh, I see, no worries.
>       > > >   >
>       > > >   >
>       > > >   >       > <http://127.0.0.1:6640>  to connect the local
> ovsdb.
>       > > >   >       > >       2020-08-
>       > > >   > 07T16:38:04.022Z|29253|reconnect|WARN|tcp:127.0.0.1:6640
> <http://127.0.0.1:6640>
>       > > > <http://127.0.0.1:6640>
>       > > >   > <http://127.0.0.1:6640>
>       > > >   >       > > <http://127.0.0.1:6640>
> <http://127.0.0.1:6640> :
>       > > > connection
>       > > >   > dropped
>       > > >   >       > > (Broken pipe)
>       > > >   >       >
>       > > >   >       >
>       > > >   >       >       Inactivity probe doesn't seem to be the
> cause of
>       > > > high cpu
>       > > >   > usage.
>       > > >   >       >
>       > > >   >       >       The wakeup on connection to sb-db is always
>       > > > followed by a
>       > > >   >       > "unreasonably
>       > > >   >       >       long" warning. I guess the pollin event
> loop is
>       > > > stuck for
>       > > >   > too long,
>       > > >   >       > like
>       > > >   >       >       10s as below.
>       > > >   >       >       ========
>       > > >   >       >       2020-08-
>       > > > 07T18:46:49.301Z|00296|poll_loop|INFO|wakeup due to
>       > > >   > [POLLIN]
>       > > >   >       > on fd 19 (10.6.20.91:60712
> <http://10.6.20.91:60712>  <http://10.6.20.91:60712>
>       > > > <http://10.6.20.91:60712>
>       > > >   > <http://10.6.20.91:60712> <->10.6.20.86:6642
> <http://10.6.20.86:6642>
>       > > > <http://10.6.20.86:6642>  <http://10.6.20.86:6642>
>       > > >   >       > <http://10.6.20.86:6642> ) at lib/stream-fd.c:157
> (99%
>       > > > CPU usage)
>       > > >   >       >       2020-08-
>       > > > 07T18:46:59.460Z|00297|timeval|WARN|Unreasonably
>       > > >   > long
>       > > >   >       > 10153ms poll interval (10075ms user, 1ms system)
>       > > >   >       >       ========
>       > > >   >       >
>       > > >   >       >       Could that stuck loop be the cause of high
> cpu
>       > > > usage?
>       > > >   >       >       What is it polling in?
>       > > >   >       >       Why is it stuck, waiting for message from
> sb-db?
>       > > >   >       >       Isn't it supposed to release the cpu while
> waiting?
>       > > >   >       >
>       > > >   >       >
>       > > >   >       >
>       > > >   >       > This log means there are messages received from
>       > > > 10.6.20.86:6642 <http://10.6.20.86:6642>
> <http://10.6.20.86:6642>
>       > > >   > <http://10.6.20.86:6642>
>       > > >   >       > <http://10.6.20.86:6642>  (the SB DB). Is there
> SB change?
>       > > > The
>       > > >   > CPU is
>       > > >   >       > spent on handling the SB change. Some type of SB
> changes
>       > > > are not
>       > > >   > handled
>       > > >   >       > incrementally.
>       > > >   >
>       > > >   >       SB update is driven by ovn-northd in case anything
> changed
>       > > > in NB,
>       > > >   >       and ovn-controller in case anything changed on
> chassis. No,
>       > > > there
>       > > >   >       is nothing changed in NB, neither chassis.
>       > > >   >
>       > > >   >       Should I bump logging level up to dbg? Is that
> going to
>       > > > show me
>       > > >   >       what messages ovn-controller is handling?
>       > > >   >
>       > > >   >
>       > > >   >
>       > > >   > Yes, debug log should show the details.
>       > > >   >
>       > > >   >
>       > > >   >
>       > > >   >       >
>       > > >   >       >       Thanks!
>       > > >   >       >
>       > > >   >       >       Tony
>       > > >   >       >
>       > > >   >       >       > -----Original Message-----
>       > > >   >       >       > From: Han Zhou <zhou...@gmail.com
> <mailto:zhou...@gmail.com>
>       > > > <mailto:zhou...@gmail.com <mailto:zhou...@gmail.com> >
>       > > >   > <mailto:zhou...@gmail.com <mailto:zhou...@gmail.com>
> <mailto:zhou...@gmail.com <mailto:zhou...@gmail.com> > >
>       > > > <mailto:zhou...@gmail.com <mailto:zhou...@gmail.com>
> <mailto:zhou...@gmail.com <mailto:zhou...@gmail.com> >
>       > > >   > <mailto:zhou...@gmail.com <mailto:zhou...@gmail.com>
> <mailto:zhou...@gmail.com <mailto:zhou...@gmail.com> > > > >
>       > > >   >       >       > Sent: Friday, August 7, 2020 10:32 AM
>       > > >   >       >       > To: Tony Liu <tonyliu0...@hotmail.com
> <mailto:tonyliu0...@hotmail.com>
>       > > > <mailto:tonyliu0...@hotmail.com
> <mailto:tonyliu0...@hotmail.com> >
>       > > >   > <mailto:tonyliu0...@hotmail.com
> <mailto:tonyliu0...@hotmail.com>  <mailto:tonyliu0...@hotmail.com
> <mailto:tonyliu0...@hotmail.com> > >
>       > > >   >       > <mailto:tonyliu0...@hotmail.com
> <mailto:tonyliu0...@hotmail.com>
>       > > > <mailto:tonyliu0...@hotmail.com
> <mailto:tonyliu0...@hotmail.com> >
>       > > >   > <mailto:tonyliu0...@hotmail.com
> <mailto:tonyliu0...@hotmail.com>
>       > > > <mailto:tonyliu0...@hotmail.com
> <mailto:tonyliu0...@hotmail.com> > > > >
>       > > >   >       >       > Cc: ovs-discuss <ovs-
> disc...@openvswitch.org <mailto:ovs-disc...@openvswitch.org>
>       > > > <mailto:ovs-disc...@openvswitch.org <mailto:ovs-
> disc...@openvswitch.org> >  <mailto:ovs- <mailto:ovs->  <mailto:ovs-
> <mailto:ovs-> >
>       > > >   > disc...@openvswitch.org <mailto:disc...@openvswitch.org>
> <mailto:disc...@openvswitch.org <mailto:disc...@openvswitch.org> > >
>       > > > <mailto:ovs- <mailto:ovs->  <mailto:ovs- <mailto:ovs-> >
> <mailto:ovs- <mailto:ovs->  <mailto:ovs- <mailto:ovs-> > >
>       > > >   >       > disc...@openvswitch.org
> <mailto:disc...@openvswitch.org>  <mailto:disc...@openvswitch.org
> <mailto:disc...@openvswitch.org> >
>       > > > <mailto:disc...@openvswitch.org
> <mailto:disc...@openvswitch.org>
>       > <mailto:disc...@openvswitch.org
> <mailto:disc...@openvswitch.org> > > > >;
>       > > >   > ovs-dev <ovs-
>       > > >   >       >       > d...@openvswitch.org
> <mailto:d...@openvswitch.org>  <mailto:d...@openvswitch.org
> <mailto:d...@openvswitch.org> >
>       > > > <mailto:d...@openvswitch.org <mailto:d...@openvswitch.org>
> <mailto:d...@openvswitch.org <mailto:d...@openvswitch.org> > >
>       > > >   > <mailto:d...@openvswitch.org <mailto:d...@openvswitch.org>
> <mailto:d...@openvswitch.org <mailto:d...@openvswitch.org> >
>       > > > <mailto:d...@openvswitch.org <mailto:d...@openvswitch.org>
> <mailto:d...@openvswitch.org <mailto:d...@openvswitch.org> > > > >
>       > > >   >       >       > Subject: Re: [ovs-discuss] [OVN] ovn-
> controller
>       > > > takes 100%
>       > > >   > cpu
>       > > >   >       > while no
>       > > >   >       >       > changes in sb-db
>       > > >   >       >       >
>       > > >   >       >       >
>       > > >   >       >       >
>       > > >   >       >       > On Fri, Aug 7, 2020 at 10:05 AM Tony Liu
>       > > >   > <tonyliu0...@hotmail.com <mailto:tonyliu0...@hotmail.com>
> <mailto:tonyliu0...@hotmail.com <mailto:tonyliu0...@hotmail.com> >
>       > > > <mailto:tonyliu0...@hotmail.com
> <mailto:tonyliu0...@hotmail.com>  <mailto:tonyliu0...@hotmail.com
> <mailto:tonyliu0...@hotmail.com> > >
>       > > >   >       > <mailto:tonyliu0...@hotmail.com
> <mailto:tonyliu0...@hotmail.com>
>       > > > <mailto:tonyliu0...@hotmail.com
> <mailto:tonyliu0...@hotmail.com> >  <mailto:tonyliu0...@hotmail.com
> <mailto:tonyliu0...@hotmail.com>
>       > > > <mailto:tonyliu0...@hotmail.com
> <mailto:tonyliu0...@hotmail.com> > > >
>       > > >   >       >       > <mailto:tonyliu0...@hotmail.com
> <mailto:tonyliu0...@hotmail.com>
>       > > > <mailto:tonyliu0...@hotmail.com
> <mailto:tonyliu0...@hotmail.com> >
>       > > >   > <mailto:tonyliu0...@hotmail.com
> <mailto:tonyliu0...@hotmail.com>  <mailto:tonyliu0...@hotmail.com
> <mailto:tonyliu0...@hotmail.com> > >
>       > > >   >       > <mailto:tonyliu0...@hotmail.com
> <mailto:tonyliu0...@hotmail.com>
>       > > > <mailto:tonyliu0...@hotmail.com
> <mailto:tonyliu0...@hotmail.com> >
>       > > >   > <mailto:tonyliu0...@hotmail.com
> <mailto:tonyliu0...@hotmail.com>
>       > > > <mailto:tonyliu0...@hotmail.com
> <mailto:tonyliu0...@hotmail.com> > > > > > wrote:
>       > > >   >       >       >
>       > > >   >       >       >
>       > > >   >       >       >       Hi,
>       > > >   >       >       >
>       > > >   >       >       >       Here are some logging snippets from
> ovn-
>       > > > controller.
>       > > >   >       >       >       ========
>       > > >   >       >       >       2020-08-
>       > > >   > 07T16:38:04.020Z|29250|timeval|WARN|Unreasonably
>       > > >   >       > long
>       > > >   >       >       > 8954ms poll interval (8895ms user, 0ms
> system)
>       > > >   >       >       >       ========
>       > > >   >       >       >       What's that mean? Is it harmless?
>       > > >   >       >       >
>       > > >   >       >       >       ========
>       > > >   >       >       >       2020-08-
>       > > > 07T16:38:04.021Z|29251|timeval|WARN|context
>       > > >   >       > switches: 0
>       > > >   >       >       > voluntary, 6 involuntary
>       > > >   >       >       >       2020-08-
>       > > >   > 07T16:38:04.022Z|29252|poll_loop|INFO|wakeup due to
>       > > >   >       > [POLLIN]
>       > > >   >       >       > on fd 19 (10.6.20.91:60398
> <http://10.6.20.91:60398>
>       > > > <http://10.6.20.91:60398>  <http://10.6.20.91:60398>
>       > > >   > <http://10.6.20.91:60398>
>       > > >   >       > <http://10.6.20.91:60398> <->10.6.20.86:6642
> <http://10.6.20.86:6642>
>       > > > <http://10.6.20.86:6642>
>       > > >   > <http://10.6.20.86:6642>  <http://10.6.20.86:6642>
>       > > >   >       >       > <http://10.6.20.86:6642> ) at lib/stream-
> fd.c:157
>       > > > (99%
>       > > >   > CPU usage)
>       > > >   >       >       >       ========
>       > > >   >       >       >       Is this wakeup caused by changes in
> sb-db?
>       > > >   >       >       >       Why is ovn-controller so busy?
>       > > >   >       >       >
>       > > >   >       >       >       ========
>       > > >   >       >       >       2020-08-
>       > > >   >       >
> 07T16:38:04.022Z|29253|reconnect|WARN|tcp:127.0.0.1:6640
> <http://127.0.0.1:6640>
>       > > > <http://127.0.0.1:6640>
>       > > >   > <http://127.0.0.1:6640>
>       > > >   >       > <http://127.0.0.1:6640>
>       > > >   >       >       > <http://127.0.0.1:6640> : connection
> dropped
>       > > > (Broken pipe)
>       > > >   >       >       >       ========
>       > > >   >       >       >       Connection to local ovsdb-server is
> dropped.
>       > > >   >       >       >       Is this caused by the timeout of
> inactivity
>       > > > probe?
>       > > >   >       >       >
>       > > >   >       >       >       ========
>       > > >   >       >       >       2020-08-
>       > > >   > 07T16:38:04.035Z|29254|poll_loop|INFO|wakeup due to
>       > > >   >       > [POLLIN]
>       > > >   >       >       > on fd 20 (<->/var/run/openvswitch/br-
> int.mgmt) at
>       > > >   > lib/stream-
>       > > >   >       > fd.c:157
>       > > >   >       >       > (99% CPU usage)
>       > > >   >       >       >       ========
>       > > >   >       >       >       What causes this wakeup?
>       > > >   >       >       >
>       > > >   >       >       >       ========
>       > > >   >       >       >       2020-08-
>       > > >   > 07T16:38:04.048Z|29255|poll_loop|INFO|wakeup due to
>       > > >   >       > 0-ms
>       > > >   >       >       > timeout at lib/ovsdb-idl.c:5391 (99% CPU
> usage)
>       > > >   >       >       >       ========
>       > > >   >       >       >       What's this 0-ms wakeup mean?
>       > > >   >       >       >
>       > > >   >       >       >       ========
>       > > >   >       >       >       2020-08-
>       > > >   > 07T16:38:05.022Z|29256|poll_loop|INFO|wakeup due to
>       > > >   >       > 962-ms
>       > > >   >       >       > timeout at lib/reconnect.c:643 (99% CPU
> usage)
>       > > >   >       >       >       2020-08-
>       > > >   >       >
> 07T16:38:05.023Z|29257|reconnect|INFO|tcp:127.0.0.1:6640
> <http://127.0.0.1:6640>
>       > > > <http://127.0.0.1:6640>
>       > > >   > <http://127.0.0.1:6640>
>       > > >   >       > <http://127.0.0.1:6640>
>       > > >   >       >       > <http://127.0.0.1:6640> : connecting...
>       > > >   >       >       >       2020-08-
>       > > >   > 07T16:38:05.041Z|29258|poll_loop|INFO|wakeup due to
>       > > >   >       >       > [POLLOUT] on fd 14 (127.0.0.1:51478
> <http://127.0.0.1:51478>
>       > > > <http://127.0.0.1:51478>
>       > > >   > <http://127.0.0.1:51478>  <http://127.0.0.1:51478>
>       > > >   >       > <http://127.0.0.1:51478> <-
>       > > >   >       >       > >127.0.0.1:6640 <http://127.0.0.1:6640>
> <http://127.0.0.1:6640>
>       > > > <http://127.0.0.1:6640>
>       > > >   > <http://127.0.0.1:6640>  <http://127.0.0.1:6640> )
>       > > >   >       > at lib/stream-fd.c:153 (99%
>       > > >   >       >       > CPU usage)
>       > > >   >       >       >       2020-08-
>       > > >   >       >
> 07T16:38:05.041Z|29259|reconnect|INFO|tcp:127.0.0.1:6640
> <http://127.0.0.1:6640>
>       > > > <http://127.0.0.1:6640>
>       > > >   > <http://127.0.0.1:6640>
>       > > >   >       > <http://127.0.0.1:6640>
>       > > >   >       >       > <http://127.0.0.1:6640> : connected
>       > > >   >       >       >       ================
>       > > >   >       >       >       Retry to connect to local ovsdb-
> server. A
>       > > > pollout
>       > > >   > event is
>       > > >   >       >       > triggered
>       > > >   >       >       >       right after connection is
> established.
>       > > > What's
>       > > >   > poolout?
>       > > >   >       >       >
>       > > >   >       >       >       ovn-controller is taking 100% CPU
> now, and
>       > > > there is
>       > > >   > no
>       > > >   >       > changes in
>       > > >   >       >       >       sb-db (not busy). It seems that
> it's busy
>       > > > with
>       > > >   > local ovsdb-
>       > > >   >       > server
>       > > >   >       >       >       or vswitchd. I'd like to understand
> why
>       > > > ovn-
>       > > >   > controller is
>       > > >   >       > so busy?
>       > > >   >       >       >       All inactivity probe intervals are
> set to
>       > > > 30s.
>       > > >   >       >       >
>       > > >   >       >       >
>       > > >   >       >       >
>       > > >   >       >       >
>       > > >   >       >       > Is there change from the local ovsdb? You
> can
>       > > > enable dbg
>       > > >   > log to
>       > > >   >       > see what
>       > > >   >       >       > is happening.
>       > > >   >       >       > For the local ovsdb probe, I have
> mentioned in
>       > > > the other
>       > > >   > thread
>       > > >   >       > that
>       > > >   >       >       > UNIX socket is recommended (instead of
> tcp
>       > > > 127.0.0.1).
>       > > >   > Using UNIX
>       > > >   >       > socket
>       > > >   >       >       > disables probe by default.
>       > > >   >       >       >
>       > > >   >       >       > Thanks,
>       > > >   >       >       > Han
>       > > >   >       >
>       > > >   >       >
>       > > >   >
>       > > >   >
>       > > >
>       > > >
>       > >
>       > > _______________________________________________
>       > > dev mailing list
>       > > d...@openvswitch.org <mailto:d...@openvswitch.org>
>       > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> 

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to