Following up on this since it is still an issue. strace on the thread in question shows the following:
13:35:47 poll([{fd=23, events=POLLIN}], 1, 0) = 0 (Timeout) <0.000018> 13:35:47 epoll_wait(42, [{EPOLLIN, {u32=3, u64=3}}], 9, 0) = 1 <0.000018> 13:35:47 recvmsg(417, {msg_namelen=0}, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable) <0.000018> 13:35:47 poll([{fd=11, events=POLLIN}, {fd=42, events=POLLIN}, {fd=23, events=POLLIN}], 3, 2147483647) = 1 ([{fd=42, revents=POLLIN}]) <0.000019> 13:35:47 getrusage(RUSAGE_THREAD, {ru_utime={tv_sec=490842, tv_usec=749026}, ru_stime={tv_sec=710657, tv_usec=442946}, ...}) = 0 <0.000018> 13:35:47 poll([{fd=23, events=POLLIN}], 1, 0) = 0 (Timeout) <0.000018> 13:35:47 epoll_wait(42, [{EPOLLIN, {u32=3, u64=3}}], 9, 0) = 1 <0.000019> 13:35:47 recvmsg(417, {msg_namelen=0}, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable) <0.000018> 13:35:47 poll([{fd=11, events=POLLIN}, {fd=42, events=POLLIN}, {fd=23, events=POLLIN}], 3, 2147483647) = 1 ([{fd=42, revents=POLLIN}]) <0.000019> 13:35:47 getrusage(RUSAGE_THREAD, {ru_utime={tv_sec=490842, tv_usec=749108}, ru_stime={tv_sec=710657, tv_usec=442946}, ...}) = 0 <0.000017> And if I strace with -c to collect a summary, after 4-5 seconds it shows the following: sudo strace -c -p 1658344 strace: Process 1658344 attached ^Cstrace: Process 1658344 detached % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 0.00 0.000000 0 9 write 0.00 0.000000 0 56397 poll 0.00 0.000000 0 28198 28198 recvmsg 0.00 0.000000 0 28199 getrusage 0.00 0.000000 0 148 56 futex 0.00 0.000000 0 28199 epoll_wait ------ ----------- ----------- --------- --------- ---------------- 100.00 0.000000 141150 28254 total I'm really at a loss here as to what's happening, has anyone seen behaviour like this? Cheers, Jamon On 02/05/2019 23:12, Jamon Camisso wrote: > I'm seeing an identical issue to the one posted here a few months ago: > > https://mail.openvswitch.org/pipermail/ovs-discuss/2018-October/047558.html > - I'll include the bug report template at the end. > > The issue is an ovs-vswitchd thread consuming 100% CPU in a very lightly > used Openstack Rocky cloud running on Bionic. Logs are filled with > entries like this, about ~14000 per day: > > 2019-05-01T18:34:30.110Z|237220|poll_loop(handler89)|INFO|Dropped > 1092844 log messages in last 6 seconds (most recently, 0 seconds ago) > due to excessive rate > > 2019-05-01T18:34:30.110Z|237221|poll_loop(handler89)|INFO|wakeup due to > [POLLIN] on fd 42 (unknown anon_inode:[eventpoll]) at > ../lib/dpif-netlink.c:2786 (99% CPU usage) > > ovs-vswitchd is running alongside various neutron processes > (lbaasv2-agent, metadata-agent, l3-agent, dhcp-agent, openvswitch-agent) > inside an LXC container on a physical host. There is a single neutron > router, and the entire environment including br-tun, br-ex, and br-int > traffic barely goes over 200KiB/s TX/RX combined. > > If it is an issue with the Ubuntu packaged version (the other report is > the same 2.10.0 package on Bionic which is suspicious), I've also filed > a bug to track things here: > https://bugs.launchpad.net/ubuntu/+source/openvswitch/+bug/1827264 > > Thanks for any feedback or troubleshooting steps anyone can provide. > > Cheers, Jamon > > > > > > Bug template: > > What you did that make the problem appear. > - host server was hard rebooted. lxc containers came back up fine, but > ovs-vswitchd thread is spinning CPU and has remained that way for 10 > days > > What you expected to happen. > - negligible CPU usage since the cloud isn't in production > > What actually happened. > - a single ovs-vswitchd thread is spinning at 100% CPU and logs are > populated with thousands of messages claiming a million+ messages > are dropped every 6 seconds > > The Open vSwitch version number (as output by ovs-vswitchd --version). > - ovs-vswitchd (Open vSwitch) 2.10.0 (the Ubuntu packaged version) > > The Git commit number (as output by git rev-parse HEAD) > - N/A > > Any local patches or changes you have applied (if any). > - N/A > > The kernel version on which Open vSwitch is running (from /proc/version) > - Linux version 4.15.0-47-generic (buildd@lgw01-amd64-001) (gcc > version 7.3.0 (Ubuntu 7.3.0-16ubuntu3)) #50-Ubuntu SMP Wed Mar 13 > 10:44:52 UTC 2019 > > The distribution and version number of your OS (e.g. “Centos 5.0”). > - Ubuntu 18.04.2 LTS > > The contents of the vswitchd configuration database (usually > /etc/openvswitch/conf.db). > - See attached conf.db.txt > > The output of ovs-dpctl show. > - See below: > > root@juju-df624b-4-lxd-10:~# ovs-dpctl show > system@ovs-system: > lookups: hit:223561120 missed:5768546 lost:798 > flows: 131 > masks: hit:2284286371 total:15 hit/pkt:9.96 > port 0: ovs-system (internal) > port 1: br-ex (internal) > port 2: eth1 > port 3: gre_sys (gre: packet_type=ptap) > port 4: br-tun (internal) > port 5: br-int (internal) > port 6: tapa062d6f1-40 > port 7: tapf90e3ab6-13 > port 8: tap45ba891c-4c > > root@juju-df624b-4-lxd-10:~# ovs-ofctl show br-int > OFPT_FEATURES_REPLY (xid=0x2): dpid:00003643edb09542 > n_tables:254, n_buffers:0 > capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP > actions: output enqueue set_vlan_vid set_vlan_pcp strip_vlan mod_dl_src > mod_dl_dst mod_nw_src mod_nw_dst mod_nw_tos mod_tp_src mod_tp_dst > 1(int-br-ex): addr:4a:b4:cc:dd:aa:ac > config: 0 > state: 0 > speed: 0 Mbps now, 0 Mbps max > 2(patch-tun): addr:7a:36:76:1f:47:6e > config: 0 > state: 0 > speed: 0 Mbps now, 0 Mbps max > 3(tapa062d6f1-40): addr:92:d4:6c:26:55:0a > config: 0 > state: 0 > current: 10GB-FD COPPER > speed: 10000 Mbps now, 0 Mbps max > 4(tapf90e3ab6-13): addr:9e:8b:4f:ae:8f:ba > config: 0 > state: 0 > current: 10GB-FD COPPER > speed: 10000 Mbps now, 0 Mbps max > 5(tap45ba891c-4c): addr:76:45:39:c8:d7:e0 > config: 0 > state: 0 > current: 10GB-FD COPPER > speed: 10000 Mbps now, 0 Mbps max > LOCAL(br-int): addr:36:43:ed:b0:95:42 > config: PORT_DOWN > state: LINK_DOWN > speed: 0 Mbps now, 0 Mbps max > OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0 > > root@juju-df624b-4-lxd-10:~# ovs-ofctl show br-tun > OFPT_FEATURES_REPLY (xid=0x2): dpid:000092c6c8f4a545 > n_tables:254, n_buffers:0 > capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP > actions: output enqueue set_vlan_vid set_vlan_pcp strip_vlan mod_dl_src > mod_dl_dst mod_nw_src mod_nw_dst mod_nw_tos mod_tp_src mod_tp_dst > 1(patch-int): addr:02:1d:bc:ec:0c:3b > config: 0 > state: 0 > speed: 0 Mbps now, 0 Mbps max > 2(gre-0a30029b): addr:fe:09:61:11:92:8b > config: 0 > state: 0 > speed: 0 Mbps now, 0 Mbps max > 3(gre-0a3002a2): addr:a2:64:93:48:e2:82 > config: 0 > state: 0 > speed: 0 Mbps now, 0 Mbps max > 4(gre-0a30029d): addr:12:b1:73:32:19:d4 > config: 0 > state: 0 > speed: 0 Mbps now, 0 Mbps max > 5(gre-0a3002ce): addr:ca:10:51:f2:0f:05 > config: 0 > state: 0 > speed: 0 Mbps now, 0 Mbps max > 6(gre-0a3002a1): addr:de:fe:95:33:d9:67 > config: 0 > state: 0 > speed: 0 Mbps now, 0 Mbps max > 7(gre-0a30029e): addr:5a:fa:ce:15:7b:5c > config: 0 > state: 0 > speed: 0 Mbps now, 0 Mbps max > 8(gre-0a3002c3): addr:b6:87:66:09:fc:04 > config: 0 > state: 0 > speed: 0 Mbps now, 0 Mbps max > 10(gre-0a3002a0): addr:ce:79:f2:bf:a8:94 > config: 0 > state: 0 > speed: 0 Mbps now, 0 Mbps max > 14(gre-0a30029f): addr:b6:b0:84:7d:9f:aa > config: 0 > state: 0 > speed: 0 Mbps now, 0 Mbps max > 18(gre-0a30029c): addr:da:14:12:54:1c:ec > config: 0 > state: 0 > speed: 0 Mbps now, 0 Mbps max > LOCAL(br-tun): addr:92:c6:c8:f4:a5:45 > config: PORT_DOWN > state: LINK_DOWN > speed: 0 Mbps now, 0 Mbps max > OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0 > > root@juju-df624b-4-lxd-10:~# ovs-ofctl show br-ex > OFPT_FEATURES_REPLY (xid=0x2): dpid:000000163e9a47a2 > n_tables:254, n_buffers:0 > capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP > actions: output enqueue set_vlan_vid set_vlan_pcp strip_vlan mod_dl_src > mod_dl_dst mod_nw_src mod_nw_dst mod_nw_tos mod_tp_src mod_tp_dst > 1(eth1): addr:00:16:3e:9a:47:a2 > config: 0 > state: 0 > current: 10GB-FD COPPER > speed: 10000 Mbps now, 0 Mbps max > 2(phy-br-ex): addr:ce:a8:6a:25:55:0e > config: 0 > state: 0 > speed: 0 Mbps now, 0 Mbps max > LOCAL(br-ex): addr:00:16:3e:9a:47:a2 > config: 0 > state: 0 > speed: 0 Mbps now, 0 Mbps max > OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0 > > > _______________________________________________ > discuss mailing list > disc...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss > _______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss