Hi All: As per the discussions/requests by Mark and Numan, I finally reverted the mtu patch (commit-id 8c319e8b73032e06c7dd1832b3b31f8a1189dcd1) on branch-2.9 and re-ran the test with 10k lports to bind on farms, with 8 LRs and 40 LS ;and results improvised. Since ovs did not go super hot, it did result in completion for 10k ports binding to HVs in 5 hrs 28 minutes vs 8 hours with mtu patch. Thus, the extra strcmp did add the overhead. Cpu idle graph of farm with 50 Hvs running 2.9 with/without mtu is available @ https://raw.githubusercontent.com/noah8713/ovn-scale-test/4cef99dbe9a0677a1b2d852b7f4f429ede340875/results/overlay/farm_cpu_2.9.png which indicates ;running non-mtu patch had more idle cpu percentage vs with mtu patch. In addition, I have also captured ovs-vswitchd process cpu util on farm since ovs itself was creating a bottleneck by slowing down the port binding on the computes.Graph is available @ https://raw.githubusercontent.com/noah8713/ovn-scale-test/scale_results/results/overlay/ovs-vswitchd_2.9_util.png Hence, overall performance improved which resulted in faster completion of all 10k port bindings.
On Thu, Feb 15, 2018 at 12:20 PM, Mark Michelson <mmich...@redhat.com> wrote: > > > On 02/08/2018 07:55 PM, Han Zhou wrote: > >> >> >> On Wed, Feb 7, 2018 at 12:47 PM, Han Zhou <zhou...@gmail.com <mailto: >> zhou...@gmail.com>> wrote: >> > >> > When doing scale testing for OVN (using https://github.com/openvswitch >> /ovn-scale-test), we had some interesting findings, and need some help >> here. >> > >> > We ran the test "create and bind lports" against branch 2.9 and branch >> 2.6, and we found that 2.6 was must faster. With some analysis, we found >> out the reason is not because of OVN gets slower in 2.9, but because the >> bottleneck of this test in branch 2.9 is ovs-vswitchd. >> > >> > The testing was run in an environment with 20 farm nodes, each has 50 >> sandbox HVs (I will just mention them as HVs in short). Before the test, >> there are already 9500 lports bound in 950 HVs on 19 farm nodes. The test >> run against the last farm node to bind the lport on the 50 HVs there. The >> steps in the test scenario are: >> > >> > 1. Create 5 new LSs in NB (so that the LSs will not be shared with any >> of HVs on other farm nodes) >> > 2. create 100 lports in NB on a LS >> > 3. bind these lports on HVs, 2 for each HV. They are bound >> sequentially on each HV, and for each HV the 2 ports are bound using one >> command together: ovs-vsctl add-port <port 1> -- set Interface >> external-ids:... -- add-port <port 2> -- set Interface external-ids:... >> (the script didn't set type to internal, but I hope it is not an issue for >> this test). >> > 4. wait the port stated changed to up in NB for all the 100 lports >> (with a single ovn-nbctl command) >> > >> > These steps are repeated for 5 times, one for each LS. So in the end >> we got 500 more lports created and bound (the total scale is then 1k HVs >> and 10k lports). >> > >> > When running with 2.6, the ovn-controllers were taking most of the CPU >> time. However, with 2.9, the CPU of ovn-controllers spikes but there is >> always ovs-vswitchd on the top with 100% CPU. It means the ovs-vswitchd is >> the bottleneck in this testing. There is only one ovs-vswitchd with 100% at >> the same time and different ovs-vswitchd will spike one after another, >> since the ports are bound sequentially on each HV. From the rally log, each >> 2 ports binding takes around 4 - 5 seconds. This is just the ovs-vsctl >> command execution time. The 100% CPU of ovs-vswitchd explains the slowness. >> > >> > So, based on this result, we can not using the total time to evaluate >> the efficiency of OVN, instead we can evaluate by CPU cost of >> ovn-controller processes. In fact, 2.9 ovn-controller costs around 70% less >> CPU than 2.6, which I think is due to some optimization we made earlier. >> (With my work-in-progress patch it saves much more, and I will post later >> as RFC). >> > >> > However, I cannot explain why ovs-vswitchd is getting slower than 2.6 >> when doing port-binding. We need expert suggestions here, for what could be >> the possible reason of this slowness. We can do more testing with different >> versions between 2.6 and 2.9 to find out related change, but with some >> pointers it might save some effort. Below are some logs of ovs-vswitchd >> when port binding is happening: >> > >> > ====== >> > 2018-02-07T00:12:54.558Z|01767|bridge|INFO|bridge br-int: added >> interface lport_bc65cd_QFOU3v on port 1028 >> > 2018-02-07T00:12:55.629Z|01768|timeval|WARN|Unreasonably long 1112ms >> poll interval (1016ms user, 4ms system) >> > 2018-02-07T00:12:55.629Z|01769|timeval|WARN|faults: 336 minor, 0 major >> > 2018-02-07T00:12:55.629Z|01770|timeval|WARN|context switches: 0 >> voluntary, 13 involuntary >> > 2018-02-07T00:12:55.629Z|01771|coverage|INFO|Event coverage, avg rate >> over last: 5 seconds, last minute, last hour, hash=b256889c: >> > 2018-02-07T00:12:55.629Z|01772|coverage|INFO|bridge_reconfigure >> 0.0/sec 0.000/sec 0.0056/sec total: 29 >> > 2018-02-07T00:12:55.629Z|01773|coverage|INFO|ofproto_flush >> 0.0/sec 0.000/sec 0.0000/sec total: 1 >> > 2018-02-07T00:12:55.629Z|01774|coverage|INFO|ofproto_packet_out >> 0.0/sec 0.000/sec 0.0111/sec total: 90 >> > 2018-02-07T00:12:55.629Z|01775|coverage|INFO|ofproto_recv_openflow >> 0.2/sec 0.033/sec 0.4858/sec total: 6673 >> > 2018-02-07T00:12:55.629Z|01776|coverage|INFO|ofproto_update_port >> 0.0/sec 0.000/sec 5.5883/sec total: 28258 >> > 2018-02-07T00:12:55.629Z|01777|coverage|INFO|rev_reconfigure >> 0.0/sec 0.000/sec 0.0056/sec total: 32 >> > 2018-02-07T00:12:55.629Z|01778|coverage|INFO|rev_port_toggled >> 0.0/sec 0.000/sec 0.0011/sec total: 6 >> > 2018-02-07T00:12:55.629Z|01779|coverage|INFO|rev_flow_table >> 0.0/sec 0.000/sec 0.0078/sec total: 124 >> > 2018-02-07T00:12:55.629Z|01780|coverage|INFO|rev_mac_learning >> 0.0/sec 0.000/sec 0.0036/sec total: 13 >> > 2018-02-07T00:12:55.629Z|01781|coverage|INFO|xlate_actions >> 0.0/sec 0.000/sec 0.0111/sec total: 90 >> > 2018-02-07T00:12:55.629Z|01782|coverage|INFO|ccmap_expand >> 0.0/sec 0.000/sec 0.0050/sec total: 36 >> > 2018-02-07T00:12:55.629Z|01783|coverage|INFO|ccmap_shrink >> 0.0/sec 0.000/sec 0.0000/sec total: 49 >> > 2018-02-07T00:12:55.629Z|01784|coverage|INFO|cmap_expand >> 0.0/sec 0.000/sec 0.0217/sec total: 257 >> > 2018-02-07T00:12:55.629Z|01785|coverage|INFO|cmap_shrink >> 0.0/sec 0.000/sec 0.0000/sec total: 141 >> > 2018-02-07T00:12:55.629Z|01786|coverage|INFO|dpif_port_add >> 0.0/sec 0.000/sec 0.0022/sec total: 1020 >> > 2018-02-07T00:12:55.629Z|01787|coverage|INFO|dpif_port_del >> 0.0/sec 0.000/sec 0.0000/sec total: 40 >> > 2018-02-07T00:12:55.629Z|01788|coverage|INFO|dpif_flow_flush >> 0.0/sec 0.000/sec 0.0000/sec total: 2 >> > 2018-02-07T00:12:55.629Z|01789|coverage|INFO|dpif_flow_get >> 0.0/sec 0.000/sec 0.0000/sec total: 20 >> > 2018-02-07T00:12:55.629Z|01790|coverage|INFO|dpif_flow_put >> 0.0/sec 0.000/sec 0.0000/sec total: 21 >> > 2018-02-07T00:12:55.629Z|01791|coverage|INFO|dpif_flow_del >> 0.0/sec 0.000/sec 0.0000/sec total: 20 >> > 2018-02-07T00:12:55.629Z|01792|coverage|INFO|dpif_execute >> 0.0/sec 0.000/sec 0.0100/sec total: 71 >> > 2018-02-07T00:12:55.629Z|01793|coverage|INFO|flow_extract >> 0.0/sec 0.000/sec 0.0000/sec total: 4 >> > 2018-02-07T00:12:55.629Z|01794|coverage|INFO|miniflow_malloc >> 0.0/sec 0.000/sec 0.4361/sec total: 6435 >> > 2018-02-07T00:12:55.629Z|01795|coverage|INFO|hindex_pathological >> 0.0/sec 0.000/sec 0.0000/sec total: 17 >> > 2018-02-07T00:12:55.629Z|01796|coverage|INFO|hindex_expand >> 0.0/sec 0.000/sec 0.0000/sec total: 9 >> > 2018-02-07T00:12:55.629Z|01797|coverage|INFO|hmap_pathological >> 0.0/sec 0.000/sec 0.8131/sec total: 13752 >> > 2018-02-07T00:12:55.630Z|01798|coverage|INFO|hmap_expand >> 21.0/sec 18.333/sec 11253.0167/sec total: 55905271 >> > 2018-02-07T00:12:55.630Z|01799|coverage|INFO|hmap_shrink >> 0.0/sec 0.000/sec 0.0022/sec total: 1029 >> > 2018-02-07T00:12:55.630Z|01800|coverage|INFO|mac_learning_learned >> 0.0/sec 0.000/sec 0.0022/sec total: 18 >> > 2018-02-07T00:12:55.630Z|01801|coverage|INFO|mac_learning_expired >> 0.0/sec 0.000/sec 0.0017/sec total: 6 >> > 2018-02-07T00:12:55.630Z|01802|coverage|INFO|netdev_sent >> 0.0/sec 0.000/sec 0.0400/sec total: 350 >> > 2018-02-07T00:12:55.630Z|01803|coverage|INFO|netdev_get_stats >> 203.4/sec 203.400/sec 198.4978/sec total: 798522 >> > 2018-02-07T00:12:55.630Z|01804|coverage|INFO|txn_unchanged >> 0.2/sec 0.200/sec 0.1911/sec total: 776 >> > 2018-02-07T00:12:55.630Z|01805|coverage|INFO|txn_incomplete >> 0.0/sec 0.000/sec 0.0167/sec total: 106 >> > 2018-02-07T00:12:55.630Z|01806|coverage|INFO|txn_success >> 0.0/sec 0.000/sec 0.0111/sec total: 48 >> > 2018-02-07T00:12:55.630Z|01807|coverage|INFO|poll_create_node >> 162.6/sec 157.267/sec 156.3664/sec total: 636465 >> > 2018-02-07T00:12:55.630Z|01808|coverage|INFO|poll_zero_timeout >> 0.2/sec 0.250/sec 0.4564/sec total: 1936 >> > 2018-02-07T00:12:55.630Z|01809|coverage|INFO|rconn_queued >> 0.2/sec 0.033/sec 0.0453/sec total: 1214 >> > 2018-02-07T00:12:55.630Z|01810|coverage|INFO|rconn_sent >> 0.2/sec 0.033/sec 0.0453/sec total: 1214 >> > 2018-02-07T00:12:55.630Z|01811|coverage|INFO|seq_change >> 1098.8/sec 1099.167/sec 1097.4869/sec total: 4468823 >> > 2018-02-07T00:12:55.630Z|01812|coverage|INFO|pstream_open >> 0.0/sec 0.000/sec 0.0000/sec total: 5 >> > 2018-02-07T00:12:55.630Z|01813|coverage|INFO|stream_open >> 0.0/sec 0.000/sec 0.0000/sec total: 1 >> > 2018-02-07T00:12:55.630Z|01814|coverage|INFO|util_xalloc >> 7223.8/sec 6527.317/sec 51569.9900/sec total: 251999059 >> > 2018-02-07T00:12:55.630Z|01815|coverage|INFO|vconn_received >> 0.2/sec 0.033/sec 0.4858/sec total: 6675 >> > 2018-02-07T00:12:55.630Z|01816|coverage|INFO|vconn_sent >> 0.2/sec 0.033/sec 0.0453/sec total: 1216 >> > 2018-02-07T00:12:55.630Z|01817|coverage|INFO|netlink_received >> 0.0/sec 0.000/sec 0.0000/sec total: 3 >> > 2018-02-07T00:12:55.630Z|01818|coverage|INFO|netlink_sent >> 0.0/sec 0.000/sec 0.0000/sec total: 1 >> > 2018-02-07T00:12:55.630Z|01819|coverage|INFO|57 events never hit >> > 2018-02-07T00:12:56.761Z|01820|timeval|WARN|Unreasonably long 1112ms >> poll interval (948ms user, 4ms system) >> > 2018-02-07T00:12:56.761Z|01821|timeval|WARN|faults: 7 minor, 0 major >> > 2018-02-07T00:12:56.761Z|01822|timeval|WARN|context switches: 2 >> voluntary, 17 involuntary >> > 2018-02-07T00:12:56.761Z|01823|coverage|INFO|Event coverage, avg rate >> over last: 5 seconds, last minute, last hour, hash=fcc57021: >> > 2018-02-07T00:12:56.761Z|01824|coverage|INFO|bridge_reconfigure >> 0.0/sec 0.000/sec 0.0056/sec total: 30 >> > 2018-02-07T00:12:56.761Z|01825|coverage|INFO|ofproto_flush >> 0.0/sec 0.000/sec 0.0000/sec total: 1 >> > 2018-02-07T00:12:56.761Z|01826|coverage|INFO|ofproto_packet_out >> 0.0/sec 0.000/sec 0.0111/sec total: 90 >> > 2018-02-07T00:12:56.761Z|01827|coverage|INFO|ofproto_recv_openflow >> 0.2/sec 0.033/sec 0.4858/sec total: 6683 >> > 2018-02-07T00:12:56.761Z|01828|coverage|INFO|ofproto_update_port >> 0.0/sec 0.000/sec 5.5883/sec total: 29268 >> > 2018-02-07T00:12:56.761Z|01829|coverage|INFO|rev_reconfigure >> 0.0/sec 0.000/sec 0.0056/sec total: 33 >> > 2018-02-07T00:12:56.761Z|01830|coverage|INFO|rev_port_toggled >> 0.0/sec 0.000/sec 0.0011/sec total: 6 >> > 2018-02-07T00:12:56.761Z|01831|coverage|INFO|rev_flow_table >> 0.0/sec 0.000/sec 0.0078/sec total: 124 >> > 2018-02-07T00:12:56.761Z|01832|coverage|INFO|rev_mac_learning >> 0.0/sec 0.000/sec 0.0036/sec total: 13 >> > 2018-02-07T00:12:56.761Z|01833|coverage|INFO|xlate_actions >> 0.0/sec 0.000/sec 0.0111/sec total: 90 >> > 2018-02-07T00:12:56.761Z|01834|coverage|INFO|ccmap_expand >> 0.0/sec 0.000/sec 0.0050/sec total: 36 >> > 2018-02-07T00:12:56.761Z|01835|coverage|INFO|ccmap_shrink >> 0.0/sec 0.000/sec 0.0000/sec total: 49 >> > 2018-02-07T00:12:56.761Z|01836|coverage|INFO|cmap_expand >> 0.0/sec 0.000/sec 0.0217/sec total: 257 >> > 2018-02-07T00:12:56.761Z|01837|coverage|INFO|cmap_shrink >> 0.0/sec 0.000/sec 0.0000/sec total: 141 >> > 2018-02-07T00:12:56.761Z|01838|coverage|INFO|dpif_port_add >> 0.0/sec 0.000/sec 0.0022/sec total: 1020 >> > 2018-02-07T00:12:56.761Z|01839|coverage|INFO|dpif_port_del >> 0.0/sec 0.000/sec 0.0000/sec total: 40 >> > 2018-02-07T00:12:56.761Z|01840|coverage|INFO|dpif_flow_flush >> 0.0/sec 0.000/sec 0.0000/sec total: 2 >> > 2018-02-07T00:12:56.761Z|01841|coverage|INFO|dpif_flow_get >> 0.0/sec 0.000/sec 0.0000/sec total: 20 >> > 2018-02-07T00:12:56.761Z|01842|coverage|INFO|dpif_flow_put >> 0.0/sec 0.000/sec 0.0000/sec total: 21 >> > 2018-02-07T00:12:56.761Z|01843|coverage|INFO|dpif_flow_del >> 0.0/sec 0.000/sec 0.0000/sec total: 20 >> > 2018-02-07T00:12:56.761Z|01844|coverage|INFO|dpif_execute >> 0.0/sec 0.000/sec 0.0100/sec total: 71 >> > 2018-02-07T00:12:56.761Z|01845|coverage|INFO|flow_extract >> 0.0/sec 0.000/sec 0.0000/sec total: 4 >> > 2018-02-07T00:12:56.761Z|01846|coverage|INFO|miniflow_malloc >> 0.0/sec 0.000/sec 0.4361/sec total: 6440 >> > 2018-02-07T00:12:56.761Z|01847|coverage|INFO|hindex_pathological >> 0.0/sec 0.000/sec 0.0000/sec total: 17 >> > 2018-02-07T00:12:56.761Z|01848|coverage|INFO|hindex_expand >> 0.0/sec 0.000/sec 0.0000/sec total: 9 >> > 2018-02-07T00:12:56.761Z|01849|coverage|INFO|hmap_pathological >> 0.0/sec 0.000/sec 0.8131/sec total: 13825 >> > 2018-02-07T00:12:56.761Z|01850|coverage|INFO|hmap_expand >> 21.0/sec 18.333/sec 11253.0167/sec total: 57949607 >> > 2018-02-07T00:12:56.761Z|01851|coverage|INFO|hmap_shrink >> 0.0/sec 0.000/sec 0.0022/sec total: 1029 >> > 2018-02-07T00:12:56.761Z|01852|coverage|INFO|mac_learning_learned >> 0.0/sec 0.000/sec 0.0022/sec total: 18 >> > 2018-02-07T00:12:56.761Z|01853|coverage|INFO|mac_learning_expired >> 0.0/sec 0.000/sec 0.0017/sec total: 6 >> > 2018-02-07T00:12:56.761Z|01854|coverage|INFO|netdev_sent >> 0.0/sec 0.000/sec 0.0400/sec total: 350 >> > 2018-02-07T00:12:56.761Z|01855|coverage|INFO|netdev_get_stats >> 203.4/sec 203.400/sec 198.4978/sec total: 799540 >> > 2018-02-07T00:12:56.761Z|01856|coverage|INFO|txn_unchanged >> 0.2/sec 0.200/sec 0.1911/sec total: 777 >> > 2018-02-07T00:12:56.761Z|01857|coverage|INFO|txn_incomplete >> 0.0/sec 0.000/sec 0.0167/sec total: 108 >> > 2018-02-07T00:12:56.761Z|01858|coverage|INFO|txn_success >> 0.0/sec 0.000/sec 0.0111/sec total: 49 >> > 2018-02-07T00:12:56.761Z|01859|coverage|INFO|poll_create_node >> 162.6/sec 157.267/sec 156.3664/sec total: 636681 >> > 2018-02-07T00:12:56.761Z|01860|coverage|INFO|poll_zero_timeout >> 0.2/sec 0.250/sec 0.4564/sec total: 1937 >> > 2018-02-07T00:12:56.761Z|01861|coverage|INFO|rconn_queued >> 0.2/sec 0.033/sec 0.0453/sec total: 1216 >> > 2018-02-07T00:12:56.761Z|01862|coverage|INFO|rconn_sent >> 0.2/sec 0.033/sec 0.0453/sec total: 1216 >> > 2018-02-07T00:12:56.761Z|01863|coverage|INFO|seq_change >> 1098.8/sec 1099.167/sec 1097.4869/sec total: 4471434 >> > 2018-02-07T00:12:56.761Z|01864|coverage|INFO|pstream_open >> 0.0/sec 0.000/sec 0.0000/sec total: 5 >> > 2018-02-07T00:12:56.761Z|01865|coverage|INFO|stream_open >> 0.0/sec 0.000/sec 0.0000/sec total: 1 >> > 2018-02-07T00:12:56.761Z|01866|coverage|INFO|util_xalloc >> 7223.8/sec 6527.317/sec 51569.9900/sec total: 260206006 >> > 2018-02-07T00:12:56.761Z|01867|coverage|INFO|vconn_received >> 0.2/sec 0.033/sec 0.4858/sec total: 6685 >> > 2018-02-07T00:12:56.761Z|01868|coverage|INFO|vconn_sent >> 0.2/sec 0.033/sec 0.0453/sec total: 1218 >> > 2018-02-07T00:12:56.761Z|01869|coverage|INFO|netlink_received >> 0.0/sec 0.000/sec 0.0000/sec total: 3 >> > 2018-02-07T00:12:56.761Z|01870|coverage|INFO|netlink_sent >> 0.0/sec 0.000/sec 0.0000/sec total: 1 >> > 2018-02-07T00:12:56.761Z|01871|coverage|INFO|57 events never hit >> > 2018-02-07T00:12:57.054Z|01872|bridge|INFO|bridge br-int: added >> interface lport_bc65cd_lykjnm on port 1029 >> > 2018-02-07T00:12:58.202Z|01873|timeval|WARN|Unreasonably long 1179ms >> poll interval (1032ms user, 0ms system) >> > 2018-02-07T00:12:58.202Z|01874|timeval|WARN|faults: 41 minor, 0 major >> > 2018-02-07T00:12:58.202Z|01875|timeval|WARN|context switches: 0 >> voluntary, 19 involuntary >> > 2018-02-07T00:12:58.203Z|01876|coverage|INFO|Event coverage, avg rate >> over last: 5 seconds, last minute, last hour, hash=668121b0: >> > 2018-02-07T00:12:58.203Z|01877|coverage|INFO|bridge_reconfigure >> 0.0/sec 0.000/sec 0.0056/sec total: 31 >> > 2018-02-07T00:12:58.203Z|01878|coverage|INFO|ofproto_flush >> 0.0/sec 0.000/sec 0.0000/sec total: 1 >> > 2018-02-07T00:12:58.203Z|01879|coverage|INFO|ofproto_packet_out >> 0.0/sec 0.000/sec 0.0111/sec total: 90 >> > 2018-02-07T00:12:58.203Z|01880|coverage|INFO|ofproto_recv_openflow >> 0.2/sec 0.033/sec 0.4858/sec total: 6733 >> > 2018-02-07T00:12:58.203Z|01881|coverage|INFO|ofproto_update_port >> 0.0/sec 0.000/sec 5.5883/sec total: 30279 >> > 2018-02-07T00:12:58.203Z|01882|coverage|INFO|rev_reconfigure >> 0.0/sec 0.000/sec 0.0056/sec total: 34 >> > 2018-02-07T00:12:58.203Z|01883|coverage|INFO|rev_port_toggled >> 0.0/sec 0.000/sec 0.0011/sec total: 6 >> > 2018-02-07T00:12:58.203Z|01884|coverage|INFO|rev_flow_table >> 0.0/sec 0.000/sec 0.0078/sec total: 125 >> > 2018-02-07T00:12:58.203Z|01885|coverage|INFO|rev_mac_learning >> 0.0/sec 0.000/sec 0.0036/sec total: 13 >> > 2018-02-07T00:12:58.203Z|01886|coverage|INFO|xlate_actions >> 0.0/sec 0.000/sec 0.0111/sec total: 90 >> > 2018-02-07T00:12:58.203Z|01887|coverage|INFO|ccmap_expand >> 0.0/sec 0.000/sec 0.0050/sec total: 36 >> > 2018-02-07T00:12:58.203Z|01888|coverage|INFO|ccmap_shrink >> 0.0/sec 0.000/sec 0.0000/sec total: 49 >> > 2018-02-07T00:12:58.203Z|01889|coverage|INFO|cmap_expand >> 0.0/sec 0.000/sec 0.0217/sec total: 260 >> > 2018-02-07T00:12:58.203Z|01890|coverage|INFO|cmap_shrink >> 0.0/sec 0.000/sec 0.0000/sec total: 141 >> > 2018-02-07T00:12:58.203Z|01891|coverage|INFO|dpif_port_add >> 0.0/sec 0.000/sec 0.0022/sec total: 1021 >> > 2018-02-07T00:12:58.203Z|01892|coverage|INFO|dpif_port_del >> 0.0/sec 0.000/sec 0.0000/sec total: 40 >> > 2018-02-07T00:12:58.203Z|01893|coverage|INFO|dpif_flow_flush >> 0.0/sec 0.000/sec 0.0000/sec total: 2 >> > 2018-02-07T00:12:58.203Z|01894|coverage|INFO|dpif_flow_get >> 0.0/sec 0.000/sec 0.0000/sec total: 20 >> > 2018-02-07T00:12:58.203Z|01895|coverage|INFO|dpif_flow_put >> 0.0/sec 0.000/sec 0.0000/sec total: 21 >> > 2018-02-07T00:12:58.203Z|01896|coverage|INFO|dpif_flow_del >> 0.0/sec 0.000/sec 0.0000/sec total: 20 >> > 2018-02-07T00:12:58.203Z|01897|coverage|INFO|dpif_execute >> 0.0/sec 0.000/sec 0.0100/sec total: 71 >> > 2018-02-07T00:12:58.203Z|01898|coverage|INFO|flow_extract >> 0.0/sec 0.000/sec 0.0000/sec total: 4 >> > 2018-02-07T00:12:58.203Z|01899|coverage|INFO|miniflow_malloc >> 0.0/sec 0.000/sec 0.4361/sec total: 6489 >> > 2018-02-07T00:12:58.203Z|01900|coverage|INFO|hindex_pathological >> 0.0/sec 0.000/sec 0.0000/sec total: 17 >> > 2018-02-07T00:12:58.203Z|01901|coverage|INFO|hindex_expand >> 0.0/sec 0.000/sec 0.0000/sec total: 9 >> > 2018-02-07T00:12:58.203Z|01902|coverage|INFO|hmap_pathological >> 0.0/sec 0.000/sec 0.8131/sec total: 13945 >> > 2018-02-07T00:12:58.203Z|01903|coverage|INFO|hmap_expand >> 21.0/sec 18.333/sec 11253.0167/sec total: 59998044 >> > 2018-02-07T00:12:58.203Z|01904|coverage|INFO|hmap_shrink >> 0.0/sec 0.000/sec 0.0022/sec total: 1030 >> > 2018-02-07T00:12:58.203Z|01905|coverage|INFO|mac_learning_learned >> 0.0/sec 0.000/sec 0.0022/sec total: 18 >> > 2018-02-07T00:12:58.203Z|01906|coverage|INFO|mac_learning_expired >> 0.0/sec 0.000/sec 0.0017/sec total: 6 >> > 2018-02-07T00:12:58.203Z|01907|coverage|INFO|netdev_sent >> 0.0/sec 0.000/sec 0.0400/sec total: 350 >> > 2018-02-07T00:12:58.203Z|01908|coverage|INFO|netdev_get_stats >> 203.4/sec 203.400/sec 198.4978/sec total: 799541 >> > 2018-02-07T00:12:58.203Z|01909|coverage|INFO|txn_unchanged >> 0.2/sec 0.200/sec 0.1911/sec total: 777 >> > 2018-02-07T00:12:58.203Z|01910|coverage|INFO|txn_incomplete >> 0.0/sec 0.000/sec 0.0167/sec total: 110 >> > 2018-02-07T00:12:58.203Z|01911|coverage|INFO|txn_success >> 0.0/sec 0.000/sec 0.0111/sec total: 50 >> > 2018-02-07T00:12:58.203Z|01912|coverage|INFO|poll_create_node >> 162.6/sec 157.267/sec 156.3664/sec total: 636974 >> > 2018-02-07T00:12:58.203Z|01913|coverage|INFO|poll_zero_timeout >> 0.2/sec 0.250/sec 0.4564/sec total: 1939 >> > 2018-02-07T00:12:58.203Z|01914|coverage|INFO|rconn_queued >> 0.2/sec 0.033/sec 0.0453/sec total: 1217 >> > 2018-02-07T00:12:58.203Z|01915|coverage|INFO|rconn_sent >> 0.2/sec 0.033/sec 0.0453/sec total: 1217 >> > 2018-02-07T00:12:58.203Z|01916|coverage|INFO|seq_change >> 1098.8/sec 1099.167/sec 1097.4869/sec total: 4474605 >> > 2018-02-07T00:12:58.203Z|01917|coverage|INFO|pstream_open >> 0.0/sec 0.000/sec 0.0000/sec total: 5 >> > 2018-02-07T00:12:58.203Z|01918|coverage|INFO|stream_open >> 0.0/sec 0.000/sec 0.0000/sec total: 1 >> > 2018-02-07T00:12:58.203Z|01919|coverage|INFO|util_xalloc >> 7223.8/sec 6527.317/sec 51569.9900/sec total: 268435993 >> > 2018-02-07T00:12:58.203Z|01920|coverage|INFO|vconn_received >> 0.2/sec 0.033/sec 0.4858/sec total: 6735 >> > 2018-02-07T00:12:58.203Z|01921|coverage|INFO|vconn_sent >> 0.2/sec 0.033/sec 0.0453/sec total: 1219 >> > 2018-02-07T00:12:58.203Z|01922|coverage|INFO|netlink_received >> 0.0/sec 0.000/sec 0.0000/sec total: 3 >> > 2018-02-07T00:12:58.203Z|01923|coverage|INFO|netlink_sent >> 0.0/sec 0.000/sec 0.0000/sec total: 1 >> > 2018-02-07T00:12:58.203Z|01924|coverage|INFO|57 events never hit >> > 2018-02-07T00:12:58.221Z|01925|bridge|INFO|bridge br-int: added >> interface patch-br-int-to-provnet-lswitch_bc65cd_Y7sqEw on port 1030 >> > 2018-02-07T00:12:58.221Z|01926|bridge|INFO|bridge br0: added >> interface patch-provnet-lswitch_bc65cd_Y7sqEw-to-br-int on port 15 >> > 2018-02-07T00:12:59.313Z|01927|timeval|WARN|Unreasonably long 1111ms >> poll interval (1092ms user, 0ms system) >> > 2018-02-07T00:12:59.313Z|01928|timeval|WARN|faults: 45 minor, 0 major >> > 2018-02-07T00:12:59.313Z|01929|timeval|WARN|context switches: 6 >> voluntary, 59 involuntary >> > 2018-02-07T00:12:59.313Z|01930|coverage|INFO|Skipping details of >> duplicate event coverage for hash=668121b0 >> > 2018-02-07T00:13:00.430Z|01931|timeval|WARN|Unreasonably long 1117ms >> poll interval (1112ms user, 0ms system) >> > 2018-02-07T00:13:00.430Z|01932|timeval|WARN|faults: 34 minor, 0 major >> > 2018-02-07T00:13:00.430Z|01933|timeval|WARN|context switches: 2 >> voluntary, 11 involuntary >> > 2018-02-07T00:13:00.431Z|01934|coverage|INFO|Event coverage, avg rate >> over last: 5 seconds, last minute, last hour, hash=a612a6e0: >> > 2018-02-07T00:13:00.431Z|01935|coverage|INFO|bridge_reconfigure >> 0.8/sec 0.067/sec 0.0067/sec total: 33 >> > 2018-02-07T00:13:00.431Z|01936|coverage|INFO|ofproto_flush >> 0.0/sec 0.000/sec 0.0000/sec total: 1 >> > 2018-02-07T00:13:00.431Z|01937|coverage|INFO|ofproto_packet_out >> 0.2/sec 0.017/sec 0.0114/sec total: 92 >> > 2018-02-07T00:13:00.431Z|01938|coverage|INFO|ofproto_recv_openflow >> 32.2/sec 2.717/sec 0.5306/sec total: 6935 >> > 2018-02-07T00:13:00.431Z|01939|coverage|INFO|ofproto_update_port >> 808.8/sec 67.400/sec 6.7117/sec total: 32303 >> > 2018-02-07T00:13:00.431Z|01940|coverage|INFO|rev_reconfigure >> 0.8/sec 0.067/sec 0.0067/sec total: 36 >> > 2018-02-07T00:13:00.431Z|01941|coverage|INFO|rev_port_toggled >> 0.0/sec 0.000/sec 0.0011/sec total: 7 >> > 2018-02-07T00:13:00.431Z|01942|coverage|INFO|rev_flow_table >> 0.4/sec 0.033/sec 0.0083/sec total: 126 >> > 2018-02-07T00:13:00.431Z|01943|coverage|INFO|rev_mac_learning >> 0.0/sec 0.000/sec 0.0036/sec total: 13 >> > 2018-02-07T00:13:00.431Z|01944|coverage|INFO|xlate_actions >> 0.2/sec 0.017/sec 0.0114/sec total: 92 >> > 2018-02-07T00:13:00.431Z|01945|coverage|INFO|ccmap_expand >> 0.0/sec 0.000/sec 0.0050/sec total: 36 >> > 2018-02-07T00:13:00.431Z|01946|coverage|INFO|ccmap_shrink >> 0.0/sec 0.000/sec 0.0000/sec total: 49 >> > 2018-02-07T00:13:00.431Z|01947|coverage|INFO|cmap_expand >> 2.0/sec 0.167/sec 0.0244/sec total: 271 >> > 2018-02-07T00:13:00.431Z|01948|coverage|INFO|cmap_shrink >> 0.0/sec 0.000/sec 0.0000/sec total: 141 >> > 2018-02-07T00:13:00.431Z|01949|coverage|INFO|dpif_port_add >> 0.4/sec 0.033/sec 0.0028/sec total: 1021 >> > 2018-02-07T00:13:00.431Z|01950|coverage|INFO|dpif_port_del >> 0.0/sec 0.000/sec 0.0000/sec total: 40 >> > 2018-02-07T00:13:00.431Z|01951|coverage|INFO|dpif_flow_flush >> 0.0/sec 0.000/sec 0.0000/sec total: 2 >> > 2018-02-07T00:13:00.431Z|01952|coverage|INFO|dpif_flow_get >> 0.0/sec 0.000/sec 0.0000/sec total: 20 >> > 2018-02-07T00:13:00.431Z|01953|coverage|INFO|dpif_flow_put >> 0.0/sec 0.000/sec 0.0000/sec total: 21 >> > 2018-02-07T00:13:00.431Z|01954|coverage|INFO|dpif_flow_del >> 0.0/sec 0.000/sec 0.0000/sec total: 20 >> > 2018-02-07T00:13:00.431Z|01955|coverage|INFO|dpif_execute >> 0.0/sec 0.000/sec 0.0100/sec total: 72 >> > 2018-02-07T00:13:00.431Z|01956|coverage|INFO|flow_extract >> 0.0/sec 0.000/sec 0.0000/sec total: 4 >> > 2018-02-07T00:13:00.431Z|01957|coverage|INFO|miniflow_malloc >> 30.8/sec 2.567/sec 0.4789/sec total: 6689 >> > 2018-02-07T00:13:00.431Z|01958|coverage|INFO|hindex_pathological >> 0.0/sec 0.000/sec 0.0000/sec total: 17 >> > 2018-02-07T00:13:00.431Z|01959|coverage|INFO|hindex_expand >> 0.0/sec 0.000/sec 0.0000/sec total: 9 >> > 2018-02-07T00:13:00.431Z|01960|coverage|INFO|hmap_pathological >> 76.8/sec 6.400/sec 0.9197/sec total: 14149 >> > 2018-02-07T00:13:00.431Z|01961|coverage|INFO|hmap_expand >> 1637509.8/sec 136476.267/sec 13527.3358/sec total: 64098893 >> > 2018-02-07T00:13:00.431Z|01962|coverage|INFO|hmap_shrink >> 0.4/sec 0.033/sec 0.0028/sec total: 1030 >> > 2018-02-07T00:13:00.431Z|01963|coverage|INFO|mac_learning_learned >> 0.0/sec 0.000/sec 0.0022/sec total: 19 >> > 2018-02-07T00:13:00.431Z|01964|coverage|INFO|mac_learning_expired >> 0.0/sec 0.000/sec 0.0017/sec total: 6 >> > 2018-02-07T00:13:00.431Z|01965|coverage|INFO|netdev_sent >> 0.0/sec 0.000/sec 0.0400/sec total: 359 >> > 2018-02-07T00:13:00.431Z|01966|coverage|INFO|netdev_get_stats >> 204.4/sec 203.483/sec 198.7817/sec total: 799543 >> > 2018-02-07T00:13:00.431Z|01967|coverage|INFO|txn_unchanged >> 0.2/sec 0.200/sec 0.1914/sec total: 777 >> > 2018-02-07T00:13:00.431Z|01968|coverage|INFO|txn_incomplete >> 1.4/sec 0.117/sec 0.0186/sec total: 113 >> > 2018-02-07T00:13:00.431Z|01969|coverage|INFO|txn_success >> 0.6/sec 0.050/sec 0.0119/sec total: 51 >> > 2018-02-07T00:13:00.431Z|01970|coverage|INFO|poll_create_node >> 241.8/sec 164.833/sec 156.7022/sec total: 637792 >> > 2018-02-07T00:13:00.431Z|01971|coverage|INFO|poll_zero_timeout >> 1.8/sec 0.367/sec 0.4589/sec total: 1944 >> > 2018-02-07T00:13:00.431Z|01972|coverage|INFO|rconn_queued >> 1.2/sec 0.133/sec 0.0469/sec total: 1219 >> > 2018-02-07T00:13:00.431Z|01973|coverage|INFO|rconn_sent >> 1.2/sec 0.133/sec 0.0469/sec total: 1219 >> > 2018-02-07T00:13:00.431Z|01974|coverage|INFO|seq_change >> 2679.2/sec 1230.183/sec 1101.2081/sec total: 4481090 >> > 2018-02-07T00:13:00.431Z|01975|coverage|INFO|pstream_open >> 0.0/sec 0.000/sec 0.0000/sec total: 5 >> > 2018-02-07T00:13:00.431Z|01976|coverage|INFO|stream_open >> 0.0/sec 0.000/sec 0.0000/sec total: 1 >> > 2018-02-07T00:13:00.431Z|01977|coverage|INFO|util_xalloc >> 6573732.0/sec 553875.117/sec 60700.1733/sec total: 284890464 >> > 2018-02-07T00:13:00.431Z|01978|coverage|INFO|vconn_received >> 32.2/sec 2.717/sec 0.5306/sec total: 6937 >> > 2018-02-07T00:13:00.431Z|01979|coverage|INFO|vconn_sent >> 1.2/sec 0.133/sec 0.0469/sec total: 1221 >> > 2018-02-07T00:13:00.431Z|01980|coverage|INFO|netlink_received >> 0.0/sec 0.000/sec 0.0000/sec total: 3 >> > 2018-02-07T00:13:00.431Z|01981|coverage|INFO|netlink_sent >> 0.0/sec 0.000/sec 0.0000/sec total: 1 >> > 2018-02-07T00:13:00.431Z|01982|coverage|INFO|57 events never hit >> > 2018-02-07T00:13:00.431Z|01983|poll_loop|INFO|Dropped 6 log messages >> in last 217 seconds (most recently, 216 seconds ago) due to excessive rate >> > 2018-02-07T00:13:00.431Z|01984|poll_loop|INFO|wakeup due to [POLLIN] >> on fd 112 (/root/sandbox-192.168.64.0/br-int.mgmt<->) at >> ../lib/stream-fd.c:157 (90% CPU usage) >> > >> > I managed to capture a perf record when a ovs-vswitchd is spiking, not >> sure if it helps: >> > Samples: 6K of event 'cycles:pp', Event count (approx.): >> 118132588562209 >> > Children Self Command Shared Object Symbol >> ◆ >> > + 66.01% 0.00% ovs-vswitchd ovs-vswitchd [.] _start >> ▒ >> > + 66.01% 0.00% ovs-vswitchd libc-2.23.so <http://libc-2.23.so> >> [.] __libc_start_main >> ▒ >> > + 66.01% 0.00% ovs-vswitchd ovs-vswitchd [.] main >> ▒ >> > + 66.01% 0.00% ovs-vswitchd ovs-vswitchd [.] >> bridge_run >> ▒ >> > + 66.01% 0.00% ovs-vswitchd ovs-vswitchd [.] >> bridge_run__ >> ▒ >> > + 41.88% 0.00% ovs-vswitchd ovs-vswitchd [.] >> ofproto_type_run >> ▒ >> > + 41.88% 0.00% ovs-vswitchd ovs-vswitchd [.] type_run >> ▒ >> > + 24.13% 0.00% ovs-vswitchd ovs-vswitchd [.] >> bridge_reconfigure >> ▒ >> > + 24.13% 0.00% ovs-vswitchd ovs-vswitchd [.] >> ofproto_run >> ▒ >> > + 24.13% 0.00% ovs-vswitchd ovs-vswitchd [.] >> update_port >> ▒ >> > + 23.31% 0.00% ovs-vswitchd ovs-vswitchd [.] >> update_mtu >> ▒ >> > + 23.31% 0.00% ovs-vswitchd ovs-vswitchd [.] >> update_mtu_ofproto >> ▒ >> > + 23.31% 0.00% ovs-vswitchd ovs-vswitchd [.] >> ofport_is_mtu_overridden >> ▒ >> > + 23.27% 0.00% ovs-vswitchd ovs-vswitchd [.] >> xlate_txn_start >> ▒ >> > + 23.27% 0.00% ovs-vswitchd ovs-vswitchd [.] >> xlate_xport_copy >> ▒ >> > + 23.27% 0.00% ovs-vswitchd libjemalloc.so.1 [.] calloc >> ▒ >> > + 23.27% 0.00% ovs-vswitchd ovs-vswitchd [.] xcalloc >> ▒ >> > + 23.27% 23.27% ovs-vswitchd libc-2.23.so <http://libc-2.23.so> >> [.] 0x000000000008f294 >> ▒ >> > + 22.38% 0.00% ovs-vswitchd ovs-vswitchd [.] >> ofproto_class_find__ >> ▒ >> > + 22.38% 0.00% ovs-vswitchd ovs-vswitchd [.] >> ofproto_port_open_type >> ▒ >> > + 20.96% 0.00% ovs-vswitchd ovs-vswitchd [.] >> dp_enumerate_types >> ▒ >> > + 20.14% 20.14% ovs-vswitchd libjemalloc.so.1 [.] malloc >> ▒ >> > + 19.43% 0.81% ovs-vswitchd ovs-vswitchd [.] >> ovs_mutex_lock_at >> ▒ >> > + 19.33% 0.00% ovs-vswitchd ovs-vswitchd [.] >> sset_add__ >> ▒ >> > + 19.33% 0.00% ovs-vswitchd ovs-vswitchd [.] xmalloc >> ▒ >> > + 18.62% 18.62% ovs-vswitchd libpthread-2.23.so < >> http://libpthread-2.23.so> [.] pthread_mutex_lock >> ▒ >> > + 18.61% 0.00% ovs-vswitchd ovs-vswitchd [.] >> dpif_netdev_run >> ▒ >> > + 18.61% 0.00% ovs-vswitchd ovs-vswitchd [.] >> dp_netdev_process_rxq_port >> ▒ >> > + 18.61% 0.00% ovs-vswitchd ovs-vswitchd [.] >> netdev_rxq_recv >> ▒ >> > + 18.61% 0.00% ovs-vswitchd ovs-vswitchd [.] >> netdev_dummy_rxq_recv >> ▒ >> > + 13.03% 0.00% revalidator66 [kernel.kallsyms] [k] >> entry_SYSCALL_64_fastpath >> ▒ >> > + 13.03% 0.00% revalidator66 libc-2.23.so <http://libc-2.23.so> >> [.] __clone >> ▒ >> > + 13.03% 0.00% revalidator66 libpthread-2.23.so < >> http://libpthread-2.23.so> [.] start_thread >> ▒ >> > >> > Thanks, >> > Han >> >> Interestingly, I located the change that slows down the port-binding: >> === >> commit 2d4beba14ae1075a7baf34fa57424edaf4eb7697 >> Author: Numan Siddique <nusid...@redhat.com <mailto:nusid...@redhat.com>> >> Date: Tue Sep 12 14:22:03 2017 +0530 >> >> ofproto: Include patch ports in mtu overriden check >> >> When a patch port is deleted from the bridge (with no other ports >> in the bridge) and if the bridge was set to an MTU by the user >> earlier, the >> MTU of the bridge is overriden to 1500. Please see the below link >> for the >> steps to reproduce the issue. >> >> This patch fixes this issue. >> >> Reported-at: https://mail.openvswitch.org/p >> ipermail/ovs-dev/2017-September/338665.html >> Signed-off-by: Numan Siddique <nusid...@redhat.com <mailto: >> nusid...@redhat.com>> >> Signed-off-by: Ben Pfaff <b...@ovn.org <mailto:b...@ovn.org>> >> === >> >> It perfectly matches the perf output shared above when ovs-vswitchd is >> running hot. >> > + 24.13% 0.00% ovs-vswitchd ovs-vswitchd [.] >> ofproto_run ▒ >> > + 24.13% 0.00% ovs-vswitchd ovs-vswitchd [.] update_port >> ▒ >> > + 23.31% 0.00% ovs-vswitchd ovs-vswitchd [.] update_mtu >> ▒ >> > + 23.31% 0.00% ovs-vswitchd ovs-vswitchd [.] >> update_mtu_ofproto ▒ >> > + 23.31% 0.00% ovs-vswitchd ovs-vswitchd [.] >> ofport_is_mtu_overridden ▒ >> ... >> >> I will study more on the details of how it is triggered and look at how >> to optimize it. Any suggestions are welcome. >> cc Numan & Ben. >> >> Thanks, >> Han >> > > Hi Han, > > If you investigate the patch you referenced, it's fairly small. It adds an > extra strcmp() call and an extra call to ofproto_port_open_type(). It's > most likely the extra call to ofproto_port_open_type() is causing the extra > CPU usage. Most of the processing done by calling it a second time is > redundant. > > Patch one to optimize the behavior would be to factor out the common parts > of ofproto_port_open_type() so that it may be called once. Then call the > parts that differ for each port type. This should be simple. It would be > worth testing again with this patch alone to see how much of an impact this > makes. > > Patch two would be to start caching results of ofproto_port_open_type() > locally since the result is not likely to differ between repeated calls. > This would be a bit trickier to make 100% correct because you would need to > be able to reliably detect when your cached value is "dirty". This patch > would be worth pursuing if patch one does not do enough on its own. > > Mark! > > _______________________________________________ > discuss mailing list > disc...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss >
_______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss