On Tue, Jun 11, 2019 at 10:40 AM Daniel Alvarez Sanchez <dalva...@redhat.com> wrote: > > Hi Han, all, > > Lucas, Numan and I have been doing some 'scale' testing of OpenStack > using OVN and wanted to present some results and issues that we've > found with the Incremental Processing feature in ovn-controller. Below > is the scenario that we executed: > > * 7 baremetal nodes setup: 3 controllers (running > ovn-northd/ovsdb-servers in A/P with pacemaker) + 4 compute nodes. OVS > 2.10. > * The test consists on: > - Create openstack network (OVN LS), subnet and router > - Attach subnet to the router and set gw to the external network > - Create an OpenStack port and apply a Security Group (ACLs to allow > UDP, SSH and ICMP). > - Bind the port to one of the 4 compute nodes (randomly) by > attaching it to a network namespace. > - Wait for the port to be ACTIVE in Neutron ('up == True' in NB) > - Wait until the test can ping the port > * Running browbeat/rally with 16 simultaneous process to execute the > test above 150 times. > * When all the 150 'fake VMs' are created, browbeat will delete all > the OpenStack/OVN resources. > > We first tried with OVS/OVN 2.10 and pulled some results which showed > 100% success but ovn-controller is quite loaded (as expected) in all > the nodes especially during the deletion phase: > > - Compute node: https://imgur.com/a/tzxfrIR > - Controller node (ovn-northd and ovsdb-servers): https://imgur.com/a/8ffKKYF > > After conducting the tests above, we replaced ovn-controller in all 7 > nodes by the one with the current master branch (actually from last > week). We also replaced ovn-northd and ovsdb-servers but the > ovs-vswitchd has been left untouched (still on 2.10). The expected > results were to get less ovn-controller CPU usage and also better > times due to the Incremental Processing feature introduced recently. > However, the results don't look very good: > > - Compute node: https://imgur.com/a/wuq87F1 > - Controller node (ovn-northd and ovsdb-servers): https://imgur.com/a/99kiyDp > > One thing that we can tell from the ovs-vswitchd CPU consumption is > that it's much less in the Incremental Processing (IP) case which > apparently doesn't make much sense. This led us to think that perhaps > ovn-controller was not installing the necessary flows in the switch > and we confirmed this hypothesis by looking into the dataplane > results. Out of the 150 VMs, 10% of them were unreachable via ping > when using ovn-controller from master. > > @Han, others, do you have any ideas as of what could be happening > here? We'll be able to use this setup for a few more days so let me > know if you want us to pull some other data/traces, ... > > Some other interesting things: > On each of the compute nodes, (with an almost evenly distributed > number of logical ports bound to them), the max amount of logical > flows in br-int is ~90K (by the end of the test, right before deleting > the resources). > > It looks like with the IP version, ovn-controller leaks some memory: > https://imgur.com/a/trQrhWd > While with OVS 2.10, it remains pretty flat during the test: > https://imgur.com/a/KCkIT4O
Hi Daniel, Han, I just sent a small patch for the ovn-controller memory leak: https://patchwork.ozlabs.org/patch/1113758/ At least on my setup this is what valgrind was pointing at. Cheers, Dumitru > > Looking forward to hearing back :) > Daniel > > PS. Sorry for my previous email, I sent it by mistake without the subject > _______________________________________________ > discuss mailing list > disc...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss _______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss