On Tue, Jun 11, 2019 at 10:40 AM Daniel Alvarez Sanchez
<dalva...@redhat.com> wrote:
>
> Hi Han, all,
>
> Lucas, Numan and I have been doing some 'scale' testing of OpenStack
> using OVN and wanted to present some results and issues that we've
> found with the Incremental Processing feature in ovn-controller. Below
> is the scenario that we executed:
>
> * 7 baremetal nodes setup: 3 controllers (running
> ovn-northd/ovsdb-servers in A/P with pacemaker) + 4 compute nodes. OVS
> 2.10.
> * The test consists on:
>   - Create openstack network (OVN LS), subnet and router
>   - Attach subnet to the router and set gw to the external network
>   - Create an OpenStack port and apply a Security Group (ACLs to allow
> UDP, SSH and ICMP).
>   - Bind the port to one of the 4 compute nodes (randomly) by
> attaching it to a network namespace.
>   - Wait for the port to be ACTIVE in Neutron ('up == True' in NB)
>   - Wait until the test can ping the port
> * Running browbeat/rally with 16 simultaneous process to execute the
> test above 150 times.
> * When all the 150 'fake VMs' are created, browbeat will delete all
> the OpenStack/OVN resources.
>
> We first tried with OVS/OVN 2.10 and pulled some results which showed
> 100% success but ovn-controller is quite loaded (as expected) in all
> the nodes especially during the deletion phase:
>
> - Compute node: https://imgur.com/a/tzxfrIR
> - Controller node (ovn-northd and ovsdb-servers): https://imgur.com/a/8ffKKYF
>
> After conducting the tests above, we replaced ovn-controller in all 7
> nodes by the one with the current master branch (actually from last
> week). We also replaced ovn-northd and ovsdb-servers but the
> ovs-vswitchd has been left untouched (still on 2.10). The expected
> results were to get less ovn-controller CPU usage and also better
> times due to the Incremental Processing feature introduced recently.
> However, the results don't look very good:
>
> - Compute node: https://imgur.com/a/wuq87F1
> - Controller node (ovn-northd and ovsdb-servers): https://imgur.com/a/99kiyDp
>
> One thing that we can tell from the ovs-vswitchd CPU consumption is
> that it's much less in the Incremental Processing (IP) case which
> apparently doesn't make much sense. This led us to think that perhaps
> ovn-controller was not installing the necessary flows in the switch
> and we confirmed this hypothesis by looking into the dataplane
> results. Out of the 150 VMs, 10% of them were unreachable via ping
> when using ovn-controller from master.
>
> @Han, others, do you have any ideas as of what could be happening
> here? We'll be able to use this setup for a few more days so let me
> know if you want us to pull some other data/traces, ...
>
> Some other interesting things:
> On each of the compute nodes, (with an almost evenly distributed
> number of logical ports bound to them), the max amount of logical
> flows in br-int is ~90K (by the end of the test, right before deleting
> the resources).
>
> It looks like with the IP version, ovn-controller leaks some memory:
> https://imgur.com/a/trQrhWd
> While with OVS 2.10, it remains pretty flat during the test:
> https://imgur.com/a/KCkIT4O

Hi Daniel, Han,

I just sent a small patch for the ovn-controller memory leak:
https://patchwork.ozlabs.org/patch/1113758/

At least on my setup this is what valgrind was pointing at.

Cheers,
Dumitru

>
> Looking forward to hearing back :)
> Daniel
>
> PS. Sorry for my previous email, I sent it by mistake without the subject
> _______________________________________________
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to