Re: [ovs-discuss] [OVN] ovn-controller Incremental Processing scale testing
On Wed, Jul 24, 2019 at 8:38 AM Han Zhou wrote: > > > On Tue, Jul 23, 2019 at 7:41 AM Numan Siddique > wrote: > > > > > > > > On Mon, Jul 22, 2019 at 12:35 PM Daniel Alvarez Sanchez < > dalva...@redhat.com> wrote: > >> > >> Neat! Thanks folks :) > >> I'll try to get an OSP setup where we can patch this and re-run the > >> same tests than previous time to confirm but looks promising. > >> > >> On Fri, Jul 19, 2019 at 11:12 PM Han Zhou wrote: > >> > > >> > > >> > > >> > On Fri, Jul 19, 2019 at 12:37 PM Numan Siddique > wrote: > >> >> > >> >> > >> >> > >> >> On Fri, Jul 19, 2019 at 6:19 PM Numan Siddique > wrote: > >> >>> > >> >>> > >> >>> > >> >>> On Fri, Jul 19, 2019 at 6:28 AM Han Zhou wrote: > >> > >> > >> > >> On Tue, Jul 9, 2019 at 12:13 AM Numan Siddique < > nusid...@redhat.com> wrote: > >> > > >> > > >> > > >> > On Tue, Jul 9, 2019 at 12:25 PM Daniel Alvarez Sanchez < > dalva...@redhat.com> wrote: > >> >> > >> >> Thanks Numan for running these tests outside OpenStack! > >> >> > >> >> On Tue, Jul 9, 2019 at 7:50 AM Numan Siddique < > nusid...@redhat.com> wrote: > >> >> > > >> >> > > >> >> > > >> >> > On Tue, Jul 9, 2019 at 11:05 AM Han Zhou > wrote: > >> >> >> > >> >> >> > >> >> >> > >> >> >> On Fri, Jun 21, 2019 at 12:31 AM Han Zhou > wrote: > >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> > On Thu, Jun 20, 2019 at 11:42 PM Numan Siddique < > nusid...@redhat.com> wrote: > >> >> >> > > > >> >> >> > > > >> >> >> > > > >> >> >> > > On Fri, Jun 21, 2019, 11:47 AM Han Zhou < > zhou...@gmail.com> wrote: > >> >> >> > >> > >> >> >> > >> > >> >> >> > >> > >> >> >> > >> On Tue, Jun 11, 2019 at 9:16 AM Daniel Alvarez Sanchez < > dalva...@redhat.com> wrote: > >> >> >> > >> > > >> >> >> > >> > Thanks a lot Han for the answer! > >> >> >> > >> > > >> >> >> > >> > On Tue, Jun 11, 2019 at 5:57 PM Han Zhou < > zhou...@gmail.com> wrote: > >> >> >> > >> > > > >> >> >> > >> > > > >> >> >> > >> > > > >> >> >> > >> > > > >> >> >> > >> > > On Tue, Jun 11, 2019 at 5:12 AM Dumitru Ceara < > dce...@redhat.com> wrote: > >> >> >> > >> > > > > >> >> >> > >> > > > On Tue, Jun 11, 2019 at 10:40 AM Daniel Alvarez > Sanchez > >> >> >> > >> > > > wrote: > >> >> >> > >> > > > > > >> >> >> > >> > > > > Hi Han, all, > >> >> >> > >> > > > > > >> >> >> > >> > > > > Lucas, Numan and I have been doing some 'scale' > testing of OpenStack > >> >> >> > >> > > > > using OVN and wanted to present some results > and issues that we've > >> >> >> > >> > > > > found with the Incremental Processing feature > in ovn-controller. Below > >> >> >> > >> > > > > is the scenario that we executed: > >> >> >> > >> > > > > > >> >> >> > >> > > > > * 7 baremetal nodes setup: 3 controllers > (running > >> >> >> > >> > > > > ovn-northd/ovsdb-servers in A/P with pacemaker) > + 4 compute nodes. OVS > >> >> >> > >> > > > > 2.10. > >> >> >> > >> > > > > * The test consists on: > >> >> >> > >> > > > > - Create openstack network (OVN LS), subnet > and router > >> >> >> > >> > > > > - Attach subnet to the router and set gw to > the external network > >> >> >> > >> > > > > - Create an OpenStack port and apply a > Security Group (ACLs to allow > >> >> >> > >> > > > > UDP, SSH and ICMP). > >> >> >> > >> > > > > - Bind the port to one of the 4 compute nodes > (randomly) by > >> >> >> > >> > > > > attaching it to a network namespace. > >> >> >> > >> > > > > - Wait for the port to be ACTIVE in Neutron > ('up == True' in NB) > >> >> >> > >> > > > > - Wait until the test can ping the port > >> >> >> > >> > > > > * Running browbeat/rally with 16 simultaneous > process to execute the > >> >> >> > >> > > > > test above 150 times. > >> >> >> > >> > > > > * When all the 150 'fake VMs' are created, > browbeat will delete all > >> >> >> > >> > > > > the OpenStack/OVN resources. > >> >> >> > >> > > > > > >> >> >> > >> > > > > We first tried with OVS/OVN 2.10 and pulled > some results which showed > >> >> >> > >> > > > > 100% success but ovn-controller is quite loaded > (as expected) in all > >> >> >> > >> > > > > the nodes especially during the deletion phase: > >> >> >> > >> > > > > > >> >> >> > >> > > > > - Compute node: https://imgur.com/a/tzxfrIR > >> >> >> > >> > > > > - Controller node (ovn-northd and > ovsdb-servers): https://imgur.com/a/8ffKKYF > >> >> >> > >> > > > > > >> >> >> > >> > > > > After conducting the tests above, we replaced > ovn-controller in all 7 > >> >> >> > >> > > > > nodes by the one with the current master branch > (actually from last > >> >> >> > >> > > > > week). We also replaced ovn-northd and > ovsdb-servers but the > >> >> >> > >> > > > >
Re: [ovs-discuss] [OVN] ovn-controller Incremental Processing scale testing
On Tue, Jul 23, 2019 at 7:41 AM Numan Siddique wrote: > > > > On Mon, Jul 22, 2019 at 12:35 PM Daniel Alvarez Sanchez < dalva...@redhat.com> wrote: >> >> Neat! Thanks folks :) >> I'll try to get an OSP setup where we can patch this and re-run the >> same tests than previous time to confirm but looks promising. >> >> On Fri, Jul 19, 2019 at 11:12 PM Han Zhou wrote: >> > >> > >> > >> > On Fri, Jul 19, 2019 at 12:37 PM Numan Siddique wrote: >> >> >> >> >> >> >> >> On Fri, Jul 19, 2019 at 6:19 PM Numan Siddique wrote: >> >>> >> >>> >> >>> >> >>> On Fri, Jul 19, 2019 at 6:28 AM Han Zhou wrote: >> >> >> >> On Tue, Jul 9, 2019 at 12:13 AM Numan Siddique wrote: >> > >> > >> > >> > On Tue, Jul 9, 2019 at 12:25 PM Daniel Alvarez Sanchez < dalva...@redhat.com> wrote: >> >> >> >> Thanks Numan for running these tests outside OpenStack! >> >> >> >> On Tue, Jul 9, 2019 at 7:50 AM Numan Siddique < nusid...@redhat.com> wrote: >> >> > >> >> > >> >> > >> >> > On Tue, Jul 9, 2019 at 11:05 AM Han Zhou wrote: >> >> >> >> >> >> >> >> >> >> >> >> On Fri, Jun 21, 2019 at 12:31 AM Han Zhou wrote: >> >> >> > >> >> >> > >> >> >> > >> >> >> > On Thu, Jun 20, 2019 at 11:42 PM Numan Siddique < nusid...@redhat.com> wrote: >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> > > On Fri, Jun 21, 2019, 11:47 AM Han Zhou wrote: >> >> >> > >> >> >> >> > >> >> >> >> > >> >> >> >> > >> On Tue, Jun 11, 2019 at 9:16 AM Daniel Alvarez Sanchez < dalva...@redhat.com> wrote: >> >> >> > >> > >> >> >> > >> > Thanks a lot Han for the answer! >> >> >> > >> > >> >> >> > >> > On Tue, Jun 11, 2019 at 5:57 PM Han Zhou < zhou...@gmail.com> wrote: >> >> >> > >> > > >> >> >> > >> > > >> >> >> > >> > > >> >> >> > >> > > >> >> >> > >> > > On Tue, Jun 11, 2019 at 5:12 AM Dumitru Ceara < dce...@redhat.com> wrote: >> >> >> > >> > > > >> >> >> > >> > > > On Tue, Jun 11, 2019 at 10:40 AM Daniel Alvarez Sanchez >> >> >> > >> > > > wrote: >> >> >> > >> > > > > >> >> >> > >> > > > > Hi Han, all, >> >> >> > >> > > > > >> >> >> > >> > > > > Lucas, Numan and I have been doing some 'scale' testing of OpenStack >> >> >> > >> > > > > using OVN and wanted to present some results and issues that we've >> >> >> > >> > > > > found with the Incremental Processing feature in ovn-controller. Below >> >> >> > >> > > > > is the scenario that we executed: >> >> >> > >> > > > > >> >> >> > >> > > > > * 7 baremetal nodes setup: 3 controllers (running >> >> >> > >> > > > > ovn-northd/ovsdb-servers in A/P with pacemaker) + 4 compute nodes. OVS >> >> >> > >> > > > > 2.10. >> >> >> > >> > > > > * The test consists on: >> >> >> > >> > > > > - Create openstack network (OVN LS), subnet and router >> >> >> > >> > > > > - Attach subnet to the router and set gw to the external network >> >> >> > >> > > > > - Create an OpenStack port and apply a Security Group (ACLs to allow >> >> >> > >> > > > > UDP, SSH and ICMP). >> >> >> > >> > > > > - Bind the port to one of the 4 compute nodes (randomly) by >> >> >> > >> > > > > attaching it to a network namespace. >> >> >> > >> > > > > - Wait for the port to be ACTIVE in Neutron ('up == True' in NB) >> >> >> > >> > > > > - Wait until the test can ping the port >> >> >> > >> > > > > * Running browbeat/rally with 16 simultaneous process to execute the >> >> >> > >> > > > > test above 150 times. >> >> >> > >> > > > > * When all the 150 'fake VMs' are created, browbeat will delete all >> >> >> > >> > > > > the OpenStack/OVN resources. >> >> >> > >> > > > > >> >> >> > >> > > > > We first tried with OVS/OVN 2.10 and pulled some results which showed >> >> >> > >> > > > > 100% success but ovn-controller is quite loaded (as expected) in all >> >> >> > >> > > > > the nodes especially during the deletion phase: >> >> >> > >> > > > > >> >> >> > >> > > > > - Compute node: https://imgur.com/a/tzxfrIR >> >> >> > >> > > > > - Controller node (ovn-northd and ovsdb-servers): https://imgur.com/a/8ffKKYF >> >> >> > >> > > > > >> >> >> > >> > > > > After conducting the tests above, we replaced ovn-controller in all 7 >> >> >> > >> > > > > nodes by the one with the current master branch (actually from last >> >> >> > >> > > > > week). We also replaced ovn-northd and ovsdb-servers but the >> >> >> > >> > > > > ovs-vswitchd has been left untouched (still on 2.10). The expected >> >> >> > >> > > > > results were to get less ovn-controller CPU usage and also better >> >> >> > >> > > > > times due to the Incremental Processing feature introduced recently. >> >> >> > >> > > > > However, the results don't look very good: >> >> >> > >> > > > > >> >> >> >
Re: [ovs-discuss] [OVN] ovn-controller Incremental Processing scale testing
On Mon, Jul 22, 2019 at 12:35 PM Daniel Alvarez Sanchez wrote: > Neat! Thanks folks :) > I'll try to get an OSP setup where we can patch this and re-run the > same tests than previous time to confirm but looks promising. > > On Fri, Jul 19, 2019 at 11:12 PM Han Zhou wrote: > > > > > > > > On Fri, Jul 19, 2019 at 12:37 PM Numan Siddique > wrote: > >> > >> > >> > >> On Fri, Jul 19, 2019 at 6:19 PM Numan Siddique > wrote: > >>> > >>> > >>> > >>> On Fri, Jul 19, 2019 at 6:28 AM Han Zhou wrote: > > > > On Tue, Jul 9, 2019 at 12:13 AM Numan Siddique > wrote: > > > > > > > > On Tue, Jul 9, 2019 at 12:25 PM Daniel Alvarez Sanchez < > dalva...@redhat.com> wrote: > >> > >> Thanks Numan for running these tests outside OpenStack! > >> > >> On Tue, Jul 9, 2019 at 7:50 AM Numan Siddique > wrote: > >> > > >> > > >> > > >> > On Tue, Jul 9, 2019 at 11:05 AM Han Zhou > wrote: > >> >> > >> >> > >> >> > >> >> On Fri, Jun 21, 2019 at 12:31 AM Han Zhou > wrote: > >> >> > > >> >> > > >> >> > > >> >> > On Thu, Jun 20, 2019 at 11:42 PM Numan Siddique < > nusid...@redhat.com> wrote: > >> >> > > > >> >> > > > >> >> > > > >> >> > > On Fri, Jun 21, 2019, 11:47 AM Han Zhou > wrote: > >> >> > >> > >> >> > >> > >> >> > >> > >> >> > >> On Tue, Jun 11, 2019 at 9:16 AM Daniel Alvarez Sanchez < > dalva...@redhat.com> wrote: > >> >> > >> > > >> >> > >> > Thanks a lot Han for the answer! > >> >> > >> > > >> >> > >> > On Tue, Jun 11, 2019 at 5:57 PM Han Zhou < > zhou...@gmail.com> wrote: > >> >> > >> > > > >> >> > >> > > > >> >> > >> > > > >> >> > >> > > > >> >> > >> > > On Tue, Jun 11, 2019 at 5:12 AM Dumitru Ceara < > dce...@redhat.com> wrote: > >> >> > >> > > > > >> >> > >> > > > On Tue, Jun 11, 2019 at 10:40 AM Daniel Alvarez > Sanchez > >> >> > >> > > > wrote: > >> >> > >> > > > > > >> >> > >> > > > > Hi Han, all, > >> >> > >> > > > > > >> >> > >> > > > > Lucas, Numan and I have been doing some 'scale' > testing of OpenStack > >> >> > >> > > > > using OVN and wanted to present some results and > issues that we've > >> >> > >> > > > > found with the Incremental Processing feature in > ovn-controller. Below > >> >> > >> > > > > is the scenario that we executed: > >> >> > >> > > > > > >> >> > >> > > > > * 7 baremetal nodes setup: 3 controllers (running > >> >> > >> > > > > ovn-northd/ovsdb-servers in A/P with pacemaker) + > 4 compute nodes. OVS > >> >> > >> > > > > 2.10. > >> >> > >> > > > > * The test consists on: > >> >> > >> > > > > - Create openstack network (OVN LS), subnet and > router > >> >> > >> > > > > - Attach subnet to the router and set gw to the > external network > >> >> > >> > > > > - Create an OpenStack port and apply a Security > Group (ACLs to allow > >> >> > >> > > > > UDP, SSH and ICMP). > >> >> > >> > > > > - Bind the port to one of the 4 compute nodes > (randomly) by > >> >> > >> > > > > attaching it to a network namespace. > >> >> > >> > > > > - Wait for the port to be ACTIVE in Neutron ('up > == True' in NB) > >> >> > >> > > > > - Wait until the test can ping the port > >> >> > >> > > > > * Running browbeat/rally with 16 simultaneous > process to execute the > >> >> > >> > > > > test above 150 times. > >> >> > >> > > > > * When all the 150 'fake VMs' are created, > browbeat will delete all > >> >> > >> > > > > the OpenStack/OVN resources. > >> >> > >> > > > > > >> >> > >> > > > > We first tried with OVS/OVN 2.10 and pulled some > results which showed > >> >> > >> > > > > 100% success but ovn-controller is quite loaded > (as expected) in all > >> >> > >> > > > > the nodes especially during the deletion phase: > >> >> > >> > > > > > >> >> > >> > > > > - Compute node: https://imgur.com/a/tzxfrIR > >> >> > >> > > > > - Controller node (ovn-northd and ovsdb-servers): > https://imgur.com/a/8ffKKYF > >> >> > >> > > > > > >> >> > >> > > > > After conducting the tests above, we replaced > ovn-controller in all 7 > >> >> > >> > > > > nodes by the one with the current master branch > (actually from last > >> >> > >> > > > > week). We also replaced ovn-northd and > ovsdb-servers but the > >> >> > >> > > > > ovs-vswitchd has been left untouched (still on > 2.10). The expected > >> >> > >> > > > > results were to get less ovn-controller CPU usage > and also better > >> >> > >> > > > > times due to the Incremental Processing feature > introduced recently. > >> >> > >> > > > > However, the results don't look very good: > >> >> > >> > > > > > >> >> > >> > > > > - Compute node: https://imgur.com/a/wuq87F1 > >> >> > >> > > > > - Controller node (ovn-northd and ovsdb-servers): >
Re: [ovs-discuss] [OVN] ovn-controller Incremental Processing scale testing
Neat! Thanks folks :) I'll try to get an OSP setup where we can patch this and re-run the same tests than previous time to confirm but looks promising. On Fri, Jul 19, 2019 at 11:12 PM Han Zhou wrote: > > > > On Fri, Jul 19, 2019 at 12:37 PM Numan Siddique wrote: >> >> >> >> On Fri, Jul 19, 2019 at 6:19 PM Numan Siddique wrote: >>> >>> >>> >>> On Fri, Jul 19, 2019 at 6:28 AM Han Zhou wrote: On Tue, Jul 9, 2019 at 12:13 AM Numan Siddique wrote: > > > > On Tue, Jul 9, 2019 at 12:25 PM Daniel Alvarez Sanchez > wrote: >> >> Thanks Numan for running these tests outside OpenStack! >> >> On Tue, Jul 9, 2019 at 7:50 AM Numan Siddique >> wrote: >> > >> > >> > >> > On Tue, Jul 9, 2019 at 11:05 AM Han Zhou wrote: >> >> >> >> >> >> >> >> On Fri, Jun 21, 2019 at 12:31 AM Han Zhou wrote: >> >> > >> >> > >> >> > >> >> > On Thu, Jun 20, 2019 at 11:42 PM Numan Siddique >> >> > wrote: >> >> > > >> >> > > >> >> > > >> >> > > On Fri, Jun 21, 2019, 11:47 AM Han Zhou >> >> > > wrote: >> >> > >> >> >> > >> >> >> > >> >> >> > >> On Tue, Jun 11, 2019 at 9:16 AM Daniel Alvarez Sanchez >> >> > >> wrote: >> >> > >> > >> >> > >> > Thanks a lot Han for the answer! >> >> > >> > >> >> > >> > On Tue, Jun 11, 2019 at 5:57 PM Han Zhou >> >> > >> > wrote: >> >> > >> > > >> >> > >> > > >> >> > >> > > >> >> > >> > > >> >> > >> > > On Tue, Jun 11, 2019 at 5:12 AM Dumitru Ceara >> >> > >> > > wrote: >> >> > >> > > > >> >> > >> > > > On Tue, Jun 11, 2019 at 10:40 AM Daniel Alvarez Sanchez >> >> > >> > > > wrote: >> >> > >> > > > > >> >> > >> > > > > Hi Han, all, >> >> > >> > > > > >> >> > >> > > > > Lucas, Numan and I have been doing some 'scale' testing >> >> > >> > > > > of OpenStack >> >> > >> > > > > using OVN and wanted to present some results and issues >> >> > >> > > > > that we've >> >> > >> > > > > found with the Incremental Processing feature in >> >> > >> > > > > ovn-controller. Below >> >> > >> > > > > is the scenario that we executed: >> >> > >> > > > > >> >> > >> > > > > * 7 baremetal nodes setup: 3 controllers (running >> >> > >> > > > > ovn-northd/ovsdb-servers in A/P with pacemaker) + 4 >> >> > >> > > > > compute nodes. OVS >> >> > >> > > > > 2.10. >> >> > >> > > > > * The test consists on: >> >> > >> > > > > - Create openstack network (OVN LS), subnet and router >> >> > >> > > > > - Attach subnet to the router and set gw to the >> >> > >> > > > > external network >> >> > >> > > > > - Create an OpenStack port and apply a Security Group >> >> > >> > > > > (ACLs to allow >> >> > >> > > > > UDP, SSH and ICMP). >> >> > >> > > > > - Bind the port to one of the 4 compute nodes >> >> > >> > > > > (randomly) by >> >> > >> > > > > attaching it to a network namespace. >> >> > >> > > > > - Wait for the port to be ACTIVE in Neutron ('up == >> >> > >> > > > > True' in NB) >> >> > >> > > > > - Wait until the test can ping the port >> >> > >> > > > > * Running browbeat/rally with 16 simultaneous process >> >> > >> > > > > to execute the >> >> > >> > > > > test above 150 times. >> >> > >> > > > > * When all the 150 'fake VMs' are created, browbeat >> >> > >> > > > > will delete all >> >> > >> > > > > the OpenStack/OVN resources. >> >> > >> > > > > >> >> > >> > > > > We first tried with OVS/OVN 2.10 and pulled some >> >> > >> > > > > results which showed >> >> > >> > > > > 100% success but ovn-controller is quite loaded (as >> >> > >> > > > > expected) in all >> >> > >> > > > > the nodes especially during the deletion phase: >> >> > >> > > > > >> >> > >> > > > > - Compute node: https://imgur.com/a/tzxfrIR >> >> > >> > > > > - Controller node (ovn-northd and ovsdb-servers): >> >> > >> > > > > https://imgur.com/a/8ffKKYF >> >> > >> > > > > >> >> > >> > > > > After conducting the tests above, we replaced >> >> > >> > > > > ovn-controller in all 7 >> >> > >> > > > > nodes by the one with the current master branch >> >> > >> > > > > (actually from last >> >> > >> > > > > week). We also replaced ovn-northd and ovsdb-servers >> >> > >> > > > > but the >> >> > >> > > > > ovs-vswitchd has been left untouched (still on 2.10). >> >> > >> > > > > The expected >> >> > >> > > > > results were to get less ovn-controller CPU usage and >> >> > >> > > > > also better >> >> > >> > > > > times due to the Incremental Processing feature >> >> > >> > > > > introduced recently. >> >> > >> > > > > However, the results don't look very good: >> >> > >> > > > > >> >> > >> > >
Re: [ovs-discuss] [OVN] ovn-controller Incremental Processing scale testing
On Fri, Jul 19, 2019 at 12:37 PM Numan Siddique wrote: > > > On Fri, Jul 19, 2019 at 6:19 PM Numan Siddique > wrote: > >> >> >> On Fri, Jul 19, 2019 at 6:28 AM Han Zhou wrote: >> >>> >>> >>> On Tue, Jul 9, 2019 at 12:13 AM Numan Siddique >>> wrote: >>> > >>> > >>> > >>> > On Tue, Jul 9, 2019 at 12:25 PM Daniel Alvarez Sanchez < >>> dalva...@redhat.com> wrote: >>> >> >>> >> Thanks Numan for running these tests outside OpenStack! >>> >> >>> >> On Tue, Jul 9, 2019 at 7:50 AM Numan Siddique >>> wrote: >>> >> > >>> >> > >>> >> > >>> >> > On Tue, Jul 9, 2019 at 11:05 AM Han Zhou wrote: >>> >> >> >>> >> >> >>> >> >> >>> >> >> On Fri, Jun 21, 2019 at 12:31 AM Han Zhou >>> wrote: >>> >> >> > >>> >> >> > >>> >> >> > >>> >> >> > On Thu, Jun 20, 2019 at 11:42 PM Numan Siddique < >>> nusid...@redhat.com> wrote: >>> >> >> > > >>> >> >> > > >>> >> >> > > >>> >> >> > > On Fri, Jun 21, 2019, 11:47 AM Han Zhou >>> wrote: >>> >> >> > >> >>> >> >> > >> >>> >> >> > >> >>> >> >> > >> On Tue, Jun 11, 2019 at 9:16 AM Daniel Alvarez Sanchez < >>> dalva...@redhat.com> wrote: >>> >> >> > >> > >>> >> >> > >> > Thanks a lot Han for the answer! >>> >> >> > >> > >>> >> >> > >> > On Tue, Jun 11, 2019 at 5:57 PM Han Zhou >>> wrote: >>> >> >> > >> > > >>> >> >> > >> > > >>> >> >> > >> > > >>> >> >> > >> > > >>> >> >> > >> > > On Tue, Jun 11, 2019 at 5:12 AM Dumitru Ceara < >>> dce...@redhat.com> wrote: >>> >> >> > >> > > > >>> >> >> > >> > > > On Tue, Jun 11, 2019 at 10:40 AM Daniel Alvarez Sanchez >>> >> >> > >> > > > wrote: >>> >> >> > >> > > > > >>> >> >> > >> > > > > Hi Han, all, >>> >> >> > >> > > > > >>> >> >> > >> > > > > Lucas, Numan and I have been doing some 'scale' >>> testing of OpenStack >>> >> >> > >> > > > > using OVN and wanted to present some results and >>> issues that we've >>> >> >> > >> > > > > found with the Incremental Processing feature in >>> ovn-controller. Below >>> >> >> > >> > > > > is the scenario that we executed: >>> >> >> > >> > > > > >>> >> >> > >> > > > > * 7 baremetal nodes setup: 3 controllers (running >>> >> >> > >> > > > > ovn-northd/ovsdb-servers in A/P with pacemaker) + 4 >>> compute nodes. OVS >>> >> >> > >> > > > > 2.10. >>> >> >> > >> > > > > * The test consists on: >>> >> >> > >> > > > > - Create openstack network (OVN LS), subnet and >>> router >>> >> >> > >> > > > > - Attach subnet to the router and set gw to the >>> external network >>> >> >> > >> > > > > - Create an OpenStack port and apply a Security >>> Group (ACLs to allow >>> >> >> > >> > > > > UDP, SSH and ICMP). >>> >> >> > >> > > > > - Bind the port to one of the 4 compute nodes >>> (randomly) by >>> >> >> > >> > > > > attaching it to a network namespace. >>> >> >> > >> > > > > - Wait for the port to be ACTIVE in Neutron ('up == >>> True' in NB) >>> >> >> > >> > > > > - Wait until the test can ping the port >>> >> >> > >> > > > > * Running browbeat/rally with 16 simultaneous process >>> to execute the >>> >> >> > >> > > > > test above 150 times. >>> >> >> > >> > > > > * When all the 150 'fake VMs' are created, browbeat >>> will delete all >>> >> >> > >> > > > > the OpenStack/OVN resources. >>> >> >> > >> > > > > >>> >> >> > >> > > > > We first tried with OVS/OVN 2.10 and pulled some >>> results which showed >>> >> >> > >> > > > > 100% success but ovn-controller is quite loaded (as >>> expected) in all >>> >> >> > >> > > > > the nodes especially during the deletion phase: >>> >> >> > >> > > > > >>> >> >> > >> > > > > - Compute node: https://imgur.com/a/tzxfrIR >>> >> >> > >> > > > > - Controller node (ovn-northd and ovsdb-servers): >>> https://imgur.com/a/8ffKKYF >>> >> >> > >> > > > > >>> >> >> > >> > > > > After conducting the tests above, we replaced >>> ovn-controller in all 7 >>> >> >> > >> > > > > nodes by the one with the current master branch >>> (actually from last >>> >> >> > >> > > > > week). We also replaced ovn-northd and ovsdb-servers >>> but the >>> >> >> > >> > > > > ovs-vswitchd has been left untouched (still on 2.10). >>> The expected >>> >> >> > >> > > > > results were to get less ovn-controller CPU usage and >>> also better >>> >> >> > >> > > > > times due to the Incremental Processing feature >>> introduced recently. >>> >> >> > >> > > > > However, the results don't look very good: >>> >> >> > >> > > > > >>> >> >> > >> > > > > - Compute node: https://imgur.com/a/wuq87F1 >>> >> >> > >> > > > > - Controller node (ovn-northd and ovsdb-servers): >>> https://imgur.com/a/99kiyDp >>> >> >> > >> > > > > >>> >> >> > >> > > > > One thing that we can tell from the ovs-vswitchd CPU >>> consumption is >>> >> >> > >> > > > > that it's much less in the Incremental Processing >>> (IP) case which >>> >> >> > >> > > > > apparently doesn't make much sense. This led us to >>> think that perhaps >>> >> >> > >> > > > > ovn-controller was not installing the necessary flows >>> in the switch >>> >> >> > >> > > > > and we confirmed this hypothesis by looking into the >>> dataplane >>> >> >> > >> > > > > results. Out
Re: [ovs-discuss] [OVN] ovn-controller Incremental Processing scale testing
On Fri, Jul 19, 2019 at 6:19 PM Numan Siddique wrote: > > > On Fri, Jul 19, 2019 at 6:28 AM Han Zhou wrote: > >> >> >> On Tue, Jul 9, 2019 at 12:13 AM Numan Siddique >> wrote: >> > >> > >> > >> > On Tue, Jul 9, 2019 at 12:25 PM Daniel Alvarez Sanchez < >> dalva...@redhat.com> wrote: >> >> >> >> Thanks Numan for running these tests outside OpenStack! >> >> >> >> On Tue, Jul 9, 2019 at 7:50 AM Numan Siddique >> wrote: >> >> > >> >> > >> >> > >> >> > On Tue, Jul 9, 2019 at 11:05 AM Han Zhou wrote: >> >> >> >> >> >> >> >> >> >> >> >> On Fri, Jun 21, 2019 at 12:31 AM Han Zhou >> wrote: >> >> >> > >> >> >> > >> >> >> > >> >> >> > On Thu, Jun 20, 2019 at 11:42 PM Numan Siddique < >> nusid...@redhat.com> wrote: >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> > > On Fri, Jun 21, 2019, 11:47 AM Han Zhou >> wrote: >> >> >> > >> >> >> >> > >> >> >> >> > >> >> >> >> > >> On Tue, Jun 11, 2019 at 9:16 AM Daniel Alvarez Sanchez < >> dalva...@redhat.com> wrote: >> >> >> > >> > >> >> >> > >> > Thanks a lot Han for the answer! >> >> >> > >> > >> >> >> > >> > On Tue, Jun 11, 2019 at 5:57 PM Han Zhou >> wrote: >> >> >> > >> > > >> >> >> > >> > > >> >> >> > >> > > >> >> >> > >> > > >> >> >> > >> > > On Tue, Jun 11, 2019 at 5:12 AM Dumitru Ceara < >> dce...@redhat.com> wrote: >> >> >> > >> > > > >> >> >> > >> > > > On Tue, Jun 11, 2019 at 10:40 AM Daniel Alvarez Sanchez >> >> >> > >> > > > wrote: >> >> >> > >> > > > > >> >> >> > >> > > > > Hi Han, all, >> >> >> > >> > > > > >> >> >> > >> > > > > Lucas, Numan and I have been doing some 'scale' >> testing of OpenStack >> >> >> > >> > > > > using OVN and wanted to present some results and >> issues that we've >> >> >> > >> > > > > found with the Incremental Processing feature in >> ovn-controller. Below >> >> >> > >> > > > > is the scenario that we executed: >> >> >> > >> > > > > >> >> >> > >> > > > > * 7 baremetal nodes setup: 3 controllers (running >> >> >> > >> > > > > ovn-northd/ovsdb-servers in A/P with pacemaker) + 4 >> compute nodes. OVS >> >> >> > >> > > > > 2.10. >> >> >> > >> > > > > * The test consists on: >> >> >> > >> > > > > - Create openstack network (OVN LS), subnet and >> router >> >> >> > >> > > > > - Attach subnet to the router and set gw to the >> external network >> >> >> > >> > > > > - Create an OpenStack port and apply a Security >> Group (ACLs to allow >> >> >> > >> > > > > UDP, SSH and ICMP). >> >> >> > >> > > > > - Bind the port to one of the 4 compute nodes >> (randomly) by >> >> >> > >> > > > > attaching it to a network namespace. >> >> >> > >> > > > > - Wait for the port to be ACTIVE in Neutron ('up == >> True' in NB) >> >> >> > >> > > > > - Wait until the test can ping the port >> >> >> > >> > > > > * Running browbeat/rally with 16 simultaneous process >> to execute the >> >> >> > >> > > > > test above 150 times. >> >> >> > >> > > > > * When all the 150 'fake VMs' are created, browbeat >> will delete all >> >> >> > >> > > > > the OpenStack/OVN resources. >> >> >> > >> > > > > >> >> >> > >> > > > > We first tried with OVS/OVN 2.10 and pulled some >> results which showed >> >> >> > >> > > > > 100% success but ovn-controller is quite loaded (as >> expected) in all >> >> >> > >> > > > > the nodes especially during the deletion phase: >> >> >> > >> > > > > >> >> >> > >> > > > > - Compute node: https://imgur.com/a/tzxfrIR >> >> >> > >> > > > > - Controller node (ovn-northd and ovsdb-servers): >> https://imgur.com/a/8ffKKYF >> >> >> > >> > > > > >> >> >> > >> > > > > After conducting the tests above, we replaced >> ovn-controller in all 7 >> >> >> > >> > > > > nodes by the one with the current master branch >> (actually from last >> >> >> > >> > > > > week). We also replaced ovn-northd and ovsdb-servers >> but the >> >> >> > >> > > > > ovs-vswitchd has been left untouched (still on 2.10). >> The expected >> >> >> > >> > > > > results were to get less ovn-controller CPU usage and >> also better >> >> >> > >> > > > > times due to the Incremental Processing feature >> introduced recently. >> >> >> > >> > > > > However, the results don't look very good: >> >> >> > >> > > > > >> >> >> > >> > > > > - Compute node: https://imgur.com/a/wuq87F1 >> >> >> > >> > > > > - Controller node (ovn-northd and ovsdb-servers): >> https://imgur.com/a/99kiyDp >> >> >> > >> > > > > >> >> >> > >> > > > > One thing that we can tell from the ovs-vswitchd CPU >> consumption is >> >> >> > >> > > > > that it's much less in the Incremental Processing (IP) >> case which >> >> >> > >> > > > > apparently doesn't make much sense. This led us to >> think that perhaps >> >> >> > >> > > > > ovn-controller was not installing the necessary flows >> in the switch >> >> >> > >> > > > > and we confirmed this hypothesis by looking into the >> dataplane >> >> >> > >> > > > > results. Out of the 150 VMs, 10% of them were >> unreachable via ping >> >> >> > >> > > > > when using ovn-controller from master. >> >> >> > >> > > > > >> >> >> > >> > > > > @Han, others, do you have any
Re: [ovs-discuss] [OVN] ovn-controller Incremental Processing scale testing
On Fri, Jul 19, 2019 at 6:28 AM Han Zhou wrote: > > > On Tue, Jul 9, 2019 at 12:13 AM Numan Siddique > wrote: > > > > > > > > On Tue, Jul 9, 2019 at 12:25 PM Daniel Alvarez Sanchez < > dalva...@redhat.com> wrote: > >> > >> Thanks Numan for running these tests outside OpenStack! > >> > >> On Tue, Jul 9, 2019 at 7:50 AM Numan Siddique > wrote: > >> > > >> > > >> > > >> > On Tue, Jul 9, 2019 at 11:05 AM Han Zhou wrote: > >> >> > >> >> > >> >> > >> >> On Fri, Jun 21, 2019 at 12:31 AM Han Zhou wrote: > >> >> > > >> >> > > >> >> > > >> >> > On Thu, Jun 20, 2019 at 11:42 PM Numan Siddique < > nusid...@redhat.com> wrote: > >> >> > > > >> >> > > > >> >> > > > >> >> > > On Fri, Jun 21, 2019, 11:47 AM Han Zhou > wrote: > >> >> > >> > >> >> > >> > >> >> > >> > >> >> > >> On Tue, Jun 11, 2019 at 9:16 AM Daniel Alvarez Sanchez < > dalva...@redhat.com> wrote: > >> >> > >> > > >> >> > >> > Thanks a lot Han for the answer! > >> >> > >> > > >> >> > >> > On Tue, Jun 11, 2019 at 5:57 PM Han Zhou > wrote: > >> >> > >> > > > >> >> > >> > > > >> >> > >> > > > >> >> > >> > > > >> >> > >> > > On Tue, Jun 11, 2019 at 5:12 AM Dumitru Ceara < > dce...@redhat.com> wrote: > >> >> > >> > > > > >> >> > >> > > > On Tue, Jun 11, 2019 at 10:40 AM Daniel Alvarez Sanchez > >> >> > >> > > > wrote: > >> >> > >> > > > > > >> >> > >> > > > > Hi Han, all, > >> >> > >> > > > > > >> >> > >> > > > > Lucas, Numan and I have been doing some 'scale' testing > of OpenStack > >> >> > >> > > > > using OVN and wanted to present some results and issues > that we've > >> >> > >> > > > > found with the Incremental Processing feature in > ovn-controller. Below > >> >> > >> > > > > is the scenario that we executed: > >> >> > >> > > > > > >> >> > >> > > > > * 7 baremetal nodes setup: 3 controllers (running > >> >> > >> > > > > ovn-northd/ovsdb-servers in A/P with pacemaker) + 4 > compute nodes. OVS > >> >> > >> > > > > 2.10. > >> >> > >> > > > > * The test consists on: > >> >> > >> > > > > - Create openstack network (OVN LS), subnet and router > >> >> > >> > > > > - Attach subnet to the router and set gw to the > external network > >> >> > >> > > > > - Create an OpenStack port and apply a Security Group > (ACLs to allow > >> >> > >> > > > > UDP, SSH and ICMP). > >> >> > >> > > > > - Bind the port to one of the 4 compute nodes > (randomly) by > >> >> > >> > > > > attaching it to a network namespace. > >> >> > >> > > > > - Wait for the port to be ACTIVE in Neutron ('up == > True' in NB) > >> >> > >> > > > > - Wait until the test can ping the port > >> >> > >> > > > > * Running browbeat/rally with 16 simultaneous process > to execute the > >> >> > >> > > > > test above 150 times. > >> >> > >> > > > > * When all the 150 'fake VMs' are created, browbeat > will delete all > >> >> > >> > > > > the OpenStack/OVN resources. > >> >> > >> > > > > > >> >> > >> > > > > We first tried with OVS/OVN 2.10 and pulled some > results which showed > >> >> > >> > > > > 100% success but ovn-controller is quite loaded (as > expected) in all > >> >> > >> > > > > the nodes especially during the deletion phase: > >> >> > >> > > > > > >> >> > >> > > > > - Compute node: https://imgur.com/a/tzxfrIR > >> >> > >> > > > > - Controller node (ovn-northd and ovsdb-servers): > https://imgur.com/a/8ffKKYF > >> >> > >> > > > > > >> >> > >> > > > > After conducting the tests above, we replaced > ovn-controller in all 7 > >> >> > >> > > > > nodes by the one with the current master branch > (actually from last > >> >> > >> > > > > week). We also replaced ovn-northd and ovsdb-servers > but the > >> >> > >> > > > > ovs-vswitchd has been left untouched (still on 2.10). > The expected > >> >> > >> > > > > results were to get less ovn-controller CPU usage and > also better > >> >> > >> > > > > times due to the Incremental Processing feature > introduced recently. > >> >> > >> > > > > However, the results don't look very good: > >> >> > >> > > > > > >> >> > >> > > > > - Compute node: https://imgur.com/a/wuq87F1 > >> >> > >> > > > > - Controller node (ovn-northd and ovsdb-servers): > https://imgur.com/a/99kiyDp > >> >> > >> > > > > > >> >> > >> > > > > One thing that we can tell from the ovs-vswitchd CPU > consumption is > >> >> > >> > > > > that it's much less in the Incremental Processing (IP) > case which > >> >> > >> > > > > apparently doesn't make much sense. This led us to > think that perhaps > >> >> > >> > > > > ovn-controller was not installing the necessary flows > in the switch > >> >> > >> > > > > and we confirmed this hypothesis by looking into the > dataplane > >> >> > >> > > > > results. Out of the 150 VMs, 10% of them were > unreachable via ping > >> >> > >> > > > > when using ovn-controller from master. > >> >> > >> > > > > > >> >> > >> > > > > @Han, others, do you have any ideas as of what could be > happening > >> >> > >> > > > > here? We'll be able to use this setup for a few more > days so let me > >> >> > >> > > > > know if you want us to pull some other data/traces,
Re: [ovs-discuss] [OVN] ovn-controller Incremental Processing scale testing
On Tue, Jul 9, 2019 at 12:25 PM Daniel Alvarez Sanchez wrote: > Thanks Numan for running these tests outside OpenStack! > > On Tue, Jul 9, 2019 at 7:50 AM Numan Siddique wrote: > > > > > > > > On Tue, Jul 9, 2019 at 11:05 AM Han Zhou wrote: > >> > >> > >> > >> On Fri, Jun 21, 2019 at 12:31 AM Han Zhou wrote: > >> > > >> > > >> > > >> > On Thu, Jun 20, 2019 at 11:42 PM Numan Siddique > wrote: > >> > > > >> > > > >> > > > >> > > On Fri, Jun 21, 2019, 11:47 AM Han Zhou wrote: > >> > >> > >> > >> > >> > >> > >> > >> On Tue, Jun 11, 2019 at 9:16 AM Daniel Alvarez Sanchez < > dalva...@redhat.com> wrote: > >> > >> > > >> > >> > Thanks a lot Han for the answer! > >> > >> > > >> > >> > On Tue, Jun 11, 2019 at 5:57 PM Han Zhou > wrote: > >> > >> > > > >> > >> > > > >> > >> > > > >> > >> > > > >> > >> > > On Tue, Jun 11, 2019 at 5:12 AM Dumitru Ceara < > dce...@redhat.com> wrote: > >> > >> > > > > >> > >> > > > On Tue, Jun 11, 2019 at 10:40 AM Daniel Alvarez Sanchez > >> > >> > > > wrote: > >> > >> > > > > > >> > >> > > > > Hi Han, all, > >> > >> > > > > > >> > >> > > > > Lucas, Numan and I have been doing some 'scale' testing of > OpenStack > >> > >> > > > > using OVN and wanted to present some results and issues > that we've > >> > >> > > > > found with the Incremental Processing feature in > ovn-controller. Below > >> > >> > > > > is the scenario that we executed: > >> > >> > > > > > >> > >> > > > > * 7 baremetal nodes setup: 3 controllers (running > >> > >> > > > > ovn-northd/ovsdb-servers in A/P with pacemaker) + 4 > compute nodes. OVS > >> > >> > > > > 2.10. > >> > >> > > > > * The test consists on: > >> > >> > > > > - Create openstack network (OVN LS), subnet and router > >> > >> > > > > - Attach subnet to the router and set gw to the external > network > >> > >> > > > > - Create an OpenStack port and apply a Security Group > (ACLs to allow > >> > >> > > > > UDP, SSH and ICMP). > >> > >> > > > > - Bind the port to one of the 4 compute nodes (randomly) > by > >> > >> > > > > attaching it to a network namespace. > >> > >> > > > > - Wait for the port to be ACTIVE in Neutron ('up == > True' in NB) > >> > >> > > > > - Wait until the test can ping the port > >> > >> > > > > * Running browbeat/rally with 16 simultaneous process to > execute the > >> > >> > > > > test above 150 times. > >> > >> > > > > * When all the 150 'fake VMs' are created, browbeat will > delete all > >> > >> > > > > the OpenStack/OVN resources. > >> > >> > > > > > >> > >> > > > > We first tried with OVS/OVN 2.10 and pulled some results > which showed > >> > >> > > > > 100% success but ovn-controller is quite loaded (as > expected) in all > >> > >> > > > > the nodes especially during the deletion phase: > >> > >> > > > > > >> > >> > > > > - Compute node: https://imgur.com/a/tzxfrIR > >> > >> > > > > - Controller node (ovn-northd and ovsdb-servers): > https://imgur.com/a/8ffKKYF > >> > >> > > > > > >> > >> > > > > After conducting the tests above, we replaced > ovn-controller in all 7 > >> > >> > > > > nodes by the one with the current master branch (actually > from last > >> > >> > > > > week). We also replaced ovn-northd and ovsdb-servers but > the > >> > >> > > > > ovs-vswitchd has been left untouched (still on 2.10). The > expected > >> > >> > > > > results were to get less ovn-controller CPU usage and also > better > >> > >> > > > > times due to the Incremental Processing feature introduced > recently. > >> > >> > > > > However, the results don't look very good: > >> > >> > > > > > >> > >> > > > > - Compute node: https://imgur.com/a/wuq87F1 > >> > >> > > > > - Controller node (ovn-northd and ovsdb-servers): > https://imgur.com/a/99kiyDp > >> > >> > > > > > >> > >> > > > > One thing that we can tell from the ovs-vswitchd CPU > consumption is > >> > >> > > > > that it's much less in the Incremental Processing (IP) > case which > >> > >> > > > > apparently doesn't make much sense. This led us to think > that perhaps > >> > >> > > > > ovn-controller was not installing the necessary flows in > the switch > >> > >> > > > > and we confirmed this hypothesis by looking into the > dataplane > >> > >> > > > > results. Out of the 150 VMs, 10% of them were unreachable > via ping > >> > >> > > > > when using ovn-controller from master. > >> > >> > > > > > >> > >> > > > > @Han, others, do you have any ideas as of what could be > happening > >> > >> > > > > here? We'll be able to use this setup for a few more days > so let me > >> > >> > > > > know if you want us to pull some other data/traces, ... > >> > >> > > > > > >> > >> > > > > Some other interesting things: > >> > >> > > > > On each of the compute nodes, (with an almost evenly > distributed > >> > >> > > > > number of logical ports bound to them), the max amount of > logical > >> > >> > > > > flows in br-int is ~90K (by the end of the test, right > before deleting > >> > >> > > > > the resources). > >> > >> > > > > > >> > >> > > > > It looks like with the IP version,
Re: [ovs-discuss] [OVN] ovn-controller Incremental Processing scale testing
Thanks Numan for running these tests outside OpenStack! On Tue, Jul 9, 2019 at 7:50 AM Numan Siddique wrote: > > > > On Tue, Jul 9, 2019 at 11:05 AM Han Zhou wrote: >> >> >> >> On Fri, Jun 21, 2019 at 12:31 AM Han Zhou wrote: >> > >> > >> > >> > On Thu, Jun 20, 2019 at 11:42 PM Numan Siddique >> > wrote: >> > > >> > > >> > > >> > > On Fri, Jun 21, 2019, 11:47 AM Han Zhou wrote: >> > >> >> > >> >> > >> >> > >> On Tue, Jun 11, 2019 at 9:16 AM Daniel Alvarez Sanchez >> > >> wrote: >> > >> > >> > >> > Thanks a lot Han for the answer! >> > >> > >> > >> > On Tue, Jun 11, 2019 at 5:57 PM Han Zhou wrote: >> > >> > > >> > >> > > >> > >> > > >> > >> > > >> > >> > > On Tue, Jun 11, 2019 at 5:12 AM Dumitru Ceara >> > >> > > wrote: >> > >> > > > >> > >> > > > On Tue, Jun 11, 2019 at 10:40 AM Daniel Alvarez Sanchez >> > >> > > > wrote: >> > >> > > > > >> > >> > > > > Hi Han, all, >> > >> > > > > >> > >> > > > > Lucas, Numan and I have been doing some 'scale' testing of >> > >> > > > > OpenStack >> > >> > > > > using OVN and wanted to present some results and issues that >> > >> > > > > we've >> > >> > > > > found with the Incremental Processing feature in >> > >> > > > > ovn-controller. Below >> > >> > > > > is the scenario that we executed: >> > >> > > > > >> > >> > > > > * 7 baremetal nodes setup: 3 controllers (running >> > >> > > > > ovn-northd/ovsdb-servers in A/P with pacemaker) + 4 compute >> > >> > > > > nodes. OVS >> > >> > > > > 2.10. >> > >> > > > > * The test consists on: >> > >> > > > > - Create openstack network (OVN LS), subnet and router >> > >> > > > > - Attach subnet to the router and set gw to the external >> > >> > > > > network >> > >> > > > > - Create an OpenStack port and apply a Security Group (ACLs >> > >> > > > > to allow >> > >> > > > > UDP, SSH and ICMP). >> > >> > > > > - Bind the port to one of the 4 compute nodes (randomly) by >> > >> > > > > attaching it to a network namespace. >> > >> > > > > - Wait for the port to be ACTIVE in Neutron ('up == True' in >> > >> > > > > NB) >> > >> > > > > - Wait until the test can ping the port >> > >> > > > > * Running browbeat/rally with 16 simultaneous process to >> > >> > > > > execute the >> > >> > > > > test above 150 times. >> > >> > > > > * When all the 150 'fake VMs' are created, browbeat will delete >> > >> > > > > all >> > >> > > > > the OpenStack/OVN resources. >> > >> > > > > >> > >> > > > > We first tried with OVS/OVN 2.10 and pulled some results which >> > >> > > > > showed >> > >> > > > > 100% success but ovn-controller is quite loaded (as expected) >> > >> > > > > in all >> > >> > > > > the nodes especially during the deletion phase: >> > >> > > > > >> > >> > > > > - Compute node: https://imgur.com/a/tzxfrIR >> > >> > > > > - Controller node (ovn-northd and ovsdb-servers): >> > >> > > > > https://imgur.com/a/8ffKKYF >> > >> > > > > >> > >> > > > > After conducting the tests above, we replaced ovn-controller in >> > >> > > > > all 7 >> > >> > > > > nodes by the one with the current master branch (actually from >> > >> > > > > last >> > >> > > > > week). We also replaced ovn-northd and ovsdb-servers but the >> > >> > > > > ovs-vswitchd has been left untouched (still on 2.10). The >> > >> > > > > expected >> > >> > > > > results were to get less ovn-controller CPU usage and also >> > >> > > > > better >> > >> > > > > times due to the Incremental Processing feature introduced >> > >> > > > > recently. >> > >> > > > > However, the results don't look very good: >> > >> > > > > >> > >> > > > > - Compute node: https://imgur.com/a/wuq87F1 >> > >> > > > > - Controller node (ovn-northd and ovsdb-servers): >> > >> > > > > https://imgur.com/a/99kiyDp >> > >> > > > > >> > >> > > > > One thing that we can tell from the ovs-vswitchd CPU >> > >> > > > > consumption is >> > >> > > > > that it's much less in the Incremental Processing (IP) case >> > >> > > > > which >> > >> > > > > apparently doesn't make much sense. This led us to think that >> > >> > > > > perhaps >> > >> > > > > ovn-controller was not installing the necessary flows in the >> > >> > > > > switch >> > >> > > > > and we confirmed this hypothesis by looking into the dataplane >> > >> > > > > results. Out of the 150 VMs, 10% of them were unreachable via >> > >> > > > > ping >> > >> > > > > when using ovn-controller from master. >> > >> > > > > >> > >> > > > > @Han, others, do you have any ideas as of what could be >> > >> > > > > happening >> > >> > > > > here? We'll be able to use this setup for a few more days so >> > >> > > > > let me >> > >> > > > > know if you want us to pull some other data/traces, ... >> > >> > > > > >> > >> > > > > Some other interesting things: >> > >> > > > > On each of the compute nodes, (with an almost evenly distributed >> > >> > > > > number of logical ports bound to them), the max amount of >> > >> > > > > logical >> > >> > > > > flows in br-int is ~90K (by the end of the test, right before >> > >> > > > >
Re: [ovs-discuss] [OVN] ovn-controller Incremental Processing scale testing
On Tue, Jul 9, 2019 at 11:05 AM Han Zhou wrote: > > > On Fri, Jun 21, 2019 at 12:31 AM Han Zhou wrote: > > > > > > > > On Thu, Jun 20, 2019 at 11:42 PM Numan Siddique > wrote: > > > > > > > > > > > > On Fri, Jun 21, 2019, 11:47 AM Han Zhou wrote: > > >> > > >> > > >> > > >> On Tue, Jun 11, 2019 at 9:16 AM Daniel Alvarez Sanchez < > dalva...@redhat.com> wrote: > > >> > > > >> > Thanks a lot Han for the answer! > > >> > > > >> > On Tue, Jun 11, 2019 at 5:57 PM Han Zhou wrote: > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > On Tue, Jun 11, 2019 at 5:12 AM Dumitru Ceara > wrote: > > >> > > > > > >> > > > On Tue, Jun 11, 2019 at 10:40 AM Daniel Alvarez Sanchez > > >> > > > wrote: > > >> > > > > > > >> > > > > Hi Han, all, > > >> > > > > > > >> > > > > Lucas, Numan and I have been doing some 'scale' testing of > OpenStack > > >> > > > > using OVN and wanted to present some results and issues that > we've > > >> > > > > found with the Incremental Processing feature in > ovn-controller. Below > > >> > > > > is the scenario that we executed: > > >> > > > > > > >> > > > > * 7 baremetal nodes setup: 3 controllers (running > > >> > > > > ovn-northd/ovsdb-servers in A/P with pacemaker) + 4 compute > nodes. OVS > > >> > > > > 2.10. > > >> > > > > * The test consists on: > > >> > > > > - Create openstack network (OVN LS), subnet and router > > >> > > > > - Attach subnet to the router and set gw to the external > network > > >> > > > > - Create an OpenStack port and apply a Security Group (ACLs > to allow > > >> > > > > UDP, SSH and ICMP). > > >> > > > > - Bind the port to one of the 4 compute nodes (randomly) by > > >> > > > > attaching it to a network namespace. > > >> > > > > - Wait for the port to be ACTIVE in Neutron ('up == True' > in NB) > > >> > > > > - Wait until the test can ping the port > > >> > > > > * Running browbeat/rally with 16 simultaneous process to > execute the > > >> > > > > test above 150 times. > > >> > > > > * When all the 150 'fake VMs' are created, browbeat will > delete all > > >> > > > > the OpenStack/OVN resources. > > >> > > > > > > >> > > > > We first tried with OVS/OVN 2.10 and pulled some results > which showed > > >> > > > > 100% success but ovn-controller is quite loaded (as expected) > in all > > >> > > > > the nodes especially during the deletion phase: > > >> > > > > > > >> > > > > - Compute node: https://imgur.com/a/tzxfrIR > > >> > > > > - Controller node (ovn-northd and ovsdb-servers): > https://imgur.com/a/8ffKKYF > > >> > > > > > > >> > > > > After conducting the tests above, we replaced ovn-controller > in all 7 > > >> > > > > nodes by the one with the current master branch (actually > from last > > >> > > > > week). We also replaced ovn-northd and ovsdb-servers but the > > >> > > > > ovs-vswitchd has been left untouched (still on 2.10). The > expected > > >> > > > > results were to get less ovn-controller CPU usage and also > better > > >> > > > > times due to the Incremental Processing feature introduced > recently. > > >> > > > > However, the results don't look very good: > > >> > > > > > > >> > > > > - Compute node: https://imgur.com/a/wuq87F1 > > >> > > > > - Controller node (ovn-northd and ovsdb-servers): > https://imgur.com/a/99kiyDp > > >> > > > > > > >> > > > > One thing that we can tell from the ovs-vswitchd CPU > consumption is > > >> > > > > that it's much less in the Incremental Processing (IP) case > which > > >> > > > > apparently doesn't make much sense. This led us to think that > perhaps > > >> > > > > ovn-controller was not installing the necessary flows in the > switch > > >> > > > > and we confirmed this hypothesis by looking into the dataplane > > >> > > > > results. Out of the 150 VMs, 10% of them were unreachable via > ping > > >> > > > > when using ovn-controller from master. > > >> > > > > > > >> > > > > @Han, others, do you have any ideas as of what could be > happening > > >> > > > > here? We'll be able to use this setup for a few more days so > let me > > >> > > > > know if you want us to pull some other data/traces, ... > > >> > > > > > > >> > > > > Some other interesting things: > > >> > > > > On each of the compute nodes, (with an almost evenly > distributed > > >> > > > > number of logical ports bound to them), the max amount of > logical > > >> > > > > flows in br-int is ~90K (by the end of the test, right before > deleting > > >> > > > > the resources). > > >> > > > > > > >> > > > > It looks like with the IP version, ovn-controller leaks some > memory: > > >> > > > > https://imgur.com/a/trQrhWd > > >> > > > > While with OVS 2.10, it remains pretty flat during the test: > > >> > > > > https://imgur.com/a/KCkIT4O > > >> > > > > > >> > > > Hi Daniel, Han, > > >> > > > > > >> > > > I just sent a small patch for the ovn-controller memory leak: > > >> > > > https://patchwork.ozlabs.org/patch/1113758/ > > >> > > > > > >> > > > At least on my setup this is what valgrind was pointing at. > > >> > > > > > >> > > > Cheers, >
Re: [ovs-discuss] [OVN] ovn-controller Incremental Processing scale testing
On Fri, Jun 21, 2019 at 12:31 AM Han Zhou wrote: > > > > On Thu, Jun 20, 2019 at 11:42 PM Numan Siddique wrote: > > > > > > > > On Fri, Jun 21, 2019, 11:47 AM Han Zhou wrote: > >> > >> > >> > >> On Tue, Jun 11, 2019 at 9:16 AM Daniel Alvarez Sanchez < dalva...@redhat.com> wrote: > >> > > >> > Thanks a lot Han for the answer! > >> > > >> > On Tue, Jun 11, 2019 at 5:57 PM Han Zhou wrote: > >> > > > >> > > > >> > > > >> > > > >> > > On Tue, Jun 11, 2019 at 5:12 AM Dumitru Ceara wrote: > >> > > > > >> > > > On Tue, Jun 11, 2019 at 10:40 AM Daniel Alvarez Sanchez > >> > > > wrote: > >> > > > > > >> > > > > Hi Han, all, > >> > > > > > >> > > > > Lucas, Numan and I have been doing some 'scale' testing of OpenStack > >> > > > > using OVN and wanted to present some results and issues that we've > >> > > > > found with the Incremental Processing feature in ovn-controller. Below > >> > > > > is the scenario that we executed: > >> > > > > > >> > > > > * 7 baremetal nodes setup: 3 controllers (running > >> > > > > ovn-northd/ovsdb-servers in A/P with pacemaker) + 4 compute nodes. OVS > >> > > > > 2.10. > >> > > > > * The test consists on: > >> > > > > - Create openstack network (OVN LS), subnet and router > >> > > > > - Attach subnet to the router and set gw to the external network > >> > > > > - Create an OpenStack port and apply a Security Group (ACLs to allow > >> > > > > UDP, SSH and ICMP). > >> > > > > - Bind the port to one of the 4 compute nodes (randomly) by > >> > > > > attaching it to a network namespace. > >> > > > > - Wait for the port to be ACTIVE in Neutron ('up == True' in NB) > >> > > > > - Wait until the test can ping the port > >> > > > > * Running browbeat/rally with 16 simultaneous process to execute the > >> > > > > test above 150 times. > >> > > > > * When all the 150 'fake VMs' are created, browbeat will delete all > >> > > > > the OpenStack/OVN resources. > >> > > > > > >> > > > > We first tried with OVS/OVN 2.10 and pulled some results which showed > >> > > > > 100% success but ovn-controller is quite loaded (as expected) in all > >> > > > > the nodes especially during the deletion phase: > >> > > > > > >> > > > > - Compute node: https://imgur.com/a/tzxfrIR > >> > > > > - Controller node (ovn-northd and ovsdb-servers): https://imgur.com/a/8ffKKYF > >> > > > > > >> > > > > After conducting the tests above, we replaced ovn-controller in all 7 > >> > > > > nodes by the one with the current master branch (actually from last > >> > > > > week). We also replaced ovn-northd and ovsdb-servers but the > >> > > > > ovs-vswitchd has been left untouched (still on 2.10). The expected > >> > > > > results were to get less ovn-controller CPU usage and also better > >> > > > > times due to the Incremental Processing feature introduced recently. > >> > > > > However, the results don't look very good: > >> > > > > > >> > > > > - Compute node: https://imgur.com/a/wuq87F1 > >> > > > > - Controller node (ovn-northd and ovsdb-servers): https://imgur.com/a/99kiyDp > >> > > > > > >> > > > > One thing that we can tell from the ovs-vswitchd CPU consumption is > >> > > > > that it's much less in the Incremental Processing (IP) case which > >> > > > > apparently doesn't make much sense. This led us to think that perhaps > >> > > > > ovn-controller was not installing the necessary flows in the switch > >> > > > > and we confirmed this hypothesis by looking into the dataplane > >> > > > > results. Out of the 150 VMs, 10% of them were unreachable via ping > >> > > > > when using ovn-controller from master. > >> > > > > > >> > > > > @Han, others, do you have any ideas as of what could be happening > >> > > > > here? We'll be able to use this setup for a few more days so let me > >> > > > > know if you want us to pull some other data/traces, ... > >> > > > > > >> > > > > Some other interesting things: > >> > > > > On each of the compute nodes, (with an almost evenly distributed > >> > > > > number of logical ports bound to them), the max amount of logical > >> > > > > flows in br-int is ~90K (by the end of the test, right before deleting > >> > > > > the resources). > >> > > > > > >> > > > > It looks like with the IP version, ovn-controller leaks some memory: > >> > > > > https://imgur.com/a/trQrhWd > >> > > > > While with OVS 2.10, it remains pretty flat during the test: > >> > > > > https://imgur.com/a/KCkIT4O > >> > > > > >> > > > Hi Daniel, Han, > >> > > > > >> > > > I just sent a small patch for the ovn-controller memory leak: > >> > > > https://patchwork.ozlabs.org/patch/1113758/ > >> > > > > >> > > > At least on my setup this is what valgrind was pointing at. > >> > > > > >> > > > Cheers, > >> > > > Dumitru > >> > > > > >> > > > > > >> > > > > Looking forward to hearing back :) > >> > > > > Daniel > >> > > > > > >> > > > > PS. Sorry for my previous email, I sent it by mistake without the subject > >> > > > > ___ > >> > > > > discuss mailing list > >> > > > >
Re: [ovs-discuss] [OVN] ovn-controller Incremental Processing scale testing
On Mon, Jun 24, 2019 at 1:51 PM aginwala wrote: > Hi: > As per irc meeting discussion, some nice findings were already discussed > by Numan (Thanks for sharing the details). When changing external_ids for > a claimed port e.g. ovn-nbctl set logical_switch_port sw0-port1 > external_ids:foo=bar triggers re-computation on local compute. I do see the > same behavior. Numan is proposing a patch to skip computation for > external_ids column for an already claimed port for port_binding table > because of runtime_data, can't handle change for input SB_port_binding, > fall back to recompute ( > https://github.com/openvswitch/ovs/blob/master/ovn/lib/inc-proc-eng.h#L77). > However, I don't see external_ids in port_binding table for the port being > set explicitly when setting Interface table in the test code that Daniel > posted [1] which could trigger extra re-computation in current test > scenario. > ovn-northd just copies the external_ids of a logical switch port to external_ids of port binding. And networking-ovn makes use of external_ids a lot. > > Also ovs-vsctl add-br test will also trigger re-computation on local > compute and yes I can see the same. Since we don't have any handlers for > Ports and Interfaces table similar to port_binding and other handlers @ > https://github.com/openvswitch/ovs/blob/master/ovn/controller/ovn-controller.c#L1769, > adding a new bridge also causes re-computation on the local compute. Not > sure if its required immediately because as per the patch shared by Daniel > [1], I don't see any new test bridges getting created apart from br-int > and hence wont be much impact. Or may be I missed to see if they are also > creating test bridges during testing. Of course, any new ovs-vsctl command > for attaching/detaching vif will sure trigger recompute on br-int as and > when VIF(vm) gets added/deleted to program the flow on local compute. > It would impact how the CMS creates the ovs port. If suppose If I do something like below --- ovs-vsctl add-port br-int foo ovs-vsctl set interface foo type=internal ovs-vsctl set Interface foo external_ids:iface-id=foo-id and if ovn-controller gets 3 updates from ovsdb-server, this would result in 3 recomputations. However if I do ovs-vsctl add-port br-int foo -- set interface foo type=internal -- set interface foo external_ids:iface-id=foo-id this could result in only 1 recomputation. I think ovn-controller should handle the local ovsdb changes for 1. external_ids of openvswitch table 2. if an ovs interface's external_ids:iface-id is updated. We should try to ignore or any other changes to the local ovsdb. > I didn't get a chance to verify when a chassisredirect port is claimed on > a gateway chassis, it triggers computation on all computes registered with > SB as per code > https://github.com/openvswitch/ovs/blob/master/ovn/controller/binding.c#L722 > which was also raises further optimization for chassisredirect flow that > Numan is suggesting. > > 1. > https://github.com/danalsan/browbeat/commit/0ff72da52ddf17aa9f7269f191eebd890899bdad > > I submitted the patches just now to address some of the issues - https://patchwork.ozlabs.org/project/openvswitch/list/?series=115737 I also ran the test with these patches, but it didn't help in any improvement. Although the patches I submitted avoids recomputation for some of the scenarios, I think I still need to dig further to see what's causing the performance impact when compared with non IP patches, Thanks Numan On Fri, Jun 21, 2019 at 12:32 AM Han Zhou wrote: > >> >> >> On Thu, Jun 20, 2019 at 11:42 PM Numan Siddique >> wrote: >> > >> > >> > >> > On Fri, Jun 21, 2019, 11:47 AM Han Zhou wrote: >> >> >> >> >> >> >> >> On Tue, Jun 11, 2019 at 9:16 AM Daniel Alvarez Sanchez < >> dalva...@redhat.com> wrote: >> >> > >> >> > Thanks a lot Han for the answer! >> >> > >> >> > On Tue, Jun 11, 2019 at 5:57 PM Han Zhou wrote: >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > On Tue, Jun 11, 2019 at 5:12 AM Dumitru Ceara >> wrote: >> >> > > > >> >> > > > On Tue, Jun 11, 2019 at 10:40 AM Daniel Alvarez Sanchez >> >> > > > wrote: >> >> > > > > >> >> > > > > Hi Han, all, >> >> > > > > >> >> > > > > Lucas, Numan and I have been doing some 'scale' testing of >> OpenStack >> >> > > > > using OVN and wanted to present some results and issues that >> we've >> >> > > > > found with the Incremental Processing feature in >> ovn-controller. Below >> >> > > > > is the scenario that we executed: >> >> > > > > >> >> > > > > * 7 baremetal nodes setup: 3 controllers (running >> >> > > > > ovn-northd/ovsdb-servers in A/P with pacemaker) + 4 compute >> nodes. OVS >> >> > > > > 2.10. >> >> > > > > * The test consists on: >> >> > > > > - Create openstack network (OVN LS), subnet and router >> >> > > > > - Attach subnet to the router and set gw to the external >> network >> >> > > > > - Create an OpenStack port and apply a Security Group (ACLs >> to allow >> >> > > > > UDP, SSH and ICMP). >>
Re: [ovs-discuss] [OVN] ovn-controller Incremental Processing scale testing
Hi: As per irc meeting discussion, some nice findings were already discussed by Numan (Thanks for sharing the details). When changing external_ids for a claimed port e.g. ovn-nbctl set logical_switch_port sw0-port1 external_ids:foo=bar triggers re-computation on local compute. I do see the same behavior. Numan is proposing a patch to skip computation for external_ids column for an already claimed port for port_binding table because of runtime_data, can't handle change for input SB_port_binding, fall back to recompute ( https://github.com/openvswitch/ovs/blob/master/ovn/lib/inc-proc-eng.h#L77). However, I don't see external_ids in port_binding table for the port being set explicitly when setting Interface table in the test code that Daniel posted [1] which could trigger extra re-computation in current test scenario. Also ovs-vsctl add-br test will also trigger re-computation on local compute and yes I can see the same. Since we don't have any handlers for Ports and Interfaces table similar to port_binding and other handlers @ https://github.com/openvswitch/ovs/blob/master/ovn/controller/ovn-controller.c#L1769, adding a new bridge also causes re-computation on the local compute. Not sure if its required immediately because as per the patch shared by Daniel [1], I don't see any new test bridges getting created apart from br-int and hence wont be much impact. Or may be I missed to see if they are also creating test bridges during testing. Of course, any new ovs-vsctl command for attaching/detaching vif will sure trigger recompute on br-int as and when VIF(vm) gets added/deleted to program the flow on local compute. I didn't get a chance to verify when a chassisredirect port is claimed on a gateway chassis, it triggers computation on all computes registered with SB as per code https://github.com/openvswitch/ovs/blob/master/ovn/controller/binding.c#L722 which was also raises further optimization for chassisredirect flow that Numan is suggesting. 1. https://github.com/danalsan/browbeat/commit/0ff72da52ddf17aa9f7269f191eebd890899bdad On Fri, Jun 21, 2019 at 12:32 AM Han Zhou wrote: > > > On Thu, Jun 20, 2019 at 11:42 PM Numan Siddique > wrote: > > > > > > > > On Fri, Jun 21, 2019, 11:47 AM Han Zhou wrote: > >> > >> > >> > >> On Tue, Jun 11, 2019 at 9:16 AM Daniel Alvarez Sanchez < > dalva...@redhat.com> wrote: > >> > > >> > Thanks a lot Han for the answer! > >> > > >> > On Tue, Jun 11, 2019 at 5:57 PM Han Zhou wrote: > >> > > > >> > > > >> > > > >> > > > >> > > On Tue, Jun 11, 2019 at 5:12 AM Dumitru Ceara > wrote: > >> > > > > >> > > > On Tue, Jun 11, 2019 at 10:40 AM Daniel Alvarez Sanchez > >> > > > wrote: > >> > > > > > >> > > > > Hi Han, all, > >> > > > > > >> > > > > Lucas, Numan and I have been doing some 'scale' testing of > OpenStack > >> > > > > using OVN and wanted to present some results and issues that > we've > >> > > > > found with the Incremental Processing feature in > ovn-controller. Below > >> > > > > is the scenario that we executed: > >> > > > > > >> > > > > * 7 baremetal nodes setup: 3 controllers (running > >> > > > > ovn-northd/ovsdb-servers in A/P with pacemaker) + 4 compute > nodes. OVS > >> > > > > 2.10. > >> > > > > * The test consists on: > >> > > > > - Create openstack network (OVN LS), subnet and router > >> > > > > - Attach subnet to the router and set gw to the external > network > >> > > > > - Create an OpenStack port and apply a Security Group (ACLs > to allow > >> > > > > UDP, SSH and ICMP). > >> > > > > - Bind the port to one of the 4 compute nodes (randomly) by > >> > > > > attaching it to a network namespace. > >> > > > > - Wait for the port to be ACTIVE in Neutron ('up == True' in > NB) > >> > > > > - Wait until the test can ping the port > >> > > > > * Running browbeat/rally with 16 simultaneous process to > execute the > >> > > > > test above 150 times. > >> > > > > * When all the 150 'fake VMs' are created, browbeat will delete > all > >> > > > > the OpenStack/OVN resources. > >> > > > > > >> > > > > We first tried with OVS/OVN 2.10 and pulled some results which > showed > >> > > > > 100% success but ovn-controller is quite loaded (as expected) > in all > >> > > > > the nodes especially during the deletion phase: > >> > > > > > >> > > > > - Compute node: https://imgur.com/a/tzxfrIR > >> > > > > - Controller node (ovn-northd and ovsdb-servers): > https://imgur.com/a/8ffKKYF > >> > > > > > >> > > > > After conducting the tests above, we replaced ovn-controller in > all 7 > >> > > > > nodes by the one with the current master branch (actually from > last > >> > > > > week). We also replaced ovn-northd and ovsdb-servers but the > >> > > > > ovs-vswitchd has been left untouched (still on 2.10). The > expected > >> > > > > results were to get less ovn-controller CPU usage and also > better > >> > > > > times due to the Incremental Processing feature introduced > recently. > >> > > > > However, the results don't look very good: > >> > > > > > >> > > > >
Re: [ovs-discuss] [OVN] ovn-controller Incremental Processing scale testing
On Thu, Jun 20, 2019 at 11:42 PM Numan Siddique wrote: > > > > On Fri, Jun 21, 2019, 11:47 AM Han Zhou wrote: >> >> >> >> On Tue, Jun 11, 2019 at 9:16 AM Daniel Alvarez Sanchez < dalva...@redhat.com> wrote: >> > >> > Thanks a lot Han for the answer! >> > >> > On Tue, Jun 11, 2019 at 5:57 PM Han Zhou wrote: >> > > >> > > >> > > >> > > >> > > On Tue, Jun 11, 2019 at 5:12 AM Dumitru Ceara wrote: >> > > > >> > > > On Tue, Jun 11, 2019 at 10:40 AM Daniel Alvarez Sanchez >> > > > wrote: >> > > > > >> > > > > Hi Han, all, >> > > > > >> > > > > Lucas, Numan and I have been doing some 'scale' testing of OpenStack >> > > > > using OVN and wanted to present some results and issues that we've >> > > > > found with the Incremental Processing feature in ovn-controller. Below >> > > > > is the scenario that we executed: >> > > > > >> > > > > * 7 baremetal nodes setup: 3 controllers (running >> > > > > ovn-northd/ovsdb-servers in A/P with pacemaker) + 4 compute nodes. OVS >> > > > > 2.10. >> > > > > * The test consists on: >> > > > > - Create openstack network (OVN LS), subnet and router >> > > > > - Attach subnet to the router and set gw to the external network >> > > > > - Create an OpenStack port and apply a Security Group (ACLs to allow >> > > > > UDP, SSH and ICMP). >> > > > > - Bind the port to one of the 4 compute nodes (randomly) by >> > > > > attaching it to a network namespace. >> > > > > - Wait for the port to be ACTIVE in Neutron ('up == True' in NB) >> > > > > - Wait until the test can ping the port >> > > > > * Running browbeat/rally with 16 simultaneous process to execute the >> > > > > test above 150 times. >> > > > > * When all the 150 'fake VMs' are created, browbeat will delete all >> > > > > the OpenStack/OVN resources. >> > > > > >> > > > > We first tried with OVS/OVN 2.10 and pulled some results which showed >> > > > > 100% success but ovn-controller is quite loaded (as expected) in all >> > > > > the nodes especially during the deletion phase: >> > > > > >> > > > > - Compute node: https://imgur.com/a/tzxfrIR >> > > > > - Controller node (ovn-northd and ovsdb-servers): https://imgur.com/a/8ffKKYF >> > > > > >> > > > > After conducting the tests above, we replaced ovn-controller in all 7 >> > > > > nodes by the one with the current master branch (actually from last >> > > > > week). We also replaced ovn-northd and ovsdb-servers but the >> > > > > ovs-vswitchd has been left untouched (still on 2.10). The expected >> > > > > results were to get less ovn-controller CPU usage and also better >> > > > > times due to the Incremental Processing feature introduced recently. >> > > > > However, the results don't look very good: >> > > > > >> > > > > - Compute node: https://imgur.com/a/wuq87F1 >> > > > > - Controller node (ovn-northd and ovsdb-servers): https://imgur.com/a/99kiyDp >> > > > > >> > > > > One thing that we can tell from the ovs-vswitchd CPU consumption is >> > > > > that it's much less in the Incremental Processing (IP) case which >> > > > > apparently doesn't make much sense. This led us to think that perhaps >> > > > > ovn-controller was not installing the necessary flows in the switch >> > > > > and we confirmed this hypothesis by looking into the dataplane >> > > > > results. Out of the 150 VMs, 10% of them were unreachable via ping >> > > > > when using ovn-controller from master. >> > > > > >> > > > > @Han, others, do you have any ideas as of what could be happening >> > > > > here? We'll be able to use this setup for a few more days so let me >> > > > > know if you want us to pull some other data/traces, ... >> > > > > >> > > > > Some other interesting things: >> > > > > On each of the compute nodes, (with an almost evenly distributed >> > > > > number of logical ports bound to them), the max amount of logical >> > > > > flows in br-int is ~90K (by the end of the test, right before deleting >> > > > > the resources). >> > > > > >> > > > > It looks like with the IP version, ovn-controller leaks some memory: >> > > > > https://imgur.com/a/trQrhWd >> > > > > While with OVS 2.10, it remains pretty flat during the test: >> > > > > https://imgur.com/a/KCkIT4O >> > > > >> > > > Hi Daniel, Han, >> > > > >> > > > I just sent a small patch for the ovn-controller memory leak: >> > > > https://patchwork.ozlabs.org/patch/1113758/ >> > > > >> > > > At least on my setup this is what valgrind was pointing at. >> > > > >> > > > Cheers, >> > > > Dumitru >> > > > >> > > > > >> > > > > Looking forward to hearing back :) >> > > > > Daniel >> > > > > >> > > > > PS. Sorry for my previous email, I sent it by mistake without the subject >> > > > > ___ >> > > > > discuss mailing list >> > > > > disc...@openvswitch.org >> > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss >> > > >> > > Thanks Daniel for the testing and reporting, and thanks Dumitru for fixing the memory leak. >> > > >> > > Currently ovn-controller incremental processing only
Re: [ovs-discuss] [OVN] ovn-controller Incremental Processing scale testing
Hi Han, > On 21 Jun 2019, at 08:16, Han Zhou wrote: > > > > On Tue, Jun 11, 2019 at 9:16 AM Daniel Alvarez Sanchez > wrote: > > > > Thanks a lot Han for the answer! > > > > On Tue, Jun 11, 2019 at 5:57 PM Han Zhou wrote: > > > > > > > > > > > > > > > On Tue, Jun 11, 2019 at 5:12 AM Dumitru Ceara wrote: > > > > > > > > On Tue, Jun 11, 2019 at 10:40 AM Daniel Alvarez Sanchez > > > > wrote: > > > > > > > > > > Hi Han, all, > > > > > > > > > > Lucas, Numan and I have been doing some 'scale' testing of OpenStack > > > > > using OVN and wanted to present some results and issues that we've > > > > > found with the Incremental Processing feature in ovn-controller. Below > > > > > is the scenario that we executed: > > > > > > > > > > * 7 baremetal nodes setup: 3 controllers (running > > > > > ovn-northd/ovsdb-servers in A/P with pacemaker) + 4 compute nodes. OVS > > > > > 2.10. > > > > > * The test consists on: > > > > > - Create openstack network (OVN LS), subnet and router > > > > > - Attach subnet to the router and set gw to the external network > > > > > - Create an OpenStack port and apply a Security Group (ACLs to allow > > > > > UDP, SSH and ICMP). > > > > > - Bind the port to one of the 4 compute nodes (randomly) by > > > > > attaching it to a network namespace. > > > > > - Wait for the port to be ACTIVE in Neutron ('up == True' in NB) > > > > > - Wait until the test can ping the port > > > > > * Running browbeat/rally with 16 simultaneous process to execute the > > > > > test above 150 times. > > > > > * When all the 150 'fake VMs' are created, browbeat will delete all > > > > > the OpenStack/OVN resources. > > > > > > > > > > We first tried with OVS/OVN 2.10 and pulled some results which showed > > > > > 100% success but ovn-controller is quite loaded (as expected) in all > > > > > the nodes especially during the deletion phase: > > > > > > > > > > - Compute node: https://imgur.com/a/tzxfrIR > > > > > - Controller node (ovn-northd and ovsdb-servers): > > > > > https://imgur.com/a/8ffKKYF > > > > > > > > > > After conducting the tests above, we replaced ovn-controller in all 7 > > > > > nodes by the one with the current master branch (actually from last > > > > > week). We also replaced ovn-northd and ovsdb-servers but the > > > > > ovs-vswitchd has been left untouched (still on 2.10). The expected > > > > > results were to get less ovn-controller CPU usage and also better > > > > > times due to the Incremental Processing feature introduced recently. > > > > > However, the results don't look very good: > > > > > > > > > > - Compute node: https://imgur.com/a/wuq87F1 > > > > > - Controller node (ovn-northd and ovsdb-servers): > > > > > https://imgur.com/a/99kiyDp > > > > > > > > > > One thing that we can tell from the ovs-vswitchd CPU consumption is > > > > > that it's much less in the Incremental Processing (IP) case which > > > > > apparently doesn't make much sense. This led us to think that perhaps > > > > > ovn-controller was not installing the necessary flows in the switch > > > > > and we confirmed this hypothesis by looking into the dataplane > > > > > results. Out of the 150 VMs, 10% of them were unreachable via ping > > > > > when using ovn-controller from master. > > > > > > > > > > @Han, others, do you have any ideas as of what could be happening > > > > > here? We'll be able to use this setup for a few more days so let me > > > > > know if you want us to pull some other data/traces, ... > > > > > > > > > > Some other interesting things: > > > > > On each of the compute nodes, (with an almost evenly distributed > > > > > number of logical ports bound to them), the max amount of logical > > > > > flows in br-int is ~90K (by the end of the test, right before deleting > > > > > the resources). > > > > > > > > > > It looks like with the IP version, ovn-controller leaks some memory: > > > > > https://imgur.com/a/trQrhWd > > > > > While with OVS 2.10, it remains pretty flat during the test: > > > > > https://imgur.com/a/KCkIT4O > > > > > > > > Hi Daniel, Han, > > > > > > > > I just sent a small patch for the ovn-controller memory leak: > > > > https://patchwork.ozlabs.org/patch/1113758/ > > > > > > > > At least on my setup this is what valgrind was pointing at. > > > > > > > > Cheers, > > > > Dumitru > > > > > > > > > > > > > > Looking forward to hearing back :) > > > > > Daniel > > > > > > > > > > PS. Sorry for my previous email, I sent it by mistake without the > > > > > subject > > > > > ___ > > > > > discuss mailing list > > > > > disc...@openvswitch.org > > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss > > > > > > Thanks Daniel for the testing and reporting, and thanks Dumitru for > > > fixing the memory leak. > > > > > > Currently ovn-controller incremental processing only handles below SB > > > changes incrementally: > > > - logical_flow > > > - port_binding (for regular VIF binding NOT on current chassis)
Re: [ovs-discuss] [OVN] ovn-controller Incremental Processing scale testing
On Fri, Jun 21, 2019, 11:47 AM Han Zhou wrote: > > > On Tue, Jun 11, 2019 at 9:16 AM Daniel Alvarez Sanchez < > dalva...@redhat.com> wrote: > > > > Thanks a lot Han for the answer! > > > > On Tue, Jun 11, 2019 at 5:57 PM Han Zhou wrote: > > > > > > > > > > > > > > > On Tue, Jun 11, 2019 at 5:12 AM Dumitru Ceara > wrote: > > > > > > > > On Tue, Jun 11, 2019 at 10:40 AM Daniel Alvarez Sanchez > > > > wrote: > > > > > > > > > > Hi Han, all, > > > > > > > > > > Lucas, Numan and I have been doing some 'scale' testing of > OpenStack > > > > > using OVN and wanted to present some results and issues that we've > > > > > found with the Incremental Processing feature in ovn-controller. > Below > > > > > is the scenario that we executed: > > > > > > > > > > * 7 baremetal nodes setup: 3 controllers (running > > > > > ovn-northd/ovsdb-servers in A/P with pacemaker) + 4 compute nodes. > OVS > > > > > 2.10. > > > > > * The test consists on: > > > > > - Create openstack network (OVN LS), subnet and router > > > > > - Attach subnet to the router and set gw to the external network > > > > > - Create an OpenStack port and apply a Security Group (ACLs to > allow > > > > > UDP, SSH and ICMP). > > > > > - Bind the port to one of the 4 compute nodes (randomly) by > > > > > attaching it to a network namespace. > > > > > - Wait for the port to be ACTIVE in Neutron ('up == True' in NB) > > > > > - Wait until the test can ping the port > > > > > * Running browbeat/rally with 16 simultaneous process to execute > the > > > > > test above 150 times. > > > > > * When all the 150 'fake VMs' are created, browbeat will delete all > > > > > the OpenStack/OVN resources. > > > > > > > > > > We first tried with OVS/OVN 2.10 and pulled some results which > showed > > > > > 100% success but ovn-controller is quite loaded (as expected) in > all > > > > > the nodes especially during the deletion phase: > > > > > > > > > > - Compute node: https://imgur.com/a/tzxfrIR > > > > > - Controller node (ovn-northd and ovsdb-servers): > https://imgur.com/a/8ffKKYF > > > > > > > > > > After conducting the tests above, we replaced ovn-controller in > all 7 > > > > > nodes by the one with the current master branch (actually from last > > > > > week). We also replaced ovn-northd and ovsdb-servers but the > > > > > ovs-vswitchd has been left untouched (still on 2.10). The expected > > > > > results were to get less ovn-controller CPU usage and also better > > > > > times due to the Incremental Processing feature introduced > recently. > > > > > However, the results don't look very good: > > > > > > > > > > - Compute node: https://imgur.com/a/wuq87F1 > > > > > - Controller node (ovn-northd and ovsdb-servers): > https://imgur.com/a/99kiyDp > > > > > > > > > > One thing that we can tell from the ovs-vswitchd CPU consumption is > > > > > that it's much less in the Incremental Processing (IP) case which > > > > > apparently doesn't make much sense. This led us to think that > perhaps > > > > > ovn-controller was not installing the necessary flows in the switch > > > > > and we confirmed this hypothesis by looking into the dataplane > > > > > results. Out of the 150 VMs, 10% of them were unreachable via ping > > > > > when using ovn-controller from master. > > > > > > > > > > @Han, others, do you have any ideas as of what could be happening > > > > > here? We'll be able to use this setup for a few more days so let me > > > > > know if you want us to pull some other data/traces, ... > > > > > > > > > > Some other interesting things: > > > > > On each of the compute nodes, (with an almost evenly distributed > > > > > number of logical ports bound to them), the max amount of logical > > > > > flows in br-int is ~90K (by the end of the test, right before > deleting > > > > > the resources). > > > > > > > > > > It looks like with the IP version, ovn-controller leaks some > memory: > > > > > https://imgur.com/a/trQrhWd > > > > > While with OVS 2.10, it remains pretty flat during the test: > > > > > https://imgur.com/a/KCkIT4O > > > > > > > > Hi Daniel, Han, > > > > > > > > I just sent a small patch for the ovn-controller memory leak: > > > > https://patchwork.ozlabs.org/patch/1113758/ > > > > > > > > At least on my setup this is what valgrind was pointing at. > > > > > > > > Cheers, > > > > Dumitru > > > > > > > > > > > > > > Looking forward to hearing back :) > > > > > Daniel > > > > > > > > > > PS. Sorry for my previous email, I sent it by mistake without the > subject > > > > > ___ > > > > > discuss mailing list > > > > > disc...@openvswitch.org > > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss > > > > > > Thanks Daniel for the testing and reporting, and thanks Dumitru for > fixing the memory leak. > > > > > > Currently ovn-controller incremental processing only handles below SB > changes incrementally: > > > - logical_flow > > > - port_binding (for regular VIF binding NOT on current chassis)
Re: [ovs-discuss] [OVN] ovn-controller Incremental Processing scale testing
On Tue, Jun 11, 2019 at 9:16 AM Daniel Alvarez Sanchez wrote: > > Thanks a lot Han for the answer! > > On Tue, Jun 11, 2019 at 5:57 PM Han Zhou wrote: > > > > > > > > > > On Tue, Jun 11, 2019 at 5:12 AM Dumitru Ceara wrote: > > > > > > On Tue, Jun 11, 2019 at 10:40 AM Daniel Alvarez Sanchez > > > wrote: > > > > > > > > Hi Han, all, > > > > > > > > Lucas, Numan and I have been doing some 'scale' testing of OpenStack > > > > using OVN and wanted to present some results and issues that we've > > > > found with the Incremental Processing feature in ovn-controller. Below > > > > is the scenario that we executed: > > > > > > > > * 7 baremetal nodes setup: 3 controllers (running > > > > ovn-northd/ovsdb-servers in A/P with pacemaker) + 4 compute nodes. OVS > > > > 2.10. > > > > * The test consists on: > > > > - Create openstack network (OVN LS), subnet and router > > > > - Attach subnet to the router and set gw to the external network > > > > - Create an OpenStack port and apply a Security Group (ACLs to allow > > > > UDP, SSH and ICMP). > > > > - Bind the port to one of the 4 compute nodes (randomly) by > > > > attaching it to a network namespace. > > > > - Wait for the port to be ACTIVE in Neutron ('up == True' in NB) > > > > - Wait until the test can ping the port > > > > * Running browbeat/rally with 16 simultaneous process to execute the > > > > test above 150 times. > > > > * When all the 150 'fake VMs' are created, browbeat will delete all > > > > the OpenStack/OVN resources. > > > > > > > > We first tried with OVS/OVN 2.10 and pulled some results which showed > > > > 100% success but ovn-controller is quite loaded (as expected) in all > > > > the nodes especially during the deletion phase: > > > > > > > > - Compute node: https://imgur.com/a/tzxfrIR > > > > - Controller node (ovn-northd and ovsdb-servers): https://imgur.com/a/8ffKKYF > > > > > > > > After conducting the tests above, we replaced ovn-controller in all 7 > > > > nodes by the one with the current master branch (actually from last > > > > week). We also replaced ovn-northd and ovsdb-servers but the > > > > ovs-vswitchd has been left untouched (still on 2.10). The expected > > > > results were to get less ovn-controller CPU usage and also better > > > > times due to the Incremental Processing feature introduced recently. > > > > However, the results don't look very good: > > > > > > > > - Compute node: https://imgur.com/a/wuq87F1 > > > > - Controller node (ovn-northd and ovsdb-servers): https://imgur.com/a/99kiyDp > > > > > > > > One thing that we can tell from the ovs-vswitchd CPU consumption is > > > > that it's much less in the Incremental Processing (IP) case which > > > > apparently doesn't make much sense. This led us to think that perhaps > > > > ovn-controller was not installing the necessary flows in the switch > > > > and we confirmed this hypothesis by looking into the dataplane > > > > results. Out of the 150 VMs, 10% of them were unreachable via ping > > > > when using ovn-controller from master. > > > > > > > > @Han, others, do you have any ideas as of what could be happening > > > > here? We'll be able to use this setup for a few more days so let me > > > > know if you want us to pull some other data/traces, ... > > > > > > > > Some other interesting things: > > > > On each of the compute nodes, (with an almost evenly distributed > > > > number of logical ports bound to them), the max amount of logical > > > > flows in br-int is ~90K (by the end of the test, right before deleting > > > > the resources). > > > > > > > > It looks like with the IP version, ovn-controller leaks some memory: > > > > https://imgur.com/a/trQrhWd > > > > While with OVS 2.10, it remains pretty flat during the test: > > > > https://imgur.com/a/KCkIT4O > > > > > > Hi Daniel, Han, > > > > > > I just sent a small patch for the ovn-controller memory leak: > > > https://patchwork.ozlabs.org/patch/1113758/ > > > > > > At least on my setup this is what valgrind was pointing at. > > > > > > Cheers, > > > Dumitru > > > > > > > > > > > Looking forward to hearing back :) > > > > Daniel > > > > > > > > PS. Sorry for my previous email, I sent it by mistake without the subject > > > > ___ > > > > discuss mailing list > > > > disc...@openvswitch.org > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss > > > > Thanks Daniel for the testing and reporting, and thanks Dumitru for fixing the memory leak. > > > > Currently ovn-controller incremental processing only handles below SB changes incrementally: > > - logical_flow > > - port_binding (for regular VIF binding NOT on current chassis) > > - mc_group > > - address_set > > - port_group > > - mac_binding > > > > So, in test scenario you described, since each iteration creates network (SB datapath changes) and router ports (port_binding changes for non VIF), the incremental processing would not help much, because most steps in your test should trigger
Re: [ovs-discuss] [OVN] ovn-controller Incremental Processing scale testing
Thanks a lot Han for the answer! On Tue, Jun 11, 2019 at 5:57 PM Han Zhou wrote: > > > > > On Tue, Jun 11, 2019 at 5:12 AM Dumitru Ceara wrote: > > > > On Tue, Jun 11, 2019 at 10:40 AM Daniel Alvarez Sanchez > > wrote: > > > > > > Hi Han, all, > > > > > > Lucas, Numan and I have been doing some 'scale' testing of OpenStack > > > using OVN and wanted to present some results and issues that we've > > > found with the Incremental Processing feature in ovn-controller. Below > > > is the scenario that we executed: > > > > > > * 7 baremetal nodes setup: 3 controllers (running > > > ovn-northd/ovsdb-servers in A/P with pacemaker) + 4 compute nodes. OVS > > > 2.10. > > > * The test consists on: > > > - Create openstack network (OVN LS), subnet and router > > > - Attach subnet to the router and set gw to the external network > > > - Create an OpenStack port and apply a Security Group (ACLs to allow > > > UDP, SSH and ICMP). > > > - Bind the port to one of the 4 compute nodes (randomly) by > > > attaching it to a network namespace. > > > - Wait for the port to be ACTIVE in Neutron ('up == True' in NB) > > > - Wait until the test can ping the port > > > * Running browbeat/rally with 16 simultaneous process to execute the > > > test above 150 times. > > > * When all the 150 'fake VMs' are created, browbeat will delete all > > > the OpenStack/OVN resources. > > > > > > We first tried with OVS/OVN 2.10 and pulled some results which showed > > > 100% success but ovn-controller is quite loaded (as expected) in all > > > the nodes especially during the deletion phase: > > > > > > - Compute node: https://imgur.com/a/tzxfrIR > > > - Controller node (ovn-northd and ovsdb-servers): > > > https://imgur.com/a/8ffKKYF > > > > > > After conducting the tests above, we replaced ovn-controller in all 7 > > > nodes by the one with the current master branch (actually from last > > > week). We also replaced ovn-northd and ovsdb-servers but the > > > ovs-vswitchd has been left untouched (still on 2.10). The expected > > > results were to get less ovn-controller CPU usage and also better > > > times due to the Incremental Processing feature introduced recently. > > > However, the results don't look very good: > > > > > > - Compute node: https://imgur.com/a/wuq87F1 > > > - Controller node (ovn-northd and ovsdb-servers): > > > https://imgur.com/a/99kiyDp > > > > > > One thing that we can tell from the ovs-vswitchd CPU consumption is > > > that it's much less in the Incremental Processing (IP) case which > > > apparently doesn't make much sense. This led us to think that perhaps > > > ovn-controller was not installing the necessary flows in the switch > > > and we confirmed this hypothesis by looking into the dataplane > > > results. Out of the 150 VMs, 10% of them were unreachable via ping > > > when using ovn-controller from master. > > > > > > @Han, others, do you have any ideas as of what could be happening > > > here? We'll be able to use this setup for a few more days so let me > > > know if you want us to pull some other data/traces, ... > > > > > > Some other interesting things: > > > On each of the compute nodes, (with an almost evenly distributed > > > number of logical ports bound to them), the max amount of logical > > > flows in br-int is ~90K (by the end of the test, right before deleting > > > the resources). > > > > > > It looks like with the IP version, ovn-controller leaks some memory: > > > https://imgur.com/a/trQrhWd > > > While with OVS 2.10, it remains pretty flat during the test: > > > https://imgur.com/a/KCkIT4O > > > > Hi Daniel, Han, > > > > I just sent a small patch for the ovn-controller memory leak: > > https://patchwork.ozlabs.org/patch/1113758/ > > > > At least on my setup this is what valgrind was pointing at. > > > > Cheers, > > Dumitru > > > > > > > > Looking forward to hearing back :) > > > Daniel > > > > > > PS. Sorry for my previous email, I sent it by mistake without the subject > > > ___ > > > discuss mailing list > > > disc...@openvswitch.org > > > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss > > Thanks Daniel for the testing and reporting, and thanks Dumitru for fixing > the memory leak. > > Currently ovn-controller incremental processing only handles below SB changes > incrementally: > - logical_flow > - port_binding (for regular VIF binding NOT on current chassis) > - mc_group > - address_set > - port_group > - mac_binding > > So, in test scenario you described, since each iteration creates network (SB > datapath changes) and router ports (port_binding changes for non VIF), the > incremental processing would not help much, because most steps in your test > should trigger recompute. It would help if you create more Fake VMs in each > iteration, e.g. create 10 VMs or more on each LS. Secondly, when VIF > port-binding happens on current chassis, the ovn-controller will still do > re-compute, and because you have only 4
Re: [ovs-discuss] [OVN] ovn-controller Incremental Processing scale testing
On Tue, Jun 11, 2019 at 5:12 AM Dumitru Ceara wrote: > > On Tue, Jun 11, 2019 at 10:40 AM Daniel Alvarez Sanchez > wrote: > > > > Hi Han, all, > > > > Lucas, Numan and I have been doing some 'scale' testing of OpenStack > > using OVN and wanted to present some results and issues that we've > > found with the Incremental Processing feature in ovn-controller. Below > > is the scenario that we executed: > > > > * 7 baremetal nodes setup: 3 controllers (running > > ovn-northd/ovsdb-servers in A/P with pacemaker) + 4 compute nodes. OVS > > 2.10. > > * The test consists on: > > - Create openstack network (OVN LS), subnet and router > > - Attach subnet to the router and set gw to the external network > > - Create an OpenStack port and apply a Security Group (ACLs to allow > > UDP, SSH and ICMP). > > - Bind the port to one of the 4 compute nodes (randomly) by > > attaching it to a network namespace. > > - Wait for the port to be ACTIVE in Neutron ('up == True' in NB) > > - Wait until the test can ping the port > > * Running browbeat/rally with 16 simultaneous process to execute the > > test above 150 times. > > * When all the 150 'fake VMs' are created, browbeat will delete all > > the OpenStack/OVN resources. > > > > We first tried with OVS/OVN 2.10 and pulled some results which showed > > 100% success but ovn-controller is quite loaded (as expected) in all > > the nodes especially during the deletion phase: > > > > - Compute node: https://imgur.com/a/tzxfrIR > > - Controller node (ovn-northd and ovsdb-servers): https://imgur.com/a/8ffKKYF > > > > After conducting the tests above, we replaced ovn-controller in all 7 > > nodes by the one with the current master branch (actually from last > > week). We also replaced ovn-northd and ovsdb-servers but the > > ovs-vswitchd has been left untouched (still on 2.10). The expected > > results were to get less ovn-controller CPU usage and also better > > times due to the Incremental Processing feature introduced recently. > > However, the results don't look very good: > > > > - Compute node: https://imgur.com/a/wuq87F1 > > - Controller node (ovn-northd and ovsdb-servers): https://imgur.com/a/99kiyDp > > > > One thing that we can tell from the ovs-vswitchd CPU consumption is > > that it's much less in the Incremental Processing (IP) case which > > apparently doesn't make much sense. This led us to think that perhaps > > ovn-controller was not installing the necessary flows in the switch > > and we confirmed this hypothesis by looking into the dataplane > > results. Out of the 150 VMs, 10% of them were unreachable via ping > > when using ovn-controller from master. > > > > @Han, others, do you have any ideas as of what could be happening > > here? We'll be able to use this setup for a few more days so let me > > know if you want us to pull some other data/traces, ... > > > > Some other interesting things: > > On each of the compute nodes, (with an almost evenly distributed > > number of logical ports bound to them), the max amount of logical > > flows in br-int is ~90K (by the end of the test, right before deleting > > the resources). > > > > It looks like with the IP version, ovn-controller leaks some memory: > > https://imgur.com/a/trQrhWd > > While with OVS 2.10, it remains pretty flat during the test: > > https://imgur.com/a/KCkIT4O > > Hi Daniel, Han, > > I just sent a small patch for the ovn-controller memory leak: > https://patchwork.ozlabs.org/patch/1113758/ > > At least on my setup this is what valgrind was pointing at. > > Cheers, > Dumitru > > > > > Looking forward to hearing back :) > > Daniel > > > > PS. Sorry for my previous email, I sent it by mistake without the subject > > ___ > > discuss mailing list > > disc...@openvswitch.org > > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss Thanks Daniel for the testing and reporting, and thanks Dumitru for fixing the memory leak. Currently ovn-controller incremental processing only handles below SB changes incrementally: - logical_flow - port_binding (for regular VIF binding NOT on current chassis) - mc_group - address_set - port_group - mac_binding So, in test scenario you described, since each iteration creates network (SB datapath changes) and router ports (port_binding changes for non VIF), the incremental processing would not help much, because most steps in your test should trigger recompute. It would help if you create more Fake VMs in each iteration, e.g. create 10 VMs or more on each LS. Secondly, when VIF port-binding happens on current chassis, the ovn-controller will still do re-compute, and because you have only 4 compute nodes, so 1/4 of the compute node will still recompute even when binding a regular VIF port. When you have more compute nodes you would see incremental processing more effective. However, what really worries me is the 10% VM unreachable. I have one confusion here on the test steps. The last step you described was: -
Re: [ovs-discuss] [OVN] ovn-controller Incremental Processing scale testing
On Tue, Jun 11, 2019 at 10:40 AM Daniel Alvarez Sanchez wrote: > > Hi Han, all, > > Lucas, Numan and I have been doing some 'scale' testing of OpenStack > using OVN and wanted to present some results and issues that we've > found with the Incremental Processing feature in ovn-controller. Below > is the scenario that we executed: > > * 7 baremetal nodes setup: 3 controllers (running > ovn-northd/ovsdb-servers in A/P with pacemaker) + 4 compute nodes. OVS > 2.10. > * The test consists on: > - Create openstack network (OVN LS), subnet and router > - Attach subnet to the router and set gw to the external network > - Create an OpenStack port and apply a Security Group (ACLs to allow > UDP, SSH and ICMP). > - Bind the port to one of the 4 compute nodes (randomly) by > attaching it to a network namespace. > - Wait for the port to be ACTIVE in Neutron ('up == True' in NB) > - Wait until the test can ping the port > * Running browbeat/rally with 16 simultaneous process to execute the > test above 150 times. > * When all the 150 'fake VMs' are created, browbeat will delete all > the OpenStack/OVN resources. > > We first tried with OVS/OVN 2.10 and pulled some results which showed > 100% success but ovn-controller is quite loaded (as expected) in all > the nodes especially during the deletion phase: > > - Compute node: https://imgur.com/a/tzxfrIR > - Controller node (ovn-northd and ovsdb-servers): https://imgur.com/a/8ffKKYF > > After conducting the tests above, we replaced ovn-controller in all 7 > nodes by the one with the current master branch (actually from last > week). We also replaced ovn-northd and ovsdb-servers but the > ovs-vswitchd has been left untouched (still on 2.10). The expected > results were to get less ovn-controller CPU usage and also better > times due to the Incremental Processing feature introduced recently. > However, the results don't look very good: > > - Compute node: https://imgur.com/a/wuq87F1 > - Controller node (ovn-northd and ovsdb-servers): https://imgur.com/a/99kiyDp > > One thing that we can tell from the ovs-vswitchd CPU consumption is > that it's much less in the Incremental Processing (IP) case which > apparently doesn't make much sense. This led us to think that perhaps > ovn-controller was not installing the necessary flows in the switch > and we confirmed this hypothesis by looking into the dataplane > results. Out of the 150 VMs, 10% of them were unreachable via ping > when using ovn-controller from master. > > @Han, others, do you have any ideas as of what could be happening > here? We'll be able to use this setup for a few more days so let me > know if you want us to pull some other data/traces, ... > > Some other interesting things: > On each of the compute nodes, (with an almost evenly distributed > number of logical ports bound to them), the max amount of logical > flows in br-int is ~90K (by the end of the test, right before deleting > the resources). > > It looks like with the IP version, ovn-controller leaks some memory: > https://imgur.com/a/trQrhWd > While with OVS 2.10, it remains pretty flat during the test: > https://imgur.com/a/KCkIT4O Hi Daniel, Han, I just sent a small patch for the ovn-controller memory leak: https://patchwork.ozlabs.org/patch/1113758/ At least on my setup this is what valgrind was pointing at. Cheers, Dumitru > > Looking forward to hearing back :) > Daniel > > PS. Sorry for my previous email, I sent it by mistake without the subject > ___ > discuss mailing list > disc...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss