Re: [ovs-discuss] [OVN] ovn-controller Incremental Processing scale testing

2019-07-24 Thread Numan Siddique
On Wed, Jul 24, 2019 at 8:38 AM Han Zhou  wrote:

>
>
> On Tue, Jul 23, 2019 at 7:41 AM Numan Siddique 
> wrote:
> >
> >
> >
> > On Mon, Jul 22, 2019 at 12:35 PM Daniel Alvarez Sanchez <
> dalva...@redhat.com> wrote:
> >>
> >> Neat! Thanks folks :)
> >> I'll try to get an OSP setup where we can patch this and re-run the
> >> same tests than previous time to confirm but looks promising.
> >>
> >> On Fri, Jul 19, 2019 at 11:12 PM Han Zhou  wrote:
> >> >
> >> >
> >> >
> >> > On Fri, Jul 19, 2019 at 12:37 PM Numan Siddique 
> wrote:
> >> >>
> >> >>
> >> >>
> >> >> On Fri, Jul 19, 2019 at 6:19 PM Numan Siddique 
> wrote:
> >> >>>
> >> >>>
> >> >>>
> >> >>> On Fri, Jul 19, 2019 at 6:28 AM Han Zhou  wrote:
> >> 
> >> 
> >> 
> >>  On Tue, Jul 9, 2019 at 12:13 AM Numan Siddique <
> nusid...@redhat.com> wrote:
> >>  >
> >>  >
> >>  >
> >>  > On Tue, Jul 9, 2019 at 12:25 PM Daniel Alvarez Sanchez <
> dalva...@redhat.com> wrote:
> >>  >>
> >>  >> Thanks Numan for running these tests outside OpenStack!
> >>  >>
> >>  >> On Tue, Jul 9, 2019 at 7:50 AM Numan Siddique <
> nusid...@redhat.com> wrote:
> >>  >> >
> >>  >> >
> >>  >> >
> >>  >> > On Tue, Jul 9, 2019 at 11:05 AM Han Zhou 
> wrote:
> >>  >> >>
> >>  >> >>
> >>  >> >>
> >>  >> >> On Fri, Jun 21, 2019 at 12:31 AM Han Zhou 
> wrote:
> >>  >> >> >
> >>  >> >> >
> >>  >> >> >
> >>  >> >> > On Thu, Jun 20, 2019 at 11:42 PM Numan Siddique <
> nusid...@redhat.com> wrote:
> >>  >> >> > >
> >>  >> >> > >
> >>  >> >> > >
> >>  >> >> > > On Fri, Jun 21, 2019, 11:47 AM Han Zhou <
> zhou...@gmail.com> wrote:
> >>  >> >> > >>
> >>  >> >> > >>
> >>  >> >> > >>
> >>  >> >> > >> On Tue, Jun 11, 2019 at 9:16 AM Daniel Alvarez Sanchez <
> dalva...@redhat.com> wrote:
> >>  >> >> > >> >
> >>  >> >> > >> > Thanks a lot Han for the answer!
> >>  >> >> > >> >
> >>  >> >> > >> > On Tue, Jun 11, 2019 at 5:57 PM Han Zhou <
> zhou...@gmail.com> wrote:
> >>  >> >> > >> > >
> >>  >> >> > >> > >
> >>  >> >> > >> > >
> >>  >> >> > >> > >
> >>  >> >> > >> > > On Tue, Jun 11, 2019 at 5:12 AM Dumitru Ceara <
> dce...@redhat.com> wrote:
> >>  >> >> > >> > > >
> >>  >> >> > >> > > > On Tue, Jun 11, 2019 at 10:40 AM Daniel Alvarez
> Sanchez
> >>  >> >> > >> > > >  wrote:
> >>  >> >> > >> > > > >
> >>  >> >> > >> > > > > Hi Han, all,
> >>  >> >> > >> > > > >
> >>  >> >> > >> > > > > Lucas, Numan and I have been doing some 'scale'
> testing of OpenStack
> >>  >> >> > >> > > > > using OVN and wanted to present some results
> and issues that we've
> >>  >> >> > >> > > > > found with the Incremental Processing feature
> in ovn-controller. Below
> >>  >> >> > >> > > > > is the scenario that we executed:
> >>  >> >> > >> > > > >
> >>  >> >> > >> > > > > * 7 baremetal nodes setup: 3 controllers
> (running
> >>  >> >> > >> > > > > ovn-northd/ovsdb-servers in A/P with pacemaker)
> + 4 compute nodes. OVS
> >>  >> >> > >> > > > > 2.10.
> >>  >> >> > >> > > > > * The test consists on:
> >>  >> >> > >> > > > >   - Create openstack network (OVN LS), subnet
> and router
> >>  >> >> > >> > > > >   - Attach subnet to the router and set gw to
> the external network
> >>  >> >> > >> > > > >   - Create an OpenStack port and apply a
> Security Group (ACLs to allow
> >>  >> >> > >> > > > > UDP, SSH and ICMP).
> >>  >> >> > >> > > > >   - Bind the port to one of the 4 compute nodes
> (randomly) by
> >>  >> >> > >> > > > > attaching it to a network namespace.
> >>  >> >> > >> > > > >   - Wait for the port to be ACTIVE in Neutron
> ('up == True' in NB)
> >>  >> >> > >> > > > >   - Wait until the test can ping the port
> >>  >> >> > >> > > > > * Running browbeat/rally with 16 simultaneous
> process to execute the
> >>  >> >> > >> > > > > test above 150 times.
> >>  >> >> > >> > > > > * When all the 150 'fake VMs' are created,
> browbeat will delete all
> >>  >> >> > >> > > > > the OpenStack/OVN resources.
> >>  >> >> > >> > > > >
> >>  >> >> > >> > > > > We first tried with OVS/OVN 2.10 and pulled
> some results which showed
> >>  >> >> > >> > > > > 100% success but ovn-controller is quite loaded
> (as expected) in all
> >>  >> >> > >> > > > > the nodes especially during the deletion phase:
> >>  >> >> > >> > > > >
> >>  >> >> > >> > > > > - Compute node: https://imgur.com/a/tzxfrIR
> >>  >> >> > >> > > > > - Controller node (ovn-northd and
> ovsdb-servers): https://imgur.com/a/8ffKKYF
> >>  >> >> > >> > > > >
> >>  >> >> > >> > > > > After conducting the tests above, we replaced
> ovn-controller in all 7
> >>  >> >> > >> > > > > nodes by the one with the current master branch
> (actually from last
> >>  >> >> > >> > > > > week). We also replaced ovn-northd and
> ovsdb-servers but the
> >>  >> >> > >> > > > > 

Re: [ovs-discuss] [OVN] ovn-controller Incremental Processing scale testing

2019-07-23 Thread Han Zhou
On Tue, Jul 23, 2019 at 7:41 AM Numan Siddique  wrote:
>
>
>
> On Mon, Jul 22, 2019 at 12:35 PM Daniel Alvarez Sanchez <
dalva...@redhat.com> wrote:
>>
>> Neat! Thanks folks :)
>> I'll try to get an OSP setup where we can patch this and re-run the
>> same tests than previous time to confirm but looks promising.
>>
>> On Fri, Jul 19, 2019 at 11:12 PM Han Zhou  wrote:
>> >
>> >
>> >
>> > On Fri, Jul 19, 2019 at 12:37 PM Numan Siddique 
wrote:
>> >>
>> >>
>> >>
>> >> On Fri, Jul 19, 2019 at 6:19 PM Numan Siddique 
wrote:
>> >>>
>> >>>
>> >>>
>> >>> On Fri, Jul 19, 2019 at 6:28 AM Han Zhou  wrote:
>> 
>> 
>> 
>>  On Tue, Jul 9, 2019 at 12:13 AM Numan Siddique 
wrote:
>>  >
>>  >
>>  >
>>  > On Tue, Jul 9, 2019 at 12:25 PM Daniel Alvarez Sanchez <
dalva...@redhat.com> wrote:
>>  >>
>>  >> Thanks Numan for running these tests outside OpenStack!
>>  >>
>>  >> On Tue, Jul 9, 2019 at 7:50 AM Numan Siddique <
nusid...@redhat.com> wrote:
>>  >> >
>>  >> >
>>  >> >
>>  >> > On Tue, Jul 9, 2019 at 11:05 AM Han Zhou 
wrote:
>>  >> >>
>>  >> >>
>>  >> >>
>>  >> >> On Fri, Jun 21, 2019 at 12:31 AM Han Zhou 
wrote:
>>  >> >> >
>>  >> >> >
>>  >> >> >
>>  >> >> > On Thu, Jun 20, 2019 at 11:42 PM Numan Siddique <
nusid...@redhat.com> wrote:
>>  >> >> > >
>>  >> >> > >
>>  >> >> > >
>>  >> >> > > On Fri, Jun 21, 2019, 11:47 AM Han Zhou 
wrote:
>>  >> >> > >>
>>  >> >> > >>
>>  >> >> > >>
>>  >> >> > >> On Tue, Jun 11, 2019 at 9:16 AM Daniel Alvarez Sanchez <
dalva...@redhat.com> wrote:
>>  >> >> > >> >
>>  >> >> > >> > Thanks a lot Han for the answer!
>>  >> >> > >> >
>>  >> >> > >> > On Tue, Jun 11, 2019 at 5:57 PM Han Zhou <
zhou...@gmail.com> wrote:
>>  >> >> > >> > >
>>  >> >> > >> > >
>>  >> >> > >> > >
>>  >> >> > >> > >
>>  >> >> > >> > > On Tue, Jun 11, 2019 at 5:12 AM Dumitru Ceara <
dce...@redhat.com> wrote:
>>  >> >> > >> > > >
>>  >> >> > >> > > > On Tue, Jun 11, 2019 at 10:40 AM Daniel Alvarez
Sanchez
>>  >> >> > >> > > >  wrote:
>>  >> >> > >> > > > >
>>  >> >> > >> > > > > Hi Han, all,
>>  >> >> > >> > > > >
>>  >> >> > >> > > > > Lucas, Numan and I have been doing some 'scale'
testing of OpenStack
>>  >> >> > >> > > > > using OVN and wanted to present some results and
issues that we've
>>  >> >> > >> > > > > found with the Incremental Processing feature in
ovn-controller. Below
>>  >> >> > >> > > > > is the scenario that we executed:
>>  >> >> > >> > > > >
>>  >> >> > >> > > > > * 7 baremetal nodes setup: 3 controllers (running
>>  >> >> > >> > > > > ovn-northd/ovsdb-servers in A/P with pacemaker)
+ 4 compute nodes. OVS
>>  >> >> > >> > > > > 2.10.
>>  >> >> > >> > > > > * The test consists on:
>>  >> >> > >> > > > >   - Create openstack network (OVN LS), subnet
and router
>>  >> >> > >> > > > >   - Attach subnet to the router and set gw to
the external network
>>  >> >> > >> > > > >   - Create an OpenStack port and apply a
Security Group (ACLs to allow
>>  >> >> > >> > > > > UDP, SSH and ICMP).
>>  >> >> > >> > > > >   - Bind the port to one of the 4 compute nodes
(randomly) by
>>  >> >> > >> > > > > attaching it to a network namespace.
>>  >> >> > >> > > > >   - Wait for the port to be ACTIVE in Neutron
('up == True' in NB)
>>  >> >> > >> > > > >   - Wait until the test can ping the port
>>  >> >> > >> > > > > * Running browbeat/rally with 16 simultaneous
process to execute the
>>  >> >> > >> > > > > test above 150 times.
>>  >> >> > >> > > > > * When all the 150 'fake VMs' are created,
browbeat will delete all
>>  >> >> > >> > > > > the OpenStack/OVN resources.
>>  >> >> > >> > > > >
>>  >> >> > >> > > > > We first tried with OVS/OVN 2.10 and pulled some
results which showed
>>  >> >> > >> > > > > 100% success but ovn-controller is quite loaded
(as expected) in all
>>  >> >> > >> > > > > the nodes especially during the deletion phase:
>>  >> >> > >> > > > >
>>  >> >> > >> > > > > - Compute node: https://imgur.com/a/tzxfrIR
>>  >> >> > >> > > > > - Controller node (ovn-northd and
ovsdb-servers): https://imgur.com/a/8ffKKYF
>>  >> >> > >> > > > >
>>  >> >> > >> > > > > After conducting the tests above, we replaced
ovn-controller in all 7
>>  >> >> > >> > > > > nodes by the one with the current master branch
(actually from last
>>  >> >> > >> > > > > week). We also replaced ovn-northd and
ovsdb-servers but the
>>  >> >> > >> > > > > ovs-vswitchd has been left untouched (still on
2.10). The expected
>>  >> >> > >> > > > > results were to get less ovn-controller CPU
usage and also better
>>  >> >> > >> > > > > times due to the Incremental Processing feature
introduced recently.
>>  >> >> > >> > > > > However, the results don't look very good:
>>  >> >> > >> > > > >
>>  >> >> > 

Re: [ovs-discuss] [OVN] ovn-controller Incremental Processing scale testing

2019-07-23 Thread Numan Siddique
On Mon, Jul 22, 2019 at 12:35 PM Daniel Alvarez Sanchez 
wrote:

> Neat! Thanks folks :)
> I'll try to get an OSP setup where we can patch this and re-run the
> same tests than previous time to confirm but looks promising.
>
> On Fri, Jul 19, 2019 at 11:12 PM Han Zhou  wrote:
> >
> >
> >
> > On Fri, Jul 19, 2019 at 12:37 PM Numan Siddique 
> wrote:
> >>
> >>
> >>
> >> On Fri, Jul 19, 2019 at 6:19 PM Numan Siddique 
> wrote:
> >>>
> >>>
> >>>
> >>> On Fri, Jul 19, 2019 at 6:28 AM Han Zhou  wrote:
> 
> 
> 
>  On Tue, Jul 9, 2019 at 12:13 AM Numan Siddique 
> wrote:
>  >
>  >
>  >
>  > On Tue, Jul 9, 2019 at 12:25 PM Daniel Alvarez Sanchez <
> dalva...@redhat.com> wrote:
>  >>
>  >> Thanks Numan for running these tests outside OpenStack!
>  >>
>  >> On Tue, Jul 9, 2019 at 7:50 AM Numan Siddique 
> wrote:
>  >> >
>  >> >
>  >> >
>  >> > On Tue, Jul 9, 2019 at 11:05 AM Han Zhou 
> wrote:
>  >> >>
>  >> >>
>  >> >>
>  >> >> On Fri, Jun 21, 2019 at 12:31 AM Han Zhou 
> wrote:
>  >> >> >
>  >> >> >
>  >> >> >
>  >> >> > On Thu, Jun 20, 2019 at 11:42 PM Numan Siddique <
> nusid...@redhat.com> wrote:
>  >> >> > >
>  >> >> > >
>  >> >> > >
>  >> >> > > On Fri, Jun 21, 2019, 11:47 AM Han Zhou 
> wrote:
>  >> >> > >>
>  >> >> > >>
>  >> >> > >>
>  >> >> > >> On Tue, Jun 11, 2019 at 9:16 AM Daniel Alvarez Sanchez <
> dalva...@redhat.com> wrote:
>  >> >> > >> >
>  >> >> > >> > Thanks a lot Han for the answer!
>  >> >> > >> >
>  >> >> > >> > On Tue, Jun 11, 2019 at 5:57 PM Han Zhou <
> zhou...@gmail.com> wrote:
>  >> >> > >> > >
>  >> >> > >> > >
>  >> >> > >> > >
>  >> >> > >> > >
>  >> >> > >> > > On Tue, Jun 11, 2019 at 5:12 AM Dumitru Ceara <
> dce...@redhat.com> wrote:
>  >> >> > >> > > >
>  >> >> > >> > > > On Tue, Jun 11, 2019 at 10:40 AM Daniel Alvarez
> Sanchez
>  >> >> > >> > > >  wrote:
>  >> >> > >> > > > >
>  >> >> > >> > > > > Hi Han, all,
>  >> >> > >> > > > >
>  >> >> > >> > > > > Lucas, Numan and I have been doing some 'scale'
> testing of OpenStack
>  >> >> > >> > > > > using OVN and wanted to present some results and
> issues that we've
>  >> >> > >> > > > > found with the Incremental Processing feature in
> ovn-controller. Below
>  >> >> > >> > > > > is the scenario that we executed:
>  >> >> > >> > > > >
>  >> >> > >> > > > > * 7 baremetal nodes setup: 3 controllers (running
>  >> >> > >> > > > > ovn-northd/ovsdb-servers in A/P with pacemaker) +
> 4 compute nodes. OVS
>  >> >> > >> > > > > 2.10.
>  >> >> > >> > > > > * The test consists on:
>  >> >> > >> > > > >   - Create openstack network (OVN LS), subnet and
> router
>  >> >> > >> > > > >   - Attach subnet to the router and set gw to the
> external network
>  >> >> > >> > > > >   - Create an OpenStack port and apply a Security
> Group (ACLs to allow
>  >> >> > >> > > > > UDP, SSH and ICMP).
>  >> >> > >> > > > >   - Bind the port to one of the 4 compute nodes
> (randomly) by
>  >> >> > >> > > > > attaching it to a network namespace.
>  >> >> > >> > > > >   - Wait for the port to be ACTIVE in Neutron ('up
> == True' in NB)
>  >> >> > >> > > > >   - Wait until the test can ping the port
>  >> >> > >> > > > > * Running browbeat/rally with 16 simultaneous
> process to execute the
>  >> >> > >> > > > > test above 150 times.
>  >> >> > >> > > > > * When all the 150 'fake VMs' are created,
> browbeat will delete all
>  >> >> > >> > > > > the OpenStack/OVN resources.
>  >> >> > >> > > > >
>  >> >> > >> > > > > We first tried with OVS/OVN 2.10 and pulled some
> results which showed
>  >> >> > >> > > > > 100% success but ovn-controller is quite loaded
> (as expected) in all
>  >> >> > >> > > > > the nodes especially during the deletion phase:
>  >> >> > >> > > > >
>  >> >> > >> > > > > - Compute node: https://imgur.com/a/tzxfrIR
>  >> >> > >> > > > > - Controller node (ovn-northd and ovsdb-servers):
> https://imgur.com/a/8ffKKYF
>  >> >> > >> > > > >
>  >> >> > >> > > > > After conducting the tests above, we replaced
> ovn-controller in all 7
>  >> >> > >> > > > > nodes by the one with the current master branch
> (actually from last
>  >> >> > >> > > > > week). We also replaced ovn-northd and
> ovsdb-servers but the
>  >> >> > >> > > > > ovs-vswitchd has been left untouched (still on
> 2.10). The expected
>  >> >> > >> > > > > results were to get less ovn-controller CPU usage
> and also better
>  >> >> > >> > > > > times due to the Incremental Processing feature
> introduced recently.
>  >> >> > >> > > > > However, the results don't look very good:
>  >> >> > >> > > > >
>  >> >> > >> > > > > - Compute node: https://imgur.com/a/wuq87F1
>  >> >> > >> > > > > - Controller node (ovn-northd and ovsdb-servers):
> 

Re: [ovs-discuss] [OVN] ovn-controller Incremental Processing scale testing

2019-07-22 Thread Daniel Alvarez Sanchez
Neat! Thanks folks :)
I'll try to get an OSP setup where we can patch this and re-run the
same tests than previous time to confirm but looks promising.

On Fri, Jul 19, 2019 at 11:12 PM Han Zhou  wrote:
>
>
>
> On Fri, Jul 19, 2019 at 12:37 PM Numan Siddique  wrote:
>>
>>
>>
>> On Fri, Jul 19, 2019 at 6:19 PM Numan Siddique  wrote:
>>>
>>>
>>>
>>> On Fri, Jul 19, 2019 at 6:28 AM Han Zhou  wrote:



 On Tue, Jul 9, 2019 at 12:13 AM Numan Siddique  wrote:
 >
 >
 >
 > On Tue, Jul 9, 2019 at 12:25 PM Daniel Alvarez Sanchez 
 >  wrote:
 >>
 >> Thanks Numan for running these tests outside OpenStack!
 >>
 >> On Tue, Jul 9, 2019 at 7:50 AM Numan Siddique  
 >> wrote:
 >> >
 >> >
 >> >
 >> > On Tue, Jul 9, 2019 at 11:05 AM Han Zhou  wrote:
 >> >>
 >> >>
 >> >>
 >> >> On Fri, Jun 21, 2019 at 12:31 AM Han Zhou  wrote:
 >> >> >
 >> >> >
 >> >> >
 >> >> > On Thu, Jun 20, 2019 at 11:42 PM Numan Siddique 
 >> >> >  wrote:
 >> >> > >
 >> >> > >
 >> >> > >
 >> >> > > On Fri, Jun 21, 2019, 11:47 AM Han Zhou  
 >> >> > > wrote:
 >> >> > >>
 >> >> > >>
 >> >> > >>
 >> >> > >> On Tue, Jun 11, 2019 at 9:16 AM Daniel Alvarez Sanchez 
 >> >> > >>  wrote:
 >> >> > >> >
 >> >> > >> > Thanks a lot Han for the answer!
 >> >> > >> >
 >> >> > >> > On Tue, Jun 11, 2019 at 5:57 PM Han Zhou  
 >> >> > >> > wrote:
 >> >> > >> > >
 >> >> > >> > >
 >> >> > >> > >
 >> >> > >> > >
 >> >> > >> > > On Tue, Jun 11, 2019 at 5:12 AM Dumitru Ceara 
 >> >> > >> > >  wrote:
 >> >> > >> > > >
 >> >> > >> > > > On Tue, Jun 11, 2019 at 10:40 AM Daniel Alvarez Sanchez
 >> >> > >> > > >  wrote:
 >> >> > >> > > > >
 >> >> > >> > > > > Hi Han, all,
 >> >> > >> > > > >
 >> >> > >> > > > > Lucas, Numan and I have been doing some 'scale' testing 
 >> >> > >> > > > > of OpenStack
 >> >> > >> > > > > using OVN and wanted to present some results and issues 
 >> >> > >> > > > > that we've
 >> >> > >> > > > > found with the Incremental Processing feature in 
 >> >> > >> > > > > ovn-controller. Below
 >> >> > >> > > > > is the scenario that we executed:
 >> >> > >> > > > >
 >> >> > >> > > > > * 7 baremetal nodes setup: 3 controllers (running
 >> >> > >> > > > > ovn-northd/ovsdb-servers in A/P with pacemaker) + 4 
 >> >> > >> > > > > compute nodes. OVS
 >> >> > >> > > > > 2.10.
 >> >> > >> > > > > * The test consists on:
 >> >> > >> > > > >   - Create openstack network (OVN LS), subnet and router
 >> >> > >> > > > >   - Attach subnet to the router and set gw to the 
 >> >> > >> > > > > external network
 >> >> > >> > > > >   - Create an OpenStack port and apply a Security Group 
 >> >> > >> > > > > (ACLs to allow
 >> >> > >> > > > > UDP, SSH and ICMP).
 >> >> > >> > > > >   - Bind the port to one of the 4 compute nodes 
 >> >> > >> > > > > (randomly) by
 >> >> > >> > > > > attaching it to a network namespace.
 >> >> > >> > > > >   - Wait for the port to be ACTIVE in Neutron ('up == 
 >> >> > >> > > > > True' in NB)
 >> >> > >> > > > >   - Wait until the test can ping the port
 >> >> > >> > > > > * Running browbeat/rally with 16 simultaneous process 
 >> >> > >> > > > > to execute the
 >> >> > >> > > > > test above 150 times.
 >> >> > >> > > > > * When all the 150 'fake VMs' are created, browbeat 
 >> >> > >> > > > > will delete all
 >> >> > >> > > > > the OpenStack/OVN resources.
 >> >> > >> > > > >
 >> >> > >> > > > > We first tried with OVS/OVN 2.10 and pulled some 
 >> >> > >> > > > > results which showed
 >> >> > >> > > > > 100% success but ovn-controller is quite loaded (as 
 >> >> > >> > > > > expected) in all
 >> >> > >> > > > > the nodes especially during the deletion phase:
 >> >> > >> > > > >
 >> >> > >> > > > > - Compute node: https://imgur.com/a/tzxfrIR
 >> >> > >> > > > > - Controller node (ovn-northd and ovsdb-servers): 
 >> >> > >> > > > > https://imgur.com/a/8ffKKYF
 >> >> > >> > > > >
 >> >> > >> > > > > After conducting the tests above, we replaced 
 >> >> > >> > > > > ovn-controller in all 7
 >> >> > >> > > > > nodes by the one with the current master branch 
 >> >> > >> > > > > (actually from last
 >> >> > >> > > > > week). We also replaced ovn-northd and ovsdb-servers 
 >> >> > >> > > > > but the
 >> >> > >> > > > > ovs-vswitchd has been left untouched (still on 2.10). 
 >> >> > >> > > > > The expected
 >> >> > >> > > > > results were to get less ovn-controller CPU usage and 
 >> >> > >> > > > > also better
 >> >> > >> > > > > times due to the Incremental Processing feature 
 >> >> > >> > > > > introduced recently.
 >> >> > >> > > > > However, the results don't look very good:
 >> >> > >> > > > >
 >> >> > >> > > 

Re: [ovs-discuss] [OVN] ovn-controller Incremental Processing scale testing

2019-07-19 Thread Han Zhou
On Fri, Jul 19, 2019 at 12:37 PM Numan Siddique  wrote:

>
>
> On Fri, Jul 19, 2019 at 6:19 PM Numan Siddique 
> wrote:
>
>>
>>
>> On Fri, Jul 19, 2019 at 6:28 AM Han Zhou  wrote:
>>
>>>
>>>
>>> On Tue, Jul 9, 2019 at 12:13 AM Numan Siddique 
>>> wrote:
>>> >
>>> >
>>> >
>>> > On Tue, Jul 9, 2019 at 12:25 PM Daniel Alvarez Sanchez <
>>> dalva...@redhat.com> wrote:
>>> >>
>>> >> Thanks Numan for running these tests outside OpenStack!
>>> >>
>>> >> On Tue, Jul 9, 2019 at 7:50 AM Numan Siddique 
>>> wrote:
>>> >> >
>>> >> >
>>> >> >
>>> >> > On Tue, Jul 9, 2019 at 11:05 AM Han Zhou  wrote:
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> On Fri, Jun 21, 2019 at 12:31 AM Han Zhou 
>>> wrote:
>>> >> >> >
>>> >> >> >
>>> >> >> >
>>> >> >> > On Thu, Jun 20, 2019 at 11:42 PM Numan Siddique <
>>> nusid...@redhat.com> wrote:
>>> >> >> > >
>>> >> >> > >
>>> >> >> > >
>>> >> >> > > On Fri, Jun 21, 2019, 11:47 AM Han Zhou 
>>> wrote:
>>> >> >> > >>
>>> >> >> > >>
>>> >> >> > >>
>>> >> >> > >> On Tue, Jun 11, 2019 at 9:16 AM Daniel Alvarez Sanchez <
>>> dalva...@redhat.com> wrote:
>>> >> >> > >> >
>>> >> >> > >> > Thanks a lot Han for the answer!
>>> >> >> > >> >
>>> >> >> > >> > On Tue, Jun 11, 2019 at 5:57 PM Han Zhou 
>>> wrote:
>>> >> >> > >> > >
>>> >> >> > >> > >
>>> >> >> > >> > >
>>> >> >> > >> > >
>>> >> >> > >> > > On Tue, Jun 11, 2019 at 5:12 AM Dumitru Ceara <
>>> dce...@redhat.com> wrote:
>>> >> >> > >> > > >
>>> >> >> > >> > > > On Tue, Jun 11, 2019 at 10:40 AM Daniel Alvarez Sanchez
>>> >> >> > >> > > >  wrote:
>>> >> >> > >> > > > >
>>> >> >> > >> > > > > Hi Han, all,
>>> >> >> > >> > > > >
>>> >> >> > >> > > > > Lucas, Numan and I have been doing some 'scale'
>>> testing of OpenStack
>>> >> >> > >> > > > > using OVN and wanted to present some results and
>>> issues that we've
>>> >> >> > >> > > > > found with the Incremental Processing feature in
>>> ovn-controller. Below
>>> >> >> > >> > > > > is the scenario that we executed:
>>> >> >> > >> > > > >
>>> >> >> > >> > > > > * 7 baremetal nodes setup: 3 controllers (running
>>> >> >> > >> > > > > ovn-northd/ovsdb-servers in A/P with pacemaker) + 4
>>> compute nodes. OVS
>>> >> >> > >> > > > > 2.10.
>>> >> >> > >> > > > > * The test consists on:
>>> >> >> > >> > > > >   - Create openstack network (OVN LS), subnet and
>>> router
>>> >> >> > >> > > > >   - Attach subnet to the router and set gw to the
>>> external network
>>> >> >> > >> > > > >   - Create an OpenStack port and apply a Security
>>> Group (ACLs to allow
>>> >> >> > >> > > > > UDP, SSH and ICMP).
>>> >> >> > >> > > > >   - Bind the port to one of the 4 compute nodes
>>> (randomly) by
>>> >> >> > >> > > > > attaching it to a network namespace.
>>> >> >> > >> > > > >   - Wait for the port to be ACTIVE in Neutron ('up ==
>>> True' in NB)
>>> >> >> > >> > > > >   - Wait until the test can ping the port
>>> >> >> > >> > > > > * Running browbeat/rally with 16 simultaneous process
>>> to execute the
>>> >> >> > >> > > > > test above 150 times.
>>> >> >> > >> > > > > * When all the 150 'fake VMs' are created, browbeat
>>> will delete all
>>> >> >> > >> > > > > the OpenStack/OVN resources.
>>> >> >> > >> > > > >
>>> >> >> > >> > > > > We first tried with OVS/OVN 2.10 and pulled some
>>> results which showed
>>> >> >> > >> > > > > 100% success but ovn-controller is quite loaded (as
>>> expected) in all
>>> >> >> > >> > > > > the nodes especially during the deletion phase:
>>> >> >> > >> > > > >
>>> >> >> > >> > > > > - Compute node: https://imgur.com/a/tzxfrIR
>>> >> >> > >> > > > > - Controller node (ovn-northd and ovsdb-servers):
>>> https://imgur.com/a/8ffKKYF
>>> >> >> > >> > > > >
>>> >> >> > >> > > > > After conducting the tests above, we replaced
>>> ovn-controller in all 7
>>> >> >> > >> > > > > nodes by the one with the current master branch
>>> (actually from last
>>> >> >> > >> > > > > week). We also replaced ovn-northd and ovsdb-servers
>>> but the
>>> >> >> > >> > > > > ovs-vswitchd has been left untouched (still on 2.10).
>>> The expected
>>> >> >> > >> > > > > results were to get less ovn-controller CPU usage and
>>> also better
>>> >> >> > >> > > > > times due to the Incremental Processing feature
>>> introduced recently.
>>> >> >> > >> > > > > However, the results don't look very good:
>>> >> >> > >> > > > >
>>> >> >> > >> > > > > - Compute node: https://imgur.com/a/wuq87F1
>>> >> >> > >> > > > > - Controller node (ovn-northd and ovsdb-servers):
>>> https://imgur.com/a/99kiyDp
>>> >> >> > >> > > > >
>>> >> >> > >> > > > > One thing that we can tell from the ovs-vswitchd CPU
>>> consumption is
>>> >> >> > >> > > > > that it's much less in the Incremental Processing
>>> (IP) case which
>>> >> >> > >> > > > > apparently doesn't make much sense. This led us to
>>> think that perhaps
>>> >> >> > >> > > > > ovn-controller was not installing the necessary flows
>>> in the switch
>>> >> >> > >> > > > > and we confirmed this hypothesis by looking into the
>>> dataplane
>>> >> >> > >> > > > > results. Out 

Re: [ovs-discuss] [OVN] ovn-controller Incremental Processing scale testing

2019-07-19 Thread Numan Siddique
On Fri, Jul 19, 2019 at 6:19 PM Numan Siddique  wrote:

>
>
> On Fri, Jul 19, 2019 at 6:28 AM Han Zhou  wrote:
>
>>
>>
>> On Tue, Jul 9, 2019 at 12:13 AM Numan Siddique 
>> wrote:
>> >
>> >
>> >
>> > On Tue, Jul 9, 2019 at 12:25 PM Daniel Alvarez Sanchez <
>> dalva...@redhat.com> wrote:
>> >>
>> >> Thanks Numan for running these tests outside OpenStack!
>> >>
>> >> On Tue, Jul 9, 2019 at 7:50 AM Numan Siddique 
>> wrote:
>> >> >
>> >> >
>> >> >
>> >> > On Tue, Jul 9, 2019 at 11:05 AM Han Zhou  wrote:
>> >> >>
>> >> >>
>> >> >>
>> >> >> On Fri, Jun 21, 2019 at 12:31 AM Han Zhou 
>> wrote:
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > On Thu, Jun 20, 2019 at 11:42 PM Numan Siddique <
>> nusid...@redhat.com> wrote:
>> >> >> > >
>> >> >> > >
>> >> >> > >
>> >> >> > > On Fri, Jun 21, 2019, 11:47 AM Han Zhou 
>> wrote:
>> >> >> > >>
>> >> >> > >>
>> >> >> > >>
>> >> >> > >> On Tue, Jun 11, 2019 at 9:16 AM Daniel Alvarez Sanchez <
>> dalva...@redhat.com> wrote:
>> >> >> > >> >
>> >> >> > >> > Thanks a lot Han for the answer!
>> >> >> > >> >
>> >> >> > >> > On Tue, Jun 11, 2019 at 5:57 PM Han Zhou 
>> wrote:
>> >> >> > >> > >
>> >> >> > >> > >
>> >> >> > >> > >
>> >> >> > >> > >
>> >> >> > >> > > On Tue, Jun 11, 2019 at 5:12 AM Dumitru Ceara <
>> dce...@redhat.com> wrote:
>> >> >> > >> > > >
>> >> >> > >> > > > On Tue, Jun 11, 2019 at 10:40 AM Daniel Alvarez Sanchez
>> >> >> > >> > > >  wrote:
>> >> >> > >> > > > >
>> >> >> > >> > > > > Hi Han, all,
>> >> >> > >> > > > >
>> >> >> > >> > > > > Lucas, Numan and I have been doing some 'scale'
>> testing of OpenStack
>> >> >> > >> > > > > using OVN and wanted to present some results and
>> issues that we've
>> >> >> > >> > > > > found with the Incremental Processing feature in
>> ovn-controller. Below
>> >> >> > >> > > > > is the scenario that we executed:
>> >> >> > >> > > > >
>> >> >> > >> > > > > * 7 baremetal nodes setup: 3 controllers (running
>> >> >> > >> > > > > ovn-northd/ovsdb-servers in A/P with pacemaker) + 4
>> compute nodes. OVS
>> >> >> > >> > > > > 2.10.
>> >> >> > >> > > > > * The test consists on:
>> >> >> > >> > > > >   - Create openstack network (OVN LS), subnet and
>> router
>> >> >> > >> > > > >   - Attach subnet to the router and set gw to the
>> external network
>> >> >> > >> > > > >   - Create an OpenStack port and apply a Security
>> Group (ACLs to allow
>> >> >> > >> > > > > UDP, SSH and ICMP).
>> >> >> > >> > > > >   - Bind the port to one of the 4 compute nodes
>> (randomly) by
>> >> >> > >> > > > > attaching it to a network namespace.
>> >> >> > >> > > > >   - Wait for the port to be ACTIVE in Neutron ('up ==
>> True' in NB)
>> >> >> > >> > > > >   - Wait until the test can ping the port
>> >> >> > >> > > > > * Running browbeat/rally with 16 simultaneous process
>> to execute the
>> >> >> > >> > > > > test above 150 times.
>> >> >> > >> > > > > * When all the 150 'fake VMs' are created, browbeat
>> will delete all
>> >> >> > >> > > > > the OpenStack/OVN resources.
>> >> >> > >> > > > >
>> >> >> > >> > > > > We first tried with OVS/OVN 2.10 and pulled some
>> results which showed
>> >> >> > >> > > > > 100% success but ovn-controller is quite loaded (as
>> expected) in all
>> >> >> > >> > > > > the nodes especially during the deletion phase:
>> >> >> > >> > > > >
>> >> >> > >> > > > > - Compute node: https://imgur.com/a/tzxfrIR
>> >> >> > >> > > > > - Controller node (ovn-northd and ovsdb-servers):
>> https://imgur.com/a/8ffKKYF
>> >> >> > >> > > > >
>> >> >> > >> > > > > After conducting the tests above, we replaced
>> ovn-controller in all 7
>> >> >> > >> > > > > nodes by the one with the current master branch
>> (actually from last
>> >> >> > >> > > > > week). We also replaced ovn-northd and ovsdb-servers
>> but the
>> >> >> > >> > > > > ovs-vswitchd has been left untouched (still on 2.10).
>> The expected
>> >> >> > >> > > > > results were to get less ovn-controller CPU usage and
>> also better
>> >> >> > >> > > > > times due to the Incremental Processing feature
>> introduced recently.
>> >> >> > >> > > > > However, the results don't look very good:
>> >> >> > >> > > > >
>> >> >> > >> > > > > - Compute node: https://imgur.com/a/wuq87F1
>> >> >> > >> > > > > - Controller node (ovn-northd and ovsdb-servers):
>> https://imgur.com/a/99kiyDp
>> >> >> > >> > > > >
>> >> >> > >> > > > > One thing that we can tell from the ovs-vswitchd CPU
>> consumption is
>> >> >> > >> > > > > that it's much less in the Incremental Processing (IP)
>> case which
>> >> >> > >> > > > > apparently doesn't make much sense. This led us to
>> think that perhaps
>> >> >> > >> > > > > ovn-controller was not installing the necessary flows
>> in the switch
>> >> >> > >> > > > > and we confirmed this hypothesis by looking into the
>> dataplane
>> >> >> > >> > > > > results. Out of the 150 VMs, 10% of them were
>> unreachable via ping
>> >> >> > >> > > > > when using ovn-controller from master.
>> >> >> > >> > > > >
>> >> >> > >> > > > > @Han, others, do you have any 

Re: [ovs-discuss] [OVN] ovn-controller Incremental Processing scale testing

2019-07-19 Thread Numan Siddique
On Fri, Jul 19, 2019 at 6:28 AM Han Zhou  wrote:

>
>
> On Tue, Jul 9, 2019 at 12:13 AM Numan Siddique 
> wrote:
> >
> >
> >
> > On Tue, Jul 9, 2019 at 12:25 PM Daniel Alvarez Sanchez <
> dalva...@redhat.com> wrote:
> >>
> >> Thanks Numan for running these tests outside OpenStack!
> >>
> >> On Tue, Jul 9, 2019 at 7:50 AM Numan Siddique 
> wrote:
> >> >
> >> >
> >> >
> >> > On Tue, Jul 9, 2019 at 11:05 AM Han Zhou  wrote:
> >> >>
> >> >>
> >> >>
> >> >> On Fri, Jun 21, 2019 at 12:31 AM Han Zhou  wrote:
> >> >> >
> >> >> >
> >> >> >
> >> >> > On Thu, Jun 20, 2019 at 11:42 PM Numan Siddique <
> nusid...@redhat.com> wrote:
> >> >> > >
> >> >> > >
> >> >> > >
> >> >> > > On Fri, Jun 21, 2019, 11:47 AM Han Zhou 
> wrote:
> >> >> > >>
> >> >> > >>
> >> >> > >>
> >> >> > >> On Tue, Jun 11, 2019 at 9:16 AM Daniel Alvarez Sanchez <
> dalva...@redhat.com> wrote:
> >> >> > >> >
> >> >> > >> > Thanks a lot Han for the answer!
> >> >> > >> >
> >> >> > >> > On Tue, Jun 11, 2019 at 5:57 PM Han Zhou 
> wrote:
> >> >> > >> > >
> >> >> > >> > >
> >> >> > >> > >
> >> >> > >> > >
> >> >> > >> > > On Tue, Jun 11, 2019 at 5:12 AM Dumitru Ceara <
> dce...@redhat.com> wrote:
> >> >> > >> > > >
> >> >> > >> > > > On Tue, Jun 11, 2019 at 10:40 AM Daniel Alvarez Sanchez
> >> >> > >> > > >  wrote:
> >> >> > >> > > > >
> >> >> > >> > > > > Hi Han, all,
> >> >> > >> > > > >
> >> >> > >> > > > > Lucas, Numan and I have been doing some 'scale' testing
> of OpenStack
> >> >> > >> > > > > using OVN and wanted to present some results and issues
> that we've
> >> >> > >> > > > > found with the Incremental Processing feature in
> ovn-controller. Below
> >> >> > >> > > > > is the scenario that we executed:
> >> >> > >> > > > >
> >> >> > >> > > > > * 7 baremetal nodes setup: 3 controllers (running
> >> >> > >> > > > > ovn-northd/ovsdb-servers in A/P with pacemaker) + 4
> compute nodes. OVS
> >> >> > >> > > > > 2.10.
> >> >> > >> > > > > * The test consists on:
> >> >> > >> > > > >   - Create openstack network (OVN LS), subnet and router
> >> >> > >> > > > >   - Attach subnet to the router and set gw to the
> external network
> >> >> > >> > > > >   - Create an OpenStack port and apply a Security Group
> (ACLs to allow
> >> >> > >> > > > > UDP, SSH and ICMP).
> >> >> > >> > > > >   - Bind the port to one of the 4 compute nodes
> (randomly) by
> >> >> > >> > > > > attaching it to a network namespace.
> >> >> > >> > > > >   - Wait for the port to be ACTIVE in Neutron ('up ==
> True' in NB)
> >> >> > >> > > > >   - Wait until the test can ping the port
> >> >> > >> > > > > * Running browbeat/rally with 16 simultaneous process
> to execute the
> >> >> > >> > > > > test above 150 times.
> >> >> > >> > > > > * When all the 150 'fake VMs' are created, browbeat
> will delete all
> >> >> > >> > > > > the OpenStack/OVN resources.
> >> >> > >> > > > >
> >> >> > >> > > > > We first tried with OVS/OVN 2.10 and pulled some
> results which showed
> >> >> > >> > > > > 100% success but ovn-controller is quite loaded (as
> expected) in all
> >> >> > >> > > > > the nodes especially during the deletion phase:
> >> >> > >> > > > >
> >> >> > >> > > > > - Compute node: https://imgur.com/a/tzxfrIR
> >> >> > >> > > > > - Controller node (ovn-northd and ovsdb-servers):
> https://imgur.com/a/8ffKKYF
> >> >> > >> > > > >
> >> >> > >> > > > > After conducting the tests above, we replaced
> ovn-controller in all 7
> >> >> > >> > > > > nodes by the one with the current master branch
> (actually from last
> >> >> > >> > > > > week). We also replaced ovn-northd and ovsdb-servers
> but the
> >> >> > >> > > > > ovs-vswitchd has been left untouched (still on 2.10).
> The expected
> >> >> > >> > > > > results were to get less ovn-controller CPU usage and
> also better
> >> >> > >> > > > > times due to the Incremental Processing feature
> introduced recently.
> >> >> > >> > > > > However, the results don't look very good:
> >> >> > >> > > > >
> >> >> > >> > > > > - Compute node: https://imgur.com/a/wuq87F1
> >> >> > >> > > > > - Controller node (ovn-northd and ovsdb-servers):
> https://imgur.com/a/99kiyDp
> >> >> > >> > > > >
> >> >> > >> > > > > One thing that we can tell from the ovs-vswitchd CPU
> consumption is
> >> >> > >> > > > > that it's much less in the Incremental Processing (IP)
> case which
> >> >> > >> > > > > apparently doesn't make much sense. This led us to
> think that perhaps
> >> >> > >> > > > > ovn-controller was not installing the necessary flows
> in the switch
> >> >> > >> > > > > and we confirmed this hypothesis by looking into the
> dataplane
> >> >> > >> > > > > results. Out of the 150 VMs, 10% of them were
> unreachable via ping
> >> >> > >> > > > > when using ovn-controller from master.
> >> >> > >> > > > >
> >> >> > >> > > > > @Han, others, do you have any ideas as of what could be
> happening
> >> >> > >> > > > > here? We'll be able to use this setup for a few more
> days so let me
> >> >> > >> > > > > know if you want us to pull some other data/traces, 

Re: [ovs-discuss] [OVN] ovn-controller Incremental Processing scale testing

2019-07-09 Thread Numan Siddique
On Tue, Jul 9, 2019 at 12:25 PM Daniel Alvarez Sanchez 
wrote:

> Thanks Numan for running these tests outside OpenStack!
>
> On Tue, Jul 9, 2019 at 7:50 AM Numan Siddique  wrote:
> >
> >
> >
> > On Tue, Jul 9, 2019 at 11:05 AM Han Zhou  wrote:
> >>
> >>
> >>
> >> On Fri, Jun 21, 2019 at 12:31 AM Han Zhou  wrote:
> >> >
> >> >
> >> >
> >> > On Thu, Jun 20, 2019 at 11:42 PM Numan Siddique 
> wrote:
> >> > >
> >> > >
> >> > >
> >> > > On Fri, Jun 21, 2019, 11:47 AM Han Zhou  wrote:
> >> > >>
> >> > >>
> >> > >>
> >> > >> On Tue, Jun 11, 2019 at 9:16 AM Daniel Alvarez Sanchez <
> dalva...@redhat.com> wrote:
> >> > >> >
> >> > >> > Thanks a lot Han for the answer!
> >> > >> >
> >> > >> > On Tue, Jun 11, 2019 at 5:57 PM Han Zhou 
> wrote:
> >> > >> > >
> >> > >> > >
> >> > >> > >
> >> > >> > >
> >> > >> > > On Tue, Jun 11, 2019 at 5:12 AM Dumitru Ceara <
> dce...@redhat.com> wrote:
> >> > >> > > >
> >> > >> > > > On Tue, Jun 11, 2019 at 10:40 AM Daniel Alvarez Sanchez
> >> > >> > > >  wrote:
> >> > >> > > > >
> >> > >> > > > > Hi Han, all,
> >> > >> > > > >
> >> > >> > > > > Lucas, Numan and I have been doing some 'scale' testing of
> OpenStack
> >> > >> > > > > using OVN and wanted to present some results and issues
> that we've
> >> > >> > > > > found with the Incremental Processing feature in
> ovn-controller. Below
> >> > >> > > > > is the scenario that we executed:
> >> > >> > > > >
> >> > >> > > > > * 7 baremetal nodes setup: 3 controllers (running
> >> > >> > > > > ovn-northd/ovsdb-servers in A/P with pacemaker) + 4
> compute nodes. OVS
> >> > >> > > > > 2.10.
> >> > >> > > > > * The test consists on:
> >> > >> > > > >   - Create openstack network (OVN LS), subnet and router
> >> > >> > > > >   - Attach subnet to the router and set gw to the external
> network
> >> > >> > > > >   - Create an OpenStack port and apply a Security Group
> (ACLs to allow
> >> > >> > > > > UDP, SSH and ICMP).
> >> > >> > > > >   - Bind the port to one of the 4 compute nodes (randomly)
> by
> >> > >> > > > > attaching it to a network namespace.
> >> > >> > > > >   - Wait for the port to be ACTIVE in Neutron ('up ==
> True' in NB)
> >> > >> > > > >   - Wait until the test can ping the port
> >> > >> > > > > * Running browbeat/rally with 16 simultaneous process to
> execute the
> >> > >> > > > > test above 150 times.
> >> > >> > > > > * When all the 150 'fake VMs' are created, browbeat will
> delete all
> >> > >> > > > > the OpenStack/OVN resources.
> >> > >> > > > >
> >> > >> > > > > We first tried with OVS/OVN 2.10 and pulled some results
> which showed
> >> > >> > > > > 100% success but ovn-controller is quite loaded (as
> expected) in all
> >> > >> > > > > the nodes especially during the deletion phase:
> >> > >> > > > >
> >> > >> > > > > - Compute node: https://imgur.com/a/tzxfrIR
> >> > >> > > > > - Controller node (ovn-northd and ovsdb-servers):
> https://imgur.com/a/8ffKKYF
> >> > >> > > > >
> >> > >> > > > > After conducting the tests above, we replaced
> ovn-controller in all 7
> >> > >> > > > > nodes by the one with the current master branch (actually
> from last
> >> > >> > > > > week). We also replaced ovn-northd and ovsdb-servers but
> the
> >> > >> > > > > ovs-vswitchd has been left untouched (still on 2.10). The
> expected
> >> > >> > > > > results were to get less ovn-controller CPU usage and also
> better
> >> > >> > > > > times due to the Incremental Processing feature introduced
> recently.
> >> > >> > > > > However, the results don't look very good:
> >> > >> > > > >
> >> > >> > > > > - Compute node: https://imgur.com/a/wuq87F1
> >> > >> > > > > - Controller node (ovn-northd and ovsdb-servers):
> https://imgur.com/a/99kiyDp
> >> > >> > > > >
> >> > >> > > > > One thing that we can tell from the ovs-vswitchd CPU
> consumption is
> >> > >> > > > > that it's much less in the Incremental Processing (IP)
> case which
> >> > >> > > > > apparently doesn't make much sense. This led us to think
> that perhaps
> >> > >> > > > > ovn-controller was not installing the necessary flows in
> the switch
> >> > >> > > > > and we confirmed this hypothesis by looking into the
> dataplane
> >> > >> > > > > results. Out of the 150 VMs, 10% of them were unreachable
> via ping
> >> > >> > > > > when using ovn-controller from master.
> >> > >> > > > >
> >> > >> > > > > @Han, others, do you have any ideas as of what could be
> happening
> >> > >> > > > > here? We'll be able to use this setup for a few more days
> so let me
> >> > >> > > > > know if you want us to pull some other data/traces, ...
> >> > >> > > > >
> >> > >> > > > > Some other interesting things:
> >> > >> > > > > On each of the compute nodes, (with an almost evenly
> distributed
> >> > >> > > > > number of logical ports bound to them), the max amount of
> logical
> >> > >> > > > > flows in br-int is ~90K (by the end of the test, right
> before deleting
> >> > >> > > > > the resources).
> >> > >> > > > >
> >> > >> > > > > It looks like with the IP version, 

Re: [ovs-discuss] [OVN] ovn-controller Incremental Processing scale testing

2019-07-09 Thread Daniel Alvarez Sanchez
Thanks Numan for running these tests outside OpenStack!

On Tue, Jul 9, 2019 at 7:50 AM Numan Siddique  wrote:
>
>
>
> On Tue, Jul 9, 2019 at 11:05 AM Han Zhou  wrote:
>>
>>
>>
>> On Fri, Jun 21, 2019 at 12:31 AM Han Zhou  wrote:
>> >
>> >
>> >
>> > On Thu, Jun 20, 2019 at 11:42 PM Numan Siddique  
>> > wrote:
>> > >
>> > >
>> > >
>> > > On Fri, Jun 21, 2019, 11:47 AM Han Zhou  wrote:
>> > >>
>> > >>
>> > >>
>> > >> On Tue, Jun 11, 2019 at 9:16 AM Daniel Alvarez Sanchez 
>> > >>  wrote:
>> > >> >
>> > >> > Thanks a lot Han for the answer!
>> > >> >
>> > >> > On Tue, Jun 11, 2019 at 5:57 PM Han Zhou  wrote:
>> > >> > >
>> > >> > >
>> > >> > >
>> > >> > >
>> > >> > > On Tue, Jun 11, 2019 at 5:12 AM Dumitru Ceara  
>> > >> > > wrote:
>> > >> > > >
>> > >> > > > On Tue, Jun 11, 2019 at 10:40 AM Daniel Alvarez Sanchez
>> > >> > > >  wrote:
>> > >> > > > >
>> > >> > > > > Hi Han, all,
>> > >> > > > >
>> > >> > > > > Lucas, Numan and I have been doing some 'scale' testing of 
>> > >> > > > > OpenStack
>> > >> > > > > using OVN and wanted to present some results and issues that 
>> > >> > > > > we've
>> > >> > > > > found with the Incremental Processing feature in 
>> > >> > > > > ovn-controller. Below
>> > >> > > > > is the scenario that we executed:
>> > >> > > > >
>> > >> > > > > * 7 baremetal nodes setup: 3 controllers (running
>> > >> > > > > ovn-northd/ovsdb-servers in A/P with pacemaker) + 4 compute 
>> > >> > > > > nodes. OVS
>> > >> > > > > 2.10.
>> > >> > > > > * The test consists on:
>> > >> > > > >   - Create openstack network (OVN LS), subnet and router
>> > >> > > > >   - Attach subnet to the router and set gw to the external 
>> > >> > > > > network
>> > >> > > > >   - Create an OpenStack port and apply a Security Group (ACLs 
>> > >> > > > > to allow
>> > >> > > > > UDP, SSH and ICMP).
>> > >> > > > >   - Bind the port to one of the 4 compute nodes (randomly) by
>> > >> > > > > attaching it to a network namespace.
>> > >> > > > >   - Wait for the port to be ACTIVE in Neutron ('up == True' in 
>> > >> > > > > NB)
>> > >> > > > >   - Wait until the test can ping the port
>> > >> > > > > * Running browbeat/rally with 16 simultaneous process to 
>> > >> > > > > execute the
>> > >> > > > > test above 150 times.
>> > >> > > > > * When all the 150 'fake VMs' are created, browbeat will delete 
>> > >> > > > > all
>> > >> > > > > the OpenStack/OVN resources.
>> > >> > > > >
>> > >> > > > > We first tried with OVS/OVN 2.10 and pulled some results which 
>> > >> > > > > showed
>> > >> > > > > 100% success but ovn-controller is quite loaded (as expected) 
>> > >> > > > > in all
>> > >> > > > > the nodes especially during the deletion phase:
>> > >> > > > >
>> > >> > > > > - Compute node: https://imgur.com/a/tzxfrIR
>> > >> > > > > - Controller node (ovn-northd and ovsdb-servers): 
>> > >> > > > > https://imgur.com/a/8ffKKYF
>> > >> > > > >
>> > >> > > > > After conducting the tests above, we replaced ovn-controller in 
>> > >> > > > > all 7
>> > >> > > > > nodes by the one with the current master branch (actually from 
>> > >> > > > > last
>> > >> > > > > week). We also replaced ovn-northd and ovsdb-servers but the
>> > >> > > > > ovs-vswitchd has been left untouched (still on 2.10). The 
>> > >> > > > > expected
>> > >> > > > > results were to get less ovn-controller CPU usage and also 
>> > >> > > > > better
>> > >> > > > > times due to the Incremental Processing feature introduced 
>> > >> > > > > recently.
>> > >> > > > > However, the results don't look very good:
>> > >> > > > >
>> > >> > > > > - Compute node: https://imgur.com/a/wuq87F1
>> > >> > > > > - Controller node (ovn-northd and ovsdb-servers): 
>> > >> > > > > https://imgur.com/a/99kiyDp
>> > >> > > > >
>> > >> > > > > One thing that we can tell from the ovs-vswitchd CPU 
>> > >> > > > > consumption is
>> > >> > > > > that it's much less in the Incremental Processing (IP) case 
>> > >> > > > > which
>> > >> > > > > apparently doesn't make much sense. This led us to think that 
>> > >> > > > > perhaps
>> > >> > > > > ovn-controller was not installing the necessary flows in the 
>> > >> > > > > switch
>> > >> > > > > and we confirmed this hypothesis by looking into the dataplane
>> > >> > > > > results. Out of the 150 VMs, 10% of them were unreachable via 
>> > >> > > > > ping
>> > >> > > > > when using ovn-controller from master.
>> > >> > > > >
>> > >> > > > > @Han, others, do you have any ideas as of what could be 
>> > >> > > > > happening
>> > >> > > > > here? We'll be able to use this setup for a few more days so 
>> > >> > > > > let me
>> > >> > > > > know if you want us to pull some other data/traces, ...
>> > >> > > > >
>> > >> > > > > Some other interesting things:
>> > >> > > > > On each of the compute nodes, (with an almost evenly distributed
>> > >> > > > > number of logical ports bound to them), the max amount of 
>> > >> > > > > logical
>> > >> > > > > flows in br-int is ~90K (by the end of the test, right before 
>> > >> > > > > 

Re: [ovs-discuss] [OVN] ovn-controller Incremental Processing scale testing

2019-07-08 Thread Numan Siddique
On Tue, Jul 9, 2019 at 11:05 AM Han Zhou  wrote:

>
>
> On Fri, Jun 21, 2019 at 12:31 AM Han Zhou  wrote:
> >
> >
> >
> > On Thu, Jun 20, 2019 at 11:42 PM Numan Siddique 
> wrote:
> > >
> > >
> > >
> > > On Fri, Jun 21, 2019, 11:47 AM Han Zhou  wrote:
> > >>
> > >>
> > >>
> > >> On Tue, Jun 11, 2019 at 9:16 AM Daniel Alvarez Sanchez <
> dalva...@redhat.com> wrote:
> > >> >
> > >> > Thanks a lot Han for the answer!
> > >> >
> > >> > On Tue, Jun 11, 2019 at 5:57 PM Han Zhou  wrote:
> > >> > >
> > >> > >
> > >> > >
> > >> > >
> > >> > > On Tue, Jun 11, 2019 at 5:12 AM Dumitru Ceara 
> wrote:
> > >> > > >
> > >> > > > On Tue, Jun 11, 2019 at 10:40 AM Daniel Alvarez Sanchez
> > >> > > >  wrote:
> > >> > > > >
> > >> > > > > Hi Han, all,
> > >> > > > >
> > >> > > > > Lucas, Numan and I have been doing some 'scale' testing of
> OpenStack
> > >> > > > > using OVN and wanted to present some results and issues that
> we've
> > >> > > > > found with the Incremental Processing feature in
> ovn-controller. Below
> > >> > > > > is the scenario that we executed:
> > >> > > > >
> > >> > > > > * 7 baremetal nodes setup: 3 controllers (running
> > >> > > > > ovn-northd/ovsdb-servers in A/P with pacemaker) + 4 compute
> nodes. OVS
> > >> > > > > 2.10.
> > >> > > > > * The test consists on:
> > >> > > > >   - Create openstack network (OVN LS), subnet and router
> > >> > > > >   - Attach subnet to the router and set gw to the external
> network
> > >> > > > >   - Create an OpenStack port and apply a Security Group (ACLs
> to allow
> > >> > > > > UDP, SSH and ICMP).
> > >> > > > >   - Bind the port to one of the 4 compute nodes (randomly) by
> > >> > > > > attaching it to a network namespace.
> > >> > > > >   - Wait for the port to be ACTIVE in Neutron ('up == True'
> in NB)
> > >> > > > >   - Wait until the test can ping the port
> > >> > > > > * Running browbeat/rally with 16 simultaneous process to
> execute the
> > >> > > > > test above 150 times.
> > >> > > > > * When all the 150 'fake VMs' are created, browbeat will
> delete all
> > >> > > > > the OpenStack/OVN resources.
> > >> > > > >
> > >> > > > > We first tried with OVS/OVN 2.10 and pulled some results
> which showed
> > >> > > > > 100% success but ovn-controller is quite loaded (as expected)
> in all
> > >> > > > > the nodes especially during the deletion phase:
> > >> > > > >
> > >> > > > > - Compute node: https://imgur.com/a/tzxfrIR
> > >> > > > > - Controller node (ovn-northd and ovsdb-servers):
> https://imgur.com/a/8ffKKYF
> > >> > > > >
> > >> > > > > After conducting the tests above, we replaced ovn-controller
> in all 7
> > >> > > > > nodes by the one with the current master branch (actually
> from last
> > >> > > > > week). We also replaced ovn-northd and ovsdb-servers but the
> > >> > > > > ovs-vswitchd has been left untouched (still on 2.10). The
> expected
> > >> > > > > results were to get less ovn-controller CPU usage and also
> better
> > >> > > > > times due to the Incremental Processing feature introduced
> recently.
> > >> > > > > However, the results don't look very good:
> > >> > > > >
> > >> > > > > - Compute node: https://imgur.com/a/wuq87F1
> > >> > > > > - Controller node (ovn-northd and ovsdb-servers):
> https://imgur.com/a/99kiyDp
> > >> > > > >
> > >> > > > > One thing that we can tell from the ovs-vswitchd CPU
> consumption is
> > >> > > > > that it's much less in the Incremental Processing (IP) case
> which
> > >> > > > > apparently doesn't make much sense. This led us to think that
> perhaps
> > >> > > > > ovn-controller was not installing the necessary flows in the
> switch
> > >> > > > > and we confirmed this hypothesis by looking into the dataplane
> > >> > > > > results. Out of the 150 VMs, 10% of them were unreachable via
> ping
> > >> > > > > when using ovn-controller from master.
> > >> > > > >
> > >> > > > > @Han, others, do you have any ideas as of what could be
> happening
> > >> > > > > here? We'll be able to use this setup for a few more days so
> let me
> > >> > > > > know if you want us to pull some other data/traces, ...
> > >> > > > >
> > >> > > > > Some other interesting things:
> > >> > > > > On each of the compute nodes, (with an almost evenly
> distributed
> > >> > > > > number of logical ports bound to them), the max amount of
> logical
> > >> > > > > flows in br-int is ~90K (by the end of the test, right before
> deleting
> > >> > > > > the resources).
> > >> > > > >
> > >> > > > > It looks like with the IP version, ovn-controller leaks some
> memory:
> > >> > > > > https://imgur.com/a/trQrhWd
> > >> > > > > While with OVS 2.10, it remains pretty flat during the test:
> > >> > > > > https://imgur.com/a/KCkIT4O
> > >> > > >
> > >> > > > Hi Daniel, Han,
> > >> > > >
> > >> > > > I just sent a small patch for the ovn-controller memory leak:
> > >> > > > https://patchwork.ozlabs.org/patch/1113758/
> > >> > > >
> > >> > > > At least on my setup this is what valgrind was pointing at.
> > >> > > >
> > >> > > > Cheers,
> 

Re: [ovs-discuss] [OVN] ovn-controller Incremental Processing scale testing

2019-07-08 Thread Han Zhou
On Fri, Jun 21, 2019 at 12:31 AM Han Zhou  wrote:
>
>
>
> On Thu, Jun 20, 2019 at 11:42 PM Numan Siddique 
wrote:
> >
> >
> >
> > On Fri, Jun 21, 2019, 11:47 AM Han Zhou  wrote:
> >>
> >>
> >>
> >> On Tue, Jun 11, 2019 at 9:16 AM Daniel Alvarez Sanchez <
dalva...@redhat.com> wrote:
> >> >
> >> > Thanks a lot Han for the answer!
> >> >
> >> > On Tue, Jun 11, 2019 at 5:57 PM Han Zhou  wrote:
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > On Tue, Jun 11, 2019 at 5:12 AM Dumitru Ceara 
wrote:
> >> > > >
> >> > > > On Tue, Jun 11, 2019 at 10:40 AM Daniel Alvarez Sanchez
> >> > > >  wrote:
> >> > > > >
> >> > > > > Hi Han, all,
> >> > > > >
> >> > > > > Lucas, Numan and I have been doing some 'scale' testing of
OpenStack
> >> > > > > using OVN and wanted to present some results and issues that
we've
> >> > > > > found with the Incremental Processing feature in
ovn-controller. Below
> >> > > > > is the scenario that we executed:
> >> > > > >
> >> > > > > * 7 baremetal nodes setup: 3 controllers (running
> >> > > > > ovn-northd/ovsdb-servers in A/P with pacemaker) + 4 compute
nodes. OVS
> >> > > > > 2.10.
> >> > > > > * The test consists on:
> >> > > > >   - Create openstack network (OVN LS), subnet and router
> >> > > > >   - Attach subnet to the router and set gw to the external
network
> >> > > > >   - Create an OpenStack port and apply a Security Group (ACLs
to allow
> >> > > > > UDP, SSH and ICMP).
> >> > > > >   - Bind the port to one of the 4 compute nodes (randomly) by
> >> > > > > attaching it to a network namespace.
> >> > > > >   - Wait for the port to be ACTIVE in Neutron ('up == True' in
NB)
> >> > > > >   - Wait until the test can ping the port
> >> > > > > * Running browbeat/rally with 16 simultaneous process to
execute the
> >> > > > > test above 150 times.
> >> > > > > * When all the 150 'fake VMs' are created, browbeat will
delete all
> >> > > > > the OpenStack/OVN resources.
> >> > > > >
> >> > > > > We first tried with OVS/OVN 2.10 and pulled some results which
showed
> >> > > > > 100% success but ovn-controller is quite loaded (as expected)
in all
> >> > > > > the nodes especially during the deletion phase:
> >> > > > >
> >> > > > > - Compute node: https://imgur.com/a/tzxfrIR
> >> > > > > - Controller node (ovn-northd and ovsdb-servers):
https://imgur.com/a/8ffKKYF
> >> > > > >
> >> > > > > After conducting the tests above, we replaced ovn-controller
in all 7
> >> > > > > nodes by the one with the current master branch (actually from
last
> >> > > > > week). We also replaced ovn-northd and ovsdb-servers but the
> >> > > > > ovs-vswitchd has been left untouched (still on 2.10). The
expected
> >> > > > > results were to get less ovn-controller CPU usage and also
better
> >> > > > > times due to the Incremental Processing feature introduced
recently.
> >> > > > > However, the results don't look very good:
> >> > > > >
> >> > > > > - Compute node: https://imgur.com/a/wuq87F1
> >> > > > > - Controller node (ovn-northd and ovsdb-servers):
https://imgur.com/a/99kiyDp
> >> > > > >
> >> > > > > One thing that we can tell from the ovs-vswitchd CPU
consumption is
> >> > > > > that it's much less in the Incremental Processing (IP) case
which
> >> > > > > apparently doesn't make much sense. This led us to think that
perhaps
> >> > > > > ovn-controller was not installing the necessary flows in the
switch
> >> > > > > and we confirmed this hypothesis by looking into the dataplane
> >> > > > > results. Out of the 150 VMs, 10% of them were unreachable via
ping
> >> > > > > when using ovn-controller from master.
> >> > > > >
> >> > > > > @Han, others, do you have any ideas as of what could be
happening
> >> > > > > here? We'll be able to use this setup for a few more days so
let me
> >> > > > > know if you want us to pull some other data/traces, ...
> >> > > > >
> >> > > > > Some other interesting things:
> >> > > > > On each of the compute nodes, (with an almost evenly
distributed
> >> > > > > number of logical ports bound to them), the max amount of
logical
> >> > > > > flows in br-int is ~90K (by the end of the test, right before
deleting
> >> > > > > the resources).
> >> > > > >
> >> > > > > It looks like with the IP version, ovn-controller leaks some
memory:
> >> > > > > https://imgur.com/a/trQrhWd
> >> > > > > While with OVS 2.10, it remains pretty flat during the test:
> >> > > > > https://imgur.com/a/KCkIT4O
> >> > > >
> >> > > > Hi Daniel, Han,
> >> > > >
> >> > > > I just sent a small patch for the ovn-controller memory leak:
> >> > > > https://patchwork.ozlabs.org/patch/1113758/
> >> > > >
> >> > > > At least on my setup this is what valgrind was pointing at.
> >> > > >
> >> > > > Cheers,
> >> > > > Dumitru
> >> > > >
> >> > > > >
> >> > > > > Looking forward to hearing back :)
> >> > > > > Daniel
> >> > > > >
> >> > > > > PS. Sorry for my previous email, I sent it by mistake without
the subject
> >> > > > > ___
> >> > > > > discuss mailing list
> >> > > > > 

Re: [ovs-discuss] [OVN] ovn-controller Incremental Processing scale testing

2019-06-24 Thread Numan Siddique
On Mon, Jun 24, 2019 at 1:51 PM aginwala  wrote:

> Hi:
> As per irc meeting discussion, some nice findings were already discussed
> by Numan (Thanks for sharing the details).  When changing external_ids for
> a claimed port e.g. ovn-nbctl set logical_switch_port sw0-port1
> external_ids:foo=bar triggers re-computation on local compute. I do see the
> same behavior. Numan is proposing a patch to skip computation for
> external_ids column for an already claimed port for port_binding table
> because of runtime_data, can't handle change for input SB_port_binding,
> fall back to recompute (
> https://github.com/openvswitch/ovs/blob/master/ovn/lib/inc-proc-eng.h#L77).
> However,  I don't see external_ids in port_binding table for the port being
> set explicitly when setting Interface table in the test code that Daniel
> posted [1] which could trigger extra re-computation in current test
> scenario.
>

ovn-northd just copies the external_ids of a logical switch port to
external_ids of port binding.  And networking-ovn makes use of external_ids
a lot.


>
> Also ovs-vsctl add-br test will also trigger re-computation on local
> compute and yes I can see the same. Since we don't have any handlers for
> Ports and Interfaces table similar to port_binding and other handlers @
> https://github.com/openvswitch/ovs/blob/master/ovn/controller/ovn-controller.c#L1769,
> adding a new bridge also causes re-computation on the local compute. Not
> sure if its required immediately because as per the patch shared by Daniel
> [1], I don't see any new test bridges getting created  apart from br-int
> and hence wont be much impact. Or may be I missed to see if they are also
> creating test bridges during testing. Of course, any new ovs-vsctl command
> for attaching/detaching vif will sure trigger recompute on br-int as and
> when VIF(vm) gets added/deleted to program the flow on local compute.
>

It would impact how the CMS creates the ovs port.

If suppose If I do something like below
---
ovs-vsctl add-port br-int foo
ovs-vsctl set interface foo type=internal
ovs-vsctl set Interface foo external_ids:iface-id=foo-id

and if ovn-controller gets 3 updates from ovsdb-server, this would result
in 3 recomputations.

However if I do
ovs-vsctl add-port br-int foo -- set interface foo type=internal -- set
interface foo external_ids:iface-id=foo-id

this could result in only 1 recomputation.

I think ovn-controller should handle the local ovsdb changes for
   1. external_ids of openvswitch table
   2. if an ovs interface's external_ids:iface-id  is updated.

We should try to ignore or any other changes to the local ovsdb.


> I didn't get a chance to verify when a chassisredirect port is claimed on
> a gateway chassis, it triggers computation on all computes registered with
> SB as per code
> https://github.com/openvswitch/ovs/blob/master/ovn/controller/binding.c#L722
> which was also raises further optimization for chassisredirect flow that
> Numan is suggesting.
>
> 1.
> https://github.com/danalsan/browbeat/commit/0ff72da52ddf17aa9f7269f191eebd890899bdad
>
>
I submitted the patches just now to address some of the issues -
https://patchwork.ozlabs.org/project/openvswitch/list/?series=115737

I also ran the test with these patches, but it didn't help in any
improvement. Although the patches I submitted avoids  recomputation
for some of the scenarios, I think I still need to dig further to see
what's causing the performance impact when compared with non IP patches,

Thanks
Numan

On Fri, Jun 21, 2019 at 12:32 AM Han Zhou  wrote:
>
>>
>>
>> On Thu, Jun 20, 2019 at 11:42 PM Numan Siddique 
>> wrote:
>> >
>> >
>> >
>> > On Fri, Jun 21, 2019, 11:47 AM Han Zhou  wrote:
>> >>
>> >>
>> >>
>> >> On Tue, Jun 11, 2019 at 9:16 AM Daniel Alvarez Sanchez <
>> dalva...@redhat.com> wrote:
>> >> >
>> >> > Thanks a lot Han for the answer!
>> >> >
>> >> > On Tue, Jun 11, 2019 at 5:57 PM Han Zhou  wrote:
>> >> > >
>> >> > >
>> >> > >
>> >> > >
>> >> > > On Tue, Jun 11, 2019 at 5:12 AM Dumitru Ceara 
>> wrote:
>> >> > > >
>> >> > > > On Tue, Jun 11, 2019 at 10:40 AM Daniel Alvarez Sanchez
>> >> > > >  wrote:
>> >> > > > >
>> >> > > > > Hi Han, all,
>> >> > > > >
>> >> > > > > Lucas, Numan and I have been doing some 'scale' testing of
>> OpenStack
>> >> > > > > using OVN and wanted to present some results and issues that
>> we've
>> >> > > > > found with the Incremental Processing feature in
>> ovn-controller. Below
>> >> > > > > is the scenario that we executed:
>> >> > > > >
>> >> > > > > * 7 baremetal nodes setup: 3 controllers (running
>> >> > > > > ovn-northd/ovsdb-servers in A/P with pacemaker) + 4 compute
>> nodes. OVS
>> >> > > > > 2.10.
>> >> > > > > * The test consists on:
>> >> > > > >   - Create openstack network (OVN LS), subnet and router
>> >> > > > >   - Attach subnet to the router and set gw to the external
>> network
>> >> > > > >   - Create an OpenStack port and apply a Security Group (ACLs
>> to allow
>> >> > > > > UDP, SSH and ICMP).
>> 

Re: [ovs-discuss] [OVN] ovn-controller Incremental Processing scale testing

2019-06-24 Thread aginwala
Hi:
As per irc meeting discussion, some nice findings were already discussed by
Numan (Thanks for sharing the details).  When changing external_ids for a
claimed port e.g. ovn-nbctl set logical_switch_port sw0-port1
external_ids:foo=bar triggers re-computation on local compute. I do see the
same behavior. Numan is proposing a patch to skip computation for
external_ids column for an already claimed port for port_binding table
because of runtime_data, can't handle change for input SB_port_binding,
fall back to recompute (
https://github.com/openvswitch/ovs/blob/master/ovn/lib/inc-proc-eng.h#L77).
However,  I don't see external_ids in port_binding table for the port being
set explicitly when setting Interface table in the test code that Daniel
posted [1] which could trigger extra re-computation in current test
scenario.

Also ovs-vsctl add-br test will also trigger re-computation on local
compute and yes I can see the same. Since we don't have any handlers for
Ports and Interfaces table similar to port_binding and other handlers @
https://github.com/openvswitch/ovs/blob/master/ovn/controller/ovn-controller.c#L1769,
adding a new bridge also causes re-computation on the local compute. Not
sure if its required immediately because as per the patch shared by Daniel
[1], I don't see any new test bridges getting created  apart from br-int
and hence wont be much impact. Or may be I missed to see if they are also
creating test bridges during testing. Of course, any new ovs-vsctl command
for attaching/detaching vif will sure trigger recompute on br-int as and
when VIF(vm) gets added/deleted to program the flow on local compute.

I didn't get a chance to verify when a chassisredirect port is claimed on a
gateway chassis, it triggers computation on all computes registered with SB
as per code
https://github.com/openvswitch/ovs/blob/master/ovn/controller/binding.c#L722
which was also raises further optimization for chassisredirect flow that
Numan is suggesting.

1.
https://github.com/danalsan/browbeat/commit/0ff72da52ddf17aa9f7269f191eebd890899bdad

On Fri, Jun 21, 2019 at 12:32 AM Han Zhou  wrote:

>
>
> On Thu, Jun 20, 2019 at 11:42 PM Numan Siddique 
> wrote:
> >
> >
> >
> > On Fri, Jun 21, 2019, 11:47 AM Han Zhou  wrote:
> >>
> >>
> >>
> >> On Tue, Jun 11, 2019 at 9:16 AM Daniel Alvarez Sanchez <
> dalva...@redhat.com> wrote:
> >> >
> >> > Thanks a lot Han for the answer!
> >> >
> >> > On Tue, Jun 11, 2019 at 5:57 PM Han Zhou  wrote:
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > On Tue, Jun 11, 2019 at 5:12 AM Dumitru Ceara 
> wrote:
> >> > > >
> >> > > > On Tue, Jun 11, 2019 at 10:40 AM Daniel Alvarez Sanchez
> >> > > >  wrote:
> >> > > > >
> >> > > > > Hi Han, all,
> >> > > > >
> >> > > > > Lucas, Numan and I have been doing some 'scale' testing of
> OpenStack
> >> > > > > using OVN and wanted to present some results and issues that
> we've
> >> > > > > found with the Incremental Processing feature in
> ovn-controller. Below
> >> > > > > is the scenario that we executed:
> >> > > > >
> >> > > > > * 7 baremetal nodes setup: 3 controllers (running
> >> > > > > ovn-northd/ovsdb-servers in A/P with pacemaker) + 4 compute
> nodes. OVS
> >> > > > > 2.10.
> >> > > > > * The test consists on:
> >> > > > >   - Create openstack network (OVN LS), subnet and router
> >> > > > >   - Attach subnet to the router and set gw to the external
> network
> >> > > > >   - Create an OpenStack port and apply a Security Group (ACLs
> to allow
> >> > > > > UDP, SSH and ICMP).
> >> > > > >   - Bind the port to one of the 4 compute nodes (randomly) by
> >> > > > > attaching it to a network namespace.
> >> > > > >   - Wait for the port to be ACTIVE in Neutron ('up == True' in
> NB)
> >> > > > >   - Wait until the test can ping the port
> >> > > > > * Running browbeat/rally with 16 simultaneous process to
> execute the
> >> > > > > test above 150 times.
> >> > > > > * When all the 150 'fake VMs' are created, browbeat will delete
> all
> >> > > > > the OpenStack/OVN resources.
> >> > > > >
> >> > > > > We first tried with OVS/OVN 2.10 and pulled some results which
> showed
> >> > > > > 100% success but ovn-controller is quite loaded (as expected)
> in all
> >> > > > > the nodes especially during the deletion phase:
> >> > > > >
> >> > > > > - Compute node: https://imgur.com/a/tzxfrIR
> >> > > > > - Controller node (ovn-northd and ovsdb-servers):
> https://imgur.com/a/8ffKKYF
> >> > > > >
> >> > > > > After conducting the tests above, we replaced ovn-controller in
> all 7
> >> > > > > nodes by the one with the current master branch (actually from
> last
> >> > > > > week). We also replaced ovn-northd and ovsdb-servers but the
> >> > > > > ovs-vswitchd has been left untouched (still on 2.10). The
> expected
> >> > > > > results were to get less ovn-controller CPU usage and also
> better
> >> > > > > times due to the Incremental Processing feature introduced
> recently.
> >> > > > > However, the results don't look very good:
> >> > > > >
> >> > > > > 

Re: [ovs-discuss] [OVN] ovn-controller Incremental Processing scale testing

2019-06-21 Thread Han Zhou
On Thu, Jun 20, 2019 at 11:42 PM Numan Siddique  wrote:
>
>
>
> On Fri, Jun 21, 2019, 11:47 AM Han Zhou  wrote:
>>
>>
>>
>> On Tue, Jun 11, 2019 at 9:16 AM Daniel Alvarez Sanchez <
dalva...@redhat.com> wrote:
>> >
>> > Thanks a lot Han for the answer!
>> >
>> > On Tue, Jun 11, 2019 at 5:57 PM Han Zhou  wrote:
>> > >
>> > >
>> > >
>> > >
>> > > On Tue, Jun 11, 2019 at 5:12 AM Dumitru Ceara 
wrote:
>> > > >
>> > > > On Tue, Jun 11, 2019 at 10:40 AM Daniel Alvarez Sanchez
>> > > >  wrote:
>> > > > >
>> > > > > Hi Han, all,
>> > > > >
>> > > > > Lucas, Numan and I have been doing some 'scale' testing of
OpenStack
>> > > > > using OVN and wanted to present some results and issues that
we've
>> > > > > found with the Incremental Processing feature in ovn-controller.
Below
>> > > > > is the scenario that we executed:
>> > > > >
>> > > > > * 7 baremetal nodes setup: 3 controllers (running
>> > > > > ovn-northd/ovsdb-servers in A/P with pacemaker) + 4 compute
nodes. OVS
>> > > > > 2.10.
>> > > > > * The test consists on:
>> > > > >   - Create openstack network (OVN LS), subnet and router
>> > > > >   - Attach subnet to the router and set gw to the external
network
>> > > > >   - Create an OpenStack port and apply a Security Group (ACLs to
allow
>> > > > > UDP, SSH and ICMP).
>> > > > >   - Bind the port to one of the 4 compute nodes (randomly) by
>> > > > > attaching it to a network namespace.
>> > > > >   - Wait for the port to be ACTIVE in Neutron ('up == True' in
NB)
>> > > > >   - Wait until the test can ping the port
>> > > > > * Running browbeat/rally with 16 simultaneous process to execute
the
>> > > > > test above 150 times.
>> > > > > * When all the 150 'fake VMs' are created, browbeat will delete
all
>> > > > > the OpenStack/OVN resources.
>> > > > >
>> > > > > We first tried with OVS/OVN 2.10 and pulled some results which
showed
>> > > > > 100% success but ovn-controller is quite loaded (as expected) in
all
>> > > > > the nodes especially during the deletion phase:
>> > > > >
>> > > > > - Compute node: https://imgur.com/a/tzxfrIR
>> > > > > - Controller node (ovn-northd and ovsdb-servers):
https://imgur.com/a/8ffKKYF
>> > > > >
>> > > > > After conducting the tests above, we replaced ovn-controller in
all 7
>> > > > > nodes by the one with the current master branch (actually from
last
>> > > > > week). We also replaced ovn-northd and ovsdb-servers but the
>> > > > > ovs-vswitchd has been left untouched (still on 2.10). The
expected
>> > > > > results were to get less ovn-controller CPU usage and also better
>> > > > > times due to the Incremental Processing feature introduced
recently.
>> > > > > However, the results don't look very good:
>> > > > >
>> > > > > - Compute node: https://imgur.com/a/wuq87F1
>> > > > > - Controller node (ovn-northd and ovsdb-servers):
https://imgur.com/a/99kiyDp
>> > > > >
>> > > > > One thing that we can tell from the ovs-vswitchd CPU consumption
is
>> > > > > that it's much less in the Incremental Processing (IP) case which
>> > > > > apparently doesn't make much sense. This led us to think that
perhaps
>> > > > > ovn-controller was not installing the necessary flows in the
switch
>> > > > > and we confirmed this hypothesis by looking into the dataplane
>> > > > > results. Out of the 150 VMs, 10% of them were unreachable via
ping
>> > > > > when using ovn-controller from master.
>> > > > >
>> > > > > @Han, others, do you have any ideas as of what could be happening
>> > > > > here? We'll be able to use this setup for a few more days so let
me
>> > > > > know if you want us to pull some other data/traces, ...
>> > > > >
>> > > > > Some other interesting things:
>> > > > > On each of the compute nodes, (with an almost evenly distributed
>> > > > > number of logical ports bound to them), the max amount of logical
>> > > > > flows in br-int is ~90K (by the end of the test, right before
deleting
>> > > > > the resources).
>> > > > >
>> > > > > It looks like with the IP version, ovn-controller leaks some
memory:
>> > > > > https://imgur.com/a/trQrhWd
>> > > > > While with OVS 2.10, it remains pretty flat during the test:
>> > > > > https://imgur.com/a/KCkIT4O
>> > > >
>> > > > Hi Daniel, Han,
>> > > >
>> > > > I just sent a small patch for the ovn-controller memory leak:
>> > > > https://patchwork.ozlabs.org/patch/1113758/
>> > > >
>> > > > At least on my setup this is what valgrind was pointing at.
>> > > >
>> > > > Cheers,
>> > > > Dumitru
>> > > >
>> > > > >
>> > > > > Looking forward to hearing back :)
>> > > > > Daniel
>> > > > >
>> > > > > PS. Sorry for my previous email, I sent it by mistake without
the subject
>> > > > > ___
>> > > > > discuss mailing list
>> > > > > disc...@openvswitch.org
>> > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>> > >
>> > > Thanks Daniel for the testing and reporting, and thanks Dumitru for
fixing the memory leak.
>> > >
>> > > Currently ovn-controller incremental processing only 

Re: [ovs-discuss] [OVN] ovn-controller Incremental Processing scale testing

2019-06-21 Thread Daniel Alvarez
Hi Han,


> On 21 Jun 2019, at 08:16, Han Zhou  wrote:
> 
> 
> 
> On Tue, Jun 11, 2019 at 9:16 AM Daniel Alvarez Sanchez  
> wrote:
> >
> > Thanks a lot Han for the answer!
> >
> > On Tue, Jun 11, 2019 at 5:57 PM Han Zhou  wrote:
> > >
> > >
> > >
> > >
> > > On Tue, Jun 11, 2019 at 5:12 AM Dumitru Ceara  wrote:
> > > >
> > > > On Tue, Jun 11, 2019 at 10:40 AM Daniel Alvarez Sanchez
> > > >  wrote:
> > > > >
> > > > > Hi Han, all,
> > > > >
> > > > > Lucas, Numan and I have been doing some 'scale' testing of OpenStack
> > > > > using OVN and wanted to present some results and issues that we've
> > > > > found with the Incremental Processing feature in ovn-controller. Below
> > > > > is the scenario that we executed:
> > > > >
> > > > > * 7 baremetal nodes setup: 3 controllers (running
> > > > > ovn-northd/ovsdb-servers in A/P with pacemaker) + 4 compute nodes. OVS
> > > > > 2.10.
> > > > > * The test consists on:
> > > > >   - Create openstack network (OVN LS), subnet and router
> > > > >   - Attach subnet to the router and set gw to the external network
> > > > >   - Create an OpenStack port and apply a Security Group (ACLs to allow
> > > > > UDP, SSH and ICMP).
> > > > >   - Bind the port to one of the 4 compute nodes (randomly) by
> > > > > attaching it to a network namespace.
> > > > >   - Wait for the port to be ACTIVE in Neutron ('up == True' in NB)
> > > > >   - Wait until the test can ping the port
> > > > > * Running browbeat/rally with 16 simultaneous process to execute the
> > > > > test above 150 times.
> > > > > * When all the 150 'fake VMs' are created, browbeat will delete all
> > > > > the OpenStack/OVN resources.
> > > > >
> > > > > We first tried with OVS/OVN 2.10 and pulled some results which showed
> > > > > 100% success but ovn-controller is quite loaded (as expected) in all
> > > > > the nodes especially during the deletion phase:
> > > > >
> > > > > - Compute node: https://imgur.com/a/tzxfrIR
> > > > > - Controller node (ovn-northd and ovsdb-servers): 
> > > > > https://imgur.com/a/8ffKKYF
> > > > >
> > > > > After conducting the tests above, we replaced ovn-controller in all 7
> > > > > nodes by the one with the current master branch (actually from last
> > > > > week). We also replaced ovn-northd and ovsdb-servers but the
> > > > > ovs-vswitchd has been left untouched (still on 2.10). The expected
> > > > > results were to get less ovn-controller CPU usage and also better
> > > > > times due to the Incremental Processing feature introduced recently.
> > > > > However, the results don't look very good:
> > > > >
> > > > > - Compute node: https://imgur.com/a/wuq87F1
> > > > > - Controller node (ovn-northd and ovsdb-servers): 
> > > > > https://imgur.com/a/99kiyDp
> > > > >
> > > > > One thing that we can tell from the ovs-vswitchd CPU consumption is
> > > > > that it's much less in the Incremental Processing (IP) case which
> > > > > apparently doesn't make much sense. This led us to think that perhaps
> > > > > ovn-controller was not installing the necessary flows in the switch
> > > > > and we confirmed this hypothesis by looking into the dataplane
> > > > > results. Out of the 150 VMs, 10% of them were unreachable via ping
> > > > > when using ovn-controller from master.
> > > > >
> > > > > @Han, others, do you have any ideas as of what could be happening
> > > > > here? We'll be able to use this setup for a few more days so let me
> > > > > know if you want us to pull some other data/traces, ...
> > > > >
> > > > > Some other interesting things:
> > > > > On each of the compute nodes, (with an almost evenly distributed
> > > > > number of logical ports bound to them), the max amount of logical
> > > > > flows in br-int is ~90K (by the end of the test, right before deleting
> > > > > the resources).
> > > > >
> > > > > It looks like with the IP version, ovn-controller leaks some memory:
> > > > > https://imgur.com/a/trQrhWd
> > > > > While with OVS 2.10, it remains pretty flat during the test:
> > > > > https://imgur.com/a/KCkIT4O
> > > >
> > > > Hi Daniel, Han,
> > > >
> > > > I just sent a small patch for the ovn-controller memory leak:
> > > > https://patchwork.ozlabs.org/patch/1113758/
> > > >
> > > > At least on my setup this is what valgrind was pointing at.
> > > >
> > > > Cheers,
> > > > Dumitru
> > > >
> > > > >
> > > > > Looking forward to hearing back :)
> > > > > Daniel
> > > > >
> > > > > PS. Sorry for my previous email, I sent it by mistake without the 
> > > > > subject
> > > > > ___
> > > > > discuss mailing list
> > > > > disc...@openvswitch.org
> > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
> > >
> > > Thanks Daniel for the testing and reporting, and thanks Dumitru for 
> > > fixing the memory leak.
> > >
> > > Currently ovn-controller incremental processing only handles below SB 
> > > changes incrementally:
> > > - logical_flow
> > > - port_binding (for regular VIF binding NOT on current chassis)

Re: [ovs-discuss] [OVN] ovn-controller Incremental Processing scale testing

2019-06-21 Thread Numan Siddique
On Fri, Jun 21, 2019, 11:47 AM Han Zhou  wrote:

>
>
> On Tue, Jun 11, 2019 at 9:16 AM Daniel Alvarez Sanchez <
> dalva...@redhat.com> wrote:
> >
> > Thanks a lot Han for the answer!
> >
> > On Tue, Jun 11, 2019 at 5:57 PM Han Zhou  wrote:
> > >
> > >
> > >
> > >
> > > On Tue, Jun 11, 2019 at 5:12 AM Dumitru Ceara 
> wrote:
> > > >
> > > > On Tue, Jun 11, 2019 at 10:40 AM Daniel Alvarez Sanchez
> > > >  wrote:
> > > > >
> > > > > Hi Han, all,
> > > > >
> > > > > Lucas, Numan and I have been doing some 'scale' testing of
> OpenStack
> > > > > using OVN and wanted to present some results and issues that we've
> > > > > found with the Incremental Processing feature in ovn-controller.
> Below
> > > > > is the scenario that we executed:
> > > > >
> > > > > * 7 baremetal nodes setup: 3 controllers (running
> > > > > ovn-northd/ovsdb-servers in A/P with pacemaker) + 4 compute nodes.
> OVS
> > > > > 2.10.
> > > > > * The test consists on:
> > > > >   - Create openstack network (OVN LS), subnet and router
> > > > >   - Attach subnet to the router and set gw to the external network
> > > > >   - Create an OpenStack port and apply a Security Group (ACLs to
> allow
> > > > > UDP, SSH and ICMP).
> > > > >   - Bind the port to one of the 4 compute nodes (randomly) by
> > > > > attaching it to a network namespace.
> > > > >   - Wait for the port to be ACTIVE in Neutron ('up == True' in NB)
> > > > >   - Wait until the test can ping the port
> > > > > * Running browbeat/rally with 16 simultaneous process to execute
> the
> > > > > test above 150 times.
> > > > > * When all the 150 'fake VMs' are created, browbeat will delete all
> > > > > the OpenStack/OVN resources.
> > > > >
> > > > > We first tried with OVS/OVN 2.10 and pulled some results which
> showed
> > > > > 100% success but ovn-controller is quite loaded (as expected) in
> all
> > > > > the nodes especially during the deletion phase:
> > > > >
> > > > > - Compute node: https://imgur.com/a/tzxfrIR
> > > > > - Controller node (ovn-northd and ovsdb-servers):
> https://imgur.com/a/8ffKKYF
> > > > >
> > > > > After conducting the tests above, we replaced ovn-controller in
> all 7
> > > > > nodes by the one with the current master branch (actually from last
> > > > > week). We also replaced ovn-northd and ovsdb-servers but the
> > > > > ovs-vswitchd has been left untouched (still on 2.10). The expected
> > > > > results were to get less ovn-controller CPU usage and also better
> > > > > times due to the Incremental Processing feature introduced
> recently.
> > > > > However, the results don't look very good:
> > > > >
> > > > > - Compute node: https://imgur.com/a/wuq87F1
> > > > > - Controller node (ovn-northd and ovsdb-servers):
> https://imgur.com/a/99kiyDp
> > > > >
> > > > > One thing that we can tell from the ovs-vswitchd CPU consumption is
> > > > > that it's much less in the Incremental Processing (IP) case which
> > > > > apparently doesn't make much sense. This led us to think that
> perhaps
> > > > > ovn-controller was not installing the necessary flows in the switch
> > > > > and we confirmed this hypothesis by looking into the dataplane
> > > > > results. Out of the 150 VMs, 10% of them were unreachable via ping
> > > > > when using ovn-controller from master.
> > > > >
> > > > > @Han, others, do you have any ideas as of what could be happening
> > > > > here? We'll be able to use this setup for a few more days so let me
> > > > > know if you want us to pull some other data/traces, ...
> > > > >
> > > > > Some other interesting things:
> > > > > On each of the compute nodes, (with an almost evenly distributed
> > > > > number of logical ports bound to them), the max amount of logical
> > > > > flows in br-int is ~90K (by the end of the test, right before
> deleting
> > > > > the resources).
> > > > >
> > > > > It looks like with the IP version, ovn-controller leaks some
> memory:
> > > > > https://imgur.com/a/trQrhWd
> > > > > While with OVS 2.10, it remains pretty flat during the test:
> > > > > https://imgur.com/a/KCkIT4O
> > > >
> > > > Hi Daniel, Han,
> > > >
> > > > I just sent a small patch for the ovn-controller memory leak:
> > > > https://patchwork.ozlabs.org/patch/1113758/
> > > >
> > > > At least on my setup this is what valgrind was pointing at.
> > > >
> > > > Cheers,
> > > > Dumitru
> > > >
> > > > >
> > > > > Looking forward to hearing back :)
> > > > > Daniel
> > > > >
> > > > > PS. Sorry for my previous email, I sent it by mistake without the
> subject
> > > > > ___
> > > > > discuss mailing list
> > > > > disc...@openvswitch.org
> > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
> > >
> > > Thanks Daniel for the testing and reporting, and thanks Dumitru for
> fixing the memory leak.
> > >
> > > Currently ovn-controller incremental processing only handles below SB
> changes incrementally:
> > > - logical_flow
> > > - port_binding (for regular VIF binding NOT on current chassis)

Re: [ovs-discuss] [OVN] ovn-controller Incremental Processing scale testing

2019-06-21 Thread Han Zhou
On Tue, Jun 11, 2019 at 9:16 AM Daniel Alvarez Sanchez 
wrote:
>
> Thanks a lot Han for the answer!
>
> On Tue, Jun 11, 2019 at 5:57 PM Han Zhou  wrote:
> >
> >
> >
> >
> > On Tue, Jun 11, 2019 at 5:12 AM Dumitru Ceara  wrote:
> > >
> > > On Tue, Jun 11, 2019 at 10:40 AM Daniel Alvarez Sanchez
> > >  wrote:
> > > >
> > > > Hi Han, all,
> > > >
> > > > Lucas, Numan and I have been doing some 'scale' testing of OpenStack
> > > > using OVN and wanted to present some results and issues that we've
> > > > found with the Incremental Processing feature in ovn-controller.
Below
> > > > is the scenario that we executed:
> > > >
> > > > * 7 baremetal nodes setup: 3 controllers (running
> > > > ovn-northd/ovsdb-servers in A/P with pacemaker) + 4 compute nodes.
OVS
> > > > 2.10.
> > > > * The test consists on:
> > > >   - Create openstack network (OVN LS), subnet and router
> > > >   - Attach subnet to the router and set gw to the external network
> > > >   - Create an OpenStack port and apply a Security Group (ACLs to
allow
> > > > UDP, SSH and ICMP).
> > > >   - Bind the port to one of the 4 compute nodes (randomly) by
> > > > attaching it to a network namespace.
> > > >   - Wait for the port to be ACTIVE in Neutron ('up == True' in NB)
> > > >   - Wait until the test can ping the port
> > > > * Running browbeat/rally with 16 simultaneous process to execute the
> > > > test above 150 times.
> > > > * When all the 150 'fake VMs' are created, browbeat will delete all
> > > > the OpenStack/OVN resources.
> > > >
> > > > We first tried with OVS/OVN 2.10 and pulled some results which
showed
> > > > 100% success but ovn-controller is quite loaded (as expected) in all
> > > > the nodes especially during the deletion phase:
> > > >
> > > > - Compute node: https://imgur.com/a/tzxfrIR
> > > > - Controller node (ovn-northd and ovsdb-servers):
https://imgur.com/a/8ffKKYF
> > > >
> > > > After conducting the tests above, we replaced ovn-controller in all
7
> > > > nodes by the one with the current master branch (actually from last
> > > > week). We also replaced ovn-northd and ovsdb-servers but the
> > > > ovs-vswitchd has been left untouched (still on 2.10). The expected
> > > > results were to get less ovn-controller CPU usage and also better
> > > > times due to the Incremental Processing feature introduced recently.
> > > > However, the results don't look very good:
> > > >
> > > > - Compute node: https://imgur.com/a/wuq87F1
> > > > - Controller node (ovn-northd and ovsdb-servers):
https://imgur.com/a/99kiyDp
> > > >
> > > > One thing that we can tell from the ovs-vswitchd CPU consumption is
> > > > that it's much less in the Incremental Processing (IP) case which
> > > > apparently doesn't make much sense. This led us to think that
perhaps
> > > > ovn-controller was not installing the necessary flows in the switch
> > > > and we confirmed this hypothesis by looking into the dataplane
> > > > results. Out of the 150 VMs, 10% of them were unreachable via ping
> > > > when using ovn-controller from master.
> > > >
> > > > @Han, others, do you have any ideas as of what could be happening
> > > > here? We'll be able to use this setup for a few more days so let me
> > > > know if you want us to pull some other data/traces, ...
> > > >
> > > > Some other interesting things:
> > > > On each of the compute nodes, (with an almost evenly distributed
> > > > number of logical ports bound to them), the max amount of logical
> > > > flows in br-int is ~90K (by the end of the test, right before
deleting
> > > > the resources).
> > > >
> > > > It looks like with the IP version, ovn-controller leaks some memory:
> > > > https://imgur.com/a/trQrhWd
> > > > While with OVS 2.10, it remains pretty flat during the test:
> > > > https://imgur.com/a/KCkIT4O
> > >
> > > Hi Daniel, Han,
> > >
> > > I just sent a small patch for the ovn-controller memory leak:
> > > https://patchwork.ozlabs.org/patch/1113758/
> > >
> > > At least on my setup this is what valgrind was pointing at.
> > >
> > > Cheers,
> > > Dumitru
> > >
> > > >
> > > > Looking forward to hearing back :)
> > > > Daniel
> > > >
> > > > PS. Sorry for my previous email, I sent it by mistake without the
subject
> > > > ___
> > > > discuss mailing list
> > > > disc...@openvswitch.org
> > > > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
> >
> > Thanks Daniel for the testing and reporting, and thanks Dumitru for
fixing the memory leak.
> >
> > Currently ovn-controller incremental processing only handles below SB
changes incrementally:
> > - logical_flow
> > - port_binding (for regular VIF binding NOT on current chassis)
> > - mc_group
> > - address_set
> > - port_group
> > - mac_binding
> >
> > So, in test scenario you described, since each iteration creates
network (SB datapath changes) and router ports (port_binding changes for
non VIF), the incremental processing would not help much, because most
steps in your test should trigger 

Re: [ovs-discuss] [OVN] ovn-controller Incremental Processing scale testing

2019-06-11 Thread Daniel Alvarez Sanchez
Thanks a lot Han for the answer!

On Tue, Jun 11, 2019 at 5:57 PM Han Zhou  wrote:
>
>
>
>
> On Tue, Jun 11, 2019 at 5:12 AM Dumitru Ceara  wrote:
> >
> > On Tue, Jun 11, 2019 at 10:40 AM Daniel Alvarez Sanchez
> >  wrote:
> > >
> > > Hi Han, all,
> > >
> > > Lucas, Numan and I have been doing some 'scale' testing of OpenStack
> > > using OVN and wanted to present some results and issues that we've
> > > found with the Incremental Processing feature in ovn-controller. Below
> > > is the scenario that we executed:
> > >
> > > * 7 baremetal nodes setup: 3 controllers (running
> > > ovn-northd/ovsdb-servers in A/P with pacemaker) + 4 compute nodes. OVS
> > > 2.10.
> > > * The test consists on:
> > >   - Create openstack network (OVN LS), subnet and router
> > >   - Attach subnet to the router and set gw to the external network
> > >   - Create an OpenStack port and apply a Security Group (ACLs to allow
> > > UDP, SSH and ICMP).
> > >   - Bind the port to one of the 4 compute nodes (randomly) by
> > > attaching it to a network namespace.
> > >   - Wait for the port to be ACTIVE in Neutron ('up == True' in NB)
> > >   - Wait until the test can ping the port
> > > * Running browbeat/rally with 16 simultaneous process to execute the
> > > test above 150 times.
> > > * When all the 150 'fake VMs' are created, browbeat will delete all
> > > the OpenStack/OVN resources.
> > >
> > > We first tried with OVS/OVN 2.10 and pulled some results which showed
> > > 100% success but ovn-controller is quite loaded (as expected) in all
> > > the nodes especially during the deletion phase:
> > >
> > > - Compute node: https://imgur.com/a/tzxfrIR
> > > - Controller node (ovn-northd and ovsdb-servers): 
> > > https://imgur.com/a/8ffKKYF
> > >
> > > After conducting the tests above, we replaced ovn-controller in all 7
> > > nodes by the one with the current master branch (actually from last
> > > week). We also replaced ovn-northd and ovsdb-servers but the
> > > ovs-vswitchd has been left untouched (still on 2.10). The expected
> > > results were to get less ovn-controller CPU usage and also better
> > > times due to the Incremental Processing feature introduced recently.
> > > However, the results don't look very good:
> > >
> > > - Compute node: https://imgur.com/a/wuq87F1
> > > - Controller node (ovn-northd and ovsdb-servers): 
> > > https://imgur.com/a/99kiyDp
> > >
> > > One thing that we can tell from the ovs-vswitchd CPU consumption is
> > > that it's much less in the Incremental Processing (IP) case which
> > > apparently doesn't make much sense. This led us to think that perhaps
> > > ovn-controller was not installing the necessary flows in the switch
> > > and we confirmed this hypothesis by looking into the dataplane
> > > results. Out of the 150 VMs, 10% of them were unreachable via ping
> > > when using ovn-controller from master.
> > >
> > > @Han, others, do you have any ideas as of what could be happening
> > > here? We'll be able to use this setup for a few more days so let me
> > > know if you want us to pull some other data/traces, ...
> > >
> > > Some other interesting things:
> > > On each of the compute nodes, (with an almost evenly distributed
> > > number of logical ports bound to them), the max amount of logical
> > > flows in br-int is ~90K (by the end of the test, right before deleting
> > > the resources).
> > >
> > > It looks like with the IP version, ovn-controller leaks some memory:
> > > https://imgur.com/a/trQrhWd
> > > While with OVS 2.10, it remains pretty flat during the test:
> > > https://imgur.com/a/KCkIT4O
> >
> > Hi Daniel, Han,
> >
> > I just sent a small patch for the ovn-controller memory leak:
> > https://patchwork.ozlabs.org/patch/1113758/
> >
> > At least on my setup this is what valgrind was pointing at.
> >
> > Cheers,
> > Dumitru
> >
> > >
> > > Looking forward to hearing back :)
> > > Daniel
> > >
> > > PS. Sorry for my previous email, I sent it by mistake without the subject
> > > ___
> > > discuss mailing list
> > > disc...@openvswitch.org
> > > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>
> Thanks Daniel for the testing and reporting, and thanks Dumitru for fixing 
> the memory leak.
>
> Currently ovn-controller incremental processing only handles below SB changes 
> incrementally:
> - logical_flow
> - port_binding (for regular VIF binding NOT on current chassis)
> - mc_group
> - address_set
> - port_group
> - mac_binding
>
> So, in test scenario you described, since each iteration creates network (SB 
> datapath changes) and router ports (port_binding changes for non VIF), the 
> incremental processing would not help much, because most steps in your test 
> should trigger recompute. It would help if you create more Fake VMs in each 
> iteration, e.g. create 10 VMs or more on each LS. Secondly, when VIF 
> port-binding happens on current chassis, the ovn-controller will still do 
> re-compute, and because you have only 4 

Re: [ovs-discuss] [OVN] ovn-controller Incremental Processing scale testing

2019-06-11 Thread Han Zhou
On Tue, Jun 11, 2019 at 5:12 AM Dumitru Ceara  wrote:
>
> On Tue, Jun 11, 2019 at 10:40 AM Daniel Alvarez Sanchez
>  wrote:
> >
> > Hi Han, all,
> >
> > Lucas, Numan and I have been doing some 'scale' testing of OpenStack
> > using OVN and wanted to present some results and issues that we've
> > found with the Incremental Processing feature in ovn-controller. Below
> > is the scenario that we executed:
> >
> > * 7 baremetal nodes setup: 3 controllers (running
> > ovn-northd/ovsdb-servers in A/P with pacemaker) + 4 compute nodes. OVS
> > 2.10.
> > * The test consists on:
> >   - Create openstack network (OVN LS), subnet and router
> >   - Attach subnet to the router and set gw to the external network
> >   - Create an OpenStack port and apply a Security Group (ACLs to allow
> > UDP, SSH and ICMP).
> >   - Bind the port to one of the 4 compute nodes (randomly) by
> > attaching it to a network namespace.
> >   - Wait for the port to be ACTIVE in Neutron ('up == True' in NB)
> >   - Wait until the test can ping the port
> > * Running browbeat/rally with 16 simultaneous process to execute the
> > test above 150 times.
> > * When all the 150 'fake VMs' are created, browbeat will delete all
> > the OpenStack/OVN resources.
> >
> > We first tried with OVS/OVN 2.10 and pulled some results which showed
> > 100% success but ovn-controller is quite loaded (as expected) in all
> > the nodes especially during the deletion phase:
> >
> > - Compute node: https://imgur.com/a/tzxfrIR
> > - Controller node (ovn-northd and ovsdb-servers):
https://imgur.com/a/8ffKKYF
> >
> > After conducting the tests above, we replaced ovn-controller in all 7
> > nodes by the one with the current master branch (actually from last
> > week). We also replaced ovn-northd and ovsdb-servers but the
> > ovs-vswitchd has been left untouched (still on 2.10). The expected
> > results were to get less ovn-controller CPU usage and also better
> > times due to the Incremental Processing feature introduced recently.
> > However, the results don't look very good:
> >
> > - Compute node: https://imgur.com/a/wuq87F1
> > - Controller node (ovn-northd and ovsdb-servers):
https://imgur.com/a/99kiyDp
> >
> > One thing that we can tell from the ovs-vswitchd CPU consumption is
> > that it's much less in the Incremental Processing (IP) case which
> > apparently doesn't make much sense. This led us to think that perhaps
> > ovn-controller was not installing the necessary flows in the switch
> > and we confirmed this hypothesis by looking into the dataplane
> > results. Out of the 150 VMs, 10% of them were unreachable via ping
> > when using ovn-controller from master.
> >
> > @Han, others, do you have any ideas as of what could be happening
> > here? We'll be able to use this setup for a few more days so let me
> > know if you want us to pull some other data/traces, ...
> >
> > Some other interesting things:
> > On each of the compute nodes, (with an almost evenly distributed
> > number of logical ports bound to them), the max amount of logical
> > flows in br-int is ~90K (by the end of the test, right before deleting
> > the resources).
> >
> > It looks like with the IP version, ovn-controller leaks some memory:
> > https://imgur.com/a/trQrhWd
> > While with OVS 2.10, it remains pretty flat during the test:
> > https://imgur.com/a/KCkIT4O
>
> Hi Daniel, Han,
>
> I just sent a small patch for the ovn-controller memory leak:
> https://patchwork.ozlabs.org/patch/1113758/
>
> At least on my setup this is what valgrind was pointing at.
>
> Cheers,
> Dumitru
>
> >
> > Looking forward to hearing back :)
> > Daniel
> >
> > PS. Sorry for my previous email, I sent it by mistake without the
subject
> > ___
> > discuss mailing list
> > disc...@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Thanks Daniel for the testing and reporting, and thanks Dumitru for fixing
the memory leak.

Currently ovn-controller incremental processing only handles below SB
changes incrementally:
- logical_flow
- port_binding (for regular VIF binding NOT on current chassis)
- mc_group
- address_set
- port_group
- mac_binding

So, in test scenario you described, since each iteration creates network
(SB datapath changes) and router ports (port_binding changes for non VIF),
the incremental processing would not help much, because most steps in your
test should trigger recompute. It would help if you create more Fake VMs in
each iteration, e.g. create 10 VMs or more on each LS. Secondly, when VIF
port-binding happens on current chassis, the ovn-controller will still do
re-compute, and because you have only 4 compute nodes, so 1/4 of the
compute node will still recompute even when binding a regular VIF port.
When you have more compute nodes you would see incremental processing more
effective.

However, what really worries me is the 10% VM unreachable. I have one
confusion here on the test steps. The last step you described was: - 

Re: [ovs-discuss] [OVN] ovn-controller Incremental Processing scale testing

2019-06-11 Thread Dumitru Ceara
On Tue, Jun 11, 2019 at 10:40 AM Daniel Alvarez Sanchez
 wrote:
>
> Hi Han, all,
>
> Lucas, Numan and I have been doing some 'scale' testing of OpenStack
> using OVN and wanted to present some results and issues that we've
> found with the Incremental Processing feature in ovn-controller. Below
> is the scenario that we executed:
>
> * 7 baremetal nodes setup: 3 controllers (running
> ovn-northd/ovsdb-servers in A/P with pacemaker) + 4 compute nodes. OVS
> 2.10.
> * The test consists on:
>   - Create openstack network (OVN LS), subnet and router
>   - Attach subnet to the router and set gw to the external network
>   - Create an OpenStack port and apply a Security Group (ACLs to allow
> UDP, SSH and ICMP).
>   - Bind the port to one of the 4 compute nodes (randomly) by
> attaching it to a network namespace.
>   - Wait for the port to be ACTIVE in Neutron ('up == True' in NB)
>   - Wait until the test can ping the port
> * Running browbeat/rally with 16 simultaneous process to execute the
> test above 150 times.
> * When all the 150 'fake VMs' are created, browbeat will delete all
> the OpenStack/OVN resources.
>
> We first tried with OVS/OVN 2.10 and pulled some results which showed
> 100% success but ovn-controller is quite loaded (as expected) in all
> the nodes especially during the deletion phase:
>
> - Compute node: https://imgur.com/a/tzxfrIR
> - Controller node (ovn-northd and ovsdb-servers): https://imgur.com/a/8ffKKYF
>
> After conducting the tests above, we replaced ovn-controller in all 7
> nodes by the one with the current master branch (actually from last
> week). We also replaced ovn-northd and ovsdb-servers but the
> ovs-vswitchd has been left untouched (still on 2.10). The expected
> results were to get less ovn-controller CPU usage and also better
> times due to the Incremental Processing feature introduced recently.
> However, the results don't look very good:
>
> - Compute node: https://imgur.com/a/wuq87F1
> - Controller node (ovn-northd and ovsdb-servers): https://imgur.com/a/99kiyDp
>
> One thing that we can tell from the ovs-vswitchd CPU consumption is
> that it's much less in the Incremental Processing (IP) case which
> apparently doesn't make much sense. This led us to think that perhaps
> ovn-controller was not installing the necessary flows in the switch
> and we confirmed this hypothesis by looking into the dataplane
> results. Out of the 150 VMs, 10% of them were unreachable via ping
> when using ovn-controller from master.
>
> @Han, others, do you have any ideas as of what could be happening
> here? We'll be able to use this setup for a few more days so let me
> know if you want us to pull some other data/traces, ...
>
> Some other interesting things:
> On each of the compute nodes, (with an almost evenly distributed
> number of logical ports bound to them), the max amount of logical
> flows in br-int is ~90K (by the end of the test, right before deleting
> the resources).
>
> It looks like with the IP version, ovn-controller leaks some memory:
> https://imgur.com/a/trQrhWd
> While with OVS 2.10, it remains pretty flat during the test:
> https://imgur.com/a/KCkIT4O

Hi Daniel, Han,

I just sent a small patch for the ovn-controller memory leak:
https://patchwork.ozlabs.org/patch/1113758/

At least on my setup this is what valgrind was pointing at.

Cheers,
Dumitru

>
> Looking forward to hearing back :)
> Daniel
>
> PS. Sorry for my previous email, I sent it by mistake without the subject
> ___
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss