Thanks for the updates here Salvatore, and for continuing to push on
this! This is all great work!

On Jan 2, 2014, at 6:57 AM, Salvatore Orlando <[email protected]> wrote:
> 
> Hi again,
> 
> I've now run the experimental job a good deal of times, and I've filed bugs 
> for all the issues which came out.
> Most of them occurred no more than once among all test execution (I think 
> about 30).
> 
> They're all tagged with neutron-parallel [1]. for ease of tracking, I've 
> associated all the bug reports with neutron, but some are probably more 
> tempest or nova issues.
> 
> Salvatore
> 
> [1] https://bugs.launchpad.net/neutron/+bugs?field.tag=neutron-parallel
> 
> 
> On 27 December 2013 11:09, Salvatore Orlando <[email protected]> wrote:
> Hi,
> 
> We now have several patches under review which improve a lot how neutron 
> handles parallel testing.
> In a nutshell, these patches try to ensure the ovs agent processes new, 
> removed, and updated interfaces as soon as possible,
> 
> These patches are:
> https://review.openstack.org/#/c/61105/
> https://review.openstack.org/#/c/61964/
> https://review.openstack.org/#/c/63100/
> https://review.openstack.org/#/c/63558/
> 
> There is still room for improvement. For instance the calls from the agent 
> into the plugins might be consistently reduced.
> However, even if the above patches shrink a lot the time required for 
> processing a device, we are still hitting a hard limit with the execution ovs 
> commands for setting local vlan tags and clearing flows (or adding the flow 
> rule for dropping all the traffic).
> In some instances this commands slow down a lot, requiring almost 10 seconds 
> to complete. This adds a delay in interface processing which in some cases 
> leads to the hideous SSH timeout error (the same we see with bug 1253896 in 
> normal testing).
> It is also worth noting that when this happens sysstat reveal CPU usage is 
> very close to 100%
> 
> From the neutron side there is little we can do. Introducing parallel 
> processing for interface, as we do for the l3 agent, is not actually a 
> solution, since ovs-vswitchd v1.4.x, the one executed on gate tests, is not 
> multithreaded. If you think the situation might be improved by changing the 
> logic for handling local vlan tags and putting ports on the dead vlan, I 
> would be happy to talk about that.
> On my local machines I've seen a dramatic improvement in processing times by 
> installing ovs 2.0.0, which has a multi-threaded vswitchd. Is this something 
> we might consider for gate tests? Also, in order to reduce CPU usage on the 
> gate (and making tests a bit faster), there is a tempest patch which stops 
> creating and wiring neutron routers when they're not needed: 
> https://review.openstack.org/#/c/62962/
> 
> Even in my local setup which succeeds about 85% of times, I'm still seeing 
> some occurrences of the issue described in [1], which at the end of the day 
> seems a dnsmasq issue.
> 
> Beyond the 'big' structural problem discussed above, there are some minor 
> problems with a few tests:
> 
> 1) test_network_quotas.test_create_ports_until_quota_hit  fails about 90% of 
> times. I think this is because the test itself should be made aware of 
> parallel execution and asynchronous events, and there is a patch for this 
> already: https://review.openstack.org/#/c/64217
> 
> 2) test_attach_interfaces.test_create_list_show_delete_interfaces fails about 
> 66% of times. The failure is always on an assertion made after deletion of 
> interfaces, which probably means the interface is not deleted within 5 
> seconds. I think this might be a consequence of the higher load on the 
> neutron service and we might try to enable multiple workers on the gate to 
> this aim, or just increase the tempest timeout. On a slightly different note, 
> allow me to say that the way assertion are made on this test might be 
> improved a bit. So far one has to go through the code to see why the test 
> failed.
> 
> Thanks for reading this rather long message.
> Regards,
> Salvatore
> 
> [1] https://lists.launchpad.net/openstack/msg23817.html
> 
> 
> 
> 
> On 2 December 2013 22:01, Kyle Mestery (kmestery) <[email protected]> wrote:
> Yes, this is all great Salvatore and Armando! Thank you for all of this work
> and the explanation behind it all.
> 
> Kyle
> 
> On Dec 2, 2013, at 2:24 PM, Eugene Nikanorov <[email protected]> wrote:
> 
> > Salvatore and Armando, thanks for your great work and detailed explanation!
> >
> > Eugene.
> >
> >
> > On Mon, Dec 2, 2013 at 11:48 PM, Joe Gordon <[email protected]> wrote:
> >
> > On Dec 2, 2013 9:04 PM, "Salvatore Orlando" <[email protected]> wrote:
> > >
> > > Hi,
> > >
> > > As you might have noticed, there has been some progress on parallel tests 
> > > for neutron.
> > > In a nutshell:
> > > * Armando fixed the issue with IP address exhaustion on the public 
> > > network [1]
> > > * Salvatore has now a patch which has a 50% success rate (the last 
> > > failures are because of me playing with it) [2]
> > > * Salvatore is looking at putting back on track full isolation [3]
> > > * All the bugs affecting parallel tests can be queried here [10]
> > > * This blueprint tracks progress made towards enabling parallel testing 
> > > [11]
> > >




_______________________________________________
OpenStack-dev mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to