Thanks for the updates here Salvatore, and for continuing to push on this! This is all great work!
On Jan 2, 2014, at 6:57 AM, Salvatore Orlando <[email protected]> wrote: > > Hi again, > > I've now run the experimental job a good deal of times, and I've filed bugs > for all the issues which came out. > Most of them occurred no more than once among all test execution (I think > about 30). > > They're all tagged with neutron-parallel [1]. for ease of tracking, I've > associated all the bug reports with neutron, but some are probably more > tempest or nova issues. > > Salvatore > > [1] https://bugs.launchpad.net/neutron/+bugs?field.tag=neutron-parallel > > > On 27 December 2013 11:09, Salvatore Orlando <[email protected]> wrote: > Hi, > > We now have several patches under review which improve a lot how neutron > handles parallel testing. > In a nutshell, these patches try to ensure the ovs agent processes new, > removed, and updated interfaces as soon as possible, > > These patches are: > https://review.openstack.org/#/c/61105/ > https://review.openstack.org/#/c/61964/ > https://review.openstack.org/#/c/63100/ > https://review.openstack.org/#/c/63558/ > > There is still room for improvement. For instance the calls from the agent > into the plugins might be consistently reduced. > However, even if the above patches shrink a lot the time required for > processing a device, we are still hitting a hard limit with the execution ovs > commands for setting local vlan tags and clearing flows (or adding the flow > rule for dropping all the traffic). > In some instances this commands slow down a lot, requiring almost 10 seconds > to complete. This adds a delay in interface processing which in some cases > leads to the hideous SSH timeout error (the same we see with bug 1253896 in > normal testing). > It is also worth noting that when this happens sysstat reveal CPU usage is > very close to 100% > > From the neutron side there is little we can do. Introducing parallel > processing for interface, as we do for the l3 agent, is not actually a > solution, since ovs-vswitchd v1.4.x, the one executed on gate tests, is not > multithreaded. If you think the situation might be improved by changing the > logic for handling local vlan tags and putting ports on the dead vlan, I > would be happy to talk about that. > On my local machines I've seen a dramatic improvement in processing times by > installing ovs 2.0.0, which has a multi-threaded vswitchd. Is this something > we might consider for gate tests? Also, in order to reduce CPU usage on the > gate (and making tests a bit faster), there is a tempest patch which stops > creating and wiring neutron routers when they're not needed: > https://review.openstack.org/#/c/62962/ > > Even in my local setup which succeeds about 85% of times, I'm still seeing > some occurrences of the issue described in [1], which at the end of the day > seems a dnsmasq issue. > > Beyond the 'big' structural problem discussed above, there are some minor > problems with a few tests: > > 1) test_network_quotas.test_create_ports_until_quota_hit fails about 90% of > times. I think this is because the test itself should be made aware of > parallel execution and asynchronous events, and there is a patch for this > already: https://review.openstack.org/#/c/64217 > > 2) test_attach_interfaces.test_create_list_show_delete_interfaces fails about > 66% of times. The failure is always on an assertion made after deletion of > interfaces, which probably means the interface is not deleted within 5 > seconds. I think this might be a consequence of the higher load on the > neutron service and we might try to enable multiple workers on the gate to > this aim, or just increase the tempest timeout. On a slightly different note, > allow me to say that the way assertion are made on this test might be > improved a bit. So far one has to go through the code to see why the test > failed. > > Thanks for reading this rather long message. > Regards, > Salvatore > > [1] https://lists.launchpad.net/openstack/msg23817.html > > > > > On 2 December 2013 22:01, Kyle Mestery (kmestery) <[email protected]> wrote: > Yes, this is all great Salvatore and Armando! Thank you for all of this work > and the explanation behind it all. > > Kyle > > On Dec 2, 2013, at 2:24 PM, Eugene Nikanorov <[email protected]> wrote: > > > Salvatore and Armando, thanks for your great work and detailed explanation! > > > > Eugene. > > > > > > On Mon, Dec 2, 2013 at 11:48 PM, Joe Gordon <[email protected]> wrote: > > > > On Dec 2, 2013 9:04 PM, "Salvatore Orlando" <[email protected]> wrote: > > > > > > Hi, > > > > > > As you might have noticed, there has been some progress on parallel tests > > > for neutron. > > > In a nutshell: > > > * Armando fixed the issue with IP address exhaustion on the public > > > network [1] > > > * Salvatore has now a patch which has a 50% success rate (the last > > > failures are because of me playing with it) [2] > > > * Salvatore is looking at putting back on track full isolation [3] > > > * All the bugs affecting parallel tests can be queried here [10] > > > * This blueprint tracks progress made towards enabling parallel testing > > > [11] > > > _______________________________________________ OpenStack-dev mailing list [email protected] http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
