Clayton, This is really good information.
I’m wondering how we can help support you and get the necessary dev support to get this resolved sooner than later. I totally agree with you that this should be backported to at least Liberty. Please let me know how I and other can help! —Joe On 2/10/16, 8:55 AM, "Clayton O'Neill" <[email protected]> wrote: >Summary: Liberty OVS agent restarts are better, but still need work. >See: https://bugs.launchpad.net/neutron/+bug/1514056 > >As many of you know, Liberty has a fix for OVS agent restarts such >that it doesn’t dump all flows when starting, resulting in a loss of >traffic. Unfortunately, Liberty neutron still has issues with OVS >agent restarts. The fix that went into Liberty prevents it from >dropping flows on the br-tun and br-int bridges and that helps >greatly, but the br-ex bridge still has it’s flows cleared on startup. > >You may be thinking: Wait, br-ex only has like 3 flows on it, how can >that be a problem? The issue appears to be that the br-ex flows are >cleared early and not setup again until late in the process. This >means that routers on the node where OVS agent is lose network >connectivity for the majority of the restart time. > >I did some testing with this yesterday, comparing a few scenarios with >100 FIPS, 100 instances and various scenarios for routers. You can >find the the complete data here: >https://docs.google.com/spreadsheets/d/1ZGra_MszBlL0fNsFqd4nOvh1PsgWu58-GxEeh1m1BPw/edit?usp=sharing > >The summary looks like this: >100 routers, 100 networks, 100 floating ips, 100 instances, single node test: >Kilo average outage time: 47 seconds >Liberty average outage time: 37 seconds > >1 router, 1 network, 100 floating ips, 100 instances, single node test: >Kilo average outage time: 46 seconds >Liberty average outage time: 13 seconds > >1 router, 1 network, 100 floating its, 100 instances, router on a >separate node, all instances on a single node, OVS restart on compute >node: >Kilo average outage time: 25 seconds >Liberty average outage time: 0 to 1 seconds > >I did my testing using 1 second pings using fping to all of the >floating IPs. With the last test, it frequently lost no packets, and >as a result I was not really able to test the scenario other than to >qualify it as good. > >This is a huge operational issue for us and I suspect for many of the >rest of you using OVS. I’d encourage everyone that is using OVS to >register interest in having this fixed in the LP bug >(https://bugs.launchpad.net/neutron/+bug/1514056). Right now this bug >as marked as low priority. > >_______________________________________________ >OpenStack-operators mailing list >[email protected] >http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ OpenStack-operators mailing list [email protected] http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
