Clayton, 

This is really good information. 

I’m wondering how we can help support you and get the necessary dev support to 
get this resolved sooner than later. I totally agree with you that this should 
be backported to at least Liberty. 

Please let me know how I and other can help!

—Joe









On 2/10/16, 8:55 AM, "Clayton O'Neill" <[email protected]> wrote:

>Summary: Liberty OVS agent restarts are better, but still need work.
>See: https://bugs.launchpad.net/neutron/+bug/1514056
>
>As many of you know, Liberty has a fix for OVS agent restarts such
>that it doesn’t dump all flows when starting, resulting in a loss of
>traffic.  Unfortunately, Liberty neutron still has issues with OVS
>agent restarts.  The fix that went into Liberty prevents it from
>dropping flows on the br-tun and br-int bridges and that helps
>greatly, but the br-ex bridge still has it’s flows cleared on startup.
>
>You may be thinking: Wait, br-ex only has like 3 flows on it, how can
>that be a problem?  The issue appears to be that the br-ex flows are
>cleared early and not setup again until late in the process.  This
>means that routers on the node where OVS agent is lose network
>connectivity for the majority of the restart time.
>
>I did some testing with this yesterday, comparing a few scenarios with
>100 FIPS, 100 instances and various scenarios for routers.  You can
>find the the complete data here:
>https://docs.google.com/spreadsheets/d/1ZGra_MszBlL0fNsFqd4nOvh1PsgWu58-GxEeh1m1BPw/edit?usp=sharing
>
>The summary looks like this:
>100 routers, 100 networks, 100 floating ips, 100 instances, single node test:
>Kilo average outage time: 47 seconds
>Liberty average outage time: 37 seconds
>
>1 router, 1 network, 100 floating ips, 100 instances, single node test:
>Kilo average outage time: 46 seconds
>Liberty average outage time: 13 seconds
>
>1 router, 1 network, 100 floating its, 100 instances, router on a
>separate node, all instances on a single node, OVS restart on compute
>node:
>Kilo average outage time: 25 seconds
>Liberty average outage time: 0 to 1 seconds
>
>I did my testing using 1 second pings using fping to all of the
>floating IPs.  With the last test, it frequently lost no packets, and
>as a result I was not really able to test the scenario other than to
>qualify it as good.
>
>This is a huge operational issue for us and I suspect for many of the
>rest of you using OVS.  I’d encourage everyone that is using OVS to
>register interest in having this fixed in the LP bug
>(https://bugs.launchpad.net/neutron/+bug/1514056).  Right now this bug
>as marked as low priority.
>
>_______________________________________________
>OpenStack-operators mailing list
>[email protected]
>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
OpenStack-operators mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Reply via email to