On 6/10/2014 5:36 AM, Michael Still wrote:
https://review.openstack.org/99002 adds more logging to
nova/network/manager.py, but I think you're not going to love the
debug log level. Was this the sort of thing you were looking for
though?

Michael

On Mon, Jun 9, 2014 at 11:45 PM, Sean Dague <s...@dague.net> wrote:
Based on some back of envelope math the gate is basically processing 2
changes an hour, failing one of them. So if you want to know how long
the gate is, take the length / 2 in hours.

Right now we're doing a lot of revert roulette, trying to revert things
that we think landed about the time things went bad. I call this
roulette because in many cases the actual issue isn't well understood. A
key reason for this is:

*nova network is a blackhole*

There is no work unit logging in nova-network, and no attempted
verification that the commands it ran did a thing. Most of these
failures that we don't have good understanding of are the network not
working under nova-network.

So we could *really* use a volunteer or two to prioritize getting that
into nova-network. Without it we might manage to turn down the failure
rate by reverting things (or we might not) but we won't really know why,
and we'll likely be here again soon.

         -Sean

--
Sean Dague
http://dague.net


_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev





I mentioned this in the nova meeting today also but the assocated bug for the nova-network ssh timeout issue is bug 1298472 [1].

My latest theory on that one is if there could be a race/network leak in the ec2 third party tests in Tempest or something in the ec2 API in nova, because I saw this [2] showing up in the n-net logs. My thinking is the tests or the API are not tearing down cleanly and eventually network resources are leaked and we start hitting those timeouts. Just a theory at this point, but the ec2 3rd party tests do run concurrently with the scenario tests so things could be colliding at that point, but I haven't had time to dig into it, plus I have very little experience in those tests or the ec2 API in nova.

[1] https://bugs.launchpad.net/tempest/+bug/1298472
[2] http://goo.gl/6f1dfw

--

Thanks,

Matt Riedemann


_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to