On 09/15/2014 05:42 AM, Daniel P. Berrange wrote: > On Sun, Sep 14, 2014 at 07:07:13AM +1000, Michael Still wrote: >> Just an observation from the last week or so... >> >> The biggest problem nova faces at the moment isn't code review latency. Our >> biggest problem is failing to fix our bugs so that the gate is reliable. >> The number of rechecks we've done in the last week to try and land code is >> truly startling. > > I consider both problems to be pretty much equally as important. I don't > think solving review latency or test reliabilty in isolation is enough to > save Nova. We need to tackle both problems as a priority. I tried to avoid > getting into my concerns about testing in my mail on review team bottlenecks > since I think we should address the problems independantly / in parallel. > >> I know that some people are focused by their employers on feature work, but >> those features aren't going to land in a world in which we have to hand >> walk everything through the gate. > > Unfortunately the reliability of the gate systems has the highest negative > impact on productivity right at the point in the dev cycle where we need > it to have the least impact too. > > If we're going to continue to raise the bar in terms of testing coverage > then we need to have a serious look at the overall approach we use for > testing because what we do today isn't going to scale, even if it is > 100% reliable. We can't keep adding new CI jobs for each new nova.conf > setting that introduces a new code path, because each job has major > implications for resource consumption (number of test nodes, log storage), > not to mention reliability. I think we need to figure out a way to get > more targetted testing of features, so we can keep the overall number > of jobs lower and the tests shorter. > > Instead of having a single tempest run that exercises all the Nova > functionality in one run, we need to figure out how to split it up > into independant functional areas. For example if we could isolate > tests which are affected by choice of cinder storage backend, then > we could run those subset of tests multiple times, once for each > supported cinder backend. Without this, the combinatorial explosion > of test jobs is going to kill us.
One of the top issues killing Nova patches last week was a unit test race (the wsgi worker one). There is no one to blame but Nova for that. Jay was really the only team member digging into it. I don't disagree on the disaggregation problem, however as lots of Nova devs are ignoring unit test fails at this point, unless that changes no other disaggregation is going make anything better. -Sean -- Sean Dague http://dague.net _______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev