On 17 August 2013 23:49, Salvatore Orlando <sorla...@nicira.com> wrote: > I tend to agree that when the gate for a project is broken, nothing should > be merged for that project until the gate jobs are green again. > In the case of Neutron, making the job non voting only caused more bugs to > slip through, and that meant more works for the developer themselves, and > more headaches for developers of other projects relying on it.
> When dealing with intermittent failures, like the bug which probably started > the issues we've been witnessing in the past 3 weeks, I think it might a > sensible idea to make the job non-voting only for projects which surely > can't be the cause of the gate failure; or perhaps skip the offending test > only. > > This means however asymettrical gating, and from Monty's post it seems > there's something quite wrong with it. However, due to my lack of expertise > on the subject, I am unable to see the issue with it. > > Salvatore The asymmetry we should fear is when project A can land something something which will break project B. In this case the proposal is to say 'B is broken already, permit A to land things without remorse until B is unbroken'. The problem is, if A makes the breakage of B worse, B ends up in catchup mode, which is most unfun. Concretely, take heat for A and neutron for B. Tempest d-g jobs start failing in neutron, so they are made skips. Now heat could make neutron tests in tempest worse, and we won't know - or if we do know, they'll still land. Previous discussion here has endorsed 'revert problematic commits, it's not blame on the developer, just do it', so I'm not going to mention that. What I will suggest we do is start running some number - lets say 20 - of midnight state jobs, all identical. Ignoring datetime sensitive tests, which are fortunately rare, this should identify tests that fail 5% of the time, independent of incoming commits. We can use this to generate a baseline reference for which tests fail intermittently in trunk, and when something breaks intermittently outside of that set, we can be pretty *sure* it's in the last days commits. Secondly, in principle it should be straight forward to do this for any point in time, so when a new problem shows it's head, we can start a bisection up programmatically - independent of the dev analysis - to find where it was introduced. If we have resources we could even do N-section rather than bisection. Killing all intermittent issues test suites is /hard/, so I think we need to have a belt-and-braces approach and engineer a rapid response system to spikes in intermittent failures, in addition to working on the failures themselves. -Rob -- Robert Collins <rbtcoll...@hp.com> Distinguished Technologist HP Converged Cloud _______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev