My concern was with the speed of response. I'm happy to have a QA Team switch that must be fixed (with an associated email to juju - dev so everyone knows why their patch won't land). I *would* like us to be tracking stuff like how long we go into regression mode, etc.
I think ideally the process would be automated, but our current CI seems to need a fair amount of manual filtering. On Jul 15, 2014 6:35 PM, "Curtis Hovey-Canonical" <cur...@canonical.com> wrote: > > We are doing some combinatorial testing because we need to ensure > every series+arch combination works. In Vegas Sprint, we settled on > unittests and lxc tests as the best way to identify issues with arch > or series. We test: > > precise + amd64 > utopic + amd64 > trusty + amd64 > trusty + i386 > trusty + ppc64 > trusty + arm64 > That looks M+N to me. (All series amd64 + trust for all arches). The MxN would be all series x all arches. ... > > I have the feeling, though, that "better CI" might be making some developers > > a bit more lax and doing less direct testing themselves, because they expect > > that CI will catch things they don't. > > I don't feel this. I think the problem is the complexity of Juju. > Mongo changes for HA broken the backup-restore feature, I think these > are different areas of expertise that needed better coordination. I think there was also some Auth changes that meant we couldn't bootstrap at all. I really like that CI caught it. I wonder if it had to get that far. > > > I like the stop-the-line-when-CI-is-broken, as long as we have reliable ways > > to stop it. Given the timescales we're working on, I'd probably be ok with > > having it be a manual thing, so that when Azure decides to rev their API and > > break everything that used to work, we aren't immediately crippled. Maybe we > > can identify a subset of CI that is reliable (or high priority) enough that > > it really is automatically stop-the-line worthy. (Trusty unit tests, PPC > > unit tests, local provider, ec2 tests come to mind.) > > Cloud failures are not regressions in juju code. I spend a day or more > a week tweaking CI to give Juju the best chance of success. I might > change a test, or write a script that cleans up the resources in > cloud/host. > > Since I am taking time to give juju more chances to pass, I delay > reporting the bugs. 5 revisions might merge while I prove that juju is > really broken. Since the defect can mutate with the extra commits. it > isn't easy to identify the 1 or more revisions that are at fault. > > When we report a "ci regression" it is something we genuinely believe > to work when we retest an old revision. I do provide a list of commits > that can be investigated. > > As for automating a stop the line policy, we might be fine with a > small hack to the git-merge-juju job to check for commits that claim > to fix a regression, when not the case, the job fails early with the > reason that we are waiting for a specific fix. Rollback is always an > option. > > I absolutely support trying to find ways to help keep CI blue (green). It's definitely the background I come from and a culture I want us to have. I think a difficulty is figuring out who/what is responsible and the slow turn around to unblocking everything. If we make what we think is the fix even if it is just reverting a change, doesn't it take hours to run CI again and even then some bits may fail spuriously/for a different reason? If we need manual intervention on both ends that means a stop-the-line takes us out of working order for 24 hours. I'm just trying to explore the consequences. I really do think we need good feedback into keeping CI happy. John =:-> > > -- > Curtis Hovey > Canonical Cloud Development and Operations > http://launchpad.net/~sinzui > > -- > Juju-dev mailing list > Juju-dev@lists.ubuntu.com > Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
-- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev