Re: Devel is broken, we cannot release

John Meinel Tue, 15 Jul 2014 07:59:38 -0700

My concern was with the speed of response. I'm happy to have a QA Team
switch that must be fixed (with an associated email to juju - dev so
everyone knows why their patch won't land). I *would* like us to be
tracking stuff like how long we go into regression mode, etc.


I think ideally the process would be automated, but our current CI seems to
need a fair amount of manual filtering.

On Jul 15, 2014 6:35 PM, "Curtis Hovey-Canonical" <cur...@canonical.com>
wrote:
>

> We are doing some combinatorial testing because we need to ensure
> every series+arch combination works. In Vegas Sprint, we settled on
> unittests and lxc tests as the best way to identify issues with arch
> or series. We test:
>
>   precise + amd64
>   utopic + amd64
>   trusty + amd64
>   trusty + i386
>   trusty + ppc64
>   trusty + arm64
>

That looks M+N to me. (All series amd64 + trust for all arches). The MxN
would be all series x all arches.

...
> > I have the feeling, though, that "better CI" might be making some
developers
> > a bit more lax and doing less direct testing themselves, because they
expect
> > that CI will catch things they don't.
>
> I don't feel this. I think the problem is the complexity of Juju.
> Mongo changes for HA broken the backup-restore feature, I think these
> are different areas of expertise that needed better coordination.

I think there was also some Auth changes that meant we couldn't bootstrap
at all.
I really like that CI caught it. I wonder if it had to get that far.

>
> > I like the stop-the-line-when-CI-is-broken, as long as we have reliable
ways
> > to stop it. Given the timescales we're working on, I'd probably be ok
with
> > having it be a manual thing, so that when Azure decides to rev their
API and
> > break everything that used to work, we aren't immediately crippled.
Maybe we
> > can identify a subset of CI that is reliable (or high priority) enough
that
> > it really is automatically stop-the-line worthy. (Trusty unit tests, PPC
> > unit tests, local provider, ec2 tests come to mind.)
>
> Cloud failures are not regressions in juju code. I spend a day or more
> a week tweaking CI to give Juju the best chance of success. I might
> change a test, or write a script that cleans up the resources in
> cloud/host.
>
> Since I am taking time to give juju more chances to pass, I delay
> reporting the bugs. 5 revisions might merge while I prove that juju is
> really broken. Since the defect can mutate with the extra commits. it
> isn't easy to identify the 1 or more revisions that are at fault.
>
> When we report a "ci regression" it is something we genuinely believe
> to work when we retest an old revision. I do provide a list of commits
> that can be investigated.
>
> As for automating a stop the line policy, we might be fine with a
> small hack to the git-merge-juju job to check for commits that claim
> to fix a regression, when not the case, the job fails early with the
> reason that we are waiting for a specific fix. Rollback is always an
> option.
>
>

I absolutely support trying to find ways to help keep CI blue (green). It's
definitely the background I come from and a culture I want us to have.
I think a difficulty is figuring out who/what is responsible and the slow
turn around to unblocking everything. If we make what we think is the fix
even if it is just reverting a change, doesn't it take hours to run CI
again and even then some bits may fail spuriously/for a different reason?
If we need manual intervention on both ends that means a stop-the-line
takes us out of working order for 24 hours. I'm just trying to explore the
consequences. I really do think we need good feedback into keeping CI happy.

John
=:->
>
> --
> Curtis Hovey
> Canonical Cloud Development and Operations
> http://launchpad.net/~sinzui
>
> --
> Juju-dev mailing list
> Juju-dev@lists.ubuntu.com
> Modify settings or unsubscribe at:
https://lists.ubuntu.com/mailman/listinfo/juju-dev

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Devel is broken, we cannot release

Reply via email to