Re: [b2g] Gaia Tree Closure Post-Mortem

Julien Wajsberg Fri, 21 Feb 2014 05:31:23 -0800

Le 21/02/2014 13:55, Gareth Aye a écrit :
> Idea #1: The Build Needn't Always Be 100% Green
>
> For a long time at the end of 2013, the build was mostly broken. We made a
> habit of reading the test results as
>
> {
>   green: "Build works!",
>   grey: "Build needs to be run again!",
>   red: "Build might not work!"
> }
>
> Our tests were less informative, but the tree stayed open. Nowadays, we
> judge greys and reds more harshly which keeps our build much greener but
> also pushes us to more extreme measures when tests fail inexplicably and/or
> intermittently.


A few weeks/months ago I did prefer that idea but after some time I
recognize that having a Green is better than anything else when you do a PR.

So no. :)

> Idea #2: Instead of Closing the Tree, Disable Tests and File Bugs
>
> Why do we close the tree when we find regressions? Because the broken tests
> can no longer keep the corresponding features' functionality from
> regressing even further. Closing the tree is like wearing an astronaut's
> suit when you have an autoimmune disease. You simply *cannot* risk being
> exposed to more bugs.
>
> But maybe we can risk being exposed to more bugs. One idea is that,
> whenever a regression pops up, we can simply disable the broken test and
> file a high priority bug to diagnose the regression, revert an offending
> patch or submit a fix, and re-enable the test.

I found that if there is not an incentive to work on the disabled tests,
they never get reenabled again. We have a lot of open bugs about
"reenable disabled test XXX".

So no.

> Idea #3: Always Throttle the Tests to Proactively Discover Intermittent
> Failures
>
> Why do we have regressions anyway? Who lands broken code in our tree? Can't
> we scold them and get on with our lives? Well, it's not that simple for the
> following two reasons.
>
>    1. Some tests only fail some of the time.
>    2. Projects which we're downstrseam from don't pay attention when they
>    break us.
>
> I'll address the first issue here and the second one in the next section.
> One idea that I've been championing on the mailing lists (and :evanxd has
> recently introduced a patch to automate) is that we throttle tests on
> checkins.
>
> Suppose Bob wrote the following contrived test using our most favorite test
> harness <http://visionmedia.github.io/mocha/>:
>
> test('should work', function() {
>   assert.ok(Math.random() > 0.5);
> });
>
> Then suppose further that Bob submitted a patch with his test, saw his pull
> request pass on CI, and merged his code. Then, all of a sudden, his test
> started burning when Alice checked in a completely unrelated patch 10
> minutes later. *Oh noes!* If we had setup our CI to run Bob's test enough
> times to tell with statistcal significance that it was passing, the whole
> debacle could have been avoided!
>
> I've encouraged people to do this on an ad hoc basis, but maybe if (Travis
> and/or :lightsofapollo) ever solve our testing capacity problems, we should
> make throttling tests a normal practice. On the one hand, machine time is
> expensive. On the other hand, so is developer time.

How many is enough? :)

> Idea #4: Make our JS UI Tests Visible on TBPL
>
> My next idea, which I alluded to earlier, is that we prevent gecko patches
> which break gaia from landing. We have setup our js marionette tests to run
> downstream from b2g-desktop builds on TBPL to do exactly this, however our
> test are currently hidden on TBPL. In my opinion, things are relatively
> stable, but we have sheriffs who may strongly
> disagree<https://bugzilla.mozilla.org/show_bug.cgi?id=960072#c8>?
> I would not be surprised if we are stable enough to be a visible test suite
> at this point, and I strongly recommend we look into unhiding these tests.

Note that the Gaia Python-based tests and the Gaia Unit Tests are
running successfully on TBPL and are not hidden. So we have this
already, which is not bad (but not enough).

> Idea #5: Invest in Infrastructure Which Helps Fix Regressions
>
> No matter how good our process and automation gets, we will have
> regressions. We have a *very* complex project with lots of dependencies and
> sometimes things just fall through the cracks. There are several tools that
> might have helped us recover faster this week which either haven't been
> built or haven't yet been used heavily in gaia. Some examples are:
>
>    - Tools for gaia/gecko bisection (meaning to look into :jhford's project)
>    - Crash reporting for js ui tests
>    - Tools uploading, sorting, and grouping screenshots taken via marionette
>
>

For a start, we need a Linux Debug B2G build on TBPL, so that we could
use it on Travis and more easily locally, and get core dumps. This is
one of the thing that blocked me yesterday...

-- 
Julien

signature.asc
Description: OpenPGP digital signature

_______________________________________________
dev-b2g mailing list
dev-b2g@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-b2g

Re: [b2g] Gaia Tree Closure Post-Mortem

Reply via email to