Orange is the new Bad (Gij)

Michael Henretty Wed, 04 Nov 2015 07:41:42 -0800

Hi Gaia Folk,

If you've been doing Gaia core work for any length of time, you are
probably aware that we have *many* intermittent Gij test failures on
Treeherder [1]. But the problem is even worse than you may know! You see,
each Gij test is run 5 times within a test chunk (g. Gij4) before it is
marked as failing. Then that chunk itself is retried up to 5 times before
the whole thing is marked as failing. This means that for a test to be
marked as "passing," it only has to run successfully once in *25* times.
I'm not kidding. Our retry logic, especially those inside the test chunk,
make it hard to know which intermittent tests are our worst offenders. This
is bad.


My suggestion is to stop doing the retries inside the chunks. That way, the
failures will at least surface on Treeherder, which means we can star more
test, which means we'll have a lot more visibility on the bad
intermittents. Sheriffs will complain a lot, so we have to be ready to act
on these bugs. But the alternative is that we continue to write tests with
a low "raciness" bar which, IMO, have a much lower chance of catching
regressions. The longer we wait, the worse this problem becomes.

Thoughts?

Thanks,
Michael

1.)
https://bugzilla.mozilla.org/buglist.cgi?keywords=intermittent-failure&keywords_type=allwords&list_id=12657856&resolution=---&query_format=advanced&product=Firefox%20OS

_______________________________________________
dev-fxos mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-fxos

Orange is the new Bad (Gij)

Reply via email to