Hey everyone,

I know we had quite a long period of flaky tests and accepting the
fact that we merge PRs with some tests failing because of the
flakiness.

However I think over a couple of months or so we have invested heavily
into fixing it  - a number of people tracked and fixed a big number of
flaky tests and what we have now is mostly "Green".

Yeah - sometimes it happens we - by mistake merge a change that causes
"main" failure (for example because our test harness is not perfect)
but we should fix those cases quickly (mostly by reverting the
offending commit and redoing it).

But I think we should (and I am talking about committers) stop the
case of merging "failed" PRs if we are not absolutely sure that the
failure is already fixed in PR (or being fixed) .

We had some changes merged recently (and I was as guilty as others)
where we merged a "real" failure without properly investigating the
root cause. The effect of that is the "broken window" effect - once
such PR gets merged, it fails other PRs (until fixed) and it makes
people impatient to merge PR with the failure because this is
"normal". It should be normal to only merge "green" PRs.

I propose that we change our approach and whenever we see a "red"
build every committer's approach should be :

* investigate the root cause
* if it's main - attempt to fix it in main first before merging (could
be by reverting the failed commit)
* or discuss it in #development /devlist if it is not easy to find
* and generally only merge a failed PR if you are absolutely sure the
failure has already been fixed (or you know someone works on fixing
it) - and ALWAYS comment about it in the PR explaining why you merge
failed PR

This is a proposal, happy to discuss it if others think differently. WDYT ?

J.

Reply via email to