Thanks a lot for our input, Thomas! You are right, 3h are only hit if
somebody makes changes in their Dockerfiles and thus every node has to
rebuild their containers - but this is expected and inevitable.

So far there have not been any big attempts to resolve the number of flaky
tests. We had a few people fixing some tests (and that's very
appreciated!!!), but it feels like we're introducing more than we can fix.
I'd definitely love a flaky-test-bash-week and proposed it a few times, but
there was no success so far unfortunately.

We will definitely not drop any platforms. What we will probably do after
the nightly tests are in place, is move some things like CentOS or
overlapping Python2/Python3 tests to nightly. We don't need to test python2
and python3 compatibility half a dozen times on different platforms for
every commit.

I've been thinking about merging the integration and unit test state and
I'm pretty tempted to do it. My only concern so far was the increased cost.
I expect the nightly tests to be in place in about 2 weeks. I'd propose we
wait until then and then revisit which runs are actually required for every
PR and which ones can be moved. During that process, we will probably
consolidate a lot of tests and put them into one stage.

But I agree, past has shown that disabling tests did only mask the problem
and won't get them fixed. Also, quite a lot of failures have proven to be
actual bugs in our code. So from a customer perspective, we should actually
give these failures a high priority. I hope they will get into the
spotlight after I provide the proper statistics.

-Marco


On Thu, Jun 7, 2018 at 6:35 PM Thomas DELTEIL <thomas.delte...@gmail.com>
wrote:

> Thanks for bringing the issue of CI stability!
>
> However I disagree with some points in this thread:
>
> - "We are at approximately 3h for a full successful run."
> => Looking at Jenkins I see the last successful runs oscillating between
> 1h53 and 2h42 with a mean that seems to be at 2h20. Or are you talking
> about something different than the jenkins CI run?
>
> - "For this I propose working towards moving tests from CI to nightly,
> specially
> the ones that take most time or do black box testing with full training of
> models. And addressing flaky tests by either fixing them or *disabling
> them." *
> => Is there any evidence that some serious effort has been spent trying to
> fix the flaky tests? I know Sheng and Marco have worked to consolidating a
> list of Flaky tests, but I think simply disabling tests will just make the
> platform weaker. Let's organize a flaky test week where we each take on a
> couple of these flaky tests and hopefully we should make good progress
> towards stabilizing the CI.
>
> -"I'd like to disable flaky tests until they're fixed."
> => Wishful thinking IMO, we know this never happens, if we can't make time
> now to fix them, we'll never go back and fix them.
>
> "I would want a turnaround time of less than 30 minutes and 0% failure rate
> on master."
>  => With current timing, this means barely finishing the build step. Do we
> propose dropping some platforms for building?
>
> I agree with some points:
>
> "Won't we end up in the same situation with so many flaky tests?" => pretty
> sure it will
> "This could be set to 100% for nightly, for example."[for the release] =>
> That would be a given to me
> "I'm also currently working on a system that tracks all test failures, so
> this will also cover nightly tests. This will give us actionable data " =>
> Awesome, that would be great to have data on that to help prioritize what
> to fix!
>
> I personally think if we disable most tests and move them to nightly tests,
> we will decrease the trust and stability of the platform and it leaves the
> door open to conflicting changes creating hard to debug failures. I think
> the biggest potential win here is reducing test flakiness. That's the one
> that is killing the productivity, we can redesign the test pipeline to run
> integration and unit test in parallel and that would give us straight away
> a 30 minutes reduced time in the CI run. Then we'd be always at <2h for a
> build, which seems reasonable if it never fails for no reason.
>
> Thomas
>
> 2018-06-07 8:27 GMT-07:00 Marco de Abreu <marco.g.ab...@googlemail.com>:
>
> > Yeah, I think we are at the point at which we have to disable tests..
> >
> > If a test fails in nightly, the commit would not be reverted since it's
> > hard to pin a failure to a specific PR. We will have reporting for
> failures
> > on nightly (they have proven to be stable, so we can enable it right from
> > the beginning). I'm also currently working on a system that tracks all
> test
> > failures, so this will also cover nightly tests. This will give us
> > actionable data which allows us to define acceptance criteria for a
> > release. E.g. if the test success rate is below X%, a release can not be
> > made. This could be set to 100% for nightly, for example.
> >
> > It would definitely be good if we could determine which tests are
> required
> > to run and which ones are unnecessary. I don't really like the flag in
> the
> > comment (and also it's hard to integrate). A good idea would be some
> > analytics on the changed file content. If we have this data, we could
> > easily enable and disable different jobs. Since this behaviour is
> entirely
> > defined in GitHub, I'd like to invite everybody to submit a PR.
> >
> > -Marco
> >
> >
> >
> > On Thu, Jun 7, 2018 at 5:20 PM Aaron Markham <aaron.s.mark...@gmail.com>
> > wrote:
> >
> > > I'd like to disable flaky tests until they're fixed.
> > > What would the process be for fixing a failure if the tests are done
> > > nightly? Would the commit be reverted? Won't we end up in the same
> > > situation with so many flaky tests?
> > >
> > > I'd like to see if we can separate the test pipelines based on the
> > content
> > > of the commit. I think that md, html, and js updates should fly through
> > and
> > > not have to go through GPU tests.
> > >
> > > Maybe some special flag added to the comment?
> > > Is this possible?
> > >
> > >
> > > On Wed, Jun 6, 2018 at 10:37 PM, Pedro Larroy <
> > > pedro.larroy.li...@gmail.com>
> > > wrote:
> > >
> > > > Hi Team
> > > >
> > > > The time to validate a PR is growing, due to our number of supported
> > > > platforms and increased time spent in testing and running models.  We
> > are
> > > > at approximately 3h for a full successful run.
> > > >
> > > > This is compounded with the failure rate of builds due to flaky tests
> > of
> > > > more than 50% which is a big drag in developer productivity if you
> can
> > > only
> > > > get one or two CI runs to a change per day.
> > > >
> > > > I would want a turnaround time of less than 30 minutes and 0% failure
> > > rate
> > > > on master.
> > > >
> > > > For this I propose working towards moving tests from CI to nightly,
> > > > specially the ones that take most time or do black box testing with
> > full
> > > > training of models. And addressing flaky tests by either fixing them
> or
> > > > disabling them.
> > > >
> > > > I would like to check if there's consensus on this previous plan so
> we
> > > are
> > > > aligned on pursuing this common goal as a shared effort.
> > > >
> > > > Pedro.
> > > >
> > >
> >
>

Reply via email to