Sorry, I missed reading that Pedro was asking to move the tests that run
training. I agree with that.

Additionally we should make the CI smart as I mentioned above.

-Naveen


On Thu, Jun 7, 2018 at 3:59 PM, Naveen Swamy <mnnav...@gmail.com> wrote:

> -1 for moving to nightly. I think that would be detrimental.
>
> We have to make our CI a little more smart and only build required
> components and not build all components to reduce cost and the time it
> takes to run CI. A Scala build need not build everything and run tests
> related to Python, etc.,
>
> Thanks, Naveen
>
> On Thu, Jun 7, 2018 at 9:57 AM, Marco de Abreu <
> marco.g.ab...@googlemail.com> wrote:
>
>> Thanks a lot for our input, Thomas! You are right, 3h are only hit if
>> somebody makes changes in their Dockerfiles and thus every node has to
>> rebuild their containers - but this is expected and inevitable.
>>
>> So far there have not been any big attempts to resolve the number of flaky
>> tests. We had a few people fixing some tests (and that's very
>> appreciated!!!), but it feels like we're introducing more than we can fix.
>> I'd definitely love a flaky-test-bash-week and proposed it a few times,
>> but
>> there was no success so far unfortunately.
>>
>> We will definitely not drop any platforms. What we will probably do after
>> the nightly tests are in place, is move some things like CentOS or
>> overlapping Python2/Python3 tests to nightly. We don't need to test
>> python2
>> and python3 compatibility half a dozen times on different platforms for
>> every commit.
>>
>> I've been thinking about merging the integration and unit test state and
>> I'm pretty tempted to do it. My only concern so far was the increased
>> cost.
>> I expect the nightly tests to be in place in about 2 weeks. I'd propose we
>> wait until then and then revisit which runs are actually required for
>> every
>> PR and which ones can be moved. During that process, we will probably
>> consolidate a lot of tests and put them into one stage.
>>
>> But I agree, past has shown that disabling tests did only mask the problem
>> and won't get them fixed. Also, quite a lot of failures have proven to be
>> actual bugs in our code. So from a customer perspective, we should
>> actually
>> give these failures a high priority. I hope they will get into the
>> spotlight after I provide the proper statistics.
>>
>> -Marco
>>
>>
>> On Thu, Jun 7, 2018 at 6:35 PM Thomas DELTEIL <thomas.delte...@gmail.com>
>> wrote:
>>
>> > Thanks for bringing the issue of CI stability!
>> >
>> > However I disagree with some points in this thread:
>> >
>> > - "We are at approximately 3h for a full successful run."
>> > => Looking at Jenkins I see the last successful runs oscillating between
>> > 1h53 and 2h42 with a mean that seems to be at 2h20. Or are you talking
>> > about something different than the jenkins CI run?
>> >
>> > - "For this I propose working towards moving tests from CI to nightly,
>> > specially
>> > the ones that take most time or do black box testing with full training
>> of
>> > models. And addressing flaky tests by either fixing them or *disabling
>> > them." *
>> > => Is there any evidence that some serious effort has been spent trying
>> to
>> > fix the flaky tests? I know Sheng and Marco have worked to
>> consolidating a
>> > list of Flaky tests, but I think simply disabling tests will just make
>> the
>> > platform weaker. Let's organize a flaky test week where we each take on
>> a
>> > couple of these flaky tests and hopefully we should make good progress
>> > towards stabilizing the CI.
>> >
>> > -"I'd like to disable flaky tests until they're fixed."
>> > => Wishful thinking IMO, we know this never happens, if we can't make
>> time
>> > now to fix them, we'll never go back and fix them.
>> >
>> > "I would want a turnaround time of less than 30 minutes and 0% failure
>> rate
>> > on master."
>> >  => With current timing, this means barely finishing the build step. Do
>> we
>> > propose dropping some platforms for building?
>> >
>> > I agree with some points:
>> >
>> > "Won't we end up in the same situation with so many flaky tests?" =>
>> pretty
>> > sure it will
>> > "This could be set to 100% for nightly, for example."[for the release]
>> =>
>> > That would be a given to me
>> > "I'm also currently working on a system that tracks all test failures,
>> so
>> > this will also cover nightly tests. This will give us actionable data "
>> =>
>> > Awesome, that would be great to have data on that to help prioritize
>> what
>> > to fix!
>> >
>> > I personally think if we disable most tests and move them to nightly
>> tests,
>> > we will decrease the trust and stability of the platform and it leaves
>> the
>> > door open to conflicting changes creating hard to debug failures. I
>> think
>> > the biggest potential win here is reducing test flakiness. That's the
>> one
>> > that is killing the productivity, we can redesign the test pipeline to
>> run
>> > integration and unit test in parallel and that would give us straight
>> away
>> > a 30 minutes reduced time in the CI run. Then we'd be always at <2h for
>> a
>> > build, which seems reasonable if it never fails for no reason.
>> >
>> > Thomas
>> >
>> > 2018-06-07 8:27 GMT-07:00 Marco de Abreu <marco.g.ab...@googlemail.com>
>> :
>> >
>> > > Yeah, I think we are at the point at which we have to disable tests..
>> > >
>> > > If a test fails in nightly, the commit would not be reverted since
>> it's
>> > > hard to pin a failure to a specific PR. We will have reporting for
>> > failures
>> > > on nightly (they have proven to be stable, so we can enable it right
>> from
>> > > the beginning). I'm also currently working on a system that tracks all
>> > test
>> > > failures, so this will also cover nightly tests. This will give us
>> > > actionable data which allows us to define acceptance criteria for a
>> > > release. E.g. if the test success rate is below X%, a release can not
>> be
>> > > made. This could be set to 100% for nightly, for example.
>> > >
>> > > It would definitely be good if we could determine which tests are
>> > required
>> > > to run and which ones are unnecessary. I don't really like the flag in
>> > the
>> > > comment (and also it's hard to integrate). A good idea would be some
>> > > analytics on the changed file content. If we have this data, we could
>> > > easily enable and disable different jobs. Since this behaviour is
>> > entirely
>> > > defined in GitHub, I'd like to invite everybody to submit a PR.
>> > >
>> > > -Marco
>> > >
>> > >
>> > >
>> > > On Thu, Jun 7, 2018 at 5:20 PM Aaron Markham <
>> aaron.s.mark...@gmail.com>
>> > > wrote:
>> > >
>> > > > I'd like to disable flaky tests until they're fixed.
>> > > > What would the process be for fixing a failure if the tests are done
>> > > > nightly? Would the commit be reverted? Won't we end up in the same
>> > > > situation with so many flaky tests?
>> > > >
>> > > > I'd like to see if we can separate the test pipelines based on the
>> > > content
>> > > > of the commit. I think that md, html, and js updates should fly
>> through
>> > > and
>> > > > not have to go through GPU tests.
>> > > >
>> > > > Maybe some special flag added to the comment?
>> > > > Is this possible?
>> > > >
>> > > >
>> > > > On Wed, Jun 6, 2018 at 10:37 PM, Pedro Larroy <
>> > > > pedro.larroy.li...@gmail.com>
>> > > > wrote:
>> > > >
>> > > > > Hi Team
>> > > > >
>> > > > > The time to validate a PR is growing, due to our number of
>> supported
>> > > > > platforms and increased time spent in testing and running
>> models.  We
>> > > are
>> > > > > at approximately 3h for a full successful run.
>> > > > >
>> > > > > This is compounded with the failure rate of builds due to flaky
>> tests
>> > > of
>> > > > > more than 50% which is a big drag in developer productivity if you
>> > can
>> > > > only
>> > > > > get one or two CI runs to a change per day.
>> > > > >
>> > > > > I would want a turnaround time of less than 30 minutes and 0%
>> failure
>> > > > rate
>> > > > > on master.
>> > > > >
>> > > > > For this I propose working towards moving tests from CI to
>> nightly,
>> > > > > specially the ones that take most time or do black box testing
>> with
>> > > full
>> > > > > training of models. And addressing flaky tests by either fixing
>> them
>> > or
>> > > > > disabling them.
>> > > > >
>> > > > > I would like to check if there's consensus on this previous plan
>> so
>> > we
>> > > > are
>> > > > > aligned on pursuing this common goal as a shared effort.
>> > > > >
>> > > > > Pedro.
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Reply via email to