Sorry, I missed reading that Pedro was asking to move the tests that run training. I agree with that.
Additionally we should make the CI smart as I mentioned above. -Naveen On Thu, Jun 7, 2018 at 3:59 PM, Naveen Swamy <mnnav...@gmail.com> wrote: > -1 for moving to nightly. I think that would be detrimental. > > We have to make our CI a little more smart and only build required > components and not build all components to reduce cost and the time it > takes to run CI. A Scala build need not build everything and run tests > related to Python, etc., > > Thanks, Naveen > > On Thu, Jun 7, 2018 at 9:57 AM, Marco de Abreu < > marco.g.ab...@googlemail.com> wrote: > >> Thanks a lot for our input, Thomas! You are right, 3h are only hit if >> somebody makes changes in their Dockerfiles and thus every node has to >> rebuild their containers - but this is expected and inevitable. >> >> So far there have not been any big attempts to resolve the number of flaky >> tests. We had a few people fixing some tests (and that's very >> appreciated!!!), but it feels like we're introducing more than we can fix. >> I'd definitely love a flaky-test-bash-week and proposed it a few times, >> but >> there was no success so far unfortunately. >> >> We will definitely not drop any platforms. What we will probably do after >> the nightly tests are in place, is move some things like CentOS or >> overlapping Python2/Python3 tests to nightly. We don't need to test >> python2 >> and python3 compatibility half a dozen times on different platforms for >> every commit. >> >> I've been thinking about merging the integration and unit test state and >> I'm pretty tempted to do it. My only concern so far was the increased >> cost. >> I expect the nightly tests to be in place in about 2 weeks. I'd propose we >> wait until then and then revisit which runs are actually required for >> every >> PR and which ones can be moved. During that process, we will probably >> consolidate a lot of tests and put them into one stage. >> >> But I agree, past has shown that disabling tests did only mask the problem >> and won't get them fixed. Also, quite a lot of failures have proven to be >> actual bugs in our code. So from a customer perspective, we should >> actually >> give these failures a high priority. I hope they will get into the >> spotlight after I provide the proper statistics. >> >> -Marco >> >> >> On Thu, Jun 7, 2018 at 6:35 PM Thomas DELTEIL <thomas.delte...@gmail.com> >> wrote: >> >> > Thanks for bringing the issue of CI stability! >> > >> > However I disagree with some points in this thread: >> > >> > - "We are at approximately 3h for a full successful run." >> > => Looking at Jenkins I see the last successful runs oscillating between >> > 1h53 and 2h42 with a mean that seems to be at 2h20. Or are you talking >> > about something different than the jenkins CI run? >> > >> > - "For this I propose working towards moving tests from CI to nightly, >> > specially >> > the ones that take most time or do black box testing with full training >> of >> > models. And addressing flaky tests by either fixing them or *disabling >> > them." * >> > => Is there any evidence that some serious effort has been spent trying >> to >> > fix the flaky tests? I know Sheng and Marco have worked to >> consolidating a >> > list of Flaky tests, but I think simply disabling tests will just make >> the >> > platform weaker. Let's organize a flaky test week where we each take on >> a >> > couple of these flaky tests and hopefully we should make good progress >> > towards stabilizing the CI. >> > >> > -"I'd like to disable flaky tests until they're fixed." >> > => Wishful thinking IMO, we know this never happens, if we can't make >> time >> > now to fix them, we'll never go back and fix them. >> > >> > "I would want a turnaround time of less than 30 minutes and 0% failure >> rate >> > on master." >> > => With current timing, this means barely finishing the build step. Do >> we >> > propose dropping some platforms for building? >> > >> > I agree with some points: >> > >> > "Won't we end up in the same situation with so many flaky tests?" => >> pretty >> > sure it will >> > "This could be set to 100% for nightly, for example."[for the release] >> => >> > That would be a given to me >> > "I'm also currently working on a system that tracks all test failures, >> so >> > this will also cover nightly tests. This will give us actionable data " >> => >> > Awesome, that would be great to have data on that to help prioritize >> what >> > to fix! >> > >> > I personally think if we disable most tests and move them to nightly >> tests, >> > we will decrease the trust and stability of the platform and it leaves >> the >> > door open to conflicting changes creating hard to debug failures. I >> think >> > the biggest potential win here is reducing test flakiness. That's the >> one >> > that is killing the productivity, we can redesign the test pipeline to >> run >> > integration and unit test in parallel and that would give us straight >> away >> > a 30 minutes reduced time in the CI run. Then we'd be always at <2h for >> a >> > build, which seems reasonable if it never fails for no reason. >> > >> > Thomas >> > >> > 2018-06-07 8:27 GMT-07:00 Marco de Abreu <marco.g.ab...@googlemail.com> >> : >> > >> > > Yeah, I think we are at the point at which we have to disable tests.. >> > > >> > > If a test fails in nightly, the commit would not be reverted since >> it's >> > > hard to pin a failure to a specific PR. We will have reporting for >> > failures >> > > on nightly (they have proven to be stable, so we can enable it right >> from >> > > the beginning). I'm also currently working on a system that tracks all >> > test >> > > failures, so this will also cover nightly tests. This will give us >> > > actionable data which allows us to define acceptance criteria for a >> > > release. E.g. if the test success rate is below X%, a release can not >> be >> > > made. This could be set to 100% for nightly, for example. >> > > >> > > It would definitely be good if we could determine which tests are >> > required >> > > to run and which ones are unnecessary. I don't really like the flag in >> > the >> > > comment (and also it's hard to integrate). A good idea would be some >> > > analytics on the changed file content. If we have this data, we could >> > > easily enable and disable different jobs. Since this behaviour is >> > entirely >> > > defined in GitHub, I'd like to invite everybody to submit a PR. >> > > >> > > -Marco >> > > >> > > >> > > >> > > On Thu, Jun 7, 2018 at 5:20 PM Aaron Markham < >> aaron.s.mark...@gmail.com> >> > > wrote: >> > > >> > > > I'd like to disable flaky tests until they're fixed. >> > > > What would the process be for fixing a failure if the tests are done >> > > > nightly? Would the commit be reverted? Won't we end up in the same >> > > > situation with so many flaky tests? >> > > > >> > > > I'd like to see if we can separate the test pipelines based on the >> > > content >> > > > of the commit. I think that md, html, and js updates should fly >> through >> > > and >> > > > not have to go through GPU tests. >> > > > >> > > > Maybe some special flag added to the comment? >> > > > Is this possible? >> > > > >> > > > >> > > > On Wed, Jun 6, 2018 at 10:37 PM, Pedro Larroy < >> > > > pedro.larroy.li...@gmail.com> >> > > > wrote: >> > > > >> > > > > Hi Team >> > > > > >> > > > > The time to validate a PR is growing, due to our number of >> supported >> > > > > platforms and increased time spent in testing and running >> models. We >> > > are >> > > > > at approximately 3h for a full successful run. >> > > > > >> > > > > This is compounded with the failure rate of builds due to flaky >> tests >> > > of >> > > > > more than 50% which is a big drag in developer productivity if you >> > can >> > > > only >> > > > > get one or two CI runs to a change per day. >> > > > > >> > > > > I would want a turnaround time of less than 30 minutes and 0% >> failure >> > > > rate >> > > > > on master. >> > > > > >> > > > > For this I propose working towards moving tests from CI to >> nightly, >> > > > > specially the ones that take most time or do black box testing >> with >> > > full >> > > > > training of models. And addressing flaky tests by either fixing >> them >> > or >> > > > > disabling them. >> > > > > >> > > > > I would like to check if there's consensus on this previous plan >> so >> > we >> > > > are >> > > > > aligned on pursuing this common goal as a shared effort. >> > > > > >> > > > > Pedro. >> > > > > >> > > > >> > > >> > >> > >