We already have GitHub issues for most of the flaky tests. Even for some that have been disabled for almost half a year - they have never been re-enabled and thus we still lack coverage.
I think I have an idea how to do it. I will check if it actually works and then provide a small POC. Basically a mapping from test to folders - if a specific folder has been updated, run the test. I have checked if there are some ready made solutions but haven't found anything useful. -Marco Steffen Rochel <steffenroc...@gmail.com> schrieb am Fr., 8. Juni 2018, 03:43: > I support to create github/Jira for flaky tests and disable for now. > However, we need to get serious and prioritize fixing the disabled tests. > Making PR checks smart and test only code impacted by change is a good > idea, anybody has experience with tools enabling smart validation? > > I'm concerned about moving tests from PR check to nightly. We should review > which are long running tests and evaluate options to reduce runtime. We > must prevent bad code to get into the repo through automated process as it > will be cumbersome to find out which PR cause failures or performance > degradation in nighty regression. Unless we get code coverage measurements > in place it is difficult to decide which tests are redundant and can either > be removed or moved to nightly. > Is the PR validation turn around time the bottleneck? I see a number of PR > waiting for reviews. > Steffen > > On Thu, Jun 7, 2018 at 6:00 PM Naveen Swamy <mnnav...@gmail.com> wrote: > > > Sorry, I missed reading that Pedro was asking to move the tests that run > > training. I agree with that. > > > > Additionally we should make the CI smart as I mentioned above. > > > > -Naveen > > > > > > On Thu, Jun 7, 2018 at 3:59 PM, Naveen Swamy <mnnav...@gmail.com> wrote: > > > > > -1 for moving to nightly. I think that would be detrimental. > > > > > > We have to make our CI a little more smart and only build required > > > components and not build all components to reduce cost and the time it > > > takes to run CI. A Scala build need not build everything and run tests > > > related to Python, etc., > > > > > > Thanks, Naveen > > > > > > On Thu, Jun 7, 2018 at 9:57 AM, Marco de Abreu < > > > marco.g.ab...@googlemail.com> wrote: > > > > > >> Thanks a lot for our input, Thomas! You are right, 3h are only hit if > > >> somebody makes changes in their Dockerfiles and thus every node has to > > >> rebuild their containers - but this is expected and inevitable. > > >> > > >> So far there have not been any big attempts to resolve the number of > > flaky > > >> tests. We had a few people fixing some tests (and that's very > > >> appreciated!!!), but it feels like we're introducing more than we can > > fix. > > >> I'd definitely love a flaky-test-bash-week and proposed it a few > times, > > >> but > > >> there was no success so far unfortunately. > > >> > > >> We will definitely not drop any platforms. What we will probably do > > after > > >> the nightly tests are in place, is move some things like CentOS or > > >> overlapping Python2/Python3 tests to nightly. We don't need to test > > >> python2 > > >> and python3 compatibility half a dozen times on different platforms > for > > >> every commit. > > >> > > >> I've been thinking about merging the integration and unit test state > and > > >> I'm pretty tempted to do it. My only concern so far was the increased > > >> cost. > > >> I expect the nightly tests to be in place in about 2 weeks. I'd > propose > > we > > >> wait until then and then revisit which runs are actually required for > > >> every > > >> PR and which ones can be moved. During that process, we will probably > > >> consolidate a lot of tests and put them into one stage. > > >> > > >> But I agree, past has shown that disabling tests did only mask the > > problem > > >> and won't get them fixed. Also, quite a lot of failures have proven to > > be > > >> actual bugs in our code. So from a customer perspective, we should > > >> actually > > >> give these failures a high priority. I hope they will get into the > > >> spotlight after I provide the proper statistics. > > >> > > >> -Marco > > >> > > >> > > >> On Thu, Jun 7, 2018 at 6:35 PM Thomas DELTEIL < > > thomas.delte...@gmail.com> > > >> wrote: > > >> > > >> > Thanks for bringing the issue of CI stability! > > >> > > > >> > However I disagree with some points in this thread: > > >> > > > >> > - "We are at approximately 3h for a full successful run." > > >> > => Looking at Jenkins I see the last successful runs oscillating > > between > > >> > 1h53 and 2h42 with a mean that seems to be at 2h20. Or are you > talking > > >> > about something different than the jenkins CI run? > > >> > > > >> > - "For this I propose working towards moving tests from CI to > nightly, > > >> > specially > > >> > the ones that take most time or do black box testing with full > > training > > >> of > > >> > models. And addressing flaky tests by either fixing them or > *disabling > > >> > them." * > > >> > => Is there any evidence that some serious effort has been spent > > trying > > >> to > > >> > fix the flaky tests? I know Sheng and Marco have worked to > > >> consolidating a > > >> > list of Flaky tests, but I think simply disabling tests will just > make > > >> the > > >> > platform weaker. Let's organize a flaky test week where we each take > > on > > >> a > > >> > couple of these flaky tests and hopefully we should make good > progress > > >> > towards stabilizing the CI. > > >> > > > >> > -"I'd like to disable flaky tests until they're fixed." > > >> > => Wishful thinking IMO, we know this never happens, if we can't > make > > >> time > > >> > now to fix them, we'll never go back and fix them. > > >> > > > >> > "I would want a turnaround time of less than 30 minutes and 0% > failure > > >> rate > > >> > on master." > > >> > => With current timing, this means barely finishing the build step. > > Do > > >> we > > >> > propose dropping some platforms for building? > > >> > > > >> > I agree with some points: > > >> > > > >> > "Won't we end up in the same situation with so many flaky tests?" => > > >> pretty > > >> > sure it will > > >> > "This could be set to 100% for nightly, for example."[for the > release] > > >> => > > >> > That would be a given to me > > >> > "I'm also currently working on a system that tracks all test > failures, > > >> so > > >> > this will also cover nightly tests. This will give us actionable > data > > " > > >> => > > >> > Awesome, that would be great to have data on that to help prioritize > > >> what > > >> > to fix! > > >> > > > >> > I personally think if we disable most tests and move them to nightly > > >> tests, > > >> > we will decrease the trust and stability of the platform and it > leaves > > >> the > > >> > door open to conflicting changes creating hard to debug failures. I > > >> think > > >> > the biggest potential win here is reducing test flakiness. That's > the > > >> one > > >> > that is killing the productivity, we can redesign the test pipeline > to > > >> run > > >> > integration and unit test in parallel and that would give us > straight > > >> away > > >> > a 30 minutes reduced time in the CI run. Then we'd be always at <2h > > for > > >> a > > >> > build, which seems reasonable if it never fails for no reason. > > >> > > > >> > Thomas > > >> > > > >> > 2018-06-07 8:27 GMT-07:00 Marco de Abreu < > > marco.g.ab...@googlemail.com> > > >> : > > >> > > > >> > > Yeah, I think we are at the point at which we have to disable > > tests.. > > >> > > > > >> > > If a test fails in nightly, the commit would not be reverted since > > >> it's > > >> > > hard to pin a failure to a specific PR. We will have reporting for > > >> > failures > > >> > > on nightly (they have proven to be stable, so we can enable it > right > > >> from > > >> > > the beginning). I'm also currently working on a system that tracks > > all > > >> > test > > >> > > failures, so this will also cover nightly tests. This will give us > > >> > > actionable data which allows us to define acceptance criteria for > a > > >> > > release. E.g. if the test success rate is below X%, a release can > > not > > >> be > > >> > > made. This could be set to 100% for nightly, for example. > > >> > > > > >> > > It would definitely be good if we could determine which tests are > > >> > required > > >> > > to run and which ones are unnecessary. I don't really like the > flag > > in > > >> > the > > >> > > comment (and also it's hard to integrate). A good idea would be > some > > >> > > analytics on the changed file content. If we have this data, we > > could > > >> > > easily enable and disable different jobs. Since this behaviour is > > >> > entirely > > >> > > defined in GitHub, I'd like to invite everybody to submit a PR. > > >> > > > > >> > > -Marco > > >> > > > > >> > > > > >> > > > > >> > > On Thu, Jun 7, 2018 at 5:20 PM Aaron Markham < > > >> aaron.s.mark...@gmail.com> > > >> > > wrote: > > >> > > > > >> > > > I'd like to disable flaky tests until they're fixed. > > >> > > > What would the process be for fixing a failure if the tests are > > done > > >> > > > nightly? Would the commit be reverted? Won't we end up in the > same > > >> > > > situation with so many flaky tests? > > >> > > > > > >> > > > I'd like to see if we can separate the test pipelines based on > the > > >> > > content > > >> > > > of the commit. I think that md, html, and js updates should fly > > >> through > > >> > > and > > >> > > > not have to go through GPU tests. > > >> > > > > > >> > > > Maybe some special flag added to the comment? > > >> > > > Is this possible? > > >> > > > > > >> > > > > > >> > > > On Wed, Jun 6, 2018 at 10:37 PM, Pedro Larroy < > > >> > > > pedro.larroy.li...@gmail.com> > > >> > > > wrote: > > >> > > > > > >> > > > > Hi Team > > >> > > > > > > >> > > > > The time to validate a PR is growing, due to our number of > > >> supported > > >> > > > > platforms and increased time spent in testing and running > > >> models. We > > >> > > are > > >> > > > > at approximately 3h for a full successful run. > > >> > > > > > > >> > > > > This is compounded with the failure rate of builds due to > flaky > > >> tests > > >> > > of > > >> > > > > more than 50% which is a big drag in developer productivity if > > you > > >> > can > > >> > > > only > > >> > > > > get one or two CI runs to a change per day. > > >> > > > > > > >> > > > > I would want a turnaround time of less than 30 minutes and 0% > > >> failure > > >> > > > rate > > >> > > > > on master. > > >> > > > > > > >> > > > > For this I propose working towards moving tests from CI to > > >> nightly, > > >> > > > > specially the ones that take most time or do black box testing > > >> with > > >> > > full > > >> > > > > training of models. And addressing flaky tests by either > fixing > > >> them > > >> > or > > >> > > > > disabling them. > > >> > > > > > > >> > > > > I would like to check if there's consensus on this previous > plan > > >> so > > >> > we > > >> > > > are > > >> > > > > aligned on pursuing this common goal as a shared effort. > > >> > > > > > > >> > > > > Pedro. > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > > > > > > >