Thanks for sending this Benno! I for one would love to see more regular communication about the state of CI, especially so that I know how I can help fix tests (right now I don't know which flaky tests are in areas I am maintaining).
Is there any reason the first portion of the test name is being truncated? For example, ResourceStatistics matches several tests: $ grep -R ' ResourceStatistics)' src/tests src/tests/containerizer/xfs_quota_tests.cpp:TEST_F(ROOT_XFS_QuotaTest, ResourceStatistics) src/tests/slave_recovery_tests.cpp:TEST_F(MesosContainerizerSlaveRecoveryTest, ResourceStatistics) src/tests/disk_quota_tests.cpp:TEST_F(DiskQuotaTest, ResourceStatistics) Did we actually fix the flaky tests or did we disable them? I see only 22 disabled tests, which is better than I expected, but I hope there's good tracking on getting these un-disabled again: $ grep -R DISABLED src/tests | grep -v DISABLED_ON_WINDOWS | grep -v NestedQuota | grep -v ChildRole | grep -v NestedRoles | grep -v environment.cpp | wc -l 22 On Fri, Oct 12, 2018 at 7:38 AM Benno Evers <bev...@mesosphere.com> wrote: > Hey all, > > as you might know, we've set up an internal CI system that is running `make > check` on a variety of different platforms and configurations, 16 in total. > > As we've experienced more and more pain maintaining a green master, I've > compiled some statistics about which tests are most flaky. I thought other > people might also be interested to have a look at that data: > > Last Week: > > # CI Statistics since 2018-10-05 14:22:35.422882 for branches > containing 'asf/master' > Total: 41 failing tests, 28 unique. (avg 0.142361111111 failing tests > per build) > > Top 5 failing tests: > 6x: [empty] > 4x: ResourceStatistics > 2x: CreateDestroyDiskRecovery > 2x: INTERNET_CURL_InvokeFetchByName > 2x: RecoverNestedContainer > > Last Month: > > # CI Statistics since 2018-09-12 14:23:36.272031 for branches > containing 'asf/master' > Total: 320 failing tests, 75 unique. (avg 0.285714285714 failing tests > per build) > > Top 5 failing tests: > 57x: Used > 32x: LongLivedDefaultExecutorRestart > 27x: PythonFramework > 23x: ROOT_CGROUPS_LaunchNestedContainerSessionsInParallel > 22x: ResourceStatistics > > Last year: > > # CI Statistics since 2017-10-12 14:24:31.639792 for branches > containing 'asf/master' > Total: 3045 failing tests, 225 unique. (avg 0.184054642166 failing > tests per build) > > Top 5 failing tests: > 292x: [empty] > 272x: ROOT_LOGROTATE_UNPRIVILEGED_USER_RotateWithSwitchUserTrueOrFalse > 136x: LOGROTATE_RotateInSandbox > 136x: LOGROTATE_CustomRotateOptions > 131x: ResourceStatistics > > > I don't really have a point with all of this, but some observations: > - [empty] means that the `mesos-tests` binary crashed > - The data also includes "real", i.e. non-flaky test failures, but they > should not appear in the top 5 lists because we would hopefully either > revert or fix them before they can accumulate dozens of failures > - Over the whole year, we seem to be pretty good at fixing the nastiest > flakes, with only one of the top 5 still appearing in this weeks test > results > - Sadly, the fail percentage isn't as different between now and then as we > might have hoped. > > Hope this was interesting, and best regards, > -- > Benno Evers > Software Engineer, Mesosphere >