To recap, the original question was: should we change StressNewTest to pass ONLY if it is able to successfully run all new/changed tests 50 times (eliminating the existing loophole exempting PRs that touch 25 or more files)?
So far a handful of people have expressed support, but many have questions/concerns: - One concern was mis-counting of test files touched. As of today this is now fixed. - A big concern was that it might become more difficult to pass all required PR checks. - Another concern was that the current timeout of 6 hours might be too high / too low. - An alternative was suggested to keep the loophole but maybe increase the threshold required to get the free pass. - If we’re going to raise the bar on any required PR check, maybe we should consider making it non-required. Let’s extend the comment period until the end of next week (Friday Mar 20) in hopes of converging on a solution everybody is happy with (even if it isn’t what was originally proposed). > On Mar 6, 2020, at 4:51 PM, Donal Evans <doev...@pivotal.io> wrote: > > With fairly apt timing, we've recently had a PR merged ( > https://github.com/apache/geode/pull/4745) that introduced a test that has > resulted in fairly consistently flaky failures in the pipeline (JIRA > ticket: https://issues.apache.org/jira/projects/GEODE/issues/GEODE-7843). > The PR was quite large and touched/created a lot of tests, so StressNewTest > never ran on it: > https://concourse.apachegeode-ci.info/builds/136784#L5e3ac3fc:2. Given how > frequently the test is failing in the pipeline (11 failures on various > IntegrationTest jobs over the past 6 commits), it seems very likely that > had StressNewTest been allowed to run, this issue would have been caught at > the PR stage and could have been remedied then, instead of resulting in a > nearly constantly red pipeline. > > I've already given my +1 to this proposal, but this situation has prompted > me to make it a +2. > > On Fri, Mar 6, 2020 at 1:46 PM Donal Evans <doev...@pivotal.io> wrote: > >> With fairly apt timing, we've recently had a PR merged ( >> https://github.com/apache/geode/pull/4745) that introduced a test that >> has resulted in fairly consistently flaky failures in the pipeline (JIRA >> ticket: https://issues.apache.org/jira/projects/GEODE/issues/GEODE-7843). >> The PR was quite large and touched/created a lot of tests, so StressNewTest >> never ran on it: >> https://concourse.apachegeode-ci.info/builds/136784#L5e3ac3fc:2. Given >> how frequently the test is failing in the pipeline (11 failures on various >> IntegrationTest jobs over the past 6 commits), it seems very likely that >> had StressNewTest been allowed to run, this issue would have been caught at >> the PR stage and could have been remedied then, instead of resulting in a >> nearly constantly red pipeline. >> >> I've already given my +1 to this proposal, but this situation has prompted >> me to make it a *+1*. >> >> On Tue, Mar 3, 2020 at 2:08 PM Anilkumar Gingade <aging...@pivotal.io> >> wrote: >> >>> The stress test is to identify the flaky-ness within the test; and >>> assuming >>> any changes to the test may have introduced the flaky-ness. >>> It's about paying the cost upfront or later when the test is determined to >>> be flaky. >>> If 25+ tests have been changed in a PR, the cost of running stress test >>> for >>> those; and gating the PR for so long. >>> Knowing how much pain it causes to fix a flaky test after a certain/long >>> duration of time; I am +1 for doing this change. >>> >>> On Tue, Mar 3, 2020 at 10:06 AM Dan Smith <dsm...@pivotal.io> wrote: >>> >>>> What's the current timeout for StressNewTest? Maybe if we just up the >>>> threshold to 100 tests or so and up the timeout to match we can catch >>>> pretty much all PRs. >>>> >>>> I'm not sure why the job is flagging more tests than it should. It looks >>>> like at some point @rhoughon changed it to read the merge base from some >>>> file created by concourse as an optimization [1] - I suspect maybe that >>>> file is inaccurate? >>>> >>>> I originally wrote this job. It's definitely not a panacea, it will only >>>> catch a new flaky test if >>>> - the test is really flaky (likely to fail more than 1/50 times) >>>> - the change actually happened in the test file itself, and not the >>>> product or some other test file. >>>> >>>> [1] >>>> >>>> >>> https://github.com/apache/geode/commit/4c06ba4625e69d44a5165aa9f2fccddfc064de87 >>>> >>>> -Dan >>>> >>>> On Sun, Mar 1, 2020 at 9:00 PM Owen Nichols <onich...@pivotal.io> >>> wrote: >>>> >>>>> We don’t tend to look too closely at successful PR checks to see >>> whether >>>>> they actually checked anything at all. >>>>> >>>>> One example I found is >>>>> >>>> >>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-pr/jobs/StressNewTestOpenJDK11/builds/5957 >>>>> < >>>>> >>>> >>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-pr/jobs/StressNewTestOpenJDK11/builds/5957 >>>>>> : >>>>> 32 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. >>>>> >>>>> Here are 92 more examples (url’s omitted for brevity — use the example >>>>> above as a template and just replace the last 4 digits): >>>>> 26 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 6243) >>>>> 26 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 6249) >>>>> 26 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 6402) >>>>> 27 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 6262) >>>>> 27 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 6430) >>>>> 27 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 6439) >>>>> 27 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 6449) >>>>> 27 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 6454) >>>>> 27 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 6458) >>>>> 27 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 6459) >>>>> 28 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 6224) >>>>> 28 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 6441) >>>>> 28 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 6448) >>>>> 28 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 6452) >>>>> 29 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 6102) >>>>> 29 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 6177) >>>>> 30 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 5939) >>>>> 30 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 5940) >>>>> 30 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 5949) >>>>> 30 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 6473) >>>>> 31 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 5953) >>>>> 31 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 6187) >>>>> 31 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 6470) >>>>> 31 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 6471) >>>>> 31 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 6474) >>>>> 31 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 6475) >>>>> 32 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 5958) >>>>> 32 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 6173) >>>>> 32 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 6236) >>>>> 32 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 6237) >>>>> 32 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 6242) >>>>> 33 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 6246) >>>>> 33 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 6248) >>>>> 33 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 6250) >>>>> 33 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 6251) >>>>> 33 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 6254) >>>>> 33 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 6255) >>>>> 34 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 6139) >>>>> 34 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 6141) >>>>> 34 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 6264) >>>>> 34 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 6266) >>>>> 34 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 6271) >>>>> 35 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 6142) >>>>> 35 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 6143) >>>>> 35 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 6277) >>>>> 35 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 6309) >>>>> 35 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 6413) >>>>> 35 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 6414) >>>>> 35 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 6419) >>>>> 35 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 6420) >>>>> 36 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 6144) >>>>> 36 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 6146) >>>>> 36 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 6147) >>>>> 36 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 6310) >>>>> 36 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 6326) >>>>> 36 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 6329) >>>>> 36 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 6330) >>>>> 40 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 6159) >>>>> 40 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 6160) >>>>> 40 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 6188) >>>>> 41 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 6440) >>>>> 41 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 6480) >>>>> 41 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 6483) >>>>> 43 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 6190) >>>>> 44 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 5768) >>>>> 44 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 5772) >>>>> 46 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 6191) >>>>> 46 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 6218) >>>>> 46 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 6219) >>>>> 46 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 6225) >>>>> 46 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 6244) >>>>> 46 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 6245) >>>>> 48 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 6129) >>>>> 49 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 5804) >>>>> 51 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 5877) >>>>> 53 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 5742) >>>>> 54 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 5757) >>>>> 66 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 6232) >>>>> 66 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 6233) >>>>> 66 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 6234) >>>>> 66 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 6259) >>>>> 66 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 6260) >>>>> 66 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 6278) >>>>> 79 is too many changed tests to stress test. Allowing this job to pass >>>>> without stress testing. (build 6481) >>>>> 103 is too many changed tests to stress test. Allowing this job to >>> pass >>>>> without stress testing. (build 6493) >>>>> 106 is too many changed tests to stress test. Allowing this job to >>> pass >>>>> without stress testing. (build 6206) >>>>> 106 is too many changed tests to stress test. Allowing this job to >>> pass >>>>> without stress testing. (build 6207) >>>>> 115 is too many changed tests to stress test. Allowing this job to >>> pass >>>>> without stress testing. (build 5769) >>>>> 118 is too many changed tests to stress test. Allowing this job to >>> pass >>>>> without stress testing. (build 6226) >>>>> 721 is too many changed tests to stress test. Allowing this job to >>> pass >>>>> without stress testing. (build 6197) >>>>> 722 is too many changed tests to stress test. Allowing this job to >>> pass >>>>> without stress testing. (build 6217) >>>>> 722 is too many changed tests to stress test. Allowing this job to >>> pass >>>>> without stress testing. (build 6221) >>>>> >>>>> Please note, sometimes the number of files reported seems to be way >>>> higher >>>>> than it should be. For example >>>> https://github.com/apache/geode/pull/4691 >>>>> <https://github.com/apache/geode/pull/4691> shows exactly 1 test file >>>>> touched, but >>>>> >>>> >>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-pr/jobs/StressNewTestOpenJDK11/builds/6278 >>>>> < >>>>> >>>> >>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-pr/jobs/StressNewTestOpenJDK11/builds/6278 >>>>> >>>>> thinks it touched 66 test files. >>>>> >>>>> There’s currently no good data to predict how long StressNew will >>> take, >>>> it >>>>> just depends on the mix of tests touched. I am aware of at least 4 >>> cases >>>>> (with less than 25 files) where the timeout was hit. In two of those >>>> cases >>>>> we re-ran with a longer timeout, and in two cases the PR author split >>> up >>>>> the PR into half a dozen smaller PRs to get around it. >>>>> >>>>> >>>>>> On Mar 1, 2020, at 7:52 PM, Anthony Baker <aba...@pivotal.io> >>> wrote: >>>>>> >>>>>> What percentage of PR’s are currently subject to the 25 test file >>> rule? >>>>> How many would be subject to the concourse timeout? >>>>>> >>>>>> I’d like to understand the scope of the impact for this change. >>>>>> >>>>>> Anthony >>>>>> >>>>>> >>>>>>> On Mar 1, 2020, at 8:58 AM, Owen Nichols <onich...@pivotal.io> >>> wrote: >>>>>>> >>>>>>> Impossible, no. Inconvenient, perhaps, but a small price to pay for >>>>> being >>>>>>> able to trust that green means green. >>>>>>> >>>>>>> With or without this proposed change, if anyone is having trouble >>>>> getting >>>>>>> their PR to pass StressNew, please bring it up on the dev list and >>> we >>>>> can >>>>>>> discuss the appropriate solution on a case-by-case basis (e.g. >>>>> increasing >>>>>>> timeout, fixing the logic that identifies changed test files, >>>> splitting >>>>>>> into multiple PRs, authorizing an override, etc). >>>>>>> >>>>>>> On Sun, Mar 1, 2020 at 8:56 AM Dan Smith <dsm...@pivotal.io> >>> wrote: >>>>>>> >>>>>>>> Won't this make it impossible to merge refactoring changes that >>> touch >>>>> a lot >>>>>>>> of tests? >>>>>>>> >>>>>>>> -Dan >>>>>>>> >>>>>>>> On Sat, Feb 29, 2020, 12:37 PM Robert Houghton < >>> rhough...@pivotal.io >>>>> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Yes, as it should >>>>>>>>> >>>>>>>>> On Sat, Feb 29, 2020, 12:25 Dan Smith <dsm...@pivotal.io> wrote: >>>>>>>>> >>>>>>>>>> Doesn't the build fail when concourse times out? >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>> >>>>> >>>>> >>>> >>> >>