To recap, the original question was: should we change StressNewTest to pass 
ONLY if it is able to successfully run all new/changed tests 50 times 
(eliminating the existing loophole exempting PRs that touch 25 or more files)?

So far a handful of people have expressed support, but many have 
questions/concerns:
- One concern was mis-counting of test files touched.  As of today this is now 
fixed.
- A big concern was that it might become more difficult to pass all required PR 
checks.
- Another concern was that the current timeout of 6 hours might be too high / 
too low.
- An alternative was suggested to keep the loophole but maybe increase the 
threshold required to get the free pass.
- If we’re going to raise the bar on any required PR check, maybe we should 
consider making it non-required.

Let’s extend the comment period until the end of next week (Friday Mar 20) in 
hopes of converging on a solution everybody is happy with (even if it isn’t 
what was originally proposed).


> On Mar 6, 2020, at 4:51 PM, Donal Evans <doev...@pivotal.io> wrote:
> 
> With fairly apt timing, we've recently had a PR merged (
> https://github.com/apache/geode/pull/4745) that introduced a test that has
> resulted in fairly consistently flaky failures in the pipeline (JIRA
> ticket: https://issues.apache.org/jira/projects/GEODE/issues/GEODE-7843).
> The PR was quite large and touched/created a lot of tests, so StressNewTest
> never ran on it:
> https://concourse.apachegeode-ci.info/builds/136784#L5e3ac3fc:2. Given how
> frequently the test is failing in the pipeline (11 failures on various
> IntegrationTest jobs over the past 6 commits), it seems very likely that
> had StressNewTest been allowed to run, this issue would have been caught at
> the PR stage and could have been remedied then, instead of resulting in a
> nearly constantly red pipeline.
> 
> I've already given my +1 to this proposal, but this situation has prompted
> me to make it a +2.
> 
> On Fri, Mar 6, 2020 at 1:46 PM Donal Evans <doev...@pivotal.io> wrote:
> 
>> With fairly apt timing, we've recently had a PR merged (
>> https://github.com/apache/geode/pull/4745) that introduced a test that
>> has resulted in fairly consistently flaky failures in the pipeline (JIRA
>> ticket: https://issues.apache.org/jira/projects/GEODE/issues/GEODE-7843).
>> The PR was quite large and touched/created a lot of tests, so StressNewTest
>> never ran on it:
>> https://concourse.apachegeode-ci.info/builds/136784#L5e3ac3fc:2. Given
>> how frequently the test is failing in the pipeline (11 failures on various
>> IntegrationTest jobs over the past 6 commits), it seems very likely that
>> had StressNewTest been allowed to run, this issue would have been caught at
>> the PR stage and could have been remedied then, instead of resulting in a
>> nearly constantly red pipeline.
>> 
>> I've already given my +1 to this proposal, but this situation has prompted
>> me to make it a *+1*.
>> 
>> On Tue, Mar 3, 2020 at 2:08 PM Anilkumar Gingade <aging...@pivotal.io>
>> wrote:
>> 
>>> The stress test is to identify the flaky-ness within the test; and
>>> assuming
>>> any changes to the test may have introduced the flaky-ness.
>>> It's about paying the cost upfront or later when the test is determined to
>>> be flaky.
>>> If 25+ tests have been changed in a PR, the cost of running stress test
>>> for
>>> those;  and gating the PR for so long.
>>> Knowing how much pain it causes to fix a flaky test after a certain/long
>>> duration of time; I am +1 for doing this change.
>>> 
>>> On Tue, Mar 3, 2020 at 10:06 AM Dan Smith <dsm...@pivotal.io> wrote:
>>> 
>>>> What's the current timeout for StressNewTest? Maybe if we just up the
>>>> threshold to 100 tests or so and up the timeout to match we can catch
>>>> pretty much all PRs.
>>>> 
>>>> I'm not sure why the job is flagging more tests than it should. It looks
>>>> like at some point @rhoughon changed it to read the merge base from some
>>>> file created by concourse as an optimization [1] - I suspect maybe that
>>>> file is inaccurate?
>>>> 
>>>> I originally wrote this job. It's definitely not a panacea, it will only
>>>> catch a new flaky test if
>>>> - the test is really flaky (likely to fail more than 1/50 times)
>>>> - the change actually happened in the test file itself, and not the
>>>> product or some other test file.
>>>> 
>>>> [1]
>>>> 
>>>> 
>>> https://github.com/apache/geode/commit/4c06ba4625e69d44a5165aa9f2fccddfc064de87
>>>> 
>>>> -Dan
>>>> 
>>>> On Sun, Mar 1, 2020 at 9:00 PM Owen Nichols <onich...@pivotal.io>
>>> wrote:
>>>> 
>>>>> We don’t tend to look too closely at successful PR checks to see
>>> whether
>>>>> they actually checked anything at all.
>>>>> 
>>>>> One example I found is
>>>>> 
>>>> 
>>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-pr/jobs/StressNewTestOpenJDK11/builds/5957
>>>>> <
>>>>> 
>>>> 
>>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-pr/jobs/StressNewTestOpenJDK11/builds/5957
>>>>>> :
>>>>> 32 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.
>>>>> 
>>>>> Here are 92 more examples (url’s omitted for brevity — use the example
>>>>> above as a template and just replace the last 4 digits):
>>>>> 26 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 6243)
>>>>> 26 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 6249)
>>>>> 26 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 6402)
>>>>> 27 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 6262)
>>>>> 27 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 6430)
>>>>> 27 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 6439)
>>>>> 27 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 6449)
>>>>> 27 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 6454)
>>>>> 27 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 6458)
>>>>> 27 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 6459)
>>>>> 28 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 6224)
>>>>> 28 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 6441)
>>>>> 28 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 6448)
>>>>> 28 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 6452)
>>>>> 29 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 6102)
>>>>> 29 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 6177)
>>>>> 30 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 5939)
>>>>> 30 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 5940)
>>>>> 30 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 5949)
>>>>> 30 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 6473)
>>>>> 31 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 5953)
>>>>> 31 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 6187)
>>>>> 31 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 6470)
>>>>> 31 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 6471)
>>>>> 31 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 6474)
>>>>> 31 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 6475)
>>>>> 32 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 5958)
>>>>> 32 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 6173)
>>>>> 32 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 6236)
>>>>> 32 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 6237)
>>>>> 32 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 6242)
>>>>> 33 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 6246)
>>>>> 33 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 6248)
>>>>> 33 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 6250)
>>>>> 33 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 6251)
>>>>> 33 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 6254)
>>>>> 33 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 6255)
>>>>> 34 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 6139)
>>>>> 34 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 6141)
>>>>> 34 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 6264)
>>>>> 34 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 6266)
>>>>> 34 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 6271)
>>>>> 35 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 6142)
>>>>> 35 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 6143)
>>>>> 35 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 6277)
>>>>> 35 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 6309)
>>>>> 35 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 6413)
>>>>> 35 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 6414)
>>>>> 35 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 6419)
>>>>> 35 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 6420)
>>>>> 36 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 6144)
>>>>> 36 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 6146)
>>>>> 36 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 6147)
>>>>> 36 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 6310)
>>>>> 36 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 6326)
>>>>> 36 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 6329)
>>>>> 36 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 6330)
>>>>> 40 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 6159)
>>>>> 40 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 6160)
>>>>> 40 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 6188)
>>>>> 41 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 6440)
>>>>> 41 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 6480)
>>>>> 41 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 6483)
>>>>> 43 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 6190)
>>>>> 44 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 5768)
>>>>> 44 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 5772)
>>>>> 46 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 6191)
>>>>> 46 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 6218)
>>>>> 46 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 6219)
>>>>> 46 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 6225)
>>>>> 46 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 6244)
>>>>> 46 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 6245)
>>>>> 48 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 6129)
>>>>> 49 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 5804)
>>>>> 51 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 5877)
>>>>> 53 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 5742)
>>>>> 54 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 5757)
>>>>> 66 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 6232)
>>>>> 66 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 6233)
>>>>> 66 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 6234)
>>>>> 66 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 6259)
>>>>> 66 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 6260)
>>>>> 66 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 6278)
>>>>> 79 is too many changed tests to stress test. Allowing this job to pass
>>>>> without stress testing.  (build 6481)
>>>>> 103 is too many changed tests to stress test. Allowing this job to
>>> pass
>>>>> without stress testing.  (build 6493)
>>>>> 106 is too many changed tests to stress test. Allowing this job to
>>> pass
>>>>> without stress testing.  (build 6206)
>>>>> 106 is too many changed tests to stress test. Allowing this job to
>>> pass
>>>>> without stress testing.  (build 6207)
>>>>> 115 is too many changed tests to stress test. Allowing this job to
>>> pass
>>>>> without stress testing.  (build 5769)
>>>>> 118 is too many changed tests to stress test. Allowing this job to
>>> pass
>>>>> without stress testing.  (build 6226)
>>>>> 721 is too many changed tests to stress test. Allowing this job to
>>> pass
>>>>> without stress testing.  (build 6197)
>>>>> 722 is too many changed tests to stress test. Allowing this job to
>>> pass
>>>>> without stress testing.  (build 6217)
>>>>> 722 is too many changed tests to stress test. Allowing this job to
>>> pass
>>>>> without stress testing.  (build 6221)
>>>>> 
>>>>> Please note, sometimes the number of files reported seems to be way
>>>> higher
>>>>> than it should be.  For example
>>>> https://github.com/apache/geode/pull/4691
>>>>> <https://github.com/apache/geode/pull/4691> shows exactly 1 test file
>>>>> touched, but
>>>>> 
>>>> 
>>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-pr/jobs/StressNewTestOpenJDK11/builds/6278
>>>>> <
>>>>> 
>>>> 
>>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-pr/jobs/StressNewTestOpenJDK11/builds/6278
>>>>> 
>>>>> thinks it touched 66 test files.
>>>>> 
>>>>> There’s currently no good data to predict how long StressNew will
>>> take,
>>>> it
>>>>> just depends on the mix of tests touched.  I am aware of at least 4
>>> cases
>>>>> (with less than 25 files) where the timeout was hit.  In two of those
>>>> cases
>>>>> we re-ran with a longer timeout, and in two cases the PR author split
>>> up
>>>>> the PR into half a dozen smaller PRs to get around it.
>>>>> 
>>>>> 
>>>>>> On Mar 1, 2020, at 7:52 PM, Anthony Baker <aba...@pivotal.io>
>>> wrote:
>>>>>> 
>>>>>> What percentage of PR’s are currently subject to the 25 test file
>>> rule?
>>>>> How many would be subject to the concourse timeout?
>>>>>> 
>>>>>> I’d like to understand the scope of the impact for this change.
>>>>>> 
>>>>>> Anthony
>>>>>> 
>>>>>> 
>>>>>>> On Mar 1, 2020, at 8:58 AM, Owen Nichols <onich...@pivotal.io>
>>> wrote:
>>>>>>> 
>>>>>>> Impossible, no. Inconvenient, perhaps, but a small price to pay for
>>>>> being
>>>>>>> able to trust that green means green.
>>>>>>> 
>>>>>>> With or without this proposed change, if anyone is having trouble
>>>>> getting
>>>>>>> their PR to pass StressNew, please bring it up on the dev list and
>>> we
>>>>> can
>>>>>>> discuss the appropriate solution on a case-by-case basis (e.g.
>>>>> increasing
>>>>>>> timeout, fixing the logic that identifies changed test files,
>>>> splitting
>>>>>>> into multiple PRs, authorizing an override, etc).
>>>>>>> 
>>>>>>> On Sun, Mar 1, 2020 at 8:56 AM Dan Smith <dsm...@pivotal.io>
>>> wrote:
>>>>>>> 
>>>>>>>> Won't this make it impossible to merge refactoring changes that
>>> touch
>>>>> a lot
>>>>>>>> of tests?
>>>>>>>> 
>>>>>>>> -Dan
>>>>>>>> 
>>>>>>>> On Sat, Feb 29, 2020, 12:37 PM Robert Houghton <
>>> rhough...@pivotal.io
>>>>> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Yes, as it should
>>>>>>>>> 
>>>>>>>>> On Sat, Feb 29, 2020, 12:25 Dan Smith <dsm...@pivotal.io> wrote:
>>>>>>>>> 
>>>>>>>>>> Doesn't the build fail when concourse times out?
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>> 
>> 

Reply via email to