Regarding Owen's comments, there has been discussion in past CIO Retros about changing how stressNewTest runs things. From that, I believe that the current behavior is that when the number of tests changed is above some arbitrary number, it doesn't run any of them (please correct me if my knowledge is out of date on this). My suggestion would be that we pick some subset of tests to be run. Ideally we would run all of them but I agree that we shouldn't go above a 2 hour run or it'll make pre-checkin take longer. So since we have a finite amount of time to complete the tests, and a finite amount of computing resources, we should limit the number of tests that we run. I suggest this instead of decreasing the number of repetitions because I think that we are more likely to miss failures if we go that route.
If there isn't a team that can prioritize this, I would be interested in working on it on the next Just Do It day. ~Helena On Thu, Dec 13, 2018 at 5:14 PM Owen Nichols <onich...@pivotal.io> wrote: > This PR changes 17 tests. At 50 repeats each, that’s 850 tests. I’m not > sure if StressNew does all 850 serially — if it does, they would have to > complete in under 10 seconds per test to duck the concourse timeout > (currently set a little over 2 hours). > > Approximately how long do you expect each of these tests to take? > Would it make sense to break up the PR into 2 PRs with fewer changed tests > in each? > Or do we just need to increase the timeout for StressNew to a much bigger > value (what is reasonable, anyway? 6 hours? 12 hours?). > > If you’d like to change the timeout for this job, the following lines may > be relevant: > > ci/pipelines/shared/jinja.variables.yml lines 111-118: > - name: “StressNew" > ... > CALL_STACK_TIMEOUT: "7200" > execute_test_timeout: 2h15m > > > If we think there are good reasons to keep the 2-hour time limit in place, > and there are extenuating reasons why your changes cannot be stressed > within this time period, I propose we should be able to substitute > additional manual reviewers in place of the imperfect automatic check here > to get this approved. > > -Owen > > > On Dec 13, 2018, at 2:20 PM, Galen O'Sullivan <gosulli...@pivotal.io> > wrote: > > > > On the PR for https://github.com/apache/geode/pull/2938, the > StressNewTest > > and (in two different runs of the same code) other jobs fail > occasionally. > > > > I'm inclined to think that the Upgrade and Acceptance test failures were > > caused by flaky tests, and I can keep rerunning the PR build. > > > > The StressNewTest issue is probably because the tests take a long time to > > run. We can't see what tests run or how long they took because the > > StressNewTests have an archive but no results page. > > > > If StressTest is going to take too long, should I ignore it and push, or > is > > there a way to disable it or dial it down for this PR? I know we've had > > discussions on the list about not merging if the pipeline is, but I think > > the StressTest failures are due to the pipeline not allowing enough time, > > and it's meant to fix a test issue. > > > > StressTest jobs: > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__concourse.apachegeode-2Dci.info_builds_24422&d=DwIBaQ&c=lnl9vOaLMzsy2niBC8-h_K-7QJuNJEsFrzdndhuJ3Sw&r=5pwPNRvtAJAFP7w9SGYR-NUqYcl8RSrvSLXHd5dKU-o&m=q8-azNJP7a-Vb_gLuTJaHlC9VFqSd-uIaW85r2xRriY&s=KbF1dF4tF8e-rmTrJEVwrFjCuqNHZAhtphTogHN4U5k&e= > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__concourse.apachegeode-2Dci.info_builds_24423&d=DwIBaQ&c=lnl9vOaLMzsy2niBC8-h_K-7QJuNJEsFrzdndhuJ3Sw&r=5pwPNRvtAJAFP7w9SGYR-NUqYcl8RSrvSLXHd5dKU-o&m=q8-azNJP7a-Vb_gLuTJaHlC9VFqSd-uIaW85r2xRriY&s=SGe4aOZYEEVTrEMoMK73p5EjzNANF7wtenF7w7vxtAQ&e= > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__concourse.apachegeode-2Dci.info_builds_24061&d=DwIBaQ&c=lnl9vOaLMzsy2niBC8-h_K-7QJuNJEsFrzdndhuJ3Sw&r=5pwPNRvtAJAFP7w9SGYR-NUqYcl8RSrvSLXHd5dKU-o&m=q8-azNJP7a-Vb_gLuTJaHlC9VFqSd-uIaW85r2xRriY&s=T_8qb6cjL_qdcEbUsuITFtIXZnVoH6mKgXF22sYTL-Q&e= > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__concourse.apachegeode-2Dci.info_builds_24062&d=DwIBaQ&c=lnl9vOaLMzsy2niBC8-h_K-7QJuNJEsFrzdndhuJ3Sw&r=5pwPNRvtAJAFP7w9SGYR-NUqYcl8RSrvSLXHd5dKU-o&m=q8-azNJP7a-Vb_gLuTJaHlC9VFqSd-uIaW85r2xRriY&s=bcNQQeaLkfe_i5FL5MXu4eWNkhwjFfxksDLAkXkjV0c&e= > > > > UpgradeTest: > https://urldefense.proofpoint.com/v2/url?u=https-3A__concourse.apachegeode-2Dci.info_builds_24060&d=DwIBaQ&c=lnl9vOaLMzsy2niBC8-h_K-7QJuNJEsFrzdndhuJ3Sw&r=5pwPNRvtAJAFP7w9SGYR-NUqYcl8RSrvSLXHd5dKU-o&m=q8-azNJP7a-Vb_gLuTJaHlC9VFqSd-uIaW85r2xRriY&s=IcWIfcx5y0MagkEI7xC9hgt9RTpYj9k881orD5lPau0&e= > > AcceptanceTest: > https://urldefense.proofpoint.com/v2/url?u=https-3A__concourse.apachegeode-2Dci.info_builds_24415&d=DwIBaQ&c=lnl9vOaLMzsy2niBC8-h_K-7QJuNJEsFrzdndhuJ3Sw&r=5pwPNRvtAJAFP7w9SGYR-NUqYcl8RSrvSLXHd5dKU-o&m=q8-azNJP7a-Vb_gLuTJaHlC9VFqSd-uIaW85r2xRriY&s=_2_nAWdcjyyqKQjP9o6riCgrZb3HnQJ8nDog3Lzdoj4&e= > > > > > > Thanks, > > Galen > >