I agree, I am willing to concede this discussion, as long as we are judicious 
and specifically only use it when it is not our test that is failing. It is a 
real problem. 

Thanks,
Mark
On 6/9/21, 9:46 AM, "Kirk Lund" <kl...@apache.org> wrote:

    I did think about splitting up dunit tests, but I believe
    testEventIdOutOfOrderInPartitionRegionSingleHop will remain flaky even if I
    move it to a new dunit test. No matter how you dice it up, we end up with a
    PR that cannot be merged to develop unless you get lucky after running
    stress-new-test many times.

    One could try being tricky by marking it with @ignore or deleting the flaky
    test in one PR, and then re-add it in a 2nd PR. But, even then that 2nd PR
    is very unlikely to pass stress-new-test unless someone fixes the cause of
    that test's flakiness.

    As it stands, stress-new-test just ends up being a dead-end or an endless
    time-sync for fixing multiple flaky tests in one dunit.

    On Tue, Jun 8, 2021 at 12:09 PM Dan Smith <dasm...@vmware.com> wrote:

    > Would it be possible to just split that test up into multiple classes? It
    > sounds like the issue is that there is so many flaky tests in that class
    > that you can't fix them all in one PR, which might indicate it's too big.
    >
    > If we can't get StressNewTest to pass - that means our builds are failing
    > more than 2% of the time due to this one test failure. Yikes!
    >
    > -Dan
    > ________________________________
    > From: Kirk Lund <kl...@apache.org>
    > Sent: Tuesday, June 8, 2021 9:33 AM
    > To: dev@geode.apache.org <dev@geode.apache.org>
    > Subject: [DISCUSS] Remove stress-new-test-openjdk11 requirement from PRs
    >
    > Our requirement for stress-new-test-openjdk11 to pass before allowing 
merge
    > doesn't really work as intended for fixing distributed tests that contain
    > multiple flaky test methods. In fact, I think it causes contributors to
    > avoid tackling flaky tests.
    >
    > I've been working on GEODE-9103: CI Failure:
    > PutAllClientServerDistributedTest.testPutAllReturnsExceptions FAILED
    > <
    > 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-9103&amp;data=04%7C01%7Chansonm%40vmware.com%7C27b2978d53d64c9c886708d92b661269%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637588539667053065%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=XNyeNQmLHIIyRgpnXRwbHn%2F2lph6S%2BhiO%2BdcOOgeoSc%3D&amp;reserved=0>
    > and was able to fix it.
    >
    > However, stress-new-test-openjdk11 then continued to fail for other flaky
    > tests in PutAllClientServerDistributedTest. I managed to fix GEODE-9296 
and
    > GEODE-8528 as well. I also tried but have not been able to fix GEODE-9242
    > which remains flaky.
    >
    > Unfortunately, I cannot merge any of my fixes for
    > PutAllClientServerDistributedTest unless every single flaky test in it is
    > fixed by my PR. I think this is undesirable because it would be better to
    > merge the fix for 3 flaky test methods than none.
    >
    > UPDATE: After running my precheckin more times, I did get
    > stress-new-test-openjdk11 to pass once so I can merge, but that's more of 
a
    > loophole than anything because I didn't manage to fix GEODE-9242.
    >
    > Despite having PR #6542 eventually pass, I would like to discuss removing
    > or relaxing our requirement that stress-new-test-openjdk11 must pass in
    > order to merge a PR...
    >
    > PR #6542: GEODE-9103: Fix ServerConnectivityExceptions in
    > PutAllClientServerDistributedTest
    > <
    > 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fgeode%2Fpull%2F6542&amp;data=04%7C01%7Chansonm%40vmware.com%7C27b2978d53d64c9c886708d92b661269%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637588539667053065%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=gRAwmAsrHsH4p9jkKTuC63nJPRGBW6TGsYG41xpLF4w%3D&amp;reserved=0
    > >
    >
    > Fixed in PR #6542:
    > * GEODE-9296: CI Failure: PutAllClientServerDistributedTest >
    > testPartialKeyInPRSingleHopWithRedundancy
    > <
    > 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-9296&amp;data=04%7C01%7Chansonm%40vmware.com%7C27b2978d53d64c9c886708d92b661269%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637588539667053065%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Q3ogdEG4omEyoc7Wrg2yMZEC0HWDTKhVqKMCJ8ZqTzc%3D&amp;reserved=0
    > >
    > * GEODE-9103: CI Failure:
    > PutAllClientServerDistributedTest.testPutAllReturnsExceptions FAILED
    > <
    > 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-9103&amp;data=04%7C01%7Chansonm%40vmware.com%7C27b2978d53d64c9c886708d92b661269%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637588539667053065%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=XNyeNQmLHIIyRgpnXRwbHn%2F2lph6S%2BhiO%2BdcOOgeoSc%3D&amp;reserved=0
    > >
    > * GEODE-8528: 
PutAllClientServerDistributedTest.testPartialKeyInPRSingleHop
    > fails due to missing disk store after server restart
    > <
    > 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-8528&amp;data=04%7C01%7Chansonm%40vmware.com%7C27b2978d53d64c9c886708d92b661269%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637588539667053065%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=2zkPkNsCFNbXHre2ekltWWrckA74YKD2SqUjp5Tehbg%3D&amp;reserved=0
    > >
    >
    > Still flaky:
    > * GEODE-9242: CI failure in PutAllClientServerDistributedTest >
    > testEventIdOutOfOrderInPartitionRegionSingleHop
    > <
    > 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-9242&amp;data=04%7C01%7Chansonm%40vmware.com%7C27b2978d53d64c9c886708d92b661269%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637588539667053065%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=fINGTY1FLXuJkr14LZNasMed6%2B1aJw9nIldCzzSf6rM%3D&amp;reserved=0
    > >
    >
    > Thanks,
    > Kirk
    >

Reply via email to