Hello,

We have had a solution for moving flaky tests to the flaky test group for quite 
some time. However, this solution did not work for individual test methods. 
This issue has now been resolved in the CI build. I have also moved the most 
flaky tests to the flaky test group, as they often required several retries 
before all flaky tests passed on an individual CI build. This problem has 
worsened over the past month.

The PR containing the changes is available at 
https://github.com/apache/pulsar/pull/22433.

All tests that were moved have been reported as flaky and are listed in the 
description of that issue.

The flaky tests build job will fail when a flaky test fails, but this will not 
prevent a PR from being merged. If a test is more or less useless, it can 
either be removed or moved to the quarantine group. The errors for the 
quarantine group are ignored, but the test report will still include the 
failures. The quarantine solution has been available for a long time, so it was 
not added in this PR.

When a test method is moved to the flaky test group, the TestNG annotation 
should be something like @Test(groups = "flaky"). Within the test class, it is 
crucial to ensure that all BeforeClass, BeforeMethod, AfterMethod, and 
AfterClass annotations contain "(alwaysRun = true)", for example, 
@BeforeClass(alwaysRun = true). Without this change, the before or after method 
will not execute when the flaky test method is run, which could lead to NPEs or 
other odd issues during the execution of the test method in the flaky test 
group.

Please let me know if you have any concerns about this change. I hope that the 
flaky tests will eventually be resolved. However, we should be more rigorous 
about moving flaky tests to the flaky test group in the future. We are spending 
a lot of time and build resources when we allow flaky tests to be part of the 
regular test runs.

Looking forward to more contributions in this area,

-Lari

Reply via email to