[kudu-CR] build: adapt new Java flaky test infrastructure to existing controls
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/12917 ) Change subject: build: adapt new Java flaky test infrastructure to existing controls .. Patch Set 4: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/12917/4/java/kudu-test-utils/src/main/java/org/apache/kudu/test/junit/RetryRule.java File java/kudu-test-utils/src/main/java/org/apache/kudu/test/junit/RetryRule.java: http://gerrit.cloudera.org:8080/#/c/12917/4/java/kudu-test-utils/src/main/java/org/apache/kudu/test/junit/RetryRule.java@61 PS4, Line 61: il will -- To view, visit http://gerrit.cloudera.org:8080/12917 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia89598d7eeb5ab642ab4ebb7aa583adcce770eae Gerrit-Change-Number: 12917 Gerrit-PatchSet: 4 Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Mike Percy Gerrit-Comment-Date: Wed, 03 Apr 2019 21:16:27 + Gerrit-HasComments: Yes
[kudu-CR] build: adapt new Java flaky test infrastructure to existing controls
Adar Dembo has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/12917 ) Change subject: build: adapt new Java flaky test infrastructure to existing controls .. build: adapt new Java flaky test infrastructure to existing controls Now that Java tests are reporting success/failure, we can use the existing flaky test controls to drive it. As a refresher, the C++ tests rely on these environment variables: - RUN_FLAKY_ONLY: whether to run just flaky tests or all tests - KUDU_FLAKY_TEST_ATTEMPTS: number of attempts for flaky tests - KUDU_FLAKY_TEST_LIST: path to list of flaky tests, one on each line - KUDU_RETRY_ALL_FAILED_TESTS: whether to retry all tests or just the ones in the flaky test list The algorithm is roughly: if RUN_FLAKY_ONLY or KUDU_FLAKY_TEST_ATTEMPTS > 1: populate KUDU_FLAKY_TEST_LIST from test result server if RUN_FLAKY_ONLY: testset = tests listed in KUDU_FLAKY_TEST_LIST else: testset = all tests for t in testset: if KUDU_RETRY_ALL_FAILED_TESTS or (KUDU_FLAKY_TEST_LIST and t in KUDU_FLAKY_TEST_LIST): num_attempts = KUDU_FLAKY_TEST_ATTEMPTS (or 1 if unset) else: num_attempts = 1 run t up to num_attempts times You can see it at work in build-and-test.sh/run-test.sh. You can also see it in dist-test.py though notably, it doesn't care about RUN_FLAKY_ONLY because we never used that particular combination (presumably the list of flaky tests is short enough that it wouldn't benefit from distributed testing). This patch attempts to mirror these exact semantics for Java tests. Here are the interesting changes: - In RetryRule, rerunFailingTestsCount is gone. The behavior is informed via the aforementioned environment variables instead. - In build-and-test.sh, if RUN_FLAKY_ONLY is set, parse the flaky test list into a series of --tests gradle command line arguments. - In dist-test.py, opt into the C++ flaky test handling (which reflects the above algorithm). There are also some small changes to flaky handling to accommodate Java's per-method flaky test tracking. Note: all of this assumes that there's no overlap between the names of any C++ or Java tests, which is currently true as all C++ tests have names like "tablet-test" or "master_cert_authority-itest" while all Java tests are prefixed with "org.apache.kudu...". If this were to change, we'd need to properly "namespace" the test results in the reporting infrastructure and fetch the flaky test lists separately for C++ and Java tests. For now there's just one flaky test list, and both ctest and gradle are OK with being asked to run irrelevant tests (they'll just be ignored). Change-Id: Ia89598d7eeb5ab642ab4ebb7aa583adcce770eae Reviewed-on: http://gerrit.cloudera.org:8080/12917 Reviewed-by: Grant Henke Tested-by: Adar Dembo --- M build-support/dist_test.py M build-support/jenkins/build-and-test.sh M java/gradle/tests.gradle M java/kudu-test-utils/src/main/java/org/apache/kudu/test/junit/RetryRule.java 4 files changed, 145 insertions(+), 31 deletions(-) Approvals: Grant Henke: Looks good to me, approved Adar Dembo: Verified -- To view, visit http://gerrit.cloudera.org:8080/12917 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ia89598d7eeb5ab642ab4ebb7aa583adcce770eae Gerrit-Change-Number: 12917 Gerrit-PatchSet: 4 Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Mike Percy
[kudu-CR] build: adapt new Java flaky test infrastructure to existing controls
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/12917 ) Change subject: build: adapt new Java flaky test infrastructure to existing controls .. Patch Set 3: Verified+1 More unknown Java flakes. -- To view, visit http://gerrit.cloudera.org:8080/12917 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia89598d7eeb5ab642ab4ebb7aa583adcce770eae Gerrit-Change-Number: 12917 Gerrit-PatchSet: 3 Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy Gerrit-Comment-Date: Wed, 03 Apr 2019 18:59:00 + Gerrit-HasComments: No
[kudu-CR] build: adapt new Java flaky test infrastructure to existing controls
Adar Dembo has removed Kudu Jenkins from this change. ( http://gerrit.cloudera.org:8080/12917 ) Change subject: build: adapt new Java flaky test infrastructure to existing controls .. Removed reviewer Kudu Jenkins with the following votes: * Verified-1 by Kudu Jenkins (120) -- To view, visit http://gerrit.cloudera.org:8080/12917 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: deleteReviewer Gerrit-Change-Id: Ia89598d7eeb5ab642ab4ebb7aa583adcce770eae Gerrit-Change-Number: 12917 Gerrit-PatchSet: 3 Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Mike Percy
[kudu-CR] build: adapt new Java flaky test infrastructure to existing controls
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/12917 ) Change subject: build: adapt new Java flaky test infrastructure to existing controls .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/12917 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia89598d7eeb5ab642ab4ebb7aa583adcce770eae Gerrit-Change-Number: 12917 Gerrit-PatchSet: 3 Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy Gerrit-Comment-Date: Wed, 03 Apr 2019 18:38:05 + Gerrit-HasComments: No
[kudu-CR] build: adapt new Java flaky test infrastructure to existing controls
Hello Mike Percy, Andrew Wong, Grant Henke, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12917 to look at the new patch set (#3). Change subject: build: adapt new Java flaky test infrastructure to existing controls .. build: adapt new Java flaky test infrastructure to existing controls Now that Java tests are reporting success/failure, we can use the existing flaky test controls to drive it. As a refresher, the C++ tests rely on these environment variables: - RUN_FLAKY_ONLY: whether to run just flaky tests or all tests - KUDU_FLAKY_TEST_ATTEMPTS: number of attempts for flaky tests - KUDU_FLAKY_TEST_LIST: path to list of flaky tests, one on each line - KUDU_RETRY_ALL_FAILED_TESTS: whether to retry all tests or just the ones in the flaky test list The algorithm is roughly: if RUN_FLAKY_ONLY or KUDU_FLAKY_TEST_ATTEMPTS > 1: populate KUDU_FLAKY_TEST_LIST from test result server if RUN_FLAKY_ONLY: testset = tests listed in KUDU_FLAKY_TEST_LIST else: testset = all tests for t in testset: if KUDU_RETRY_ALL_FAILED_TESTS or (KUDU_FLAKY_TEST_LIST and t in KUDU_FLAKY_TEST_LIST): num_attempts = KUDU_FLAKY_TEST_ATTEMPTS (or 1 if unset) else: num_attempts = 1 run t up to num_attempts times You can see it at work in build-and-test.sh/run-test.sh. You can also see it in dist-test.py though notably, it doesn't care about RUN_FLAKY_ONLY because we never used that particular combination (presumably the list of flaky tests is short enough that it wouldn't benefit from distributed testing). This patch attempts to mirror these exact semantics for Java tests. Here are the interesting changes: - In RetryRule, rerunFailingTestsCount is gone. The behavior is informed via the aforementioned environment variables instead. - In build-and-test.sh, if RUN_FLAKY_ONLY is set, parse the flaky test list into a series of --tests gradle command line arguments. - In dist-test.py, opt into the C++ flaky test handling (which reflects the above algorithm). There are also some small changes to flaky handling to accommodate Java's per-method flaky test tracking. Note: all of this assumes that there's no overlap between the names of any C++ or Java tests, which is currently true as all C++ tests have names like "tablet-test" or "master_cert_authority-itest" while all Java tests are prefixed with "org.apache.kudu...". If this were to change, we'd need to properly "namespace" the test results in the reporting infrastructure and fetch the flaky test lists separately for C++ and Java tests. For now there's just one flaky test list, and both ctest and gradle are OK with being asked to run irrelevant tests (they'll just be ignored). Change-Id: Ia89598d7eeb5ab642ab4ebb7aa583adcce770eae --- M build-support/dist_test.py M build-support/jenkins/build-and-test.sh M java/gradle/tests.gradle M java/kudu-test-utils/src/main/java/org/apache/kudu/test/junit/RetryRule.java 4 files changed, 145 insertions(+), 31 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/17/12917/3 -- To view, visit http://gerrit.cloudera.org:8080/12917 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia89598d7eeb5ab642ab4ebb7aa583adcce770eae Gerrit-Change-Number: 12917 Gerrit-PatchSet: 3 Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Mike Percy
[kudu-CR] build: adapt new Java flaky test infrastructure to existing controls
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/12917 ) Change subject: build: adapt new Java flaky test infrastructure to existing controls .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/12917/1/build-support/jenkins/build-and-test.sh File build-support/jenkins/build-and-test.sh: http://gerrit.cloudera.org:8080/#/c/12917/1/build-support/jenkins/build-and-test.sh@246 PS1, Line 246: EXTRA_GRADLE_TEST_FLAGS="--tests $t $EXTRA_GRADLE_TEST_FLAGS" > Not accidentally, but yes. :) Mind a comment here that acknowledges we intend to pass them to gradle? A mostly copy past of this is okay. I feel like it's something we would second guess later otherwise. -- To view, visit http://gerrit.cloudera.org:8080/12917 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia89598d7eeb5ab642ab4ebb7aa583adcce770eae Gerrit-Change-Number: 12917 Gerrit-PatchSet: 1 Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Mike Percy Gerrit-Comment-Date: Wed, 03 Apr 2019 18:30:30 + Gerrit-HasComments: Yes
[kudu-CR] build: adapt new Java flaky test infrastructure to existing controls
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/12917 ) Change subject: build: adapt new Java flaky test infrastructure to existing controls .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/12917/1/build-support/jenkins/build-and-test.sh File build-support/jenkins/build-and-test.sh: http://gerrit.cloudera.org:8080/#/c/12917/1/build-support/jenkins/build-and-test.sh@246 PS1, Line 246: # 1. There are no test name collisions between C++ and Java tests. > Mind a comment here that acknowledges we intend to pass them to gradle? A m Done -- To view, visit http://gerrit.cloudera.org:8080/12917 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia89598d7eeb5ab642ab4ebb7aa583adcce770eae Gerrit-Change-Number: 12917 Gerrit-PatchSet: 3 Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Mike Percy Gerrit-Comment-Date: Wed, 03 Apr 2019 18:36:25 + Gerrit-HasComments: Yes
[kudu-CR] build: adapt new Java flaky test infrastructure to existing controls
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/12917 ) Change subject: build: adapt new Java flaky test infrastructure to existing controls .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/12917 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia89598d7eeb5ab642ab4ebb7aa583adcce770eae Gerrit-Change-Number: 12917 Gerrit-PatchSet: 2 Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Mike Percy Gerrit-Comment-Date: Wed, 03 Apr 2019 18:30:39 + Gerrit-HasComments: No
[kudu-CR] build: adapt new Java flaky test infrastructure to existing controls
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/12917 ) Change subject: build: adapt new Java flaky test infrastructure to existing controls .. Patch Set 2: Verified+1 A Java test failed, but as before, I think it was an unknown flake that was previously retried because we retried all tests three times. Going forward we should start accumulating the failures in the flaky test dashboard. -- To view, visit http://gerrit.cloudera.org:8080/12917 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia89598d7eeb5ab642ab4ebb7aa583adcce770eae Gerrit-Change-Number: 12917 Gerrit-PatchSet: 2 Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy Gerrit-Comment-Date: Wed, 03 Apr 2019 18:25:50 + Gerrit-HasComments: No
[kudu-CR] build: adapt new Java flaky test infrastructure to existing controls
Adar Dembo has removed Kudu Jenkins from this change. ( http://gerrit.cloudera.org:8080/12917 ) Change subject: build: adapt new Java flaky test infrastructure to existing controls .. Removed reviewer Kudu Jenkins with the following votes: * Verified-1 by Kudu Jenkins (120) -- To view, visit http://gerrit.cloudera.org:8080/12917 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: deleteReviewer Gerrit-Change-Id: Ia89598d7eeb5ab642ab4ebb7aa583adcce770eae Gerrit-Change-Number: 12917 Gerrit-PatchSet: 2 Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Mike Percy
[kudu-CR] build: adapt new Java flaky test infrastructure to existing controls
Hello Mike Percy, Andrew Wong, Grant Henke, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12917 to look at the new patch set (#2). Change subject: build: adapt new Java flaky test infrastructure to existing controls .. build: adapt new Java flaky test infrastructure to existing controls Now that Java tests are reporting success/failure, we can use the existing flaky test controls to drive it. As a refresher, the C++ tests rely on these environment variables: - RUN_FLAKY_ONLY: whether to run just flaky tests or all tests - KUDU_FLAKY_TEST_ATTEMPTS: number of attempts for flaky tests - KUDU_FLAKY_TEST_LIST: path to list of flaky tests, one on each line - KUDU_RETRY_ALL_FAILED_TESTS: whether to retry all tests or just the ones in the flaky test list The algorithm is roughly: if RUN_FLAKY_ONLY or KUDU_FLAKY_TEST_ATTEMPTS > 1: populate KUDU_FLAKY_TEST_LIST from test result server if RUN_FLAKY_ONLY: testset = tests listed in KUDU_FLAKY_TEST_LIST else: testset = all tests for t in testset: if KUDU_RETRY_ALL_FAILED_TESTS or (KUDU_FLAKY_TEST_LIST and t in KUDU_FLAKY_TEST_LIST): num_attempts = KUDU_FLAKY_TEST_ATTEMPTS (or 1 if unset) else: num_attempts = 1 run t up to num_attempts times You can see it at work in build-and-test.sh/run-test.sh. You can also see it in dist-test.py though notably, it doesn't care about RUN_FLAKY_ONLY because we never used that particular combination (presumably the list of flaky tests is short enough that it wouldn't benefit from distributed testing). This patch attempts to mirror these exact semantics for Java tests. Here are the interesting changes: - In RetryRule, rerunFailingTestsCount is gone. The behavior is informed via the aforementioned environment variables instead. - In build-and-test.sh, if RUN_FLAKY_ONLY is set, parse the flaky test list into a series of --tests gradle command line arguments. - In dist-test.py, opt into the C++ flaky test handling (which reflects the above algorithm). There are also some small changes to flaky handling to accommodate Java's per-method flaky test tracking. Note: all of this assumes that there's no overlap between the names of any C++ or Java tests, which is currently true as all C++ tests have names like "tablet-test" or "master_cert_authority-itest" while all Java tests are prefixed with "org.apache.kudu...". If this were to change, we'd need to properly "namespace" the test results in the reporting infrastructure and fetch the flaky test lists separately for C++ and Java tests. For now there's just one flaky test list, and both ctest and gradle are OK with being asked to run irrelevant tests (they'll just be ignored). Change-Id: Ia89598d7eeb5ab642ab4ebb7aa583adcce770eae --- M build-support/dist_test.py M build-support/jenkins/build-and-test.sh M java/gradle/tests.gradle M java/kudu-test-utils/src/main/java/org/apache/kudu/test/junit/RetryRule.java 4 files changed, 137 insertions(+), 31 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/17/12917/2 -- To view, visit http://gerrit.cloudera.org:8080/12917 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia89598d7eeb5ab642ab4ebb7aa583adcce770eae Gerrit-Change-Number: 12917 Gerrit-PatchSet: 2 Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Mike Percy
[kudu-CR] build: adapt new Java flaky test infrastructure to existing controls
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/12917 ) Change subject: build: adapt new Java flaky test infrastructure to existing controls .. Patch Set 1: (6 comments) http://gerrit.cloudera.org:8080/#/c/12917/1/build-support/dist_test.py File build-support/dist_test.py: http://gerrit.cloudera.org:8080/#/c/12917/1/build-support/dist_test.py@369 PS1, Line 369: test_name = ".".join(k.split(".")[:-1]) if TEST_SHARD_RE.search(k) else k > If I understand right, this doesn't mangle the java names because there is Correct. And it looks like every class/package identifier needs to have at least one letter[1], so no Java test should ever match this regex. 1. https://stackoverflow.com/a/65490 http://gerrit.cloudera.org:8080/#/c/12917/1/build-support/jenkins/build-and-test.sh File build-support/jenkins/build-and-test.sh: http://gerrit.cloudera.org:8080/#/c/12917/1/build-support/jenkins/build-and-test.sh@246 PS1, Line 246: EXTRA_GRADLE_TEST_FLAGS="--tests $t $EXTRA_GRADLE_TEST_FLAGS" > Will this accidentally pass the flaky c++ tests to the Gradle --tests flag? Not accidentally, but yes. :) As I wrote in the commit message, the shortcut that simplified this patch was to _not_ separate the list of C++ and Java flakies. We can get away with it because: 1. There are no collisions between C++/Java test names. 2. Both ctest (with "-R $test_regex") and gradle (with multiple "--tests $t" entries) happily ignore tests they can't find. A cleaner way would be to enforce the separation, but that means doing one of: 1. Splitting the unified flaky list here into two lists based on some name-based criteria. 2. Adding another flaky list retrieval endpoint to test_result_server.py and modifying the SQL queries to do the same name-based criteria. 3. Like #2 but modifying the reporting to include a bool for C++ or Java test, and changing the results schema accordingly. I opted for quick-and-dirty. http://gerrit.cloudera.org:8080/#/c/12917/1/java/kudu-test-utils/src/main/java/org/apache/kudu/test/junit/RetryRule.java File java/kudu-test-utils/src/main/java/org/apache/kudu/test/junit/RetryRule.java: http://gerrit.cloudera.org:8080/#/c/12917/1/java/kudu-test-utils/src/main/java/org/apache/kudu/test/junit/RetryRule.java@51 PS1, Line 51: private static final int MAYBE_RETRY = 0; > Might be more clear to call this NO_EXPLICIT_RETRIES or something. Alternat Done http://gerrit.cloudera.org:8080/#/c/12917/1/java/kudu-test-utils/src/main/java/org/apache/kudu/test/junit/RetryRule.java@57 PS1, Line 57: this(MAYBE_RETRY, /*skipReporting=*/ false); > Should this be DEFAULT_RETRY_COUNT? Done http://gerrit.cloudera.org:8080/#/c/12917/1/java/kudu-test-utils/src/main/java/org/apache/kudu/test/junit/RetryRule.java@72 PS1, Line 72: String value = System.getenv("KUDU_FLAKY_TEST_LIST"); > I am not sure how costly these System.getenv calls are, but assuming this K System.getenv() should be cheap; it's not even a system call [1]. I don't think a map would save us any time, as we only call retryThisTest() once per RetryRule. Unless you're suggesting a static map created in a static block. I find static blocks somewhat icky, but it would reduce the number of times we read the file, so maybe it is worth doing. 1. https://stackoverflow.com/questions/46623018/does-a-program-make-a-system-call-to-get-the-value-of-an-environment-variable-in http://gerrit.cloudera.org:8080/#/c/12917/1/java/kudu-test-utils/src/main/java/org/apache/kudu/test/junit/RetryRule.java@128 PS1, Line 128: reporter != null > Do we need this condition? I think below it would always result in `actualR Yes, counter-intuitively it's OK to construct a RetryStatement with actualRetryCount == 0, because that's a test that won't be retried but will report its results. Put another way, reporting and retrying should be independently controllable (see the comment just above). I don't think we take advantage of this particular combination, but the C++ tests support it, so I figured I'd at least reach parity with them. -- To view, visit http://gerrit.cloudera.org:8080/12917 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia89598d7eeb5ab642ab4ebb7aa583adcce770eae Gerrit-Change-Number: 12917 Gerrit-PatchSet: 1 Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Mike Percy Gerrit-Comment-Date: Wed, 03 Apr 2019 17:51:52 + Gerrit-HasComments: Yes
[kudu-CR] build: adapt new Java flaky test infrastructure to existing controls
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/12917 ) Change subject: build: adapt new Java flaky test infrastructure to existing controls .. Patch Set 1: (6 comments) http://gerrit.cloudera.org:8080/#/c/12917/1/build-support/dist_test.py File build-support/dist_test.py: http://gerrit.cloudera.org:8080/#/c/12917/1/build-support/dist_test.py@369 PS1, Line 369: test_name = ".".join(k.split(".")[:-1]) if TEST_SHARD_RE.search(k) else k If I understand right, this doesn't mangle the java names because there is no java class or package that is purely numerical? http://gerrit.cloudera.org:8080/#/c/12917/1/build-support/jenkins/build-and-test.sh File build-support/jenkins/build-and-test.sh: http://gerrit.cloudera.org:8080/#/c/12917/1/build-support/jenkins/build-and-test.sh@246 PS1, Line 246: EXTRA_GRADLE_TEST_FLAGS="--tests $t $EXTRA_GRADLE_TEST_FLAGS" Will this accidentally pass the flaky c++ tests to the Gradle --tests flag? http://gerrit.cloudera.org:8080/#/c/12917/1/java/kudu-test-utils/src/main/java/org/apache/kudu/test/junit/RetryRule.java File java/kudu-test-utils/src/main/java/org/apache/kudu/test/junit/RetryRule.java: http://gerrit.cloudera.org:8080/#/c/12917/1/java/kudu-test-utils/src/main/java/org/apache/kudu/test/junit/RetryRule.java@51 PS1, Line 51: private static final int MAYBE_RETRY = 0; Might be more clear to call this NO_EXPLICIT_RETRIES or something. Alternatively, I am not sure you need a constant for the value 0. It would make the `retryCount != MAYBE_RETRY;` line below easier to read to just hard code 0. http://gerrit.cloudera.org:8080/#/c/12917/1/java/kudu-test-utils/src/main/java/org/apache/kudu/test/junit/RetryRule.java@57 PS1, Line 57: this(MAYBE_RETRY, /*skipReporting=*/ false); Should this be DEFAULT_RETRY_COUNT? http://gerrit.cloudera.org:8080/#/c/12917/1/java/kudu-test-utils/src/main/java/org/apache/kudu/test/junit/RetryRule.java@72 PS1, Line 72: String value = System.getenv("KUDU_FLAKY_TEST_LIST"); I am not sure how costly these System.getenv calls are, but assuming this KUDU_FLAKY_TEST_LIST could be fairly large, can we populate a map when we initialize the rule and consult it in this method? We could read/parse KUDU_RETRY_ALL_FAILED_TESTS and KUDU_FLAKY_TEST_ATTEMPTS at construction time too, though the impact is likely lower. http://gerrit.cloudera.org:8080/#/c/12917/1/java/kudu-test-utils/src/main/java/org/apache/kudu/test/junit/RetryRule.java@128 PS1, Line 128: reporter != null Do we need this condition? I think below it would always result in `actualRetryCount = 0` in the case one of the others isn't also true. -- To view, visit http://gerrit.cloudera.org:8080/12917 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia89598d7eeb5ab642ab4ebb7aa583adcce770eae Gerrit-Change-Number: 12917 Gerrit-PatchSet: 1 Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Mike Percy Gerrit-Comment-Date: Wed, 03 Apr 2019 14:11:04 + Gerrit-HasComments: Yes
[kudu-CR] build: adapt new Java flaky test infrastructure to existing controls
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/12917 ) Change subject: build: adapt new Java flaky test infrastructure to existing controls .. Patch Set 1: Verified+1 Overriding Jenkins. Several Java tests failed, but I think these are known flakes that were previously retried because we retried all tests three times. One was KUDU-1521 and there may not be JIRAs open for the others. I expect that once submitted we'll start tracking these failures in the dashboard, and then we'll retry them automatically. -- To view, visit http://gerrit.cloudera.org:8080/12917 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia89598d7eeb5ab642ab4ebb7aa583adcce770eae Gerrit-Change-Number: 12917 Gerrit-PatchSet: 1 Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy Gerrit-Comment-Date: Wed, 03 Apr 2019 03:32:47 + Gerrit-HasComments: No
[kudu-CR] build: adapt new Java flaky test infrastructure to existing controls
Adar Dembo has removed Kudu Jenkins from this change. ( http://gerrit.cloudera.org:8080/12917 ) Change subject: build: adapt new Java flaky test infrastructure to existing controls .. Removed reviewer Kudu Jenkins with the following votes: * Verified-1 by Kudu Jenkins (120) -- To view, visit http://gerrit.cloudera.org:8080/12917 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: deleteReviewer Gerrit-Change-Id: Ia89598d7eeb5ab642ab4ebb7aa583adcce770eae Gerrit-Change-Number: 12917 Gerrit-PatchSet: 1 Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Mike Percy
[kudu-CR] build: adapt new Java flaky test infrastructure to existing controls
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/12917 ) Change subject: build: adapt new Java flaky test infrastructure to existing controls .. Patch Set 1: Here's what I manually tested: dist-test + per-test flaky resistance: KUDU_FLAKY_TEST_ATTEMPTS=3 TEST_RESULT_SERVER=... build_and_test.sh report results + all tests flaky resistance: KUDU_REPORT_TEST_RESULTS=1 KUDU_FLAKY_TEST_ATTEMPTS=3 TEST_RESULT_SERVER=... KUDU_RETRY_ALL_FAILED_TESTS=1 build_and_test.sh report results + per-test flaky resistance + run just the flakes: KUDU_REPORT_TEST_RESULTS=1 KUDU_FLAKY_TEST_ATTEMPTS=3 TEST_RESULT_SERVER=... RUN_FLAKY_ONLY=1 ERROR_ON_TEST_FAILURE=0 build_and_test.sh Java-only report results: gradle -PrerunTests test --continue Java-only report results + per-test flaky resistance: KUDU_FLAKY_TEST_ATTEMPTS=3 TEST_RESULT_SERVER=... KUDU_FLAKY_TEST_LIST=... gradle -PrerunTests test --continue Java-only report results + all tests flaky resistance: KUDU_FLAKY_TEST_ATTEMPTS=3 TEST_RESULT_SERVER=... KUDU_RETRY_ALL_FAILED_TESTS=1 gradle -PrerunTests test --continue I think this generally reflects the myriad of combinations that we use, but let me know if I've missed something. -- To view, visit http://gerrit.cloudera.org:8080/12917 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia89598d7eeb5ab642ab4ebb7aa583adcce770eae Gerrit-Change-Number: 12917 Gerrit-PatchSet: 1 Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy Gerrit-Comment-Date: Wed, 03 Apr 2019 00:59:40 + Gerrit-HasComments: No
[kudu-CR] build: adapt new Java flaky test infrastructure to existing controls
Hello Mike Percy, Andrew Wong, Grant Henke, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/12917 to review the following change. Change subject: build: adapt new Java flaky test infrastructure to existing controls .. build: adapt new Java flaky test infrastructure to existing controls Now that Java tests are reporting success/failure, we can use the existing flaky test controls to drive it. As a refresher, the C++ tests rely on these environment variables: - RUN_FLAKY_ONLY: whether to run just flaky tests or all tests - KUDU_FLAKY_TEST_ATTEMPTS: number of attempts for flaky tests - KUDU_FLAKY_TEST_LIST: path to list of flaky tests, one on each line - KUDU_RETRY_ALL_FAILED_TESTS: whether to retry all tests or just the ones in the flaky test list The algorithm is roughly: if RUN_FLAKY_ONLY or KUDU_FLAKY_TEST_ATTEMPTS > 1: populate KUDU_FLAKY_TEST_LIST from test result server if RUN_FLAKY_ONLY: testset = tests listed in KUDU_FLAKY_TEST_LIST else: testset = all tests for t in testset: if KUDU_RETRY_ALL_FAILED_TESTS or (KUDU_FLAKY_TEST_LIST and t in KUDU_FLAKY_TEST_LIST): num_attempts = KUDU_FLAKY_TEST_ATTEMPTS (or 1 if unset) else: num_attempts = 1 run t up to num_attempts times You can see it at work in build-and-test.sh/run-test.sh. You can also see it in dist-test.py though notably, it doesn't care about RUN_FLAKY_ONLY because we never used that particular combination (presumably the list of flaky tests is short enough that it wouldn't benefit from distributed testing). This patch attempts to mirror these exact semantics for Java tests. Here are the interesting changes: - In RetryRule, rerunFailingTestsCount is gone. The behavior is informed via the aforementioned environment variables instead. - In build-and-test.sh, if RUN_FLAKY_ONLY is set, parse the flaky test list into a series of --tests gradle command line arguments. - In dist-test.py, opt into the C++ flaky test handling (which reflects the above algorithm). There are also some small changes to flaky handling to accommodate Java's per-method flaky test tracking. Note: all of this assumes that there's no overlap between the names of any C++ or Java tests, which is currently true as all C++ tests have names like "tablet-test" or "master_cert_authority-itest" while all Java tests are prefixed with "org.apache.kudu...". If this were to change, we'd need to properly "namespace" the test results in the reporting infrastructure and fetch the flaky test lists separately for C++ and Java tests. For now there's just one flaky test list, and both ctest and gradle are OK with being asked to run irrelevant tests (they'll just be ignored). Change-Id: Ia89598d7eeb5ab642ab4ebb7aa583adcce770eae --- M build-support/dist_test.py M build-support/jenkins/build-and-test.sh M java/gradle/tests.gradle M java/kudu-test-utils/src/main/java/org/apache/kudu/test/junit/RetryRule.java 4 files changed, 139 insertions(+), 31 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/17/12917/1 -- To view, visit http://gerrit.cloudera.org:8080/12917 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ia89598d7eeb5ab642ab4ebb7aa583adcce770eae Gerrit-Change-Number: 12917 Gerrit-PatchSet: 1 Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Mike Percy