[kudu-CR] build: adapt new Java flaky test infrastructure to existing controls

2019-04-03 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12917 )

Change subject: build: adapt new Java flaky test infrastructure to existing 
controls
..


Patch Set 3: Verified+1

More unknown Java flakes.


--
To view, visit http://gerrit.cloudera.org:8080/12917
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia89598d7eeb5ab642ab4ebb7aa583adcce770eae
Gerrit-Change-Number: 12917
Gerrit-PatchSet: 3
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy 
Gerrit-Comment-Date: Wed, 03 Apr 2019 18:59:00 +
Gerrit-HasComments: No


[kudu-CR] build: adapt new Java flaky test infrastructure to existing controls

2019-04-03 Thread Adar Dembo (Code Review)
Adar Dembo has removed Kudu Jenkins from this change.  ( 
http://gerrit.cloudera.org:8080/12917 )

Change subject: build: adapt new Java flaky test infrastructure to existing 
controls
..


Removed reviewer Kudu Jenkins with the following votes:

* Verified-1 by Kudu Jenkins (120)
--
To view, visit http://gerrit.cloudera.org:8080/12917
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteReviewer
Gerrit-Change-Id: Ia89598d7eeb5ab642ab4ebb7aa583adcce770eae
Gerrit-Change-Number: 12917
Gerrit-PatchSet: 3
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Mike Percy 


[kudu-CR] build: adapt new Java flaky test infrastructure to existing controls

2019-04-03 Thread Grant Henke (Code Review)
Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12917 )

Change subject: build: adapt new Java flaky test infrastructure to existing 
controls
..


Patch Set 3: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/12917
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia89598d7eeb5ab642ab4ebb7aa583adcce770eae
Gerrit-Change-Number: 12917
Gerrit-PatchSet: 3
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy 
Gerrit-Comment-Date: Wed, 03 Apr 2019 18:38:05 +
Gerrit-HasComments: No


[kudu-CR] build: adapt new Java flaky test infrastructure to existing controls

2019-04-03 Thread Adar Dembo (Code Review)
Hello Mike Percy, Andrew Wong, Grant Henke,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12917

to look at the new patch set (#3).

Change subject: build: adapt new Java flaky test infrastructure to existing 
controls
..

build: adapt new Java flaky test infrastructure to existing controls

Now that Java tests are reporting success/failure, we can use the existing
flaky test controls to drive it. As a refresher, the C++ tests rely on these
environment variables:
- RUN_FLAKY_ONLY: whether to run just flaky tests or all tests
- KUDU_FLAKY_TEST_ATTEMPTS: number of attempts for flaky tests
- KUDU_FLAKY_TEST_LIST: path to list of flaky tests, one on each line
- KUDU_RETRY_ALL_FAILED_TESTS: whether to retry all tests or just the ones
   in the flaky test list

The algorithm is roughly:
  if RUN_FLAKY_ONLY or KUDU_FLAKY_TEST_ATTEMPTS > 1:
populate KUDU_FLAKY_TEST_LIST from test result server

  if RUN_FLAKY_ONLY:
testset = tests listed in KUDU_FLAKY_TEST_LIST
  else:
testset = all tests

  for t in testset:
if KUDU_RETRY_ALL_FAILED_TESTS or (KUDU_FLAKY_TEST_LIST and
   t in KUDU_FLAKY_TEST_LIST):
  num_attempts = KUDU_FLAKY_TEST_ATTEMPTS (or 1 if unset)
else:
  num_attempts = 1

run t up to num_attempts times

You can see it at work in build-and-test.sh/run-test.sh. You can also see it
in dist-test.py though notably, it doesn't care about RUN_FLAKY_ONLY because
we never used that particular combination (presumably the list of flaky
tests is short enough that it wouldn't benefit from distributed testing).

This patch attempts to mirror these exact semantics for Java tests. Here are
the interesting changes:
- In RetryRule, rerunFailingTestsCount is gone. The behavior is informed via
  the aforementioned environment variables instead.
- In build-and-test.sh, if RUN_FLAKY_ONLY is set, parse the flaky test list
  into a series of --tests gradle command line arguments.
- In dist-test.py, opt into the C++ flaky test handling (which reflects the
  above algorithm). There are also some small changes to flaky handling to
  accommodate Java's per-method flaky test tracking.

Note: all of this assumes that there's no overlap between the names of any
C++ or Java tests, which is currently true as all C++ tests have names like
"tablet-test" or "master_cert_authority-itest" while all Java tests are
prefixed with "org.apache.kudu...". If this were to change, we'd need to
properly "namespace" the test results in the reporting infrastructure and
fetch the flaky test lists separately for C++ and Java tests. For now
there's just one flaky test list, and both ctest and gradle are OK with
being asked to run irrelevant tests (they'll just be ignored).

Change-Id: Ia89598d7eeb5ab642ab4ebb7aa583adcce770eae
---
M build-support/dist_test.py
M build-support/jenkins/build-and-test.sh
M java/gradle/tests.gradle
M java/kudu-test-utils/src/main/java/org/apache/kudu/test/junit/RetryRule.java
4 files changed, 145 insertions(+), 31 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/17/12917/3
--
To view, visit http://gerrit.cloudera.org:8080/12917
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia89598d7eeb5ab642ab4ebb7aa583adcce770eae
Gerrit-Change-Number: 12917
Gerrit-PatchSet: 3
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Mike Percy 


[kudu-CR] build: adapt new Java flaky test infrastructure to existing controls

2019-04-03 Thread Grant Henke (Code Review)
Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12917 )

Change subject: build: adapt new Java flaky test infrastructure to existing 
controls
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12917/1/build-support/jenkins/build-and-test.sh
File build-support/jenkins/build-and-test.sh:

http://gerrit.cloudera.org:8080/#/c/12917/1/build-support/jenkins/build-and-test.sh@246
PS1, Line 246:   EXTRA_GRADLE_TEST_FLAGS="--tests $t 
$EXTRA_GRADLE_TEST_FLAGS"
> Not accidentally, but yes. :)
Mind a comment here that acknowledges we intend to pass them to gradle? A 
mostly copy past of this is okay. I feel like it's something we would second 
guess later otherwise.



--
To view, visit http://gerrit.cloudera.org:8080/12917
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia89598d7eeb5ab642ab4ebb7aa583adcce770eae
Gerrit-Change-Number: 12917
Gerrit-PatchSet: 1
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Mike Percy 
Gerrit-Comment-Date: Wed, 03 Apr 2019 18:30:30 +
Gerrit-HasComments: Yes


[kudu-CR] build: adapt new Java flaky test infrastructure to existing controls

2019-04-03 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12917 )

Change subject: build: adapt new Java flaky test infrastructure to existing 
controls
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12917/1/build-support/jenkins/build-and-test.sh
File build-support/jenkins/build-and-test.sh:

http://gerrit.cloudera.org:8080/#/c/12917/1/build-support/jenkins/build-and-test.sh@246
PS1, Line 246: # 1. There are no test name collisions between C++ and Java 
tests.
> Mind a comment here that acknowledges we intend to pass them to gradle? A m
Done



--
To view, visit http://gerrit.cloudera.org:8080/12917
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia89598d7eeb5ab642ab4ebb7aa583adcce770eae
Gerrit-Change-Number: 12917
Gerrit-PatchSet: 3
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Mike Percy 
Gerrit-Comment-Date: Wed, 03 Apr 2019 18:36:25 +
Gerrit-HasComments: Yes


[kudu-CR] build: adapt new Java flaky test infrastructure to existing controls

2019-04-03 Thread Grant Henke (Code Review)
Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12917 )

Change subject: build: adapt new Java flaky test infrastructure to existing 
controls
..


Patch Set 2: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/12917
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia89598d7eeb5ab642ab4ebb7aa583adcce770eae
Gerrit-Change-Number: 12917
Gerrit-PatchSet: 2
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Mike Percy 
Gerrit-Comment-Date: Wed, 03 Apr 2019 18:30:39 +
Gerrit-HasComments: No


[kudu-CR] build: adapt new Java flaky test infrastructure to existing controls

2019-04-03 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12917 )

Change subject: build: adapt new Java flaky test infrastructure to existing 
controls
..


Patch Set 2: Verified+1

A Java test failed, but as before, I think it was an unknown flake that was 
previously retried because we retried all tests three times. Going forward we 
should start accumulating the failures in the flaky test dashboard.


--
To view, visit http://gerrit.cloudera.org:8080/12917
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia89598d7eeb5ab642ab4ebb7aa583adcce770eae
Gerrit-Change-Number: 12917
Gerrit-PatchSet: 2
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy 
Gerrit-Comment-Date: Wed, 03 Apr 2019 18:25:50 +
Gerrit-HasComments: No


[kudu-CR] build: adapt new Java flaky test infrastructure to existing controls

2019-04-03 Thread Adar Dembo (Code Review)
Adar Dembo has removed Kudu Jenkins from this change.  ( 
http://gerrit.cloudera.org:8080/12917 )

Change subject: build: adapt new Java flaky test infrastructure to existing 
controls
..


Removed reviewer Kudu Jenkins with the following votes:

* Verified-1 by Kudu Jenkins (120)
--
To view, visit http://gerrit.cloudera.org:8080/12917
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteReviewer
Gerrit-Change-Id: Ia89598d7eeb5ab642ab4ebb7aa583adcce770eae
Gerrit-Change-Number: 12917
Gerrit-PatchSet: 2
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Mike Percy 


[kudu-CR] build: adapt new Java flaky test infrastructure to existing controls

2019-04-03 Thread Adar Dembo (Code Review)
Hello Mike Percy, Andrew Wong, Grant Henke,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12917

to look at the new patch set (#2).

Change subject: build: adapt new Java flaky test infrastructure to existing 
controls
..

build: adapt new Java flaky test infrastructure to existing controls

Now that Java tests are reporting success/failure, we can use the existing
flaky test controls to drive it. As a refresher, the C++ tests rely on these
environment variables:
- RUN_FLAKY_ONLY: whether to run just flaky tests or all tests
- KUDU_FLAKY_TEST_ATTEMPTS: number of attempts for flaky tests
- KUDU_FLAKY_TEST_LIST: path to list of flaky tests, one on each line
- KUDU_RETRY_ALL_FAILED_TESTS: whether to retry all tests or just the ones
   in the flaky test list

The algorithm is roughly:
  if RUN_FLAKY_ONLY or KUDU_FLAKY_TEST_ATTEMPTS > 1:
populate KUDU_FLAKY_TEST_LIST from test result server

  if RUN_FLAKY_ONLY:
testset = tests listed in KUDU_FLAKY_TEST_LIST
  else:
testset = all tests

  for t in testset:
if KUDU_RETRY_ALL_FAILED_TESTS or (KUDU_FLAKY_TEST_LIST and
   t in KUDU_FLAKY_TEST_LIST):
  num_attempts = KUDU_FLAKY_TEST_ATTEMPTS (or 1 if unset)
else:
  num_attempts = 1

run t up to num_attempts times

You can see it at work in build-and-test.sh/run-test.sh. You can also see it
in dist-test.py though notably, it doesn't care about RUN_FLAKY_ONLY because
we never used that particular combination (presumably the list of flaky
tests is short enough that it wouldn't benefit from distributed testing).

This patch attempts to mirror these exact semantics for Java tests. Here are
the interesting changes:
- In RetryRule, rerunFailingTestsCount is gone. The behavior is informed via
  the aforementioned environment variables instead.
- In build-and-test.sh, if RUN_FLAKY_ONLY is set, parse the flaky test list
  into a series of --tests gradle command line arguments.
- In dist-test.py, opt into the C++ flaky test handling (which reflects the
  above algorithm). There are also some small changes to flaky handling to
  accommodate Java's per-method flaky test tracking.

Note: all of this assumes that there's no overlap between the names of any
C++ or Java tests, which is currently true as all C++ tests have names like
"tablet-test" or "master_cert_authority-itest" while all Java tests are
prefixed with "org.apache.kudu...". If this were to change, we'd need to
properly "namespace" the test results in the reporting infrastructure and
fetch the flaky test lists separately for C++ and Java tests. For now
there's just one flaky test list, and both ctest and gradle are OK with
being asked to run irrelevant tests (they'll just be ignored).

Change-Id: Ia89598d7eeb5ab642ab4ebb7aa583adcce770eae
---
M build-support/dist_test.py
M build-support/jenkins/build-and-test.sh
M java/gradle/tests.gradle
M java/kudu-test-utils/src/main/java/org/apache/kudu/test/junit/RetryRule.java
4 files changed, 137 insertions(+), 31 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/17/12917/2
--
To view, visit http://gerrit.cloudera.org:8080/12917
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia89598d7eeb5ab642ab4ebb7aa583adcce770eae
Gerrit-Change-Number: 12917
Gerrit-PatchSet: 2
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Mike Percy 


[kudu-CR] build: adapt new Java flaky test infrastructure to existing controls

2019-04-03 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12917 )

Change subject: build: adapt new Java flaky test infrastructure to existing 
controls
..


Patch Set 1:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/12917/1/build-support/dist_test.py
File build-support/dist_test.py:

http://gerrit.cloudera.org:8080/#/c/12917/1/build-support/dist_test.py@369
PS1, Line 369: test_name = ".".join(k.split(".")[:-1]) if 
TEST_SHARD_RE.search(k) else k
> If I understand right, this doesn't mangle the java names because there is
Correct. And it looks like every class/package identifier needs to have at 
least one letter[1], so no Java test should ever match this regex.

1. https://stackoverflow.com/a/65490


http://gerrit.cloudera.org:8080/#/c/12917/1/build-support/jenkins/build-and-test.sh
File build-support/jenkins/build-and-test.sh:

http://gerrit.cloudera.org:8080/#/c/12917/1/build-support/jenkins/build-and-test.sh@246
PS1, Line 246:   EXTRA_GRADLE_TEST_FLAGS="--tests $t 
$EXTRA_GRADLE_TEST_FLAGS"
> Will this accidentally pass the flaky c++ tests to the Gradle --tests flag?
Not accidentally, but yes. :)

As I wrote in the commit message, the shortcut that simplified this patch was 
to _not_ separate the list of C++ and Java flakies. We can get away with it 
because:
1. There are no collisions between C++/Java test names.
2. Both ctest (with "-R $test_regex") and gradle (with multiple "--tests $t" 
entries) happily ignore tests they can't find.

A cleaner way would be to enforce the separation, but that means doing one of:
1. Splitting the unified flaky list here into two lists based on some 
name-based criteria.
2. Adding another flaky list retrieval endpoint to test_result_server.py and 
modifying the SQL queries to do the same name-based criteria.
3. Like #2 but modifying the reporting to include a bool for C++ or Java test, 
and changing the results schema accordingly.

I opted for quick-and-dirty.


http://gerrit.cloudera.org:8080/#/c/12917/1/java/kudu-test-utils/src/main/java/org/apache/kudu/test/junit/RetryRule.java
File 
java/kudu-test-utils/src/main/java/org/apache/kudu/test/junit/RetryRule.java:

http://gerrit.cloudera.org:8080/#/c/12917/1/java/kudu-test-utils/src/main/java/org/apache/kudu/test/junit/RetryRule.java@51
PS1, Line 51:   private static final int MAYBE_RETRY = 0;
> Might be more clear to call this NO_EXPLICIT_RETRIES or something. Alternat
Done


http://gerrit.cloudera.org:8080/#/c/12917/1/java/kudu-test-utils/src/main/java/org/apache/kudu/test/junit/RetryRule.java@57
PS1, Line 57: this(MAYBE_RETRY, /*skipReporting=*/ false);
> Should this be DEFAULT_RETRY_COUNT?
Done


http://gerrit.cloudera.org:8080/#/c/12917/1/java/kudu-test-utils/src/main/java/org/apache/kudu/test/junit/RetryRule.java@72
PS1, Line 72: String value = System.getenv("KUDU_FLAKY_TEST_LIST");
> I am not sure how costly these System.getenv calls are, but assuming this K
System.getenv() should be cheap; it's not even a system call [1]. I don't think 
a map would save us any time, as we only call retryThisTest() once per 
RetryRule.

Unless you're suggesting a static map created in a static block. I find static 
blocks somewhat icky, but it would reduce the number of times we read the file, 
so maybe it is worth doing.

1. 
https://stackoverflow.com/questions/46623018/does-a-program-make-a-system-call-to-get-the-value-of-an-environment-variable-in


http://gerrit.cloudera.org:8080/#/c/12917/1/java/kudu-test-utils/src/main/java/org/apache/kudu/test/junit/RetryRule.java@128
PS1, Line 128: reporter != null
> Do we need this condition? I think below it would always result in `actualR
Yes, counter-intuitively it's OK to construct a RetryStatement with 
actualRetryCount == 0, because that's a test that won't be retried but will 
report its results. Put another way, reporting and retrying should be 
independently controllable (see the comment just above).

I don't think we take advantage of this particular combination, but the C++ 
tests support it, so I figured I'd at least reach parity with them.



--
To view, visit http://gerrit.cloudera.org:8080/12917
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia89598d7eeb5ab642ab4ebb7aa583adcce770eae
Gerrit-Change-Number: 12917
Gerrit-PatchSet: 1
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Mike Percy 
Gerrit-Comment-Date: Wed, 03 Apr 2019 17:51:52 +
Gerrit-HasComments: Yes


[kudu-CR] build: adapt new Java flaky test infrastructure to existing controls

2019-04-03 Thread Grant Henke (Code Review)
Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12917 )

Change subject: build: adapt new Java flaky test infrastructure to existing 
controls
..


Patch Set 1:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/12917/1/build-support/dist_test.py
File build-support/dist_test.py:

http://gerrit.cloudera.org:8080/#/c/12917/1/build-support/dist_test.py@369
PS1, Line 369: test_name = ".".join(k.split(".")[:-1]) if 
TEST_SHARD_RE.search(k) else k
If I understand right, this doesn't mangle the java names because there is no 
java class or package that is purely numerical?


http://gerrit.cloudera.org:8080/#/c/12917/1/build-support/jenkins/build-and-test.sh
File build-support/jenkins/build-and-test.sh:

http://gerrit.cloudera.org:8080/#/c/12917/1/build-support/jenkins/build-and-test.sh@246
PS1, Line 246:   EXTRA_GRADLE_TEST_FLAGS="--tests $t 
$EXTRA_GRADLE_TEST_FLAGS"
Will this accidentally pass the flaky c++ tests to the Gradle --tests flag?


http://gerrit.cloudera.org:8080/#/c/12917/1/java/kudu-test-utils/src/main/java/org/apache/kudu/test/junit/RetryRule.java
File 
java/kudu-test-utils/src/main/java/org/apache/kudu/test/junit/RetryRule.java:

http://gerrit.cloudera.org:8080/#/c/12917/1/java/kudu-test-utils/src/main/java/org/apache/kudu/test/junit/RetryRule.java@51
PS1, Line 51:   private static final int MAYBE_RETRY = 0;
Might be more clear to call this NO_EXPLICIT_RETRIES or something. 
Alternatively, I am not sure you need a constant for the value 0. It would make 
the `retryCount != MAYBE_RETRY;` line below easier to read to just hard code 0.


http://gerrit.cloudera.org:8080/#/c/12917/1/java/kudu-test-utils/src/main/java/org/apache/kudu/test/junit/RetryRule.java@57
PS1, Line 57: this(MAYBE_RETRY, /*skipReporting=*/ false);
Should this be DEFAULT_RETRY_COUNT?


http://gerrit.cloudera.org:8080/#/c/12917/1/java/kudu-test-utils/src/main/java/org/apache/kudu/test/junit/RetryRule.java@72
PS1, Line 72: String value = System.getenv("KUDU_FLAKY_TEST_LIST");
I am not sure how costly these System.getenv calls are, but assuming this 
KUDU_FLAKY_TEST_LIST could be fairly large, can we populate a map when we 
initialize the rule and consult it in this method?

We could read/parse KUDU_RETRY_ALL_FAILED_TESTS and KUDU_FLAKY_TEST_ATTEMPTS at 
construction time too, though the impact is likely lower.


http://gerrit.cloudera.org:8080/#/c/12917/1/java/kudu-test-utils/src/main/java/org/apache/kudu/test/junit/RetryRule.java@128
PS1, Line 128: reporter != null
Do we need this condition? I think below it would always result in 
`actualRetryCount = 0` in the case one of the others isn't also true.



--
To view, visit http://gerrit.cloudera.org:8080/12917
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia89598d7eeb5ab642ab4ebb7aa583adcce770eae
Gerrit-Change-Number: 12917
Gerrit-PatchSet: 1
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Mike Percy 
Gerrit-Comment-Date: Wed, 03 Apr 2019 14:11:04 +
Gerrit-HasComments: Yes


[kudu-CR] build: adapt new Java flaky test infrastructure to existing controls

2019-04-02 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12917 )

Change subject: build: adapt new Java flaky test infrastructure to existing 
controls
..


Patch Set 1: Verified+1

Overriding Jenkins. Several Java tests failed, but I think these are known 
flakes that were previously retried because we retried all tests three times. 
One was KUDU-1521 and there may not be JIRAs open for the others.

I expect that once submitted we'll start tracking these failures in the 
dashboard, and then we'll retry them automatically.


--
To view, visit http://gerrit.cloudera.org:8080/12917
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia89598d7eeb5ab642ab4ebb7aa583adcce770eae
Gerrit-Change-Number: 12917
Gerrit-PatchSet: 1
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy 
Gerrit-Comment-Date: Wed, 03 Apr 2019 03:32:47 +
Gerrit-HasComments: No


[kudu-CR] build: adapt new Java flaky test infrastructure to existing controls

2019-04-02 Thread Adar Dembo (Code Review)
Adar Dembo has removed Kudu Jenkins from this change.  ( 
http://gerrit.cloudera.org:8080/12917 )

Change subject: build: adapt new Java flaky test infrastructure to existing 
controls
..


Removed reviewer Kudu Jenkins with the following votes:

* Verified-1 by Kudu Jenkins (120)
--
To view, visit http://gerrit.cloudera.org:8080/12917
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteReviewer
Gerrit-Change-Id: Ia89598d7eeb5ab642ab4ebb7aa583adcce770eae
Gerrit-Change-Number: 12917
Gerrit-PatchSet: 1
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Mike Percy 


[kudu-CR] build: adapt new Java flaky test infrastructure to existing controls

2019-04-02 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12917 )

Change subject: build: adapt new Java flaky test infrastructure to existing 
controls
..


Patch Set 1:

Here's what I manually tested:

dist-test + per-test flaky resistance:  
KUDU_FLAKY_TEST_ATTEMPTS=3 TEST_RESULT_SERVER=... build_and_test.sh

report results + all tests flaky resistance: KUDU_REPORT_TEST_RESULTS=1 
KUDU_FLAKY_TEST_ATTEMPTS=3 TEST_RESULT_SERVER=... KUDU_RETRY_ALL_FAILED_TESTS=1 
build_and_test.sh

report results + per-test flaky resistance + run just the flakes: 
KUDU_REPORT_TEST_RESULTS=1 KUDU_FLAKY_TEST_ATTEMPTS=3 TEST_RESULT_SERVER=... 
RUN_FLAKY_ONLY=1 ERROR_ON_TEST_FAILURE=0 build_and_test.sh

Java-only report results:  gradle -PrerunTests 
test --continue

Java-only report results + per-test flaky resistance: 
KUDU_FLAKY_TEST_ATTEMPTS=3 TEST_RESULT_SERVER=... KUDU_FLAKY_TEST_LIST=... 
gradle -PrerunTests test --continue

Java-only report results + all tests flaky resistance: 
KUDU_FLAKY_TEST_ATTEMPTS=3 TEST_RESULT_SERVER=... KUDU_RETRY_ALL_FAILED_TESTS=1 
gradle -PrerunTests test --continue

I think this generally reflects the myriad of combinations that we use, but let 
me know if I've missed something.


--
To view, visit http://gerrit.cloudera.org:8080/12917
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia89598d7eeb5ab642ab4ebb7aa583adcce770eae
Gerrit-Change-Number: 12917
Gerrit-PatchSet: 1
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy 
Gerrit-Comment-Date: Wed, 03 Apr 2019 00:59:40 +
Gerrit-HasComments: No


[kudu-CR] build: adapt new Java flaky test infrastructure to existing controls

2019-04-02 Thread Adar Dembo (Code Review)
Hello Mike Percy, Andrew Wong, Grant Henke,

I'd like you to do a code review. Please visit

http://gerrit.cloudera.org:8080/12917

to review the following change.


Change subject: build: adapt new Java flaky test infrastructure to existing 
controls
..

build: adapt new Java flaky test infrastructure to existing controls

Now that Java tests are reporting success/failure, we can use the existing
flaky test controls to drive it. As a refresher, the C++ tests rely on these
environment variables:
- RUN_FLAKY_ONLY: whether to run just flaky tests or all tests
- KUDU_FLAKY_TEST_ATTEMPTS: number of attempts for flaky tests
- KUDU_FLAKY_TEST_LIST: path to list of flaky tests, one on each line
- KUDU_RETRY_ALL_FAILED_TESTS: whether to retry all tests or just the ones
   in the flaky test list

The algorithm is roughly:
  if RUN_FLAKY_ONLY or KUDU_FLAKY_TEST_ATTEMPTS > 1:
populate KUDU_FLAKY_TEST_LIST from test result server

  if RUN_FLAKY_ONLY:
testset = tests listed in KUDU_FLAKY_TEST_LIST
  else:
testset = all tests

  for t in testset:
if KUDU_RETRY_ALL_FAILED_TESTS or (KUDU_FLAKY_TEST_LIST and
   t in KUDU_FLAKY_TEST_LIST):
  num_attempts = KUDU_FLAKY_TEST_ATTEMPTS (or 1 if unset)
else:
  num_attempts = 1

run t up to num_attempts times

You can see it at work in build-and-test.sh/run-test.sh. You can also see it
in dist-test.py though notably, it doesn't care about RUN_FLAKY_ONLY because
we never used that particular combination (presumably the list of flaky
tests is short enough that it wouldn't benefit from distributed testing).

This patch attempts to mirror these exact semantics for Java tests. Here are
the interesting changes:
- In RetryRule, rerunFailingTestsCount is gone. The behavior is informed via
  the aforementioned environment variables instead.
- In build-and-test.sh, if RUN_FLAKY_ONLY is set, parse the flaky test list
  into a series of --tests gradle command line arguments.
- In dist-test.py, opt into the C++ flaky test handling (which reflects the
  above algorithm). There are also some small changes to flaky handling to
  accommodate Java's per-method flaky test tracking.

Note: all of this assumes that there's no overlap between the names of any
C++ or Java tests, which is currently true as all C++ tests have names like
"tablet-test" or "master_cert_authority-itest" while all Java tests are
prefixed with "org.apache.kudu...". If this were to change, we'd need to
properly "namespace" the test results in the reporting infrastructure and
fetch the flaky test lists separately for C++ and Java tests. For now
there's just one flaky test list, and both ctest and gradle are OK with
being asked to run irrelevant tests (they'll just be ignored).

Change-Id: Ia89598d7eeb5ab642ab4ebb7aa583adcce770eae
---
M build-support/dist_test.py
M build-support/jenkins/build-and-test.sh
M java/gradle/tests.gradle
M java/kudu-test-utils/src/main/java/org/apache/kudu/test/junit/RetryRule.java
4 files changed, 139 insertions(+), 31 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/17/12917/1
-- 
To view, visit http://gerrit.cloudera.org:8080/12917
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ia89598d7eeb5ab642ab4ebb7aa583adcce770eae
Gerrit-Change-Number: 12917
Gerrit-PatchSet: 1
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Mike Percy