Hello Alexey Serbin, Kudu Jenkins, Adar Dembo, I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/13454 to look at the new patch set (#2). Change subject: [java] Attempt to deflake SecureKuduSinkTest ...................................................................... [java] Attempt to deflake SecureKuduSinkTest Occasionally, SecureKuduSinkTest fails when running with TSAN binaries because of the following sequence of operations: 1. The Kerberos ticket lifetime is set to 10s. 2. The test sets up a mini kudu cluster. This first sets up the KDC, which creates credentials for all of the Kudu servers and kinits using test user credentials for the test process. 3. The setup of the cluster takes > 10s. 4. At the end of the cluster setup, the test checks that setup succeeded in part by issuing a ListTabletServers RPC. This fails because the test user ticket has expired. 5. The test fails because it can't set up the cluster. The failure looks like 21:50:06.500 [ERROR - main] (RetryRule.java:217) org.apache.kudu.flume.sink.SecureKuduSinkTest.testEventsWithShortTickets: failed attempt 1 java.io.IOException: ListTabletServers RPC failed: Client connection negotiation failed: client connection to 127.12.111.60:36425: server requires authentication, but client does not have Kerberos credentials available at org.apache.kudu.test.cluster.MiniKuduCluster.sendRequestToCluster(MiniKuduCluster.java:169) ~[kudu-test-utils-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT] at org.apache.kudu.test.cluster.MiniKuduCluster.start(MiniKuduCluster.java:234) ~[kudu-test-utils-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT] at org.apache.kudu.test.cluster.MiniKuduCluster.access$300(MiniKuduCluster.java:71) ~[kudu-test-utils-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT] at org.apache.kudu.test.cluster.MiniKuduCluster$MiniKuduClusterBuilder.build(MiniKuduCluster.java:658) ~[kudu-test-utils-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT] at org.apache.kudu.test.KuduTestHarness.before(KuduTestHarness.java:140) ~[kudu-test-utils-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT] at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:46) ~[junit-4.12.jar:4.12] at org.apache.kudu.test.junit.RetryRule$RetryStatement.doOneAttempt(RetryRule.java:215) [kudu-test-utils-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT] at org.apache.kudu.test.junit.RetryRule$RetryStatement.evaluate(RetryRule.java:232) [kudu-test-utils-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT] ... This patch attempts to deflake the test a bit by doubling the ticket lifetime to 20s. It also raises the renewal lifetime to 35s from 30s, to provide a bit of extra time between the ticket expiring and when Flume needs to renew the ticket. Before, the test waited 2x the renewable ticket lifetime. I made it so the test waits until the renewable ticket lifetime plus one second has passed, including the time spent in the test so far. I tried to test this on dist-test using TSAN binaries. With the new patch I saw 0/1000 failures, but without it I saw 830/1000 failures. That's *way* flakier than any previous indication, so I don't trust those results. The failures I sampled did seem to be related to the same issue, but it was ConnectToCluster RPCs failing instead. Change-Id: Icc936878d7f1496905e83ddaf93b9b049f417f72 --- M java/kudu-flume-sink/src/test/java/org/apache/kudu/flume/sink/SecureKuduSinkTest.java 1 file changed, 9 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/54/13454/2 -- To view, visit http://gerrit.cloudera.org:8080/13454 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Icc936878d7f1496905e83ddaf93b9b049f417f72 Gerrit-Change-Number: 13454 Gerrit-PatchSet: 2 Gerrit-Owner: Will Berkeley <wdberke...@gmail.com> Gerrit-Reviewer: Adar Dembo <a...@cloudera.com> Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Will Berkeley <wdberke...@gmail.com>