[ https://issues.apache.org/jira/browse/HADOOP-12622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15163357#comment-15163357 ]
Hadoop QA commented on HADOOP-12622: ------------------------------------ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 28s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 0s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 24s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 23s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 8s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 41s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 0s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 9s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 45s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 51s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 51s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 22s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 22s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 23s {color} | {color:green} hadoop-common-project/hadoop-common: patch generated 0 new + 44 unchanged - 2 fixed = 44 total (was 46) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 7s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 55s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 58s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 9s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 17m 4s {color} | {color:red} hadoop-common in the patch failed with JDK v1.8.0_72. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 47s {color} | {color:green} hadoop-common in the patch passed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 25s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 75m 52s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_72 Timed out junit tests | org.apache.hadoop.http.TestHttpServerLifecycle | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12789587/HADOOP-12622-v5.patch | | JIRA Issue | HADOOP-12622 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 35abd7d03aa3 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 3369a4f | | Default Java | 1.7.0_95 | | Multi-JDK versions | /usr/lib/jvm/java-8-oracle:1.8.0_72 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95 | | findbugs | v3.0.0 | | unit | https://builds.apache.org/job/PreCommit-HADOOP-Build/8705/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common-jdk1.8.0_72.txt | | unit test logs | https://builds.apache.org/job/PreCommit-HADOOP-Build/8705/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common-jdk1.8.0_72.txt | | JDK v1.7.0_95 Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/8705/testReport/ | | modules | C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/8705/console | | Powered by | Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > RetryPolicies (other than FailoverOnNetworkExceptionRetry) should put on > retry failed reason or the log from RMProxy's retry could be very misleading. > ------------------------------------------------------------------------------------------------------------------------------------------------------ > > Key: HADOOP-12622 > URL: https://issues.apache.org/jira/browse/HADOOP-12622 > Project: Hadoop Common > Issue Type: Bug > Components: auto-failover > Affects Versions: 2.6.0, 2.7.0 > Reporter: Junping Du > Assignee: Junping Du > Priority: Critical > Attachments: HADOOP-12622-v2.patch, HADOOP-12622-v3.1.patch, > HADOOP-12622-v3.patch, HADOOP-12622-v4.patch, HADOOP-12622-v5.patch, > HADOOP-12622.patch > > > In debugging a NM retry connection to RM (non-HA), the NM log during RM down > time is very misleading: > {noformat} > 2015-12-07 11:37:14,098 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: 0.0.0.0/0.0.0.0:8031. Already tried 0 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > 2015-12-07 11:37:15,099 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: 0.0.0.0/0.0.0.0:8031. Already tried 1 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > 2015-12-07 11:37:16,101 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: 0.0.0.0/0.0.0.0:8031. Already tried 2 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > 2015-12-07 11:37:17,103 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: 0.0.0.0/0.0.0.0:8031. Already tried 3 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > 2015-12-07 11:37:18,105 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: 0.0.0.0/0.0.0.0:8031. Already tried 4 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > 2015-12-07 11:37:19,107 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: 0.0.0.0/0.0.0.0:8031. Already tried 5 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > 2015-12-07 11:37:20,109 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: 0.0.0.0/0.0.0.0:8031. Already tried 6 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > 2015-12-07 11:37:21,112 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: 0.0.0.0/0.0.0.0:8031. Already tried 7 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > 2015-12-07 11:37:22,113 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: 0.0.0.0/0.0.0.0:8031. Already tried 8 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > 2015-12-07 11:37:23,115 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: 0.0.0.0/0.0.0.0:8031. Already tried 9 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > 2015-12-07 11:37:54,120 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: 0.0.0.0/0.0.0.0:8031. Already tried 0 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > 2015-12-07 11:37:55,121 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: 0.0.0.0/0.0.0.0:8031. Already tried 1 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > 2015-12-07 11:37:56,123 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: 0.0.0.0/0.0.0.0:8031. Already tried 2 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > 2015-12-07 11:37:57,125 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: 0.0.0.0/0.0.0.0:8031. Already tried 3 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > 2015-12-07 11:37:58,126 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: 0.0.0.0/0.0.0.0:8031. Already tried 4 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > 2015-12-07 11:37:59,128 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: 0.0.0.0/0.0.0.0:8031. Already tried 5 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > 2015-12-07 11:38:00,130 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: 0.0.0.0/0.0.0.0:8031. Already tried 6 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > {noformat} > It actually only log client side retry on NetworkConnection failure but not > include any info on RetryInvocationHandler where the real retry policy works. > From the code below in RetryInvocationHandler.java, even the retry ends, we > don't put warn messages to include how much/many time/ counts we spent on > retry logic that make it harder to debug. > {code} > if (failAction != null) { > if (failAction.reason != null) { > LOG.warn("Exception while invoking " + > currentProxy.proxy.getClass() > + "." + method.getName() + " over " + currentProxy.proxyInfo > + ". Not retrying because " + failAction.reason, ex); > } > throw ex; > } > {code} > We should add failAction.reason as much as we can in multiple retry policies. > In addition, we should keep consistent in log level for message during the > retry attempts: now the ipc.client is INFO, but RetryInvocationHandler is > DEBUG (if not fail_over). We should keep them consistent or it could be very > confusing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)