[ https://issues.apache.org/jira/browse/HADOOP-15129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16321037#comment-16321037 ]
genericqa commented on HADOOP-15129: ------------------------------------ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 9s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 40s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 54s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 11m 43s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 39s{color} | {color:orange} hadoop-common-project/hadoop-common: The patch generated 1 new + 174 unchanged - 0 fixed = 175 total (was 174) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 55s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 43s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 32s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 80m 33s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | HADOOP-15129 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12905521/HADOOP-15129.002.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 47dbfee78bbd 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 12d0645 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-HADOOP-Build/13949/artifact/out/diff-checkstyle-hadoop-common-project_hadoop-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/13949/testReport/ | | Max. process+thread count | 1467 (vs. ulimit of 5000) | | modules | C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/13949/console | | Powered by | Apache Yetus 0.7.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Datanode caches namenode DNS lookup failure and cannot startup > -------------------------------------------------------------- > > Key: HADOOP-15129 > URL: https://issues.apache.org/jira/browse/HADOOP-15129 > Project: Hadoop Common > Issue Type: Bug > Components: ipc > Affects Versions: 2.8.2 > Environment: Google Compute Engine. > I'm using Java 8, Debian 8, Hadoop 2.8.2. > Reporter: Karthik Palaniappan > Assignee: Karthik Palaniappan > Priority: Minor > Attachments: HADOOP-15129.001.patch, HADOOP-15129.002.patch > > > On startup, the Datanode creates an InetSocketAddress to register with each > namenode. Though there are retries on connection failure throughout the > stack, the same InetSocketAddress is reused. > InetSocketAddress is an interesting class, because it resolves DNS names to > IP addresses on construction, and it is never refreshed. Hadoop re-creates an > InetSocketAddress in some cases just in case the remote IP has changed for a > particular DNS name: https://issues.apache.org/jira/browse/HADOOP-7472. > Anyway, on startup, you cna see the Datanode log: "Namenode...remains > unresolved" -- referring to the fact that DNS lookup failed. > {code:java} > 2017-11-02 16:01:55,115 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Refresh request received for nameservices: null > 2017-11-02 16:01:55,153 WARN org.apache.hadoop.hdfs.DFSUtilClient: Namenode > for null remains unresolved for ID null. Check your hdfs-site.xml file to > ensure namenodes are configured properly. > 2017-11-02 16:01:55,156 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Starting BPOfferServices for nameservices: <default> > 2017-11-02 16:01:55,169 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Block pool <registering> (Datanode Uuid unassigned) service to > cluster-32f5-m:8020 starting to offer service > {code} > The Datanode then proceeds to use this unresolved address, as it may work if > the DN is configured to use a proxy. Since I'm not using a proxy, it forever > prints out this message: > {code:java} > 2017-12-15 00:13:40,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:13:45,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:13:50,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:13:55,713 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:14:00,713 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > {code} > Unfortunately, the log doesn't contain the exception that triggered it, but > the culprit is actually in IPC Client: > https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java#L444. > This line was introduced in https://issues.apache.org/jira/browse/HADOOP-487 > to give a clear error message when somebody mispells an address. > However, the fix in HADOOP-7472 doesn't apply here, because that code happens > in Client#getConnection after the Connection is constructed. > My proposed fix (will attach a patch) is to move this exception out of the > constructor and into a place that will trigger HADOOP-7472's logic to > re-resolve addresses. If the DNS failure was temporary, this will allow the > connection to succeed. If not, the connection will fail after ipc client > retries (default 10 seconds worth of retries). > I want to fix this in ipc client rather than just in Datanode startup, as > this fixes temporary DNS issues for all of Hadoop. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org