[jira] [Commented] (HADOOP-15129) Datanode caches namenode DNS lookup failure and cannot startup
[ https://issues.apache.org/jira/browse/HADOOP-15129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17408264#comment-17408264 ] Chris Nauroth commented on HADOOP-15129: [~ywskycn], thank you for the +1. I'm planning to commit this but will give it a little more time in case the earlier reviewers still want to add feedback. > Datanode caches namenode DNS lookup failure and cannot startup > -- > > Key: HADOOP-15129 > URL: https://issues.apache.org/jira/browse/HADOOP-15129 > Project: Hadoop Common > Issue Type: Bug > Components: ipc >Affects Versions: 2.8.2 > Environment: Google Compute Engine. > I'm using Java 8, Debian 8, Hadoop 2.8.2. >Reporter: Karthik Palaniappan >Assignee: Chris Nauroth >Priority: Minor > Labels: pull-request-available > Attachments: HADOOP-15129.001.patch, HADOOP-15129.002.patch > > Time Spent: 20m > Remaining Estimate: 0h > > On startup, the Datanode creates an InetSocketAddress to register with each > namenode. Though there are retries on connection failure throughout the > stack, the same InetSocketAddress is reused. > InetSocketAddress is an interesting class, because it resolves DNS names to > IP addresses on construction, and it is never refreshed. Hadoop re-creates an > InetSocketAddress in some cases just in case the remote IP has changed for a > particular DNS name: https://issues.apache.org/jira/browse/HADOOP-7472. > Anyway, on startup, you cna see the Datanode log: "Namenode...remains > unresolved" -- referring to the fact that DNS lookup failed. > {code:java} > 2017-11-02 16:01:55,115 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Refresh request received for nameservices: null > 2017-11-02 16:01:55,153 WARN org.apache.hadoop.hdfs.DFSUtilClient: Namenode > for null remains unresolved for ID null. Check your hdfs-site.xml file to > ensure namenodes are configured properly. > 2017-11-02 16:01:55,156 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Starting BPOfferServices for nameservices: > 2017-11-02 16:01:55,169 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Block pool (Datanode Uuid unassigned) service to > cluster-32f5-m:8020 starting to offer service > {code} > The Datanode then proceeds to use this unresolved address, as it may work if > the DN is configured to use a proxy. Since I'm not using a proxy, it forever > prints out this message: > {code:java} > 2017-12-15 00:13:40,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:13:45,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:13:50,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:13:55,713 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:14:00,713 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > {code} > Unfortunately, the log doesn't contain the exception that triggered it, but > the culprit is actually in IPC Client: > https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java#L444. > This line was introduced in https://issues.apache.org/jira/browse/HADOOP-487 > to give a clear error message when somebody mispells an address. > However, the fix in HADOOP-7472 doesn't apply here, because that code happens > in Client#getConnection after the Connection is constructed. > My proposed fix (will attach a patch) is to move this exception out of the > constructor and into a place that will trigger HADOOP-7472's logic to > re-resolve addresses. If the DNS failure was temporary, this will allow the > connection to succeed. If not, the connection will fail after ipc client > retries (default 10 seconds worth of retries). > I want to fix this in ipc client rather than just in Datanode startup, as > this fixes temporary DNS issues for all of Hadoop. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15129) Datanode caches namenode DNS lookup failure and cannot startup
[ https://issues.apache.org/jira/browse/HADOOP-15129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17406251#comment-17406251 ] Chris Nauroth commented on HADOOP-15129: I forgot to mention one other change I made from the prior patch. I incorporated {{NetUtils.getHostname()}} into the exception message, but now I see Arpit raised a point in an earlier comment about multi-homed configurations. If it's preferred, I'm happy to switch that back to {{null}} and leave it undetermined in the exception message. > Datanode caches namenode DNS lookup failure and cannot startup > -- > > Key: HADOOP-15129 > URL: https://issues.apache.org/jira/browse/HADOOP-15129 > Project: Hadoop Common > Issue Type: Bug > Components: ipc >Affects Versions: 2.8.2 > Environment: Google Compute Engine. > I'm using Java 8, Debian 8, Hadoop 2.8.2. >Reporter: Karthik Palaniappan >Assignee: Chris Nauroth >Priority: Minor > Labels: pull-request-available > Attachments: HADOOP-15129.001.patch, HADOOP-15129.002.patch > > Time Spent: 20m > Remaining Estimate: 0h > > On startup, the Datanode creates an InetSocketAddress to register with each > namenode. Though there are retries on connection failure throughout the > stack, the same InetSocketAddress is reused. > InetSocketAddress is an interesting class, because it resolves DNS names to > IP addresses on construction, and it is never refreshed. Hadoop re-creates an > InetSocketAddress in some cases just in case the remote IP has changed for a > particular DNS name: https://issues.apache.org/jira/browse/HADOOP-7472. > Anyway, on startup, you cna see the Datanode log: "Namenode...remains > unresolved" -- referring to the fact that DNS lookup failed. > {code:java} > 2017-11-02 16:01:55,115 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Refresh request received for nameservices: null > 2017-11-02 16:01:55,153 WARN org.apache.hadoop.hdfs.DFSUtilClient: Namenode > for null remains unresolved for ID null. Check your hdfs-site.xml file to > ensure namenodes are configured properly. > 2017-11-02 16:01:55,156 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Starting BPOfferServices for nameservices: > 2017-11-02 16:01:55,169 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Block pool (Datanode Uuid unassigned) service to > cluster-32f5-m:8020 starting to offer service > {code} > The Datanode then proceeds to use this unresolved address, as it may work if > the DN is configured to use a proxy. Since I'm not using a proxy, it forever > prints out this message: > {code:java} > 2017-12-15 00:13:40,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:13:45,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:13:50,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:13:55,713 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:14:00,713 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > {code} > Unfortunately, the log doesn't contain the exception that triggered it, but > the culprit is actually in IPC Client: > https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java#L444. > This line was introduced in https://issues.apache.org/jira/browse/HADOOP-487 > to give a clear error message when somebody mispells an address. > However, the fix in HADOOP-7472 doesn't apply here, because that code happens > in Client#getConnection after the Connection is constructed. > My proposed fix (will attach a patch) is to move this exception out of the > constructor and into a place that will trigger HADOOP-7472's logic to > re-resolve addresses. If the DNS failure was temporary, this will allow the > connection to succeed. If not, the connection will fail after ipc client > retries (default 10 seconds worth of retries). > I want to fix this in ipc client rather than just in Datanode startup, as > this fixes temporary DNS issues for all of Hadoop. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15129) Datanode caches namenode DNS lookup failure and cannot startup
[ https://issues.apache.org/jira/browse/HADOOP-15129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17406117#comment-17406117 ] Chris Nauroth commented on HADOOP-15129: Let's please also maintain credit for Karthik. (I submitted my patch with a Co-authored-by tag.) > Datanode caches namenode DNS lookup failure and cannot startup > -- > > Key: HADOOP-15129 > URL: https://issues.apache.org/jira/browse/HADOOP-15129 > Project: Hadoop Common > Issue Type: Bug > Components: ipc >Affects Versions: 2.8.2 > Environment: Google Compute Engine. > I'm using Java 8, Debian 8, Hadoop 2.8.2. >Reporter: Karthik Palaniappan >Assignee: Chris Nauroth >Priority: Minor > Labels: pull-request-available > Attachments: HADOOP-15129.001.patch, HADOOP-15129.002.patch > > Time Spent: 10m > Remaining Estimate: 0h > > On startup, the Datanode creates an InetSocketAddress to register with each > namenode. Though there are retries on connection failure throughout the > stack, the same InetSocketAddress is reused. > InetSocketAddress is an interesting class, because it resolves DNS names to > IP addresses on construction, and it is never refreshed. Hadoop re-creates an > InetSocketAddress in some cases just in case the remote IP has changed for a > particular DNS name: https://issues.apache.org/jira/browse/HADOOP-7472. > Anyway, on startup, you cna see the Datanode log: "Namenode...remains > unresolved" -- referring to the fact that DNS lookup failed. > {code:java} > 2017-11-02 16:01:55,115 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Refresh request received for nameservices: null > 2017-11-02 16:01:55,153 WARN org.apache.hadoop.hdfs.DFSUtilClient: Namenode > for null remains unresolved for ID null. Check your hdfs-site.xml file to > ensure namenodes are configured properly. > 2017-11-02 16:01:55,156 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Starting BPOfferServices for nameservices: > 2017-11-02 16:01:55,169 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Block pool (Datanode Uuid unassigned) service to > cluster-32f5-m:8020 starting to offer service > {code} > The Datanode then proceeds to use this unresolved address, as it may work if > the DN is configured to use a proxy. Since I'm not using a proxy, it forever > prints out this message: > {code:java} > 2017-12-15 00:13:40,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:13:45,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:13:50,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:13:55,713 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:14:00,713 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > {code} > Unfortunately, the log doesn't contain the exception that triggered it, but > the culprit is actually in IPC Client: > https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java#L444. > This line was introduced in https://issues.apache.org/jira/browse/HADOOP-487 > to give a clear error message when somebody mispells an address. > However, the fix in HADOOP-7472 doesn't apply here, because that code happens > in Client#getConnection after the Connection is constructed. > My proposed fix (will attach a patch) is to move this exception out of the > constructor and into a place that will trigger HADOOP-7472's logic to > re-resolve addresses. If the DNS failure was temporary, this will allow the > connection to succeed. If not, the connection will fail after ipc client > retries (default 10 seconds worth of retries). > I want to fix this in ipc client rather than just in Datanode startup, as > this fixes temporary DNS issues for all of Hadoop. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15129) Datanode caches namenode DNS lookup failure and cannot startup
[ https://issues.apache.org/jira/browse/HADOOP-15129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17406113#comment-17406113 ] Hadoop QA commented on HADOOP-15129: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 12s{color} | {color:red}{color} | {color:red} HADOOP-15129 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HADOOP-15129 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12905521/HADOOP-15129.002.patch | | Console output | https://ci-hadoop.apache.org/job/PreCommit-HADOOP-Build/229/console | | versions | git=2.17.1 | | Powered by | Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org | This message was automatically generated. > Datanode caches namenode DNS lookup failure and cannot startup > -- > > Key: HADOOP-15129 > URL: https://issues.apache.org/jira/browse/HADOOP-15129 > Project: Hadoop Common > Issue Type: Bug > Components: ipc >Affects Versions: 2.8.2 > Environment: Google Compute Engine. > I'm using Java 8, Debian 8, Hadoop 2.8.2. >Reporter: Karthik Palaniappan >Assignee: Karthik Palaniappan >Priority: Minor > Labels: pull-request-available > Attachments: HADOOP-15129.001.patch, HADOOP-15129.002.patch > > Time Spent: 10m > Remaining Estimate: 0h > > On startup, the Datanode creates an InetSocketAddress to register with each > namenode. Though there are retries on connection failure throughout the > stack, the same InetSocketAddress is reused. > InetSocketAddress is an interesting class, because it resolves DNS names to > IP addresses on construction, and it is never refreshed. Hadoop re-creates an > InetSocketAddress in some cases just in case the remote IP has changed for a > particular DNS name: https://issues.apache.org/jira/browse/HADOOP-7472. > Anyway, on startup, you cna see the Datanode log: "Namenode...remains > unresolved" -- referring to the fact that DNS lookup failed. > {code:java} > 2017-11-02 16:01:55,115 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Refresh request received for nameservices: null > 2017-11-02 16:01:55,153 WARN org.apache.hadoop.hdfs.DFSUtilClient: Namenode > for null remains unresolved for ID null. Check your hdfs-site.xml file to > ensure namenodes are configured properly. > 2017-11-02 16:01:55,156 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Starting BPOfferServices for nameservices: > 2017-11-02 16:01:55,169 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Block pool (Datanode Uuid unassigned) service to > cluster-32f5-m:8020 starting to offer service > {code} > The Datanode then proceeds to use this unresolved address, as it may work if > the DN is configured to use a proxy. Since I'm not using a proxy, it forever > prints out this message: > {code:java} > 2017-12-15 00:13:40,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:13:45,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:13:50,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:13:55,713 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:14:00,713 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > {code} > Unfortunately, the log doesn't contain the exception that triggered it, but > the culprit is actually in IPC Client: > https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java#L444. > This line was introduced in https://issues.apache.org/jira/browse/HADOOP-487 > to give a clear error message when somebody mispells an address. > However, the fix in HADOOP-7472 doesn't apply here, because that code happens > in Client#getConnection after the Connection is constructed. > My proposed fix (will attach a patch) is to move this exception out of the > constructor and into a place that will trigger HADOOP-7472's logic to > re-resolve addresses. If the DNS failure was temporary, this will allow the > connection to succeed. If not, the connection will fail after ipc client > retries (default 10 seconds worth of ret
[jira] [Commented] (HADOOP-15129) Datanode caches namenode DNS lookup failure and cannot startup
[ https://issues.apache.org/jira/browse/HADOOP-15129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16718798#comment-16718798 ] Ravi Prakash commented on HADOOP-15129: --- Hi Karthik! Thanks for your contribution. Could you please rebase the patch to the latest trunk? I usually apply patches using {code:java} $ git apply {code} A few suggestions: # Could you please use short descriptions in JIRA? I was told a long time ago. :) # When using JIRA numbers, could you please write HDFS-8068 (instead of just 8068) because issues often cut across several different projects, and this way JIRA creates nice links for viewers to click on? Patches are usually committed to trunk *first* and then a (possibly) different version of the patch may be committed to earlier branches like branch-2. So technically you could have used neat Lambdas in the trunk patch. ;) Its a nit though. I'm trying to find the wikipage that tried to explain certain errors. I'm afraid I rarely found them useful (its probably because we didn't really expand on those wiki pages ever), so I'm fine with a more helpful error in the logs. > Datanode caches namenode DNS lookup failure and cannot startup > -- > > Key: HADOOP-15129 > URL: https://issues.apache.org/jira/browse/HADOOP-15129 > Project: Hadoop Common > Issue Type: Bug > Components: ipc >Affects Versions: 2.8.2 > Environment: Google Compute Engine. > I'm using Java 8, Debian 8, Hadoop 2.8.2. >Reporter: Karthik Palaniappan >Assignee: Karthik Palaniappan >Priority: Minor > Attachments: HADOOP-15129.001.patch, HADOOP-15129.002.patch > > > On startup, the Datanode creates an InetSocketAddress to register with each > namenode. Though there are retries on connection failure throughout the > stack, the same InetSocketAddress is reused. > InetSocketAddress is an interesting class, because it resolves DNS names to > IP addresses on construction, and it is never refreshed. Hadoop re-creates an > InetSocketAddress in some cases just in case the remote IP has changed for a > particular DNS name: https://issues.apache.org/jira/browse/HADOOP-7472. > Anyway, on startup, you cna see the Datanode log: "Namenode...remains > unresolved" -- referring to the fact that DNS lookup failed. > {code:java} > 2017-11-02 16:01:55,115 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Refresh request received for nameservices: null > 2017-11-02 16:01:55,153 WARN org.apache.hadoop.hdfs.DFSUtilClient: Namenode > for null remains unresolved for ID null. Check your hdfs-site.xml file to > ensure namenodes are configured properly. > 2017-11-02 16:01:55,156 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Starting BPOfferServices for nameservices: > 2017-11-02 16:01:55,169 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Block pool (Datanode Uuid unassigned) service to > cluster-32f5-m:8020 starting to offer service > {code} > The Datanode then proceeds to use this unresolved address, as it may work if > the DN is configured to use a proxy. Since I'm not using a proxy, it forever > prints out this message: > {code:java} > 2017-12-15 00:13:40,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:13:45,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:13:50,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:13:55,713 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:14:00,713 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > {code} > Unfortunately, the log doesn't contain the exception that triggered it, but > the culprit is actually in IPC Client: > https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java#L444. > This line was introduced in https://issues.apache.org/jira/browse/HADOOP-487 > to give a clear error message when somebody mispells an address. > However, the fix in HADOOP-7472 doesn't apply here, because that code happens > in Client#getConnection after the Connection is constructed. > My proposed fix (will attach a patch) is to move this exception out of the > constructor and into a place that will trigger HADOOP-7472's logic to > re-resolve addresses. If the DNS failure was temporary, this will allow the > connection to succeed. If not, the connection will fail after ipc client > retries (default 10 seconds worth of retries). > I want to fix this in ipc client rather than just i
[jira] [Commented] (HADOOP-15129) Datanode caches namenode DNS lookup failure and cannot startup
[ https://issues.apache.org/jira/browse/HADOOP-15129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16718800#comment-16718800 ] Hadoop QA commented on HADOOP-15129: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 7s{color} | {color:red} HADOOP-15129 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HADOOP-15129 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12905521/HADOOP-15129.002.patch | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/15649/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Datanode caches namenode DNS lookup failure and cannot startup > -- > > Key: HADOOP-15129 > URL: https://issues.apache.org/jira/browse/HADOOP-15129 > Project: Hadoop Common > Issue Type: Bug > Components: ipc >Affects Versions: 2.8.2 > Environment: Google Compute Engine. > I'm using Java 8, Debian 8, Hadoop 2.8.2. >Reporter: Karthik Palaniappan >Assignee: Karthik Palaniappan >Priority: Minor > Attachments: HADOOP-15129.001.patch, HADOOP-15129.002.patch > > > On startup, the Datanode creates an InetSocketAddress to register with each > namenode. Though there are retries on connection failure throughout the > stack, the same InetSocketAddress is reused. > InetSocketAddress is an interesting class, because it resolves DNS names to > IP addresses on construction, and it is never refreshed. Hadoop re-creates an > InetSocketAddress in some cases just in case the remote IP has changed for a > particular DNS name: https://issues.apache.org/jira/browse/HADOOP-7472. > Anyway, on startup, you cna see the Datanode log: "Namenode...remains > unresolved" -- referring to the fact that DNS lookup failed. > {code:java} > 2017-11-02 16:01:55,115 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Refresh request received for nameservices: null > 2017-11-02 16:01:55,153 WARN org.apache.hadoop.hdfs.DFSUtilClient: Namenode > for null remains unresolved for ID null. Check your hdfs-site.xml file to > ensure namenodes are configured properly. > 2017-11-02 16:01:55,156 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Starting BPOfferServices for nameservices: > 2017-11-02 16:01:55,169 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Block pool (Datanode Uuid unassigned) service to > cluster-32f5-m:8020 starting to offer service > {code} > The Datanode then proceeds to use this unresolved address, as it may work if > the DN is configured to use a proxy. Since I'm not using a proxy, it forever > prints out this message: > {code:java} > 2017-12-15 00:13:40,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:13:45,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:13:50,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:13:55,713 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:14:00,713 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > {code} > Unfortunately, the log doesn't contain the exception that triggered it, but > the culprit is actually in IPC Client: > https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java#L444. > This line was introduced in https://issues.apache.org/jira/browse/HADOOP-487 > to give a clear error message when somebody mispells an address. > However, the fix in HADOOP-7472 doesn't apply here, because that code happens > in Client#getConnection after the Connection is constructed. > My proposed fix (will attach a patch) is to move this exception out of the > constructor and into a place that will trigger HADOOP-7472's logic to > re-resolve addresses. If the DNS failure was temporary, this will allow the > connection to succeed. If not, the connection will fail after ipc client > retries (default 10 seconds worth of retries). > I want to fix this in ipc client rather than just in Datanode startup, as > this fixes temporary DNS issues for all of Hadoop. -- This message was sent by Atlassian JIRA (v7.6.3#
[jira] [Commented] (HADOOP-15129) Datanode caches namenode DNS lookup failure and cannot startup
[ https://issues.apache.org/jira/browse/HADOOP-15129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16443525#comment-16443525 ] genericqa commented on HADOOP-15129: | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 28s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 53s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 29m 7s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 53s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 19s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 56s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 28m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 28m 4s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 50s{color} | {color:orange} hadoop-common-project/hadoop-common: The patch generated 1 new + 174 unchanged - 0 fixed = 175 total (was 174) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 12s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 30s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 1m 0s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}126m 11s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8620d2b | | JIRA Issue | HADOOP-15129 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12905521/HADOOP-15129.002.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux e97bda07cb2e 3.13.0-137-generic #186-Ubuntu SMP Mon Dec 4 19:09:19 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / e4c39f3 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_162 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-HADOOP-Build/14503/artifact/out/diff-checkstyle-hadoop-common-project_hadoop-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/14503/testReport/ | | Max. process+thread count | 1767 (vs. ulimit of 1) | | modules | C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/14503/console | | Powered by
[jira] [Commented] (HADOOP-15129) Datanode caches namenode DNS lookup failure and cannot startup
[ https://issues.apache.org/jira/browse/HADOOP-15129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16443428#comment-16443428 ] Karthik Palaniappan commented on HADOOP-15129: -- I forgot about this for a while – is there anything blocking merging the patch? > Datanode caches namenode DNS lookup failure and cannot startup > -- > > Key: HADOOP-15129 > URL: https://issues.apache.org/jira/browse/HADOOP-15129 > Project: Hadoop Common > Issue Type: Bug > Components: ipc >Affects Versions: 2.8.2 > Environment: Google Compute Engine. > I'm using Java 8, Debian 8, Hadoop 2.8.2. >Reporter: Karthik Palaniappan >Assignee: Karthik Palaniappan >Priority: Minor > Attachments: HADOOP-15129.001.patch, HADOOP-15129.002.patch > > > On startup, the Datanode creates an InetSocketAddress to register with each > namenode. Though there are retries on connection failure throughout the > stack, the same InetSocketAddress is reused. > InetSocketAddress is an interesting class, because it resolves DNS names to > IP addresses on construction, and it is never refreshed. Hadoop re-creates an > InetSocketAddress in some cases just in case the remote IP has changed for a > particular DNS name: https://issues.apache.org/jira/browse/HADOOP-7472. > Anyway, on startup, you cna see the Datanode log: "Namenode...remains > unresolved" -- referring to the fact that DNS lookup failed. > {code:java} > 2017-11-02 16:01:55,115 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Refresh request received for nameservices: null > 2017-11-02 16:01:55,153 WARN org.apache.hadoop.hdfs.DFSUtilClient: Namenode > for null remains unresolved for ID null. Check your hdfs-site.xml file to > ensure namenodes are configured properly. > 2017-11-02 16:01:55,156 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Starting BPOfferServices for nameservices: > 2017-11-02 16:01:55,169 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Block pool (Datanode Uuid unassigned) service to > cluster-32f5-m:8020 starting to offer service > {code} > The Datanode then proceeds to use this unresolved address, as it may work if > the DN is configured to use a proxy. Since I'm not using a proxy, it forever > prints out this message: > {code:java} > 2017-12-15 00:13:40,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:13:45,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:13:50,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:13:55,713 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:14:00,713 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > {code} > Unfortunately, the log doesn't contain the exception that triggered it, but > the culprit is actually in IPC Client: > https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java#L444. > This line was introduced in https://issues.apache.org/jira/browse/HADOOP-487 > to give a clear error message when somebody mispells an address. > However, the fix in HADOOP-7472 doesn't apply here, because that code happens > in Client#getConnection after the Connection is constructed. > My proposed fix (will attach a patch) is to move this exception out of the > constructor and into a place that will trigger HADOOP-7472's logic to > re-resolve addresses. If the DNS failure was temporary, this will allow the > connection to succeed. If not, the connection will fail after ipc client > retries (default 10 seconds worth of retries). > I want to fix this in ipc client rather than just in Datanode startup, as > this fixes temporary DNS issues for all of Hadoop. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15129) Datanode caches namenode DNS lookup failure and cannot startup
[ https://issues.apache.org/jira/browse/HADOOP-15129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16366892#comment-16366892 ] Steve Loughran commented on HADOOP-15129: - Could always add a specific failure call and wiki link here > Datanode caches namenode DNS lookup failure and cannot startup > -- > > Key: HADOOP-15129 > URL: https://issues.apache.org/jira/browse/HADOOP-15129 > Project: Hadoop Common > Issue Type: Bug > Components: ipc >Affects Versions: 2.8.2 > Environment: Google Compute Engine. > I'm using Java 8, Debian 8, Hadoop 2.8.2. >Reporter: Karthik Palaniappan >Assignee: Karthik Palaniappan >Priority: Minor > Attachments: HADOOP-15129.001.patch, HADOOP-15129.002.patch > > > On startup, the Datanode creates an InetSocketAddress to register with each > namenode. Though there are retries on connection failure throughout the > stack, the same InetSocketAddress is reused. > InetSocketAddress is an interesting class, because it resolves DNS names to > IP addresses on construction, and it is never refreshed. Hadoop re-creates an > InetSocketAddress in some cases just in case the remote IP has changed for a > particular DNS name: https://issues.apache.org/jira/browse/HADOOP-7472. > Anyway, on startup, you cna see the Datanode log: "Namenode...remains > unresolved" -- referring to the fact that DNS lookup failed. > {code:java} > 2017-11-02 16:01:55,115 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Refresh request received for nameservices: null > 2017-11-02 16:01:55,153 WARN org.apache.hadoop.hdfs.DFSUtilClient: Namenode > for null remains unresolved for ID null. Check your hdfs-site.xml file to > ensure namenodes are configured properly. > 2017-11-02 16:01:55,156 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Starting BPOfferServices for nameservices: > 2017-11-02 16:01:55,169 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Block pool (Datanode Uuid unassigned) service to > cluster-32f5-m:8020 starting to offer service > {code} > The Datanode then proceeds to use this unresolved address, as it may work if > the DN is configured to use a proxy. Since I'm not using a proxy, it forever > prints out this message: > {code:java} > 2017-12-15 00:13:40,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:13:45,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:13:50,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:13:55,713 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:14:00,713 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > {code} > Unfortunately, the log doesn't contain the exception that triggered it, but > the culprit is actually in IPC Client: > https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java#L444. > This line was introduced in https://issues.apache.org/jira/browse/HADOOP-487 > to give a clear error message when somebody mispells an address. > However, the fix in HADOOP-7472 doesn't apply here, because that code happens > in Client#getConnection after the Connection is constructed. > My proposed fix (will attach a patch) is to move this exception out of the > constructor and into a place that will trigger HADOOP-7472's logic to > re-resolve addresses. If the DNS failure was temporary, this will allow the > connection to succeed. If not, the connection will fail after ipc client > retries (default 10 seconds worth of retries). > I want to fix this in ipc client rather than just in Datanode startup, as > this fixes temporary DNS issues for all of Hadoop. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15129) Datanode caches namenode DNS lookup failure and cannot startup
[ https://issues.apache.org/jira/browse/HADOOP-15129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16366550#comment-16366550 ] Karthik Palaniappan commented on HADOOP-15129: -- [~arpitagarwal] yes both tests are necessary. The HostResolutionTimeout test proves that if the host is always unresolvable, the code still does throw an exception. TheFlakyHostResolution test shows that if the host is only temporarily resolvable, ipc client can recover. Other tests of course prove the final case that if the host is immediately resolvable it's fine. > Datanode caches namenode DNS lookup failure and cannot startup > -- > > Key: HADOOP-15129 > URL: https://issues.apache.org/jira/browse/HADOOP-15129 > Project: Hadoop Common > Issue Type: Bug > Components: ipc >Affects Versions: 2.8.2 > Environment: Google Compute Engine. > I'm using Java 8, Debian 8, Hadoop 2.8.2. >Reporter: Karthik Palaniappan >Assignee: Karthik Palaniappan >Priority: Minor > Attachments: HADOOP-15129.001.patch, HADOOP-15129.002.patch > > > On startup, the Datanode creates an InetSocketAddress to register with each > namenode. Though there are retries on connection failure throughout the > stack, the same InetSocketAddress is reused. > InetSocketAddress is an interesting class, because it resolves DNS names to > IP addresses on construction, and it is never refreshed. Hadoop re-creates an > InetSocketAddress in some cases just in case the remote IP has changed for a > particular DNS name: https://issues.apache.org/jira/browse/HADOOP-7472. > Anyway, on startup, you cna see the Datanode log: "Namenode...remains > unresolved" -- referring to the fact that DNS lookup failed. > {code:java} > 2017-11-02 16:01:55,115 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Refresh request received for nameservices: null > 2017-11-02 16:01:55,153 WARN org.apache.hadoop.hdfs.DFSUtilClient: Namenode > for null remains unresolved for ID null. Check your hdfs-site.xml file to > ensure namenodes are configured properly. > 2017-11-02 16:01:55,156 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Starting BPOfferServices for nameservices: > 2017-11-02 16:01:55,169 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Block pool (Datanode Uuid unassigned) service to > cluster-32f5-m:8020 starting to offer service > {code} > The Datanode then proceeds to use this unresolved address, as it may work if > the DN is configured to use a proxy. Since I'm not using a proxy, it forever > prints out this message: > {code:java} > 2017-12-15 00:13:40,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:13:45,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:13:50,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:13:55,713 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:14:00,713 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > {code} > Unfortunately, the log doesn't contain the exception that triggered it, but > the culprit is actually in IPC Client: > https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java#L444. > This line was introduced in https://issues.apache.org/jira/browse/HADOOP-487 > to give a clear error message when somebody mispells an address. > However, the fix in HADOOP-7472 doesn't apply here, because that code happens > in Client#getConnection after the Connection is constructed. > My proposed fix (will attach a patch) is to move this exception out of the > constructor and into a place that will trigger HADOOP-7472's logic to > re-resolve addresses. If the DNS failure was temporary, this will allow the > connection to succeed. If not, the connection will fail after ipc client > retries (default 10 seconds worth of retries). > I want to fix this in ipc client rather than just in Datanode startup, as > this fixes temporary DNS issues for all of Hadoop. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15129) Datanode caches namenode DNS lookup failure and cannot startup
[ https://issues.apache.org/jira/browse/HADOOP-15129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16361506#comment-16361506 ] Ajay Kumar commented on HADOOP-15129: - {quote} The local IP address for the connection is indeterminate until the target hostname is resolved e.g. multi-homed setups. So I am okay with leaving this as null for now.{quote} I see your point. I was thinking that localhost "UNKOWN" may mislead someone in wrong diagnosis about the problem. Wondering if we can we improve the messaging. But this can be discussed separately and doesn't need to be part of this jira. > Datanode caches namenode DNS lookup failure and cannot startup > -- > > Key: HADOOP-15129 > URL: https://issues.apache.org/jira/browse/HADOOP-15129 > Project: Hadoop Common > Issue Type: Bug > Components: ipc >Affects Versions: 2.8.2 > Environment: Google Compute Engine. > I'm using Java 8, Debian 8, Hadoop 2.8.2. >Reporter: Karthik Palaniappan >Assignee: Karthik Palaniappan >Priority: Minor > Attachments: HADOOP-15129.001.patch, HADOOP-15129.002.patch > > > On startup, the Datanode creates an InetSocketAddress to register with each > namenode. Though there are retries on connection failure throughout the > stack, the same InetSocketAddress is reused. > InetSocketAddress is an interesting class, because it resolves DNS names to > IP addresses on construction, and it is never refreshed. Hadoop re-creates an > InetSocketAddress in some cases just in case the remote IP has changed for a > particular DNS name: https://issues.apache.org/jira/browse/HADOOP-7472. > Anyway, on startup, you cna see the Datanode log: "Namenode...remains > unresolved" -- referring to the fact that DNS lookup failed. > {code:java} > 2017-11-02 16:01:55,115 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Refresh request received for nameservices: null > 2017-11-02 16:01:55,153 WARN org.apache.hadoop.hdfs.DFSUtilClient: Namenode > for null remains unresolved for ID null. Check your hdfs-site.xml file to > ensure namenodes are configured properly. > 2017-11-02 16:01:55,156 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Starting BPOfferServices for nameservices: > 2017-11-02 16:01:55,169 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Block pool (Datanode Uuid unassigned) service to > cluster-32f5-m:8020 starting to offer service > {code} > The Datanode then proceeds to use this unresolved address, as it may work if > the DN is configured to use a proxy. Since I'm not using a proxy, it forever > prints out this message: > {code:java} > 2017-12-15 00:13:40,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:13:45,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:13:50,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:13:55,713 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:14:00,713 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > {code} > Unfortunately, the log doesn't contain the exception that triggered it, but > the culprit is actually in IPC Client: > https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java#L444. > This line was introduced in https://issues.apache.org/jira/browse/HADOOP-487 > to give a clear error message when somebody mispells an address. > However, the fix in HADOOP-7472 doesn't apply here, because that code happens > in Client#getConnection after the Connection is constructed. > My proposed fix (will attach a patch) is to move this exception out of the > constructor and into a place that will trigger HADOOP-7472's logic to > re-resolve addresses. If the DNS failure was temporary, this will allow the > connection to succeed. If not, the connection will fail after ipc client > retries (default 10 seconds worth of retries). > I want to fix this in ipc client rather than just in Datanode startup, as > this fixes temporary DNS issues for all of Hadoop. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15129) Datanode caches namenode DNS lookup failure and cannot startup
[ https://issues.apache.org/jira/browse/HADOOP-15129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16361358#comment-16361358 ] Arpit Agarwal commented on HADOOP-15129: The test case {{testIpcFlakyHostResolution}} also lgtm. The other new test {{testIpcHostResolutionTimeout}} looks related to this change. Do we need it [~Karthik Palaniappan]? bq. can we pass "localhost" for NetUtils.wrapException instead of null. Above message in logs is little misleading. Good point. The local IP address for the connection is indeterminate until the target hostname is resolved e.g. multi-homed setups. So I am okay with leaving this as null for now. [~ajayydv], let me know if that sounds reasonable. > Datanode caches namenode DNS lookup failure and cannot startup > -- > > Key: HADOOP-15129 > URL: https://issues.apache.org/jira/browse/HADOOP-15129 > Project: Hadoop Common > Issue Type: Bug > Components: ipc >Affects Versions: 2.8.2 > Environment: Google Compute Engine. > I'm using Java 8, Debian 8, Hadoop 2.8.2. >Reporter: Karthik Palaniappan >Assignee: Karthik Palaniappan >Priority: Minor > Attachments: HADOOP-15129.001.patch, HADOOP-15129.002.patch > > > On startup, the Datanode creates an InetSocketAddress to register with each > namenode. Though there are retries on connection failure throughout the > stack, the same InetSocketAddress is reused. > InetSocketAddress is an interesting class, because it resolves DNS names to > IP addresses on construction, and it is never refreshed. Hadoop re-creates an > InetSocketAddress in some cases just in case the remote IP has changed for a > particular DNS name: https://issues.apache.org/jira/browse/HADOOP-7472. > Anyway, on startup, you cna see the Datanode log: "Namenode...remains > unresolved" -- referring to the fact that DNS lookup failed. > {code:java} > 2017-11-02 16:01:55,115 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Refresh request received for nameservices: null > 2017-11-02 16:01:55,153 WARN org.apache.hadoop.hdfs.DFSUtilClient: Namenode > for null remains unresolved for ID null. Check your hdfs-site.xml file to > ensure namenodes are configured properly. > 2017-11-02 16:01:55,156 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Starting BPOfferServices for nameservices: > 2017-11-02 16:01:55,169 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Block pool (Datanode Uuid unassigned) service to > cluster-32f5-m:8020 starting to offer service > {code} > The Datanode then proceeds to use this unresolved address, as it may work if > the DN is configured to use a proxy. Since I'm not using a proxy, it forever > prints out this message: > {code:java} > 2017-12-15 00:13:40,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:13:45,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:13:50,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:13:55,713 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:14:00,713 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > {code} > Unfortunately, the log doesn't contain the exception that triggered it, but > the culprit is actually in IPC Client: > https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java#L444. > This line was introduced in https://issues.apache.org/jira/browse/HADOOP-487 > to give a clear error message when somebody mispells an address. > However, the fix in HADOOP-7472 doesn't apply here, because that code happens > in Client#getConnection after the Connection is constructed. > My proposed fix (will attach a patch) is to move this exception out of the > constructor and into a place that will trigger HADOOP-7472's logic to > re-resolve addresses. If the DNS failure was temporary, this will allow the > connection to succeed. If not, the connection will fail after ipc client > retries (default 10 seconds worth of retries). > I want to fix this in ipc client rather than just in Datanode startup, as > this fixes temporary DNS issues for all of Hadoop. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15129) Datanode caches namenode DNS lookup failure and cannot startup
[ https://issues.apache.org/jira/browse/HADOOP-15129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344037#comment-16344037 ] Ajay Kumar commented on HADOOP-15129: - {quote}local host is: (unknown); {quote} can we pass "localhost" for {{NetUtils.wrapException}} instead of null. Above message in logs is little misleading. > Datanode caches namenode DNS lookup failure and cannot startup > -- > > Key: HADOOP-15129 > URL: https://issues.apache.org/jira/browse/HADOOP-15129 > Project: Hadoop Common > Issue Type: Bug > Components: ipc >Affects Versions: 2.8.2 > Environment: Google Compute Engine. > I'm using Java 8, Debian 8, Hadoop 2.8.2. >Reporter: Karthik Palaniappan >Assignee: Karthik Palaniappan >Priority: Minor > Attachments: HADOOP-15129.001.patch, HADOOP-15129.002.patch > > > On startup, the Datanode creates an InetSocketAddress to register with each > namenode. Though there are retries on connection failure throughout the > stack, the same InetSocketAddress is reused. > InetSocketAddress is an interesting class, because it resolves DNS names to > IP addresses on construction, and it is never refreshed. Hadoop re-creates an > InetSocketAddress in some cases just in case the remote IP has changed for a > particular DNS name: https://issues.apache.org/jira/browse/HADOOP-7472. > Anyway, on startup, you cna see the Datanode log: "Namenode...remains > unresolved" -- referring to the fact that DNS lookup failed. > {code:java} > 2017-11-02 16:01:55,115 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Refresh request received for nameservices: null > 2017-11-02 16:01:55,153 WARN org.apache.hadoop.hdfs.DFSUtilClient: Namenode > for null remains unresolved for ID null. Check your hdfs-site.xml file to > ensure namenodes are configured properly. > 2017-11-02 16:01:55,156 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Starting BPOfferServices for nameservices: > 2017-11-02 16:01:55,169 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Block pool (Datanode Uuid unassigned) service to > cluster-32f5-m:8020 starting to offer service > {code} > The Datanode then proceeds to use this unresolved address, as it may work if > the DN is configured to use a proxy. Since I'm not using a proxy, it forever > prints out this message: > {code:java} > 2017-12-15 00:13:40,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:13:45,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:13:50,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:13:55,713 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:14:00,713 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > {code} > Unfortunately, the log doesn't contain the exception that triggered it, but > the culprit is actually in IPC Client: > https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java#L444. > This line was introduced in https://issues.apache.org/jira/browse/HADOOP-487 > to give a clear error message when somebody mispells an address. > However, the fix in HADOOP-7472 doesn't apply here, because that code happens > in Client#getConnection after the Connection is constructed. > My proposed fix (will attach a patch) is to move this exception out of the > constructor and into a place that will trigger HADOOP-7472's logic to > re-resolve addresses. If the DNS failure was temporary, this will allow the > connection to succeed. If not, the connection will fail after ipc client > retries (default 10 seconds worth of retries). > I want to fix this in ipc client rather than just in Datanode startup, as > this fixes temporary DNS issues for all of Hadoop. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15129) Datanode caches namenode DNS lookup failure and cannot startup
[ https://issues.apache.org/jira/browse/HADOOP-15129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16343948#comment-16343948 ] Arpit Agarwal commented on HADOOP-15129: I haven't looked at the test cases yet but the change looks fine to me. Will review the new tests. [~kihwal], do you have any thoughts since you added the original re-resolution logic? > Datanode caches namenode DNS lookup failure and cannot startup > -- > > Key: HADOOP-15129 > URL: https://issues.apache.org/jira/browse/HADOOP-15129 > Project: Hadoop Common > Issue Type: Bug > Components: ipc >Affects Versions: 2.8.2 > Environment: Google Compute Engine. > I'm using Java 8, Debian 8, Hadoop 2.8.2. >Reporter: Karthik Palaniappan >Assignee: Karthik Palaniappan >Priority: Minor > Attachments: HADOOP-15129.001.patch, HADOOP-15129.002.patch > > > On startup, the Datanode creates an InetSocketAddress to register with each > namenode. Though there are retries on connection failure throughout the > stack, the same InetSocketAddress is reused. > InetSocketAddress is an interesting class, because it resolves DNS names to > IP addresses on construction, and it is never refreshed. Hadoop re-creates an > InetSocketAddress in some cases just in case the remote IP has changed for a > particular DNS name: https://issues.apache.org/jira/browse/HADOOP-7472. > Anyway, on startup, you cna see the Datanode log: "Namenode...remains > unresolved" -- referring to the fact that DNS lookup failed. > {code:java} > 2017-11-02 16:01:55,115 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Refresh request received for nameservices: null > 2017-11-02 16:01:55,153 WARN org.apache.hadoop.hdfs.DFSUtilClient: Namenode > for null remains unresolved for ID null. Check your hdfs-site.xml file to > ensure namenodes are configured properly. > 2017-11-02 16:01:55,156 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Starting BPOfferServices for nameservices: > 2017-11-02 16:01:55,169 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Block pool (Datanode Uuid unassigned) service to > cluster-32f5-m:8020 starting to offer service > {code} > The Datanode then proceeds to use this unresolved address, as it may work if > the DN is configured to use a proxy. Since I'm not using a proxy, it forever > prints out this message: > {code:java} > 2017-12-15 00:13:40,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:13:45,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:13:50,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:13:55,713 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:14:00,713 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > {code} > Unfortunately, the log doesn't contain the exception that triggered it, but > the culprit is actually in IPC Client: > https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java#L444. > This line was introduced in https://issues.apache.org/jira/browse/HADOOP-487 > to give a clear error message when somebody mispells an address. > However, the fix in HADOOP-7472 doesn't apply here, because that code happens > in Client#getConnection after the Connection is constructed. > My proposed fix (will attach a patch) is to move this exception out of the > constructor and into a place that will trigger HADOOP-7472's logic to > re-resolve addresses. If the DNS failure was temporary, this will allow the > connection to succeed. If not, the connection will fail after ipc client > retries (default 10 seconds worth of retries). > I want to fix this in ipc client rather than just in Datanode startup, as > this fixes temporary DNS issues for all of Hadoop. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15129) Datanode caches namenode DNS lookup failure and cannot startup
[ https://issues.apache.org/jira/browse/HADOOP-15129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16321037#comment-16321037 ] genericqa commented on HADOOP-15129: | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 9s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 40s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 54s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 11m 43s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 39s{color} | {color:orange} hadoop-common-project/hadoop-common: The patch generated 1 new + 174 unchanged - 0 fixed = 175 total (was 174) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 55s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 43s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 32s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 80m 33s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | HADOOP-15129 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12905521/HADOOP-15129.002.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 47dbfee78bbd 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 12d0645 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-HADOOP-Build/13949/artifact/out/diff-checkstyle-hadoop-common-project_hadoop-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/13949/testReport/ | | Max. process+thread count | 1467 (vs. ulimit of 5000) | | modules | C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/13949/console | | Powered by
[jira] [Commented] (HADOOP-15129) Datanode caches namenode DNS lookup failure and cannot startup
[ https://issues.apache.org/jira/browse/HADOOP-15129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16320125#comment-16320125 ] Steve Loughran commented on HADOOP-15129: - afraid not with try-with-resource. you can use lambdas with anonymous classes, as it lines you up for moving to Java 8 with ease in future; the IDE can take a class like that and switch to a Lambda at a button press. > Datanode caches namenode DNS lookup failure and cannot startup > -- > > Key: HADOOP-15129 > URL: https://issues.apache.org/jira/browse/HADOOP-15129 > Project: Hadoop Common > Issue Type: Bug > Components: ipc >Affects Versions: 2.8.2 > Environment: Google Compute Engine. > I'm using Java 8, Debian 8, Hadoop 2.8.2. >Reporter: Karthik Palaniappan >Assignee: Karthik Palaniappan >Priority: Minor > Attachments: HADOOP-15129.001.patch > > > On startup, the Datanode creates an InetSocketAddress to register with each > namenode. Though there are retries on connection failure throughout the > stack, the same InetSocketAddress is reused. > InetSocketAddress is an interesting class, because it resolves DNS names to > IP addresses on construction, and it is never refreshed. Hadoop re-creates an > InetSocketAddress in some cases just in case the remote IP has changed for a > particular DNS name: https://issues.apache.org/jira/browse/HADOOP-7472. > Anyway, on startup, you cna see the Datanode log: "Namenode...remains > unresolved" -- referring to the fact that DNS lookup failed. > {code:java} > 2017-11-02 16:01:55,115 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Refresh request received for nameservices: null > 2017-11-02 16:01:55,153 WARN org.apache.hadoop.hdfs.DFSUtilClient: Namenode > for null remains unresolved for ID null. Check your hdfs-site.xml file to > ensure namenodes are configured properly. > 2017-11-02 16:01:55,156 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Starting BPOfferServices for nameservices: > 2017-11-02 16:01:55,169 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Block pool (Datanode Uuid unassigned) service to > cluster-32f5-m:8020 starting to offer service > {code} > The Datanode then proceeds to use this unresolved address, as it may work if > the DN is configured to use a proxy. Since I'm not using a proxy, it forever > prints out this message: > {code:java} > 2017-12-15 00:13:40,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:13:45,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:13:50,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:13:55,713 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:14:00,713 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > {code} > Unfortunately, the log doesn't contain the exception that triggered it, but > the culprit is actually in IPC Client: > https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java#L444. > This line was introduced in https://issues.apache.org/jira/browse/HADOOP-487 > to give a clear error message when somebody mispells an address. > However, the fix in HADOOP-7472 doesn't apply here, because that code happens > in Client#getConnection after the Connection is constructed. > My proposed fix (will attach a patch) is to move this exception out of the > constructor and into a place that will trigger HADOOP-7472's logic to > re-resolve addresses. If the DNS failure was temporary, this will allow the > connection to succeed. If not, the connection will fail after ipc client > retries (default 10 seconds worth of retries). > I want to fix this in ipc client rather than just in Datanode startup, as > this fixes temporary DNS issues for all of Hadoop. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15129) Datanode caches namenode DNS lookup failure and cannot startup
[ https://issues.apache.org/jira/browse/HADOOP-15129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16317088#comment-16317088 ] Karthik Palaniappan commented on HADOOP-15129: -- [~ste...@apache.org] I was working on incorporating your feedback, but it looks like Hadoop 2.8 still supports Java 7, so I can't use lambdas or try-with-resources. > Datanode caches namenode DNS lookup failure and cannot startup > -- > > Key: HADOOP-15129 > URL: https://issues.apache.org/jira/browse/HADOOP-15129 > Project: Hadoop Common > Issue Type: Bug > Components: ipc >Affects Versions: 2.8.2 > Environment: Google Compute Engine. > I'm using Java 8, Debian 8, Hadoop 2.8.2. >Reporter: Karthik Palaniappan >Assignee: Karthik Palaniappan >Priority: Minor > Attachments: HADOOP-15129.001.patch > > > On startup, the Datanode creates an InetSocketAddress to register with each > namenode. Though there are retries on connection failure throughout the > stack, the same InetSocketAddress is reused. > InetSocketAddress is an interesting class, because it resolves DNS names to > IP addresses on construction, and it is never refreshed. Hadoop re-creates an > InetSocketAddress in some cases just in case the remote IP has changed for a > particular DNS name: https://issues.apache.org/jira/browse/HADOOP-7472. > Anyway, on startup, you cna see the Datanode log: "Namenode...remains > unresolved" -- referring to the fact that DNS lookup failed. > {code:java} > 2017-11-02 16:01:55,115 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Refresh request received for nameservices: null > 2017-11-02 16:01:55,153 WARN org.apache.hadoop.hdfs.DFSUtilClient: Namenode > for null remains unresolved for ID null. Check your hdfs-site.xml file to > ensure namenodes are configured properly. > 2017-11-02 16:01:55,156 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Starting BPOfferServices for nameservices: > 2017-11-02 16:01:55,169 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Block pool (Datanode Uuid unassigned) service to > cluster-32f5-m:8020 starting to offer service > {code} > The Datanode then proceeds to use this unresolved address, as it may work if > the DN is configured to use a proxy. Since I'm not using a proxy, it forever > prints out this message: > {code:java} > 2017-12-15 00:13:40,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:13:45,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:13:50,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:13:55,713 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:14:00,713 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > {code} > Unfortunately, the log doesn't contain the exception that triggered it, but > the culprit is actually in IPC Client: > https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java#L444. > This line was introduced in https://issues.apache.org/jira/browse/HADOOP-487 > to give a clear error message when somebody mispells an address. > However, the fix in HADOOP-7472 doesn't apply here, because that code happens > in Client#getConnection after the Connection is constructed. > My proposed fix (will attach a patch) is to move this exception out of the > constructor and into a place that will trigger HADOOP-7472's logic to > re-resolve addresses. If the DNS failure was temporary, this will allow the > connection to succeed. If not, the connection will fail after ipc client > retries (default 10 seconds worth of retries). > I want to fix this in ipc client rather than just in Datanode startup, as > this fixes temporary DNS issues for all of Hadoop. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15129) Datanode caches namenode DNS lookup failure and cannot startup
[ https://issues.apache.org/jira/browse/HADOOP-15129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16297389#comment-16297389 ] Karthik Palaniappan commented on HADOOP-15129: -- [~shahrs87]: It's related to both -- I'll add links to them. 1) This is not related to proxies like 8068 and 12125 2) This is a different proposed fix than 8068 -- instead of adding retries inside of HDFS, it adds retries down at the IPC layer. This is more in line with what [~jlowe] suggested in 12125. > Datanode caches namenode DNS lookup failure and cannot startup > -- > > Key: HADOOP-15129 > URL: https://issues.apache.org/jira/browse/HADOOP-15129 > Project: Hadoop Common > Issue Type: Bug > Components: ipc >Affects Versions: 2.8.2 > Environment: Google Compute Engine. > I'm using Java 8, Debian 8, Hadoop 2.8.2. >Reporter: Karthik Palaniappan >Assignee: Karthik Palaniappan >Priority: Minor > Attachments: HADOOP-15129.001.patch > > > On startup, the Datanode creates an InetSocketAddress to register with each > namenode. Though there are retries on connection failure throughout the > stack, the same InetSocketAddress is reused. > InetSocketAddress is an interesting class, because it resolves DNS names to > IP addresses on construction, and it is never refreshed. Hadoop re-creates an > InetSocketAddress in some cases just in case the remote IP has changed for a > particular DNS name: https://issues.apache.org/jira/browse/HADOOP-7472. > Anyway, on startup, you cna see the Datanode log: "Namenode...remains > unresolved" -- referring to the fact that DNS lookup failed. > {code:java} > 2017-11-02 16:01:55,115 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Refresh request received for nameservices: null > 2017-11-02 16:01:55,153 WARN org.apache.hadoop.hdfs.DFSUtilClient: Namenode > for null remains unresolved for ID null. Check your hdfs-site.xml file to > ensure namenodes are configured properly. > 2017-11-02 16:01:55,156 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Starting BPOfferServices for nameservices: > 2017-11-02 16:01:55,169 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Block pool (Datanode Uuid unassigned) service to > cluster-32f5-m:8020 starting to offer service > {code} > The Datanode then proceeds to use this unresolved address, as it may work if > the DN is configured to use a proxy. Since I'm not using a proxy, it forever > prints out this message: > {code:java} > 2017-12-15 00:13:40,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:13:45,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:13:50,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:13:55,713 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:14:00,713 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > {code} > Unfortunately, the log doesn't contain the exception that triggered it, but > the culprit is actually in IPC Client: > https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java#L444. > This line was introduced in https://issues.apache.org/jira/browse/HADOOP-487 > to give a clear error message when somebody mispells an address. > However, the fix in HADOOP-7472 doesn't apply here, because that code happens > in Client#getConnection after the Connection is constructed. > My proposed fix (will attach a patch) is to move this exception out of the > constructor and into a place that will trigger HADOOP-7472's logic to > re-resolve addresses. If the DNS failure was temporary, this will allow the > connection to succeed. If not, the connection will fail after ipc client > retries (default 10 seconds worth of retries). > I want to fix this in ipc client rather than just in Datanode startup, as > this fixes temporary DNS issues for all of Hadoop. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15129) Datanode caches namenode DNS lookup failure and cannot startup
[ https://issues.apache.org/jira/browse/HADOOP-15129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16297380#comment-16297380 ] Karthik Palaniappan commented on HADOOP-15129: -- [~arpitagarwal]: Here are the notes from our investigation. startDataNode() calls refreshNamenodes(): https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java#L1425 refreshNamenodes has unresolved InetSocketAddress(es): https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockPoolManager.java#L152, and the unresolved address(es) get passed to a BPOfferService. Note that a BPOfferService is only created once for a NameServiceId. BPOS caches that ISA: https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java#L134 And creates a BPServiceActor with that ISA: https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java#L138 When the BPServiceActor tries to get NN info, it hits that exception (and logs “Problem connecting to server”): https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java#L235 Here is the stacktrace of the exception: {code:java} java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "non-existent":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.retrieveNamespaceInfo(BPServiceActor.java:221) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:263) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:752) at java.lang.Thread.run(Thread.java:748) Caused by: java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "non-existent":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:801) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:744) at org.apache.hadoop.ipc.Client$Connection.(Client.java:446) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1524) at org.apache.hadoop.ipc.Client.call(Client.java:1375) at org.apache.hadoop.ipc.Client.call(Client.java:1339) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) at com.sun.proxy.$Proxy15.versionRequest(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.versionRequest(DatanodeProtocolClientSideTranslatorPB.java:274) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.retrieveNamespaceInfo(BPServiceActor.java:216) ... 3 more {code} BPServiceActor uses a while(true) loop that runs every 5 seconds. I don’t think it ever stops (which is a little weird). > Datanode caches namenode DNS lookup failure and cannot startup > -- > > Key: HADOOP-15129 > URL: https://issues.apache.org/jira/browse/HADOOP-15129 > Project: Hadoop Common > Issue Type: Bug > Components: ipc >Affects Versions: 2.8.2 > Environment: Google Compute Engine. > I'm using Java 8, Debian 8, Hadoop 2.8.2. >Reporter: Karthik Palaniappan >Assignee: Karthik Palaniappan >Priority: Minor > Attachments: HADOOP-15129.001.patch > > > On startup, the Datanode creates an InetSocketAddress to register with each > namenode. Though there are retries on connection failure throughout the > stack, the same InetSocketAddress is reused. > InetSocketAddress is an interesting class, because it resolves DNS names to > IP addresses on construction, and it is never refreshed. Hadoop re-creates an > InetSocketAddress in some cases just in case the remote IP has changed for a > particular DNS name: https://issues.apache.org/jira/browse/HADOOP-7472. > Anyway, on st
[jira] [Commented] (HADOOP-15129) Datanode caches namenode DNS lookup failure and cannot startup
[ https://issues.apache.org/jira/browse/HADOOP-15129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16297246#comment-16297246 ] Arpit Agarwal commented on HADOOP-15129: Thanks for the heads up Steve. [~Karthik Palaniappan], do you have the callstack to go with this error message? {code} 2017-12-15 00:13:40,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: cluster-32f5-m:8020 {code} It's not logged by default but perhaps you captured it while debugging. Also I added you as a contributor. > Datanode caches namenode DNS lookup failure and cannot startup > -- > > Key: HADOOP-15129 > URL: https://issues.apache.org/jira/browse/HADOOP-15129 > Project: Hadoop Common > Issue Type: Bug > Components: ipc >Affects Versions: 2.8.2 > Environment: Google Compute Engine. > I'm using Java 8, Debian 8, Hadoop 2.8.2. >Reporter: Karthik Palaniappan >Assignee: Karthik Palaniappan >Priority: Minor > Attachments: HADOOP-15129.001.patch > > > On startup, the Datanode creates an InetSocketAddress to register with each > namenode. Though there are retries on connection failure throughout the > stack, the same InetSocketAddress is reused. > InetSocketAddress is an interesting class, because it resolves DNS names to > IP addresses on construction, and it is never refreshed. Hadoop re-creates an > InetSocketAddress in some cases just in case the remote IP has changed for a > particular DNS name: https://issues.apache.org/jira/browse/HADOOP-7472. > Anyway, on startup, you cna see the Datanode log: "Namenode...remains > unresolved" -- referring to the fact that DNS lookup failed. > {code:java} > 2017-11-02 16:01:55,115 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Refresh request received for nameservices: null > 2017-11-02 16:01:55,153 WARN org.apache.hadoop.hdfs.DFSUtilClient: Namenode > for null remains unresolved for ID null. Check your hdfs-site.xml file to > ensure namenodes are configured properly. > 2017-11-02 16:01:55,156 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Starting BPOfferServices for nameservices: > 2017-11-02 16:01:55,169 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Block pool (Datanode Uuid unassigned) service to > cluster-32f5-m:8020 starting to offer service > {code} > The Datanode then proceeds to use this unresolved address, as it may work if > the DN is configured to use a proxy. Since I'm not using a proxy, it forever > prints out this message: > {code:java} > 2017-12-15 00:13:40,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:13:45,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:13:50,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:13:55,713 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:14:00,713 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > {code} > Unfortunately, the log doesn't contain the exception that triggered it, but > the culprit is actually in IPC Client: > https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java#L444. > This line was introduced in https://issues.apache.org/jira/browse/HADOOP-487 > to give a clear error message when somebody mispells an address. > However, the fix in HADOOP-7472 doesn't apply here, because that code happens > in Client#getConnection after the Connection is constructed. > My proposed fix (will attach a patch) is to move this exception out of the > constructor and into a place that will trigger HADOOP-7472's logic to > re-resolve addresses. If the DNS failure was temporary, this will allow the > connection to succeed. If not, the connection will fail after ipc client > retries (default 10 seconds worth of retries). > I want to fix this in ipc client rather than just in Datanode startup, as > this fixes temporary DNS issues for all of Hadoop. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15129) Datanode caches namenode DNS lookup failure and cannot startup
[ https://issues.apache.org/jira/browse/HADOOP-15129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16296888#comment-16296888 ] Rushabh S Shah commented on HADOOP-15129: - Isn't this dupe of HDFS-8068 ? > Datanode caches namenode DNS lookup failure and cannot startup > -- > > Key: HADOOP-15129 > URL: https://issues.apache.org/jira/browse/HADOOP-15129 > Project: Hadoop Common > Issue Type: Bug > Components: ipc >Affects Versions: 2.8.2 > Environment: Google Compute Engine. > I'm using Java 8, Debian 8, Hadoop 2.8.2. >Reporter: Karthik Palaniappan >Priority: Minor > Attachments: HADOOP-15129.001.patch > > > On startup, the Datanode creates an InetSocketAddress to register with each > namenode. Though there are retries on connection failure throughout the > stack, the same InetSocketAddress is reused. > InetSocketAddress is an interesting class, because it resolves DNS names to > IP addresses on construction, and it is never refreshed. Hadoop re-creates an > InetSocketAddress in some cases just in case the remote IP has changed for a > particular DNS name: https://issues.apache.org/jira/browse/HADOOP-7472. > Anyway, on startup, you cna see the Datanode log: "Namenode...remains > unresolved" -- referring to the fact that DNS lookup failed. > {code:java} > 2017-11-02 16:01:55,115 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Refresh request received for nameservices: null > 2017-11-02 16:01:55,153 WARN org.apache.hadoop.hdfs.DFSUtilClient: Namenode > for null remains unresolved for ID null. Check your hdfs-site.xml file to > ensure namenodes are configured properly. > 2017-11-02 16:01:55,156 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Starting BPOfferServices for nameservices: > 2017-11-02 16:01:55,169 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Block pool (Datanode Uuid unassigned) service to > cluster-32f5-m:8020 starting to offer service > {code} > The Datanode then proceeds to use this unresolved address, as it may work if > the DN is configured to use a proxy. Since I'm not using a proxy, it forever > prints out this message: > {code:java} > 2017-12-15 00:13:40,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:13:45,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:13:50,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:13:55,713 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:14:00,713 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > {code} > Unfortunately, the log doesn't contain the exception that triggered it, but > the culprit is actually in IPC Client: > https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java#L444. > This line was introduced in https://issues.apache.org/jira/browse/HADOOP-487 > to give a clear error message when somebody mispells an address. > However, the fix in HADOOP-7472 doesn't apply here, because that code happens > in Client#getConnection after the Connection is constructed. > My proposed fix (will attach a patch) is to move this exception out of the > constructor and into a place that will trigger HADOOP-7472's logic to > re-resolve addresses. If the DNS failure was temporary, this will allow the > connection to succeed. If not, the connection will fail after ipc client > retries (default 10 seconds worth of retries). > I want to fix this in ipc client rather than just in Datanode startup, as > this fixes temporary DNS issues for all of Hadoop. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15129) Datanode caches namenode DNS lookup failure and cannot startup
[ https://issues.apache.org/jira/browse/HADOOP-15129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16296865#comment-16296865 ] Steve Loughran commented on HADOOP-15129: - the IPC is very sensitive code, as DN/NN comms, so this is going to need review by people who care about that, maybe [~arpitagarwal]. to me, production code looks fine, but we need reviewers from HDFS too. Test wise, * use LambdaTestUtils.intercept() & java-8 lambda expressions for those tests which throw exceptions; if your lambda returns any object, toString() will be caused on it & the text used in the exception...this can be useful * and as Client is autocloseable, you can use try-with-resources to manage its lifecycle Test failures unrelated & covered elsewhere > Datanode caches namenode DNS lookup failure and cannot startup > -- > > Key: HADOOP-15129 > URL: https://issues.apache.org/jira/browse/HADOOP-15129 > Project: Hadoop Common > Issue Type: Bug > Components: ipc >Affects Versions: 2.8.2 > Environment: Google Compute Engine. > I'm using Java 8, Debian 8, Hadoop 2.8.2. >Reporter: Karthik Palaniappan >Priority: Minor > Attachments: HADOOP-15129.001.patch > > > On startup, the Datanode creates an InetSocketAddress to register with each > namenode. Though there are retries on connection failure throughout the > stack, the same InetSocketAddress is reused. > InetSocketAddress is an interesting class, because it resolves DNS names to > IP addresses on construction, and it is never refreshed. Hadoop re-creates an > InetSocketAddress in some cases just in case the remote IP has changed for a > particular DNS name: https://issues.apache.org/jira/browse/HADOOP-7472. > Anyway, on startup, you cna see the Datanode log: "Namenode...remains > unresolved" -- referring to the fact that DNS lookup failed. > {code:java} > 2017-11-02 16:01:55,115 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Refresh request received for nameservices: null > 2017-11-02 16:01:55,153 WARN org.apache.hadoop.hdfs.DFSUtilClient: Namenode > for null remains unresolved for ID null. Check your hdfs-site.xml file to > ensure namenodes are configured properly. > 2017-11-02 16:01:55,156 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Starting BPOfferServices for nameservices: > 2017-11-02 16:01:55,169 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Block pool (Datanode Uuid unassigned) service to > cluster-32f5-m:8020 starting to offer service > {code} > The Datanode then proceeds to use this unresolved address, as it may work if > the DN is configured to use a proxy. Since I'm not using a proxy, it forever > prints out this message: > {code:java} > 2017-12-15 00:13:40,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:13:45,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:13:50,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:13:55,713 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > 2017-12-15 00:14:00,713 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Problem connecting to server: cluster-32f5-m:8020 > {code} > Unfortunately, the log doesn't contain the exception that triggered it, but > the culprit is actually in IPC Client: > https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java#L444. > This line was introduced in https://issues.apache.org/jira/browse/HADOOP-487 > to give a clear error message when somebody mispells an address. > However, the fix in HADOOP-7472 doesn't apply here, because that code happens > in Client#getConnection after the Connection is constructed. > My proposed fix (will attach a patch) is to move this exception out of the > constructor and into a place that will trigger HADOOP-7472's logic to > re-resolve addresses. If the DNS failure was temporary, this will allow the > connection to succeed. If not, the connection will fail after ipc client > retries (default 10 seconds worth of retries). > I want to fix this in ipc client rather than just in Datanode startup, as > this fixes temporary DNS issues for all of Hadoop. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15129) Datanode caches namenode DNS lookup failure and cannot startup
[ https://issues.apache.org/jira/browse/HADOOP-15129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16296134#comment-16296134 ] genericqa commented on HADOOP-15129: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 8s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 14m 6s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 6s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 48s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 53s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 13m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 48s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 7m 59s{color} | {color:red} hadoop-common in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 32s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 84m 4s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.fs.viewfs.TestViewFileSystemWithAuthorityLocalFileSystem | | | hadoop.fs.viewfs.TestViewFileSystemLocalFileSystem | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | HADOOP-15129 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12902749/HADOOP-15129.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 2779bcca9d3c 3.13.0-129-generic #178-Ubuntu SMP Fri Aug 11 12:48:20 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / c7499f2 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-HADOOP-Build/13855/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/13855/testReport/ | | Max. process+thread count | 1500 (vs. ulimit of 5000) | | modules | C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common | | Console output | https://builds.apache.or
[jira] [Commented] (HADOOP-15129) Datanode caches namenode DNS lookup failure and cannot startup
[ https://issues.apache.org/jira/browse/HADOOP-15129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16296044#comment-16296044 ] genericqa commented on HADOOP-15129: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 9s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 10s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 14m 18s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 34s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 0s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 15m 17s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 40s{color} | {color:orange} hadoop-common-project/hadoop-common: The patch generated 1 new + 174 unchanged - 0 fixed = 175 total (was 174) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 55s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 8m 28s{color} | {color:red} hadoop-common in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 33s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 87m 4s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.ipc.TestIPC | | | hadoop.fs.viewfs.TestViewFileSystemLocalFileSystem | | | hadoop.fs.viewfs.TestViewFileSystemWithAuthorityLocalFileSystem | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | HADOOP-15129 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12902738/HADOOP-15129.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 5551822d6052 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / c7a4dda | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-HADOOP-Build/13853/artifact/out/diff-checkstyle-hadoop-common-project_hadoop-common.txt | | unit | https://builds.apache.org/job/PreCommit-HADOOP-Build/13853/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt | | Test Results