[ https://issues.apache.org/jira/browse/HDFS-7392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288424#comment-14288424 ]
Tsz Wo Nicholas Sze commented on HDFS-7392: ------------------------------------------- I think we should fix this bug since the failure count does not work when a host name is resolved to more than one address. Will review the patch. > org.apache.hadoop.hdfs.DistributedFileSystem open invalid URI forever > --------------------------------------------------------------------- > > Key: HDFS-7392 > URL: https://issues.apache.org/jira/browse/HDFS-7392 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client > Reporter: Frantisek Vacek > Assignee: Yi Liu > Attachments: 1.png, 2.png, HDFS-7392.diff > > > In some specific circumstances, > org.apache.hadoop.hdfs.DistributedFileSystem.open(invalid URI) never timeouts > and last forever. > What are specific circumstances: > 1) HDFS URI (hdfs://share.example.com:8020/someDir/someFile.txt) should point > to valid IP address but without name node service running on it. > 2) There should be at least 2 IP addresses for such a URI. See output below: > {quote} > [~/proj/quickbox]$ nslookup share.example.com > Server: 127.0.1.1 > Address: 127.0.1.1#53 > share.example.com canonical name = > internal-realm-share-example-com-1234.us-east-1.elb.amazonaws.com. > Name: internal-realm-share-example-com-1234.us-east-1.elb.amazonaws.com > Address: 192.168.1.223 > Name: internal-realm-share-example-com-1234.us-east-1.elb.amazonaws.com > Address: 192.168.1.65 > {quote} > In such a case the org.apache.hadoop.ipc.Client.Connection.updateAddress() > returns sometimes true (even if address didn't actually changed see img. 1) > and the timeoutFailures counter is set to 0 (see img. 2). The > maxRetriesOnSocketTimeouts (45) is never reached and connection attempt is > repeated forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)