[ 
https://issues.apache.org/jira/browse/HADOOP-17052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17119913#comment-17119913
 ] 

Dhiraj Hegde commented on HADOOP-17052:
---------------------------------------

Sorry for the confusion on the retry issue, I did not mean DNS needs to be 
retried, what I meant was all the node retry logic in hdfs client is failing in 
this situation. Let me clarify the problem here:

NetUtils.connect() throws this unchecked exception 
java.nio.channels.UnresolvedAddressException. This causes the error to bubble 
up all the way out of the hdfs client. This means that while normally hdfs 
client code would catch IOException and respond to it by trying a different 
node, but in this case that process is aborted. By making this unchecked 
exception to be  java.net.UnknownHostException we make sure that the hdfs code 
will catch the exception and work as designed.  When request to one node fails 
it will catch the exception and try another.

> NetUtils.connect() throws an exception the prevents any retries when hostname 
> resolution fails
> ----------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-17052
>                 URL: https://issues.apache.org/jira/browse/HADOOP-17052
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: hdfs-client
>    Affects Versions: 2.10.0, 2.9.2, 3.2.1, 3.1.3
>            Reporter: Dhiraj Hegde
>            Assignee: Dhiraj Hegde
>            Priority: Major
>         Attachments: stack_trace2
>
>
> Hadoop components are increasingly being deployed on VMs and containers. One 
> aspect of this environment is that DNS is dynamic. Hostname records get 
> modified (or deleted/recreated) as a container in Kubernetes (or even VM) is 
> being created/recreated. In such dynamic environments, the initial DNS 
> resolution request might return resolution failure briefly as DNS client 
> doesn't always get the latest records. This has been observed in Kubernetes 
> in particular. In such cases NetUtils.connect() appears to throw 
> java.nio.channels.UnresolvedAddressException.  In much of Hadoop code (like 
> DFSInputStream and DFSOutputStream), the code is designed to retry 
> IOException. However, since UnresolvedAddressException is not child of 
> IOException, no retry happens and the code aborts immediately. It is much 
> better if NetUtils.connect() throws java.net.UnknownHostException as that is 
> derived from IOException and the code will treat this as a retry-able error.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to