[
https://issues.apache.org/jira/browse/HADOOP-17052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17119913#comment-17119913
]
Dhiraj Hegde commented on HADOOP-17052:
---------------------------------------
Sorry for the confusion on the retry issue, I did not mean DNS needs to be
retried, what I meant was all the node retry logic in hdfs client is failing in
this situation. Let me clarify the problem here:
NetUtils.connect() throws this unchecked exception
java.nio.channels.UnresolvedAddressException. This causes the error to bubble
up all the way out of the hdfs client. This means that while normally hdfs
client code would catch IOException and respond to it by trying a different
node, but in this case that process is aborted. By making this unchecked
exception to be java.net.UnknownHostException we make sure that the hdfs code
will catch the exception and work as designed. When request to one node fails
it will catch the exception and try another.
> NetUtils.connect() throws an exception the prevents any retries when hostname
> resolution fails
> ----------------------------------------------------------------------------------------------
>
> Key: HADOOP-17052
> URL: https://issues.apache.org/jira/browse/HADOOP-17052
> Project: Hadoop Common
> Issue Type: Bug
> Components: hdfs-client
> Affects Versions: 2.10.0, 2.9.2, 3.2.1, 3.1.3
> Reporter: Dhiraj Hegde
> Assignee: Dhiraj Hegde
> Priority: Major
> Attachments: stack_trace2
>
>
> Hadoop components are increasingly being deployed on VMs and containers. One
> aspect of this environment is that DNS is dynamic. Hostname records get
> modified (or deleted/recreated) as a container in Kubernetes (or even VM) is
> being created/recreated. In such dynamic environments, the initial DNS
> resolution request might return resolution failure briefly as DNS client
> doesn't always get the latest records. This has been observed in Kubernetes
> in particular. In such cases NetUtils.connect() appears to throw
> java.nio.channels.UnresolvedAddressException. In much of Hadoop code (like
> DFSInputStream and DFSOutputStream), the code is designed to retry
> IOException. However, since UnresolvedAddressException is not child of
> IOException, no retry happens and the code aborts immediately. It is much
> better if NetUtils.connect() throws java.net.UnknownHostException as that is
> derived from IOException and the code will treat this as a retry-able error.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]