[
https://issues.apache.org/jira/browse/HADOOP-17052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17117602#comment-17117602
]
hemanthboyina commented on HADOOP-17052:
----------------------------------------
thanks for providing more details [~dhegde]
{quote}The code change could be made a level above in places like
newConnectedPeer()
{quote}
i don't think this will cover the scenario of Write call
IMO it will be better to handle in Netutils.connect() by catching the exception
and checking if that is an instanceof , and throw the required exception to
retry
[~aajisaka] [~liuml07] thoughts ?
> NetUtils.connect() throws an exception the prevents any retries when hostname
> resolution fails
> ----------------------------------------------------------------------------------------------
>
> Key: HADOOP-17052
> URL: https://issues.apache.org/jira/browse/HADOOP-17052
> Project: Hadoop Common
> Issue Type: Bug
> Components: hdfs-client
> Affects Versions: 2.10.0, 2.9.2, 3.2.1, 3.1.3
> Reporter: Dhiraj Hegde
> Assignee: Dhiraj Hegde
> Priority: Major
> Attachments: stack_trace2
>
>
> Hadoop components are increasingly being deployed on VMs and containers. One
> aspect of this environment is that DNS is dynamic. Hostname records get
> modified (or deleted/recreated) as a container in Kubernetes (or even VM) is
> being created/recreated. In such dynamic environments, the initial DNS
> resolution request might return resolution failure briefly as DNS client
> doesn't always get the latest records. This has been observed in Kubernetes
> in particular. In such cases NetUtils.connect() appears to throw
> java.nio.channels.UnresolvedAddressException. In much of Hadoop code (like
> DFSInputStream and DFSOutputStream), the code is designed to retry
> IOException. However, since UnresolvedAddressException is not child of
> IOException, no retry happens and the code aborts immediately. It is much
> better if NetUtils.connect() throws java.net.UnknownHostException as that is
> derived from IOException and the code will treat this as a retry-able error.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]