[ 
https://issues.apache.org/jira/browse/HADOOP-19218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17862579#comment-17862579
 ] 

Viraj Jasani commented on HADOOP-19218:
---------------------------------------

Thread dump ref:
{code:java}
"IPC Server listener on 8020" #92 daemon prio=5 os_prio=0 
tid=0x00007f23a9592800 nid=0x81744 runnable [0x00007f23ad38a000]
   java.lang.Thread.State: RUNNABLE
        at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
        at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:867)
        at 
java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1302)
        at java.net.InetAddress$NameServiceAddresses.get(InetAddress.java:815)
        - locked <0x00007f2bc29c6a10> (a 
java.net.InetAddress$NameServiceAddresses)
        at java.net.InetAddress.getAllByName0(InetAddress.java:1291)
        at java.net.InetAddress.getAllByName0(InetAddress.java:1211)
        at java.net.InetAddress.getHostFromNameService(InetAddress.java:637)
        at java.net.InetAddress.getHostName(InetAddress.java:562)
        at java.net.InetAddress.getHostName(InetAddress.java:534)
        at org.apache.hadoop.ipc.Server$Connection.<init>(Server.java:1916)
        at 
org.apache.hadoop.ipc.Server$ConnectionManager.register(Server.java:3841)
        at org.apache.hadoop.ipc.Server$Listener.doAccept(Server.java:1448)
        at org.apache.hadoop.ipc.Server$Listener.run(Server.java:1389) {code}

> Avoid DNS lookup while creating IPC Connection object
> -----------------------------------------------------
>
>                 Key: HADOOP-19218
>                 URL: https://issues.apache.org/jira/browse/HADOOP-19218
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Viraj Jasani
>            Assignee: Viraj Jasani
>            Priority: Major
>
> Been running HADOOP-18628 in production for quite sometime, everything works 
> fine as long as DNS servers in HA are available. Upgrading single NS server 
> at a time is also a common case, not problematic. Every DNS lookup takes 1ms 
> in general.
> However, recently we encountered a case where 2 out of 4 NS servers went down 
> (temporarily but it's a rare case). With small duration DNS cache and 2s of 
> NS fallback timeout configured in resolv.conf, now any client performing DNS 
> lookup can encounter 4s+ delay. This caused namenode outage as listener 
> thread is single threaded and it was not able to keep up with large num of 
> unique clients (in direct proportion with num of DNS resolutions every few 
> seconds) initiating connection on listener port.
> While having 2 out of 4 DNS servers offline is rare case and NS fallback 
> settings could also be improved, it is important to note that we don't need 
> to perform DNS resolution for every new connection if the intention is to 
> improve the insights into VersionMistmatch errors thrown by the server.
> The proposal is the delay the DNS resolution until the server throws the 
> error for incompatible header or version mismatch. This would also help with 
> ~1ms extra time spent even for healthy DNS lookup.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to