[ https://issues.apache.org/jira/browse/HDFS-3990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13476139#comment-13476139 ]
Daryn Sharp commented on HDFS-3990: ----------------------------------- The caching is to prevent the unnecessary dns lookups that are a multiple of the number of datanodes - typically just to view a jsp or query json, or for other internal operations as well. Every time a node is checked against the include/exclude lists, it generates dns queries of 2X the datanodes. Counting the number of nodes causes a dns query for every datanode. Reassigning an ip should require no restart of the NN. The DN's are tracked by their ip and storage id. If a DN registers with a previously known ip or storage id, the existing node is updated with the fields in the new node id which contain a refreshed lookup. > NN's health report has severe performance problems > -------------------------------------------------- > > Key: HDFS-3990 > URL: https://issues.apache.org/jira/browse/HDFS-3990 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node > Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0 > Reporter: Daryn Sharp > Assignee: Daryn Sharp > Priority: Critical > Attachments: HDFS-3990.patch > > > The dfshealth page will place a read lock on the namespace while it does a > dns lookup for every DN. On a multi-thousand node cluster, this often > results in 10s+ load time for the health page. 10 concurrent requests were > found to cause 7m+ load times during which time write operations blocked. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira