[ 
https://issues.apache.org/jira/browse/HDFS-3990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13476139#comment-13476139
 ] 

Daryn Sharp commented on HDFS-3990:
-----------------------------------

The caching is to prevent the unnecessary dns lookups that are a multiple of 
the number of datanodes - typically just to view a jsp or query json, or for 
other internal operations as well.  Every time a node is checked against the 
include/exclude lists, it generates dns queries of 2X the datanodes.  Counting 
the number of nodes causes a dns query for every datanode.

Reassigning an ip should require no restart of the NN.  The DN's are tracked by 
their ip and storage id.  If a DN registers with a previously known ip or 
storage id, the existing node is updated with the fields in the new node id 
which contain a refreshed lookup.
                
> NN's health report has severe performance problems
> --------------------------------------------------
>
>                 Key: HDFS-3990
>                 URL: https://issues.apache.org/jira/browse/HDFS-3990
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0
>            Reporter: Daryn Sharp
>            Assignee: Daryn Sharp
>            Priority: Critical
>         Attachments: HDFS-3990.patch
>
>
> The dfshealth page will place a read lock on the namespace while it does a 
> dns lookup for every DN.  On a multi-thousand node cluster, this often 
> results in 10s+ load time for the health page.  10 concurrent requests were 
> found to cause 7m+ load times during which time write operations blocked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to