[ 
https://issues.apache.org/jira/browse/HDFS-14579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16867849#comment-16867849
 ] 

Stephen O'Donnell commented on HDFS-14579:
------------------------------------------

[~kihwal] So that is 54ms for the refreshNodes command to complete, including 
reading in the 4900 entries out of the lock and then iterating over them within 
the lock? That proves this can be fast even on large clusters.

What [~elgoiri] is seeing seems to be slow DNS only in the file loading part, 
but outside the write lock. Therefore the command is slow, but does not have 
much impact. In theory we could make the DNS lookups parallel, but it seems 
this would be better solved by addressing the DNS issues (eg /etc/hosts entries 
or nscd) rather than complicating the code.

[~hexiaoqiao] commented in another Jira that their NN got blocked for a long 
time with refreshNodes. Can you confirm if it was block reading the file and 
the namenode was able to process other requests at this time, or was it 
blocking all other requests due to holding the write lock?

> In refreshNodes, avoid performing a DNS lookup while holding the write lock
> ---------------------------------------------------------------------------
>
>                 Key: HDFS-14579
>                 URL: https://issues.apache.org/jira/browse/HDFS-14579
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 3.3.0
>            Reporter: Stephen O'Donnell
>            Assignee: Stephen O'Donnell
>            Priority: Major
>         Attachments: HDFS-14579.001.patch
>
>
> When refreshNodes is called on a large cluster, or a cluster where DNS is not 
> performing well, it can cause the namenode to hang for a long time. This is 
> because the refreshNodes operation holds the global write lock while it is 
> running. Most of refreshNodes code is simple and hence fast, but 
> unfortunately it performs a DNS lookup for each host in the cluster while the 
> lock is held. 
> Right now, it calls:
> {code}
>   public void refreshNodes(final Configuration conf) throws IOException {
>     refreshHostsReader(conf);
>     namesystem.writeLock();
>     try {
>       refreshDatanodes();
>       countSoftwareVersions();
>     } finally {
>       namesystem.writeUnlock();
>     }
>   }
> {code}
> The line refreshHostsReader(conf); reads the new config file and does a DNS 
> lookup on each entry - the write lock is not held here. Then the main work is 
> done here:
> {code}
>   private void refreshDatanodes() {
>     final Map<String, DatanodeDescriptor> copy;
>     synchronized (this) {
>       copy = new HashMap<>(datanodeMap);
>     }
>     for (DatanodeDescriptor node : copy.values()) {
>       // Check if not include.
>       if (!hostConfigManager.isIncluded(node)) {
>         node.setDisallowed(true);
>       } else {
>         long maintenanceExpireTimeInMS =
>             hostConfigManager.getMaintenanceExpirationTimeInMS(node);
>         if (node.maintenanceNotExpired(maintenanceExpireTimeInMS)) {
>           datanodeAdminManager.startMaintenance(
>               node, maintenanceExpireTimeInMS);
>         } else if (hostConfigManager.isExcluded(node)) {
>           datanodeAdminManager.startDecommission(node);
>         } else {
>           datanodeAdminManager.stopMaintenance(node);
>           datanodeAdminManager.stopDecommission(node);
>         }
>       }
>       node.setUpgradeDomain(hostConfigManager.getUpgradeDomain(node));
>     }
>   }
> {code}
> All the isIncluded(), isExcluded() methods call node.getResolvedAddress() 
> which does the DNS lookup. We could probably change things to perform all the 
> DNS lookups outside of the write lock, and then take the lock and process the 
> nodes. Also change or overload isIncluded() etc to take the inetAddress 
> rather than the datanode descriptor.
> It would not shorten the time the operation takes to run overall, but it 
> would move the long duration out of the write lock and avoid blocking the 
> namenode for the entire time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to