[ https://issues.apache.org/jira/browse/HDFS-14579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16867098#comment-16867098 ]
Íñigo Goiri commented on HDFS-14579: ------------------------------------ Yes, refreshNodes is slightly different: {code} "IPC Server handler 124 on 8020" #506 daemon prio=5 os_prio=0 tid=0x000000006f23f000 nid=0xc1c runnable [0x0000001a8fcfd000] java.lang.Thread.State: RUNNABLE at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method) at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928) at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323) at java.net.InetAddress.getAllByName0(InetAddress.java:1276) at java.net.InetAddress.getAllByName(InetAddress.java:1192) at java.net.InetAddress.getAllByName(InetAddress.java:1126) at java.net.InetAddress.getByName(InetAddress.java:1076) at java.net.InetSocketAddress.<init>(InetSocketAddress.java:220) at org.apache.hadoop.hdfs.server.blockmanagement.HostFileManager.parseEntry(HostFileManager.java:94) at org.apache.hadoop.hdfs.server.blockmanagement.HostFileManager.readFile(HostFileManager.java:80) at org.apache.hadoop.hdfs.server.blockmanagement.HostFileManager.refresh(HostFileManager.java:157) at org.apache.hadoop.hdfs.server.blockmanagement.HostFileManager.refresh(HostFileManager.java:70) at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.refreshHostsReader(DatanodeManager.java:1183) at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.refreshNodes(DatanodeManager.java:1165) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.refreshNodes(FSNamesystem.java:4554) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.refreshNodes(NameNodeRpcServer.java:1215) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.refreshNodes(ClientNamenodeProtocolServerSideTranslatorPB.java:823) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:514) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1011) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:889) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:835) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2639) {code} I think for my case it may make more sense to make this in parallel. > In refreshNodes, avoid performing a DNS lookup while holding the write lock > --------------------------------------------------------------------------- > > Key: HDFS-14579 > URL: https://issues.apache.org/jira/browse/HDFS-14579 > Project: Hadoop HDFS > Issue Type: Improvement > Affects Versions: 3.3.0 > Reporter: Stephen O'Donnell > Assignee: Stephen O'Donnell > Priority: Major > Attachments: HDFS-14579.001.patch > > > When refreshNodes is called on a large cluster, or a cluster where DNS is not > performing well, it can cause the namenode to hang for a long time. This is > because the refreshNodes operation holds the global write lock while it is > running. Most of refreshNodes code is simple and hence fast, but > unfortunately it performs a DNS lookup for each host in the cluster while the > lock is held. > Right now, it calls: > {code} > public void refreshNodes(final Configuration conf) throws IOException { > refreshHostsReader(conf); > namesystem.writeLock(); > try { > refreshDatanodes(); > countSoftwareVersions(); > } finally { > namesystem.writeUnlock(); > } > } > {code} > The line refreshHostsReader(conf); reads the new config file and does a DNS > lookup on each entry - the write lock is not held here. Then the main work is > done here: > {code} > private void refreshDatanodes() { > final Map<String, DatanodeDescriptor> copy; > synchronized (this) { > copy = new HashMap<>(datanodeMap); > } > for (DatanodeDescriptor node : copy.values()) { > // Check if not include. > if (!hostConfigManager.isIncluded(node)) { > node.setDisallowed(true); > } else { > long maintenanceExpireTimeInMS = > hostConfigManager.getMaintenanceExpirationTimeInMS(node); > if (node.maintenanceNotExpired(maintenanceExpireTimeInMS)) { > datanodeAdminManager.startMaintenance( > node, maintenanceExpireTimeInMS); > } else if (hostConfigManager.isExcluded(node)) { > datanodeAdminManager.startDecommission(node); > } else { > datanodeAdminManager.stopMaintenance(node); > datanodeAdminManager.stopDecommission(node); > } > } > node.setUpgradeDomain(hostConfigManager.getUpgradeDomain(node)); > } > } > {code} > All the isIncluded(), isExcluded() methods call node.getResolvedAddress() > which does the DNS lookup. We could probably change things to perform all the > DNS lookups outside of the write lock, and then take the lock and process the > nodes. Also change or overload isIncluded() etc to take the inetAddress > rather than the datanode descriptor. > It would not shorten the time the operation takes to run overall, but it > would move the long duration out of the write lock and avoid blocking the > namenode for the entire time. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org