ZhiWei Shi created HDFS-16574: --------------------------------- Summary: Reduces the time it takes once to hold FSNamesystem write lock to remove blocks associated with dead data nodes Key: HDFS-16574 URL: https://issues.apache.org/jira/browse/HDFS-16574 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: ZhiWei Shi Assignee: ZhiWei Shi
In {{{}BlockManager.{}}}removeBlocksAssociatedTo(final DatanodeStorageInfo storageInfo) {code:java} /** Remove the blocks associated to the given datanode. */ void removeBlocksAssociatedTo(final DatanodeDescriptor node) { providedStorageMap.removeDatanode(node); final Iterator<BlockInfo> it = node.getBlockIterator(); while(it.hasNext()) { removeStoredBlock(it.next(), node); } // Remove all pending DN messages referencing this DN. pendingDNMessages.removeAllMessagesForDatanode(node); node.resetBlocks(); invalidateBlocks.remove(node); } {code} it holds FSNamesystem write lock to remove all blocks associated with a dead data node,which can take a lot of time when the datanode has a large number of blocks. eg :49.2s for 22.30M blocks,7.9s for 2.84M blocks. {code:java} # ${ip-1} has 22304612 blocks 2022-05-08 13:43:35,864 INFO org.apache.hadoop.net.NetworkTopology: Removing a node: /default-rack/${ip-1}:800 2022-05-08 13:43:35,864 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem write lock held for 49184 ms via java.lang.Thread.getStackTrace(Thread.java:1552) org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1021) org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:263) org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:220) org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1579) org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager.heartbeatCheck(HeartbeatManager.java:409) org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager$Monitor.run(HeartbeatManager.java:436) java.lang.Thread.run(Thread.java:745) Number of suppressed write-lock reports: 0 Longest write-lock held interval: 49184 # ${ip-2} has 22304612 blocks 2022-05-08 08:11:55,559 INFO org.apache.hadoop.net.NetworkTopology: Removing a node: /default-rack/${ip-2}:800 2022-05-08 08:11:55,560 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem write lock held for 7925 ms via java.lang.Thread.getStackTrace(Thread.java:1552) org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1021) org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:263) org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:220) org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1579) org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager.heartbeatCheck(HeartbeatManager.java:409) org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager$Monitor.run(HeartbeatManager.java:436) java.lang.Thread.run(Thread.java:745) Number of suppressed write-lock reports: 0 Longest write-lock held interval: 7925{code} This whill block all RPC requests that require FSNamesystem lock,and cannot process health check requests in time,which eventually leads to namenode failover {code:java} 2022-05-08 13:43:32,279 WARN org.apache.hadoop.ha.HealthMonitor: Transport-level exception trying to monitor health of NameNode at hd044.corp.yodao.com/10.108.162.60:8000: java.net.SocketTimeoutException: 45000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.108.162.60:16861 remote=hd044.corp.yodao.com/10.108.162.60:8000] Call From hd044.corp.yodao.com/10.108.162.60 to hd044.corp.yodao.com:8000 failed on socket timeout exception: java.net.SocketTimeoutException: 45000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.108.162.60:16861 remote=hd044.corp.yodao.com/10.108.162.60:8000]; For more details see: http://wiki.apache.org/hadoop/SocketTimeout 2022-05-08 13:43:32,283 INFO org.apache.hadoop.ha.HealthMonitor: Entering state SERVICE_NOT_RESPONDING 2022-05-08 13:43:32,283 INFO org.apache.hadoop.ha.ZKFailoverController: Local service NameNode at hd044.corp.yodao.com/10.108.162.60:8000 entered state: SERVICE_NOT_RESPONDING 2022-05-08 13:43:32,922 INFO org.apache.hadoop.hdfs.tools.DFSZKFailoverController: -- Local NN thread dump -- Process Thread Dump: ...... Thread 100 (IPC Server handler 0 on 8000): State: WAITING Blocked count: 14895292 Waited count: 210385351 Waiting on java.util.concurrent.locks.ReentrantReadWriteLock$FairSync@5723ab57 Stack: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967) java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283) java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727) org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.readLock(FSNamesystemLock.java:144) org.apache.hadoop.hdfs.server.namenode.FSNamesystem.readLock(FSNamesystem.java:1556) org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1842) org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:707) org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:381) org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:503) org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989) org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:871) org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:817) java.security.AccessController.doPrivileged(Native Method) javax.security.auth.Subject.doAs(Subject.java:422) org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893) org.apache.hadoop.ipc.Server$Handler.run(Server.java:2606) ...... Thread 41 (org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager$Monitor@7c351808): State: RUNNABLE Blocked count: 10 Waited count: 720146 Stack: org.apache.hadoop.hdfs.util.LightWeightHashSet.resize(LightWeightHashSet.java:479) org.apache.hadoop.hdfs.util.LightWeightHashSet.expandIfNecessary(LightWeightHashSet.java:497) org.apache.hadoop.hdfs.util.LightWeightHashSet.add(LightWeightHashSet.java:244) org.apache.hadoop.hdfs.server.blockmanagement.UnderReplicatedBlocks.update(UnderReplicatedBlocks.java:312) org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.updateNeededReplications(BlockManager.java:3718) org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeStoredBlock(BlockManager.java:3310) org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlocksAssociatedTo(BlockManager.java:1314) org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.removeDatanode(DatanodeManager.java:638) org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.removeDeadDatanode(DatanodeManager.java:683) org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager.heartbeatCheck(HeartbeatManager.java:407) org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager$Monitor.run(HeartbeatManager.java:436) java.lang.Thread.run(Thread.java:745) ..... -- Local NN thread dump -- 2022-05-08 13:43:32,923 INFO org.apache.hadoop.ha.ZKFailoverController: Quitting master election for NameNode at hd044.corp.yodao.com/10.108.162.60:8000 and marking that fencing is necessary 2022-05-08 13:43:32,923 INFO org.apache.hadoop.ha.ActiveStandbyElector: Yielding from election 2022-05-08 13:43:32,925 INFO org.apache.zookeeper.ZooKeeper: Session: 0x97e5d7aac530e03 closed 2022-05-08 13:43:32,925 WARN org.apache.hadoop.ha.ActiveStandbyElector: Ignoring stale result from old client with sessionId 0x97e5d7aac530e03 2022-05-08 13:43:32,926 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down 2022-05-08 13:43:36,186 INFO org.apache.hadoop.ha.HealthMonitor: Entering state SERVICE_HEALTHY{code} After it has processed some blocks or consumed some time, it should release the FSNAME system lock for a period of time so that other Rpc requests can be processed in time -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org