[ https://issues.apache.org/jira/browse/HDFS-13946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16653096#comment-16653096 ]
Yiqun Lin commented on HDFS-13946: ---------------------------------- Hi [~xkrogen], thanks for the review. The latest patch has addressed your comments: Mainly did following changes in v02 patch: * For read lock, using {{AtomicReference}} to store both stack trace and interval time. * For write lock, adding a new method {{getMaxValue(String recorderName, int idx)}} for getting the max value from summary info whenever we should log or not. Please have a review. > Log longest FSN write/read lock held stack trace > ------------------------------------------------ > > Key: HDFS-13946 > URL: https://issues.apache.org/jira/browse/HDFS-13946 > Project: Hadoop HDFS > Issue Type: Improvement > Affects Versions: 3.1.1 > Reporter: Yiqun Lin > Assignee: Yiqun Lin > Priority: Minor > Attachments: HDFS-13946.001.patch > > > FSN write/read lock log statement only prints longest lock held interval not > its stack trace during suppress warning interval. Only current thread is > printed, but it looks not so useful. Once NN is slowing down, the most > important thing we take care is that which operation holds longest time of > the lock. > Following is log printed based on current logic. > {noformat} > 2018-09-30 13:56:06,700 INFO [IPC Server handler 119 on 8020] > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem write lock > held for 11 ms via > java.lang.Thread.getStackTrace(Thread.java:1589) > org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:945) > org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:198) > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1688) > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.removeBlocks(FSNamesystem.java:4281) > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:4247) > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:4183) > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:4167) > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:848)org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.delete(AuthorizationProviderProxyClientProtocol.java:311) > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:625) > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) > org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073) > org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2226) > org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2222) > java.security.AccessController.doPrivileged(Native Method) > javax.security.auth.Subject.doAs(Subject.java:415) > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917) > org.apache.hadoop.ipc.Server$Handler.run(Server.java:2220) > Number of suppressed write-lock reports: 14 > Longest write-lock held interval: 70 > {noformat} > Also it will be good for the trouble shooting. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org