[ https://issues.apache.org/jira/browse/HDFS-15217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17245565#comment-17245565 ]
Ahmed Hussein commented on HDFS-15217: -------------------------------------- We have seen a slowdown in token ops compared to older releases. So, we were looking into ways of optimizations into 3.x. I came across this jira by chance. [~brfrn169], I have couple of questions about the justifications behind adding that code: * The ops information can be retrieved from the AuditLog. Isn't the AuditLog enough to see the ops? * Was there a concern regarding a possible deadLock? Then, why not using {{debug}} instead of adding that overhead to hot production code? I can see that all unlock operations are calling {{getLockReportInfoSupplier()}}. Even, {{getLockReportInfoSupplier(null)}} is not a no-op. I saw quick evaluations about the overhead, but I am concerned for the following reasons: * There is an overhead even though the string evaluation is lazy. The overhead is: allocating a new {{Supplier<String>}} object even though the supplier get method is not being called. This is a capturing lambda. Therefore, it has to be evaluated every time. * In a production system, Allocating an object while releasing a lock is dangerous because that last allocation could trigger a GC. This makes evaluating the patch tricky because there is a considerable large delta error depending on whether or not a GC has been triggered. The worst case scenario is to-trigger a GC while allocating an object that is going to be suppressed at the end. * On a production system, this is "Hot" . Especially; {{getLockReportInfoSupplier()}} is added to all the token ops such as {{getDelegationToken()}}. * The overhead of allocating the supplier could be useless because {{writeUnlock}} will suppress the info. CC: [~ayushtkn], [~inigoiri] > Add more information to longest write/read lock held log > -------------------------------------------------------- > > Key: HDFS-15217 > URL: https://issues.apache.org/jira/browse/HDFS-15217 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: Toshihiro Suzuki > Assignee: Toshihiro Suzuki > Priority: Major > Fix For: 3.4.0 > > > Currently, we can see the stack trace in the longest write/read lock held > log, but sometimes we need more information, for example, a target path of > deletion: > {code:java} > 2020-03-10 21:51:21,116 [main] INFO namenode.FSNamesystem > (FSNamesystemLock.java:writeUnlock(276)) - Number of suppressed > write-lock reports: 0 > Longest write-lock held at 2020-03-10 21:51:21,107+0900 for 6ms via > java.lang.Thread.getStackTrace(Thread.java:1559) > org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1058) > org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:257) > org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:233) > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1706) > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:3188) > ... > {code} > Adding more information (opName, path, etc.) to the log is useful to > troubleshoot. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org