[ 
https://issues.apache.org/jira/browse/HDFS-15217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17245565#comment-17245565
 ] 

Ahmed Hussein commented on HDFS-15217:
--------------------------------------

We have seen a slowdown in token ops compared to older releases. So, we were 
looking into ways of optimizations into 3.x.
I came across this jira by chance.

[~brfrn169], I have couple of questions about the justifications behind adding 
that code:
* The ops information can be retrieved from the AuditLog. Isn't the AuditLog 
enough to see the ops?
* Was there a concern regarding a possible deadLock? Then, why not using 
{{debug}} instead of adding that overhead to hot production code?

I can see that all unlock operations are calling 
{{getLockReportInfoSupplier()}}. Even, {{getLockReportInfoSupplier(null)}} is 
not a no-op.
I saw quick evaluations about the overhead, but I am concerned for the 
following reasons:
* There is an overhead even though the string evaluation is lazy. The overhead 
is: allocating a new {{Supplier<String>}} object even though the supplier get 
method is not being called. This is a capturing lambda. Therefore, it has to be 
evaluated every time.
* In a production system, Allocating an object while releasing a lock is 
dangerous because that last allocation could trigger a GC. This makes 
evaluating the patch tricky because there is a considerable large delta error 
depending on whether or not a GC has been triggered. The worst case scenario is 
to-trigger a GC while allocating an object that is going to be suppressed at 
the end.
* On a production system, this is "Hot" . Especially; 
{{getLockReportInfoSupplier()}} is added to all the token ops such as 
{{getDelegationToken()}}.
* The overhead of allocating the supplier could be useless because 
{{writeUnlock}} will suppress the info.


CC: [~ayushtkn], [~inigoiri]


> Add more information to longest write/read lock held log
> --------------------------------------------------------
>
>                 Key: HDFS-15217
>                 URL: https://issues.apache.org/jira/browse/HDFS-15217
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Toshihiro Suzuki
>            Assignee: Toshihiro Suzuki
>            Priority: Major
>             Fix For: 3.4.0
>
>
> Currently, we can see the stack trace in the longest write/read lock held 
> log, but sometimes we need more information, for example, a target path of 
> deletion:
> {code:java}
> 2020-03-10 21:51:21,116 [main] INFO  namenode.FSNamesystem 
> (FSNamesystemLock.java:writeUnlock(276)) -         Number of suppressed 
> write-lock reports: 0
>       Longest write-lock held at 2020-03-10 21:51:21,107+0900 for 6ms via 
> java.lang.Thread.getStackTrace(Thread.java:1559)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1058)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:257)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:233)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1706)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:3188)
> ...
> {code}
> Adding more information (opName, path, etc.) to the log is useful to 
> troubleshoot.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to