[ 
https://issues.apache.org/jira/browse/HDFS-15217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17245950#comment-17245950
 ] 

Ahmed Hussein commented on HDFS-15217:
--------------------------------------

Thank you [~brfrn169] and [~inigoiri] for the explanation.
> Do you have specific numbers?

As we were moving from 2.8 to 2.10, [~daryn] reported that 2.10 token ops are 
~1.3X slower.
His initial intuition is as follows:
* adding token ops to the audit log. In doing so, multiple ugis are 
accidentally created to merely get the user name when it already exists in the 
identifier.
* token operations are acquiring the fsn write lock. Like lease renewal and 
token expiration, the read lock is sufficient because the namesystem is not 
mutated.

My concern with {{getLockReportInfoSupplier()}} is that -- similar to first 
point-- it accidentally increased the overhead of creating ugis and detailed 
information that are already in the identifier.
The same information is going to be generated again in the audio log.
For example assume that we have ~1 million lock operations within an hour, then 
{{getLockReportInfoSupplier()}} allocates 1+ million objects to the java heap.
That's a significant impact on long-time JVM considering that only a few of 
those million objects used when the lock-time exceeds the threshold.

My proposal is that we have the following three optimizations to try:
# Try out [~daryn]'s suggestion replacing {{writeLock}} with {{readLock}} in 
some Ops.
# eliminate/reduce ugi creation within the ipc handlers due to the performance 
impact.
# Since [~brfrn169] asserts that lock information is important, then we can 
create a follow-up jira trying to re-evaluate the overhead (memory and time). 
If necessary, then maybe some optimizations can be done to reuse 
{{getLockReportInfoSupplier()}} for both {{auditLog}}, and {{unlock()}}. That 
way, allocating the supplier objects would pay off.

WDYT guys?


> Add more information to longest write/read lock held log
> --------------------------------------------------------
>
>                 Key: HDFS-15217
>                 URL: https://issues.apache.org/jira/browse/HDFS-15217
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Toshihiro Suzuki
>            Assignee: Toshihiro Suzuki
>            Priority: Major
>             Fix For: 3.4.0
>
>
> Currently, we can see the stack trace in the longest write/read lock held 
> log, but sometimes we need more information, for example, a target path of 
> deletion:
> {code:java}
> 2020-03-10 21:51:21,116 [main] INFO  namenode.FSNamesystem 
> (FSNamesystemLock.java:writeUnlock(276)) -         Number of suppressed 
> write-lock reports: 0
>       Longest write-lock held at 2020-03-10 21:51:21,107+0900 for 6ms via 
> java.lang.Thread.getStackTrace(Thread.java:1559)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1058)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:257)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:233)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1706)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:3188)
> ...
> {code}
> Adding more information (opName, path, etc.) to the log is useful to 
> troubleshoot.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to