[jira] [Commented] (HDFS-15217) Add more information to longest write/read lock held log
[ https://issues.apache.org/jira/browse/HDFS-15217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17246225#comment-17246225 ] Toshihiro Suzuki commented on HDFS-15217: - Sound good. Thank you [~ahussein]. > Add more information to longest write/read lock held log > > > Key: HDFS-15217 > URL: https://issues.apache.org/jira/browse/HDFS-15217 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Toshihiro Suzuki >Assignee: Toshihiro Suzuki >Priority: Major > Fix For: 3.4.0 > > > Currently, we can see the stack trace in the longest write/read lock held > log, but sometimes we need more information, for example, a target path of > deletion: > {code:java} > 2020-03-10 21:51:21,116 [main] INFO namenode.FSNamesystem > (FSNamesystemLock.java:writeUnlock(276)) - Number of suppressed > write-lock reports: 0 > Longest write-lock held at 2020-03-10 21:51:21,107+0900 for 6ms via > java.lang.Thread.getStackTrace(Thread.java:1559) > org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1058) > org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:257) > org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:233) > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1706) > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:3188) > ... > {code} > Adding more information (opName, path, etc.) to the log is useful to > troubleshoot. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15217) Add more information to longest write/read lock held log
[ https://issues.apache.org/jira/browse/HDFS-15217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17245950#comment-17245950 ] Ahmed Hussein commented on HDFS-15217: -- Thank you [~brfrn169] and [~inigoiri] for the explanation. > Do you have specific numbers? As we were moving from 2.8 to 2.10, [~daryn] reported that 2.10 token ops are ~1.3X slower. His initial intuition is as follows: * adding token ops to the audit log. In doing so, multiple ugis are accidentally created to merely get the user name when it already exists in the identifier. * token operations are acquiring the fsn write lock. Like lease renewal and token expiration, the read lock is sufficient because the namesystem is not mutated. My concern with {{getLockReportInfoSupplier()}} is that -- similar to first point-- it accidentally increased the overhead of creating ugis and detailed information that are already in the identifier. The same information is going to be generated again in the audio log. For example assume that we have ~1 million lock operations within an hour, then {{getLockReportInfoSupplier()}} allocates 1+ million objects to the java heap. That's a significant impact on long-time JVM considering that only a few of those million objects used when the lock-time exceeds the threshold. My proposal is that we have the following three optimizations to try: # Try out [~daryn]'s suggestion replacing {{writeLock}} with {{readLock}} in some Ops. # eliminate/reduce ugi creation within the ipc handlers due to the performance impact. # Since [~brfrn169] asserts that lock information is important, then we can create a follow-up jira trying to re-evaluate the overhead (memory and time). If necessary, then maybe some optimizations can be done to reuse {{getLockReportInfoSupplier()}} for both {{auditLog}}, and {{unlock()}}. That way, allocating the supplier objects would pay off. WDYT guys? > Add more information to longest write/read lock held log > > > Key: HDFS-15217 > URL: https://issues.apache.org/jira/browse/HDFS-15217 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Toshihiro Suzuki >Assignee: Toshihiro Suzuki >Priority: Major > Fix For: 3.4.0 > > > Currently, we can see the stack trace in the longest write/read lock held > log, but sometimes we need more information, for example, a target path of > deletion: > {code:java} > 2020-03-10 21:51:21,116 [main] INFO namenode.FSNamesystem > (FSNamesystemLock.java:writeUnlock(276)) - Number of suppressed > write-lock reports: 0 > Longest write-lock held at 2020-03-10 21:51:21,107+0900 for 6ms via > java.lang.Thread.getStackTrace(Thread.java:1559) > org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1058) > org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:257) > org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:233) > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1706) > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:3188) > ... > {code} > Adding more information (opName, path, etc.) to the log is useful to > troubleshoot. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15217) Add more information to longest write/read lock held log
[ https://issues.apache.org/jira/browse/HDFS-15217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17245749#comment-17245749 ] Toshihiro Suzuki commented on HDFS-15217: - {quote} The ops information can be retrieved from the AuditLog. Isn't the AuditLog enough to see the ops? Was there a concern regarding a possible deadLock? Then, why not using debug instead of adding that overhead to hot production code? {quote} I don't think the AuditLog is enough to see the ops when there are huge number of opts. As mentioned in the Description, I faced the long time write-lock held issue, but I was not able to identify which operation caused the long time write-lock held from the NN log and the AuditLog. That's why I thought that we needed this change and added more information to the longest write/read lock held log. > Add more information to longest write/read lock held log > > > Key: HDFS-15217 > URL: https://issues.apache.org/jira/browse/HDFS-15217 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Toshihiro Suzuki >Assignee: Toshihiro Suzuki >Priority: Major > Fix For: 3.4.0 > > > Currently, we can see the stack trace in the longest write/read lock held > log, but sometimes we need more information, for example, a target path of > deletion: > {code:java} > 2020-03-10 21:51:21,116 [main] INFO namenode.FSNamesystem > (FSNamesystemLock.java:writeUnlock(276)) - Number of suppressed > write-lock reports: 0 > Longest write-lock held at 2020-03-10 21:51:21,107+0900 for 6ms via > java.lang.Thread.getStackTrace(Thread.java:1559) > org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1058) > org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:257) > org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:233) > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1706) > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:3188) > ... > {code} > Adding more information (opName, path, etc.) to the log is useful to > troubleshoot. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15217) Add more information to longest write/read lock held log
[ https://issues.apache.org/jira/browse/HDFS-15217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17245740#comment-17245740 ] Toshihiro Suzuki commented on HDFS-15217: - Thank you [~ahussein]. Are we sure the slowdown is caused by the changes in this Jira? > Add more information to longest write/read lock held log > > > Key: HDFS-15217 > URL: https://issues.apache.org/jira/browse/HDFS-15217 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Toshihiro Suzuki >Assignee: Toshihiro Suzuki >Priority: Major > Fix For: 3.4.0 > > > Currently, we can see the stack trace in the longest write/read lock held > log, but sometimes we need more information, for example, a target path of > deletion: > {code:java} > 2020-03-10 21:51:21,116 [main] INFO namenode.FSNamesystem > (FSNamesystemLock.java:writeUnlock(276)) - Number of suppressed > write-lock reports: 0 > Longest write-lock held at 2020-03-10 21:51:21,107+0900 for 6ms via > java.lang.Thread.getStackTrace(Thread.java:1559) > org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1058) > org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:257) > org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:233) > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1706) > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:3188) > ... > {code} > Adding more information (opName, path, etc.) to the log is useful to > troubleshoot. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15217) Add more information to longest write/read lock held log
[ https://issues.apache.org/jira/browse/HDFS-15217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17245698#comment-17245698 ] Íñigo Goiri commented on HDFS-15217: [~ahussein] that's concerning. Do you have specific numbers? > Add more information to longest write/read lock held log > > > Key: HDFS-15217 > URL: https://issues.apache.org/jira/browse/HDFS-15217 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Toshihiro Suzuki >Assignee: Toshihiro Suzuki >Priority: Major > Fix For: 3.4.0 > > > Currently, we can see the stack trace in the longest write/read lock held > log, but sometimes we need more information, for example, a target path of > deletion: > {code:java} > 2020-03-10 21:51:21,116 [main] INFO namenode.FSNamesystem > (FSNamesystemLock.java:writeUnlock(276)) - Number of suppressed > write-lock reports: 0 > Longest write-lock held at 2020-03-10 21:51:21,107+0900 for 6ms via > java.lang.Thread.getStackTrace(Thread.java:1559) > org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1058) > org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:257) > org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:233) > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1706) > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:3188) > ... > {code} > Adding more information (opName, path, etc.) to the log is useful to > troubleshoot. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15217) Add more information to longest write/read lock held log
[ https://issues.apache.org/jira/browse/HDFS-15217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17245565#comment-17245565 ] Ahmed Hussein commented on HDFS-15217: -- We have seen a slowdown in token ops compared to older releases. So, we were looking into ways of optimizations into 3.x. I came across this jira by chance. [~brfrn169], I have couple of questions about the justifications behind adding that code: * The ops information can be retrieved from the AuditLog. Isn't the AuditLog enough to see the ops? * Was there a concern regarding a possible deadLock? Then, why not using {{debug}} instead of adding that overhead to hot production code? I can see that all unlock operations are calling {{getLockReportInfoSupplier()}}. Even, {{getLockReportInfoSupplier(null)}} is not a no-op. I saw quick evaluations about the overhead, but I am concerned for the following reasons: * There is an overhead even though the string evaluation is lazy. The overhead is: allocating a new {{Supplier}} object even though the supplier get method is not being called. This is a capturing lambda. Therefore, it has to be evaluated every time. * In a production system, Allocating an object while releasing a lock is dangerous because that last allocation could trigger a GC. This makes evaluating the patch tricky because there is a considerable large delta error depending on whether or not a GC has been triggered. The worst case scenario is to-trigger a GC while allocating an object that is going to be suppressed at the end. * On a production system, this is "Hot" . Especially; {{getLockReportInfoSupplier()}} is added to all the token ops such as {{getDelegationToken()}}. * The overhead of allocating the supplier could be useless because {{writeUnlock}} will suppress the info. CC: [~ayushtkn], [~inigoiri] > Add more information to longest write/read lock held log > > > Key: HDFS-15217 > URL: https://issues.apache.org/jira/browse/HDFS-15217 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Toshihiro Suzuki >Assignee: Toshihiro Suzuki >Priority: Major > Fix For: 3.4.0 > > > Currently, we can see the stack trace in the longest write/read lock held > log, but sometimes we need more information, for example, a target path of > deletion: > {code:java} > 2020-03-10 21:51:21,116 [main] INFO namenode.FSNamesystem > (FSNamesystemLock.java:writeUnlock(276)) - Number of suppressed > write-lock reports: 0 > Longest write-lock held at 2020-03-10 21:51:21,107+0900 for 6ms via > java.lang.Thread.getStackTrace(Thread.java:1559) > org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1058) > org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:257) > org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:233) > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1706) > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:3188) > ... > {code} > Adding more information (opName, path, etc.) to the log is useful to > troubleshoot. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15217) Add more information to longest write/read lock held log
[ https://issues.apache.org/jira/browse/HDFS-15217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17093197#comment-17093197 ] Hudson commented on HDFS-15217: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #18187 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/18187/]) HDFS-15298 Fix the findbugs warnings introduced in HDFS-15217 (#1979) (github: rev 62c26b91fd06f505a6e64fd32a36e5e67d06fa30) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java > Add more information to longest write/read lock held log > > > Key: HDFS-15217 > URL: https://issues.apache.org/jira/browse/HDFS-15217 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Toshihiro Suzuki >Assignee: Toshihiro Suzuki >Priority: Major > Fix For: 3.4.0 > > > Currently, we can see the stack trace in the longest write/read lock held > log, but sometimes we need more information, for example, a target path of > deletion: > {code:java} > 2020-03-10 21:51:21,116 [main] INFO namenode.FSNamesystem > (FSNamesystemLock.java:writeUnlock(276)) - Number of suppressed > write-lock reports: 0 > Longest write-lock held at 2020-03-10 21:51:21,107+0900 for 6ms via > java.lang.Thread.getStackTrace(Thread.java:1559) > org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1058) > org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:257) > org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:233) > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1706) > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:3188) > ... > {code} > Adding more information (opName, path, etc.) to the log is useful to > troubleshoot. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15217) Add more information to longest write/read lock held log
[ https://issues.apache.org/jira/browse/HDFS-15217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091055#comment-17091055 ] Toshihiro Suzuki commented on HDFS-15217: - Filed: https://issues.apache.org/jira/browse/HDFS-15298 > Add more information to longest write/read lock held log > > > Key: HDFS-15217 > URL: https://issues.apache.org/jira/browse/HDFS-15217 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Toshihiro Suzuki >Assignee: Toshihiro Suzuki >Priority: Major > Fix For: 3.4.0 > > > Currently, we can see the stack trace in the longest write/read lock held > log, but sometimes we need more information, for example, a target path of > deletion: > {code:java} > 2020-03-10 21:51:21,116 [main] INFO namenode.FSNamesystem > (FSNamesystemLock.java:writeUnlock(276)) - Number of suppressed > write-lock reports: 0 > Longest write-lock held at 2020-03-10 21:51:21,107+0900 for 6ms via > java.lang.Thread.getStackTrace(Thread.java:1559) > org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1058) > org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:257) > org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:233) > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1706) > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:3188) > ... > {code} > Adding more information (opName, path, etc.) to the log is useful to > troubleshoot. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15217) Add more information to longest write/read lock held log
[ https://issues.apache.org/jira/browse/HDFS-15217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091054#comment-17091054 ] Toshihiro Suzuki commented on HDFS-15217: - [~ayushtkn] I checked the Report again, and yes, the changes in this Jira introduced the findbugs warnings. Sorry I misunderstood the warnings. Will create a new JIRA to address the warnings. Thank you for pointing it out. > Add more information to longest write/read lock held log > > > Key: HDFS-15217 > URL: https://issues.apache.org/jira/browse/HDFS-15217 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Toshihiro Suzuki >Assignee: Toshihiro Suzuki >Priority: Major > Fix For: 3.4.0 > > > Currently, we can see the stack trace in the longest write/read lock held > log, but sometimes we need more information, for example, a target path of > deletion: > {code:java} > 2020-03-10 21:51:21,116 [main] INFO namenode.FSNamesystem > (FSNamesystemLock.java:writeUnlock(276)) - Number of suppressed > write-lock reports: 0 > Longest write-lock held at 2020-03-10 21:51:21,107+0900 for 6ms via > java.lang.Thread.getStackTrace(Thread.java:1559) > org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1058) > org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:257) > org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:233) > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1706) > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:3188) > ... > {code} > Adding more information (opName, path, etc.) to the log is useful to > troubleshoot. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15217) Add more information to longest write/read lock held log
[ https://issues.apache.org/jira/browse/HDFS-15217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17090971#comment-17090971 ] Ayush Saxena commented on HDFS-15217: - [~brfrn169] Can you check the Report once again, that is because of your changes only : {noformat} Known null at FSNamesystem.java:[line 7434] --> This Line is added in this commit only {noformat} Another way to confirm it is by this changes is In this yetus report which is from your PR : https://github.com/apache/hadoop/pull/1954#issuecomment-613238448 The trunk Findbugs is passing but your patch Findbugs is failing. Let me know, if still any confusion. > Add more information to longest write/read lock held log > > > Key: HDFS-15217 > URL: https://issues.apache.org/jira/browse/HDFS-15217 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Toshihiro Suzuki >Assignee: Toshihiro Suzuki >Priority: Major > Fix For: 3.4.0 > > > Currently, we can see the stack trace in the longest write/read lock held > log, but sometimes we need more information, for example, a target path of > deletion: > {code:java} > 2020-03-10 21:51:21,116 [main] INFO namenode.FSNamesystem > (FSNamesystemLock.java:writeUnlock(276)) - Number of suppressed > write-lock reports: 0 > Longest write-lock held at 2020-03-10 21:51:21,107+0900 for 6ms via > java.lang.Thread.getStackTrace(Thread.java:1559) > org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1058) > org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:257) > org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:233) > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1706) > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:3188) > ... > {code} > Adding more information (opName, path, etc.) to the log is useful to > troubleshoot. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15217) Add more information to longest write/read lock held log
[ https://issues.apache.org/jira/browse/HDFS-15217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17090933#comment-17090933 ] Toshihiro Suzuki commented on HDFS-15217: - [~ayushtkn] [~elgoiri] I was aware of the findbugs warnings actually but I don't think they were introduced in the changes in this Jira. To fix the findbugs warnings, we need to change the original code. > Add more information to longest write/read lock held log > > > Key: HDFS-15217 > URL: https://issues.apache.org/jira/browse/HDFS-15217 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Toshihiro Suzuki >Assignee: Toshihiro Suzuki >Priority: Major > Fix For: 3.4.0 > > > Currently, we can see the stack trace in the longest write/read lock held > log, but sometimes we need more information, for example, a target path of > deletion: > {code:java} > 2020-03-10 21:51:21,116 [main] INFO namenode.FSNamesystem > (FSNamesystemLock.java:writeUnlock(276)) - Number of suppressed > write-lock reports: 0 > Longest write-lock held at 2020-03-10 21:51:21,107+0900 for 6ms via > java.lang.Thread.getStackTrace(Thread.java:1559) > org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1058) > org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:257) > org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:233) > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1706) > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:3188) > ... > {code} > Adding more information (opName, path, etc.) to the log is useful to > troubleshoot. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15217) Add more information to longest write/read lock held log
[ https://issues.apache.org/jira/browse/HDFS-15217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17090797#comment-17090797 ] Íñigo Goiri commented on HDFS-15217: My bad, I had seen it at the beginning but forgot about it. [~brfrn169] do you mind opening a JIRA with the fix? > Add more information to longest write/read lock held log > > > Key: HDFS-15217 > URL: https://issues.apache.org/jira/browse/HDFS-15217 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Toshihiro Suzuki >Assignee: Toshihiro Suzuki >Priority: Major > Fix For: 3.4.0 > > > Currently, we can see the stack trace in the longest write/read lock held > log, but sometimes we need more information, for example, a target path of > deletion: > {code:java} > 2020-03-10 21:51:21,116 [main] INFO namenode.FSNamesystem > (FSNamesystemLock.java:writeUnlock(276)) - Number of suppressed > write-lock reports: 0 > Longest write-lock held at 2020-03-10 21:51:21,107+0900 for 6ms via > java.lang.Thread.getStackTrace(Thread.java:1559) > org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1058) > org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:257) > org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:233) > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1706) > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:3188) > ... > {code} > Adding more information (opName, path, etc.) to the log is useful to > troubleshoot. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15217) Add more information to longest write/read lock held log
[ https://issues.apache.org/jira/browse/HDFS-15217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17090503#comment-17090503 ] Ayush Saxena commented on HDFS-15217: - Hi [~brfrn169] [~elgoiri] Seems this commit has introduced findbugs warnings : [https://builds.apache.org/job/hadoop-multibranch/job/PR-1954/5/artifact/out/new-findbugs-hadoop-hdfs-project_hadoop-hdfs.html] It was there in the PR yetus report as well, can you check once. > Add more information to longest write/read lock held log > > > Key: HDFS-15217 > URL: https://issues.apache.org/jira/browse/HDFS-15217 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Toshihiro Suzuki >Assignee: Toshihiro Suzuki >Priority: Major > Fix For: 3.4.0 > > > Currently, we can see the stack trace in the longest write/read lock held > log, but sometimes we need more information, for example, a target path of > deletion: > {code:java} > 2020-03-10 21:51:21,116 [main] INFO namenode.FSNamesystem > (FSNamesystemLock.java:writeUnlock(276)) - Number of suppressed > write-lock reports: 0 > Longest write-lock held at 2020-03-10 21:51:21,107+0900 for 6ms via > java.lang.Thread.getStackTrace(Thread.java:1559) > org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1058) > org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:257) > org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:233) > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1706) > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:3188) > ... > {code} > Adding more information (opName, path, etc.) to the log is useful to > troubleshoot. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15217) Add more information to longest write/read lock held log
[ https://issues.apache.org/jira/browse/HDFS-15217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17086654#comment-17086654 ] Hudson commented on HDFS-15217: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #18163 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/18163/]) HDFS-15217 Add more information to longest write/read lock held log (github: rev 1824aee9da4056de0fb638906b2172e486bbebe7) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * (add) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSNamesystemLockReport.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystemLock.java > Add more information to longest write/read lock held log > > > Key: HDFS-15217 > URL: https://issues.apache.org/jira/browse/HDFS-15217 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Toshihiro Suzuki >Assignee: Toshihiro Suzuki >Priority: Major > Fix For: 3.4.0 > > > Currently, we can see the stack trace in the longest write/read lock held > log, but sometimes we need more information, for example, a target path of > deletion: > {code:java} > 2020-03-10 21:51:21,116 [main] INFO namenode.FSNamesystem > (FSNamesystemLock.java:writeUnlock(276)) - Number of suppressed > write-lock reports: 0 > Longest write-lock held at 2020-03-10 21:51:21,107+0900 for 6ms via > java.lang.Thread.getStackTrace(Thread.java:1559) > org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1058) > org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:257) > org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:233) > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1706) > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:3188) > ... > {code} > Adding more information (opName, path, etc.) to the log is useful to > troubleshoot. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15217) Add more information to longest write/read lock held log
[ https://issues.apache.org/jira/browse/HDFS-15217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17081309#comment-17081309 ] Toshihiro Suzuki commented on HDFS-15217: - I created a PR for this. After this patch, we can see additional information in the lock report message as follows: {code:java} 2020-04-11 23:04:36,020 [IPC Server handler 5 on default port 62641] INFO namenode.FSNamesystem (FSNamesystemLock.java:writeUnlock(321)) - Number of suppressed write-lock reports: 0 Longest write-lock held at 2020-04-11 23:04:36,020+0900 for 3ms by delete (ugi=bob (auth:SIMPLE),ip=/127.0.0.1,src=/file,dst=null,perm=null) via java.lang.Thread.getStackTrace(Thread.java:1559) org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1058) org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:302) org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:261) org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1746) org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:3274) org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:1130) org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:724) org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:529) org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1016) org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:944) java.security.AccessController.doPrivileged(Native Method) javax.security.auth.Subject.doAs(Subject.java:422) org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1845) org.apache.hadoop.ipc.Server$Handler.run(Server.java:2948) Total suppressed write-lock held time: 0.0 {code} This patch adds the additional information *"by delete (ugi=bob (auth:SIMPLE),ip=/127.0.0.1,src=/file,dst=null,perm=null)"* which is similar to the audit log format. > Add more information to longest write/read lock held log > > > Key: HDFS-15217 > URL: https://issues.apache.org/jira/browse/HDFS-15217 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Toshihiro Suzuki >Assignee: Toshihiro Suzuki >Priority: Major > > Currently, we can see the stack trace in the longest write/read lock held > log, but sometimes we need more information, for example, a target path of > deletion: > {code:java} > 2020-03-10 21:51:21,116 [main] INFO namenode.FSNamesystem > (FSNamesystemLock.java:writeUnlock(276)) - Number of suppressed > write-lock reports: 0 > Longest write-lock held at 2020-03-10 21:51:21,107+0900 for 6ms via > java.lang.Thread.getStackTrace(Thread.java:1559) > org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1058) > org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:257) > org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:233) > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1706) > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:3188) > ... > {code} > Adding more information (opName, path, etc.) to the log is useful to > troubleshoot. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org