[jira] [Assigned] (HDFS-16408) Ensure LeaseRecheckIntervalMs is greater than zero
[ https://issues.apache.org/jira/browse/HDFS-16408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen O'Donnell reassigned HDFS-16408: Assignee: Jingxuan Fu (was: Stephen O'Donnell) > Ensure LeaseRecheckIntervalMs is greater than zero > -- > > Key: HDFS-16408 > URL: https://issues.apache.org/jira/browse/HDFS-16408 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.1.3, 3.3.1 >Reporter: Jingxuan Fu >Assignee: Jingxuan Fu >Priority: Major > Labels: pull-request-available > Original Estimate: 1h > Time Spent: 3h 20m > Remaining Estimate: 0h > > There is a problem with the try catch statement in the LeaseMonitor daemon > (in LeaseManager.java), when an unknown exception is caught, it simply prints > a warning message and continues with the next loop. > An extreme case is when the configuration item > 'dfs.namenode.lease-recheck-interval-ms' is accidentally set to a negative > number by the user, as the configuration item is read without checking its > range, 'fsnamesystem. getLeaseRecheckIntervalMs()' returns this value and is > used as an argument to Thread.sleep(). A negative argument will cause > Thread.sleep() to throw an IllegalArgumentException, which will be caught by > 'catch(Throwable e)' and a warning message will be printed. > This behavior is repeated for each subsequent loop. This means that a huge > amount of repetitive messages will be printed to the log file in a short > period of time, quickly consuming disk space and affecting the operation of > the system. > As you can see, 178M log files are generated in one minute. > > {code:java} > ll logs/ > total 174456 > drwxrwxr-x 2 hadoop hadoop 4096 1月 3 15:13 ./ > drwxr-xr-x 11 hadoop hadoop 4096 1月 3 15:13 ../ > -rw-rw-r-- 1 hadoop hadoop 36342 1月 3 15:14 > hadoop-hadoop-datanode-ljq1.log > -rw-rw-r-- 1 hadoop hadoop 1243 1月 3 15:13 > hadoop-hadoop-datanode-ljq1.out > -rw-rw-r-- 1 hadoop hadoop 178545466 1月 3 15:14 > hadoop-hadoop-namenode-ljq1.log > -rw-rw-r-- 1 hadoop hadoop 692 1月 3 15:13 > hadoop-hadoop-namenode-ljq1.out > -rw-rw-r-- 1 hadoop hadoop 33201 1月 3 15:14 > hadoop-hadoop-secondarynamenode-ljq1.log > -rw-rw-r-- 1 hadoop hadoop 3764 1月 3 15:14 > hadoop-hadoop-secondarynamenode-ljq1.out > -rw-rw-r-- 1 hadoop hadoop 0 1月 3 15:13 SecurityAuth-hadoop.audit > > tail -n 15 logs/hadoop-hadoop-namenode-ljq1.log > 2022-01-03 15:14:46,032 WARN > org.apache.hadoop.hdfs.server.namenode.LeaseManager: Unexpected throwable: > java.lang.IllegalArgumentException: timeout value is negative > at java.base/java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:534) > at java.base/java.lang.Thread.run(Thread.java:829) > 2022-01-03 15:14:46,033 WARN > org.apache.hadoop.hdfs.server.namenode.LeaseManager: Unexpected throwable: > java.lang.IllegalArgumentException: timeout value is negative > at java.base/java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:534) > at java.base/java.lang.Thread.run(Thread.java:829) > 2022-01-03 15:14:46,033 WARN > org.apache.hadoop.hdfs.server.namenode.LeaseManager: Unexpected throwable: > java.lang.IllegalArgumentException: timeout value is negative > at java.base/java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:534) > at java.base/java.lang.Thread.run(Thread.java:829) > {code} > > I think there are two potential solutions. > The first is to adjust the position of the try catch statement in the > LeaseMonitor daemon by moving 'catch(Throwable e)' to the outside of the loop > body. This can be done like the NameNodeResourceMonitor daemon, which ends > the thread when an unexpected exception is caught. > The second is to use Precondition.checkArgument() to scope the configuration > item 'dfs.namenode.lease-recheck-interval-ms' when it is read, to avoid the > wrong configuration item can affect the subsequent operation of the program. > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-16408) Ensure LeaseRecheckIntervalMs is greater than zero
[ https://issues.apache.org/jira/browse/HDFS-16408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen O'Donnell reassigned HDFS-16408: Assignee: Stephen O'Donnell > Ensure LeaseRecheckIntervalMs is greater than zero > -- > > Key: HDFS-16408 > URL: https://issues.apache.org/jira/browse/HDFS-16408 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.1.3, 3.3.1 >Reporter: Jingxuan Fu >Assignee: Stephen O'Donnell >Priority: Major > Labels: pull-request-available > Original Estimate: 1h > Time Spent: 3h 20m > Remaining Estimate: 0h > > There is a problem with the try catch statement in the LeaseMonitor daemon > (in LeaseManager.java), when an unknown exception is caught, it simply prints > a warning message and continues with the next loop. > An extreme case is when the configuration item > 'dfs.namenode.lease-recheck-interval-ms' is accidentally set to a negative > number by the user, as the configuration item is read without checking its > range, 'fsnamesystem. getLeaseRecheckIntervalMs()' returns this value and is > used as an argument to Thread.sleep(). A negative argument will cause > Thread.sleep() to throw an IllegalArgumentException, which will be caught by > 'catch(Throwable e)' and a warning message will be printed. > This behavior is repeated for each subsequent loop. This means that a huge > amount of repetitive messages will be printed to the log file in a short > period of time, quickly consuming disk space and affecting the operation of > the system. > As you can see, 178M log files are generated in one minute. > > {code:java} > ll logs/ > total 174456 > drwxrwxr-x 2 hadoop hadoop 4096 1月 3 15:13 ./ > drwxr-xr-x 11 hadoop hadoop 4096 1月 3 15:13 ../ > -rw-rw-r-- 1 hadoop hadoop 36342 1月 3 15:14 > hadoop-hadoop-datanode-ljq1.log > -rw-rw-r-- 1 hadoop hadoop 1243 1月 3 15:13 > hadoop-hadoop-datanode-ljq1.out > -rw-rw-r-- 1 hadoop hadoop 178545466 1月 3 15:14 > hadoop-hadoop-namenode-ljq1.log > -rw-rw-r-- 1 hadoop hadoop 692 1月 3 15:13 > hadoop-hadoop-namenode-ljq1.out > -rw-rw-r-- 1 hadoop hadoop 33201 1月 3 15:14 > hadoop-hadoop-secondarynamenode-ljq1.log > -rw-rw-r-- 1 hadoop hadoop 3764 1月 3 15:14 > hadoop-hadoop-secondarynamenode-ljq1.out > -rw-rw-r-- 1 hadoop hadoop 0 1月 3 15:13 SecurityAuth-hadoop.audit > > tail -n 15 logs/hadoop-hadoop-namenode-ljq1.log > 2022-01-03 15:14:46,032 WARN > org.apache.hadoop.hdfs.server.namenode.LeaseManager: Unexpected throwable: > java.lang.IllegalArgumentException: timeout value is negative > at java.base/java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:534) > at java.base/java.lang.Thread.run(Thread.java:829) > 2022-01-03 15:14:46,033 WARN > org.apache.hadoop.hdfs.server.namenode.LeaseManager: Unexpected throwable: > java.lang.IllegalArgumentException: timeout value is negative > at java.base/java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:534) > at java.base/java.lang.Thread.run(Thread.java:829) > 2022-01-03 15:14:46,033 WARN > org.apache.hadoop.hdfs.server.namenode.LeaseManager: Unexpected throwable: > java.lang.IllegalArgumentException: timeout value is negative > at java.base/java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:534) > at java.base/java.lang.Thread.run(Thread.java:829) > {code} > > I think there are two potential solutions. > The first is to adjust the position of the try catch statement in the > LeaseMonitor daemon by moving 'catch(Throwable e)' to the outside of the loop > body. This can be done like the NameNodeResourceMonitor daemon, which ends > the thread when an unexpected exception is caught. > The second is to use Precondition.checkArgument() to scope the configuration > item 'dfs.namenode.lease-recheck-interval-ms' when it is read, to avoid the > wrong configuration item can affect the subsequent operation of the program. > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org