[ https://issues.apache.org/jira/browse/HDFS-16697?focusedWorklogId=795635&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-795635 ]
ASF GitHub Bot logged work on HDFS-16697: ----------------------------------------- Author: ASF GitHub Bot Created on: 27/Jul/22 12:30 Start Date: 27/Jul/22 12:30 Worklog Time Spent: 10m Work Description: Likkey opened a new pull request, #4641: URL: https://github.com/apache/hadoop/pull/4641 <!-- Thanks for sending a pull request! 1. If this is your first time, please read our contributor guidelines: https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute 2. Make sure your PR title starts with JIRA issue id, e.g., 'HADOOP-17799. Your PR title ...'. --> ### Description of PR Add Precondition.checkArgument() to check minimumRedundantVolumes and give a prompt when the value is greater than the number of NameNode storage volumes to avoid never being able to turn off safe mode afterwards. ### How was this patch tested? ### For code changes: - [ ] Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')? - [ ] Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation? - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, `NOTICE-binary` files? Issue Time Tracking ------------------- Worklog Id: (was: 795635) Remaining Estimate: 0h Time Spent: 10m > Randomly setting “dfs.namenode.resource.checked.volumes.minimum” will always > prevent safe mode from being turned off > -------------------------------------------------------------------------------------------------------------------- > > Key: HDFS-16697 > URL: https://issues.apache.org/jira/browse/HDFS-16697 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode > Affects Versions: 3.1.3 > Environment: Linux version 4.15.0-142-generic > (buildd@lgw01-amd64-039) (gcc version 5.4.0 20160609 (Ubuntu > 5.4.0-6ubuntu1~16.04.12)) > java version "1.8.0_162" > Java(TM) SE Runtime Environment (build 1.8.0_162-b12) > Java HotSpot(TM) 64-Bit Server VM (build 25.162-b12, mixed mode) > Reporter: Jingxuan Fu > Assignee: Jingxuan Fu > Priority: Major > Fix For: 3.1.3 > > Time Spent: 10m > Remaining Estimate: 0h > > {code:java} > <property> > <name>dfs.namenode.resource.checked.volumes.minimum</name> > <value>1</value> > <description> > The minimum number of redundant NameNode storage volumes required. > </description> > </property>{code} > I found that when setting the value of > “dfs.namenode.resource.checked.volumes.minimum” is greater than the total > number of storage volumes in the NameNode, it is always impossible to turn > off the safe mode, and when in safe mode, the file system only accepts read > data requests, but not delete, modify and other change requests, which is > greatly limited by the function. > The default value of the configuration item is 1, we set to 2 as an example > for illustration, after starting hdfs logs and the client will throw the > relevant reminders. > {code:java} > 2022-07-27 17:37:31,772 WARN > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: NameNode low on > available disk space. Already in safe mode. > 2022-07-27 17:37:31,772 INFO org.apache.hadoop.hdfs.StateChange: STATE* Safe > mode is ON. > Resources are low on NN. Please add or free up more resourcesthen turn off > safe mode manually. NOTE: If you turn off safe mode before adding resources, > the NN will immediately return to safe mode. Use "hdfs dfsadmin -safemode > leave" to turn safe mode off. > {code} > {code:java} > org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create > directory /hdfsapi/test. Name node is in safe mode. > Resources are low on NN. Please add or free up more resourcesthen turn off > safe mode manually. NOTE: If you turn off safe mode before adding resources, > the NN will immediately return to safe mode. Use "hdfs dfsadmin -safemode > leave" to turn safe mode off. NamenodeHostName:192.168.1.167 > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.newSafemodeException(FSNamesystem.java:1468) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:1455) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3174) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:1145) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:714) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:527) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1036) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1000) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:928) > at java.base/java.security.AccessController.doPrivileged(Native > Method) > at java.base/javax.security.auth.Subject.doAs(Subject.java:423) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2916){code} > According to the prompt, it is believed that there is not enough resource > space to meet the corresponding conditions to close safe mode, but after > adding or releasing more resources and lowering the resource condition > threshold "dfs.namenode.resource.du.reserved", it still fails to close safe > mode and throws the same prompt . > According to the source code, we know that if the NameNode has redundant > storage volumes less than the "dfs.namenode.resource.checked.volumes.minimum" > set the minimum number of redundant storage volumes will enter safe mode. > After debugging, *we found that the current NameNode storage volumes are > abundant resource space, but because the total number of NameNode storage > volumes is less than the set value, so the number of NameNode storage volumes > with redundancy space must also be less than the set value, resulting in > always entering safe mode.* > In summary, it is found that the configuration item lacks a condition check > and an associated exception handling mechanism, which makes it impossible to > find the root cause of the impact when a misconfiguration occurs. > The solution I propose is to use Precondition.checkArgument() to check the > value of the configuration item and give a prompt when the value is greater > than the number of NameNode storage volumes to avoid the misconfiguration > from affecting the subsequent operation of the program. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org