[ https://issues.apache.org/jira/browse/HDFS-15446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17149707#comment-17149707 ]
Ayush Saxena commented on HDFS-15446: ------------------------------------- We are not performing the createSnapshot operation here for the first time, this is when we are re playing the edit logs. The entry made into the edit log since these checked passed, if these traversal or permissions were not correct, it won't be there in the edit logs, Edit logs have only successful entries which changes the state of the filesystem, so that they can be used to reach to the same filesystem state. If checkTraverse() or permission check has to fail, it fail itself when the client makes such a call for the first time to the namenode, such a call will not make into the edit logs itself, because it didn't changed the FileSystem state. Secondly, lets go beyond and consider checkTraverse(..) was there, It is a void method it would be either silent or throw exception, won't change anything at FS layer, Correct? Now if it stays silent, it does nothing, No use of calling it and if it throws exception here, during edit loading the namenode would crash itself like it happened here, and if that happens then it is a critcal bug, that how the FS state is different when re applying the edits and how the client call was success and edit entry isn't. In that case as well we need to fix that bug, rather than having a check here. Going even further in the context, this is the reason, if you tend to change the behavior of an API to throw exception in some scenario, you would find {{unprotectedMethodName}} being called for the edit logs, they don't throw that exception, This is done because of two reasons firstly if someone is re playing edits which were of time before the exception change behavior was introduced, so his edit loading shall fail, Secondly it keeps the the new behavior intact as well, since the edit log entry would itself not be there if the operation threw exception when the client made the call for the first shot. Let me know for any further confusion, you both can try once, making an edit entry for create snapshot, where checkTraverse(..) was success when client called it, that will have an entry in the edit logs since the operation was success, and then when the edit is being replayed {{checkTraverse(..)}} throw an exception. Ideally that shouldn't happen. Give a try. We will hold this till then... > CreateSnapshotOp fails during edit log loading for /.reserved/raw/path with > error java.io.FileNotFoundException: Directory does not exist: > /.reserved/raw/path > --------------------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: HDFS-15446 > URL: https://issues.apache.org/jira/browse/HDFS-15446 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs > Affects Versions: 3.2.0, 3.3.0 > Reporter: Srinivasu Majeti > Assignee: Stephen O'Donnell > Priority: Major > Labels: reserved-word, snapshot > Attachments: HDFS-15446.001.patch, HDFS-15446.002.patch, > HDFS-15446.003.patch > > > After allowing snapshot creation for a path say /app-logs , when we try to > create snapshot on > /.reserved/raw/app-logs , its successful with snapshot creation but later > when Standby Namenode is restarted and tries to load the edit record > OP_CREATE_SNAPSHOT , we see it failing and Standby Namenode shuts down with > an exception "ava.io.FileNotFoundException: Directory does not exist: > /.reserved/raw/app-logs" . > Here are the steps to reproduce : > {code:java} > # hdfs dfs -ls /.reserved/raw/ > Found 15 items > drwxrwxrwt - yarn hadoop 0 2020-06-29 10:27 > /.reserved/raw/app-logs > drwxr-xr-x - hive hadoop 0 2020-06-29 10:29 /.reserved/raw/prod > ++++++++++++++ > [root@c3230-node2 ~]# hdfs dfsadmin -allowSnapshot /app-logs > Allowing snapshot on /app-logs succeeded > [root@c3230-node2 ~]# hdfs dfsadmin -allowSnapshot /prod > Allowing snapshot on /prod succeeded > ++++++++++++++ > # hdfs lsSnapshottableDir > drwxrwxrwt 0 yarn hadoop 0 2020-06-29 10:27 1 65536 /app-logs > drwxr-xr-x 0 hive hadoop 0 2020-06-29 10:29 1 65536 /prod > ++++++++++++++ > [root@c3230-node2 ~]# hdfs dfs -createSnapshot /.reserved/raw/app-logs testSS > Created snapshot /.reserved/raw/app-logs/.snapshot/testSS > {code} > Exception we see in Standby namenode while loading the snapshot creation edit > record. > {code:java} > 2020-06-29 10:33:25,488 ERROR namenode.NameNode (NameNode.java:main(1715)) - > Failed to start namenode. > java.io.FileNotFoundException: Directory does not exist: > /.reserved/raw/app-logs > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.valueOf(INodeDirectory.java:60) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.getSnapshottableRoot(SnapshotManager.java:259) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.createSnapshot(SnapshotManager.java:307) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:772) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:257) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org