[ 
https://issues.apache.org/jira/browse/HDFS-15446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17149707#comment-17149707
 ] 

Ayush Saxena commented on HDFS-15446:
-------------------------------------

We are not performing the createSnapshot operation here for the first time, 
this is when we are re playing the edit logs. The entry made into the edit log 
since these checked passed, if these traversal or permissions were not correct, 
it won't be there in the edit logs, Edit logs have only successful entries 
which changes the state of the filesystem, so that they can be used to reach to 
the same filesystem state. If checkTraverse() or permission check has to fail, 
it fail itself when the client makes such a call for the first time to the 
namenode, such a call will not make into the edit logs itself, because it 
didn't changed the FileSystem state.

Secondly, lets go beyond and consider checkTraverse(..) was there, It is a void 
method it would be either silent or throw exception, won't change anything at 
FS layer, Correct? Now if it stays silent, it does nothing, No use of calling 
it and if it throws exception here, during edit loading the namenode would 
crash itself like it happened here, and if that happens then it is a critcal 
bug, that how the FS state is different when re applying the edits and how the 
client call was success and edit entry isn't. In that case as well we need to 
fix that bug, rather than having a check here.

Going even further in the context,  this is the reason, if you tend to change 
the behavior of an API to throw exception in some scenario, you would find 
{{unprotectedMethodName}} being called for the edit logs, they don't throw that 
exception, This is done because of two reasons firstly if someone is re playing 
edits which were of time before the exception change behavior was introduced, 
so his edit loading shall fail, Secondly it keeps the the new behavior intact 
as well, since the edit log entry would itself not be there if the operation 
threw exception when the client made the call for the first shot.

Let me know for any further confusion, you both can try once, making an edit 
entry for create snapshot, where checkTraverse(..) was success when client 
called it, that will have an entry in the edit logs since the operation was 
success, and then when the edit is being replayed {{checkTraverse(..)}} throw 
an exception. Ideally that shouldn't happen. Give a try. We will hold this till 
then...


> CreateSnapshotOp fails during edit log loading for /.reserved/raw/path with 
> error java.io.FileNotFoundException: Directory does not exist: 
> /.reserved/raw/path 
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-15446
>                 URL: https://issues.apache.org/jira/browse/HDFS-15446
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs
>    Affects Versions: 3.2.0, 3.3.0
>            Reporter: Srinivasu Majeti
>            Assignee: Stephen O'Donnell
>            Priority: Major
>              Labels: reserved-word, snapshot
>         Attachments: HDFS-15446.001.patch, HDFS-15446.002.patch, 
> HDFS-15446.003.patch
>
>
> After allowing snapshot creation for a path say /app-logs , when we try to 
> create snapshot on 
>  /.reserved/raw/app-logs , its successful with snapshot creation but later 
> when Standby Namenode is restarted and tries to load the edit record 
> OP_CREATE_SNAPSHOT , we see it failing and Standby Namenode shuts down with 
> an exception "ava.io.FileNotFoundException: Directory does not exist: 
> /.reserved/raw/app-logs" .
> Here are the steps to reproduce :
> {code:java}
> # hdfs dfs -ls /.reserved/raw/
> Found 15 items
> drwxrwxrwt   - yarn   hadoop          0 2020-06-29 10:27 
> /.reserved/raw/app-logs
> drwxr-xr-x   - hive   hadoop          0 2020-06-29 10:29 /.reserved/raw/prod
> ++++++++++++++
> [root@c3230-node2 ~]# hdfs dfsadmin -allowSnapshot /app-logs
> Allowing snapshot on /app-logs succeeded
> [root@c3230-node2 ~]# hdfs dfsadmin -allowSnapshot /prod
> Allowing snapshot on /prod succeeded
> ++++++++++++++
> # hdfs lsSnapshottableDir
> drwxrwxrwt 0 yarn hadoop 0 2020-06-29 10:27 1 65536 /app-logs
> drwxr-xr-x 0 hive hadoop 0 2020-06-29 10:29 1 65536 /prod
> ++++++++++++++
> [root@c3230-node2 ~]# hdfs dfs -createSnapshot /.reserved/raw/app-logs testSS
> Created snapshot /.reserved/raw/app-logs/.snapshot/testSS
> {code}
> Exception we see in Standby namenode while loading the snapshot creation edit 
> record.
> {code:java}
> 2020-06-29 10:33:25,488 ERROR namenode.NameNode (NameNode.java:main(1715)) - 
> Failed to start namenode.
> java.io.FileNotFoundException: Directory does not exist: 
> /.reserved/raw/app-logs
>         at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.valueOf(INodeDirectory.java:60)
>         at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.getSnapshottableRoot(SnapshotManager.java:259)
>         at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.createSnapshot(SnapshotManager.java:307)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:772)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:257)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to