[ 
https://issues.apache.org/jira/browse/HDFS-15446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17149645#comment-17149645
 ] 

Stephen O'Donnell commented on HDFS-15446:
------------------------------------------

I am not sure if we need to call checkTraverse or not - the tests all seem to 
pass without it, so it seems like we don't need it in this use case, as 
permissions etc should all have been checked before the edit makes it into the 
edit stream.

However, when the namenode receives a "createSnapshot" call it, eventually 
calls this code in FSDirSnapshotOp.java:

{code}
  static String createSnapshot(
      FSDirectory fsd, FSPermissionChecker pc, SnapshotManager snapshotManager,
      String snapshotRoot, String snapshotName, boolean logRetryCache)
      throws IOException {
    final INodesInPath iip = fsd.resolvePath(pc, snapshotRoot, DirOp.WRITE);
{code}

As you can see, it calls fsd.resolvePath(...), and that is why my original 
patch changed to call this same method when loading the edits.

I think it would be safer to use the same code in this change too, ie what was 
done in the 002 patch.

[~ayushtkn] was concerned about the performance overhead, hence we created the 
cut down method in the 003 patch. However snapshot operations are relatively 
rare - maybe a few 1000's per day in an extreme case, and the cost of dropping 
a large snapshot is often several seconds of runtime. The cost of getting the 
IIP should be small compared to that. It also seems the original processing 
code was calling checkTraverse anyway, so we already had that overhead.

[~ayushtkn] What do you think? Is there any risk from not calling 
checkTraverse(...) ?



> CreateSnapshotOp fails during edit log loading for /.reserved/raw/path with 
> error java.io.FileNotFoundException: Directory does not exist: 
> /.reserved/raw/path 
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-15446
>                 URL: https://issues.apache.org/jira/browse/HDFS-15446
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs
>    Affects Versions: 3.2.0, 3.3.0
>            Reporter: Srinivasu Majeti
>            Assignee: Stephen O'Donnell
>            Priority: Major
>              Labels: reserved-word, snapshot
>         Attachments: HDFS-15446.001.patch, HDFS-15446.002.patch, 
> HDFS-15446.003.patch
>
>
> After allowing snapshot creation for a path say /app-logs , when we try to 
> create snapshot on 
>  /.reserved/raw/app-logs , its successful with snapshot creation but later 
> when Standby Namenode is restarted and tries to load the edit record 
> OP_CREATE_SNAPSHOT , we see it failing and Standby Namenode shuts down with 
> an exception "ava.io.FileNotFoundException: Directory does not exist: 
> /.reserved/raw/app-logs" .
> Here are the steps to reproduce :
> {code:java}
> # hdfs dfs -ls /.reserved/raw/
> Found 15 items
> drwxrwxrwt   - yarn   hadoop          0 2020-06-29 10:27 
> /.reserved/raw/app-logs
> drwxr-xr-x   - hive   hadoop          0 2020-06-29 10:29 /.reserved/raw/prod
> ++++++++++++++
> [root@c3230-node2 ~]# hdfs dfsadmin -allowSnapshot /app-logs
> Allowing snapshot on /app-logs succeeded
> [root@c3230-node2 ~]# hdfs dfsadmin -allowSnapshot /prod
> Allowing snapshot on /prod succeeded
> ++++++++++++++
> # hdfs lsSnapshottableDir
> drwxrwxrwt 0 yarn hadoop 0 2020-06-29 10:27 1 65536 /app-logs
> drwxr-xr-x 0 hive hadoop 0 2020-06-29 10:29 1 65536 /prod
> ++++++++++++++
> [root@c3230-node2 ~]# hdfs dfs -createSnapshot /.reserved/raw/app-logs testSS
> Created snapshot /.reserved/raw/app-logs/.snapshot/testSS
> {code}
> Exception we see in Standby namenode while loading the snapshot creation edit 
> record.
> {code:java}
> 2020-06-29 10:33:25,488 ERROR namenode.NameNode (NameNode.java:main(1715)) - 
> Failed to start namenode.
> java.io.FileNotFoundException: Directory does not exist: 
> /.reserved/raw/app-logs
>         at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.valueOf(INodeDirectory.java:60)
>         at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.getSnapshottableRoot(SnapshotManager.java:259)
>         at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.createSnapshot(SnapshotManager.java:307)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:772)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:257)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to