[ 
https://issues.apache.org/jira/browse/HDFS-6647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-6647:
---------------------------------

    Attachment: HDFS-6647-failing-test.patch

I'm attaching a test case which illustrates the problem. When this problem 
occurs, the NN will fail to be able to read the edit log and will fail to start 
with an error like the following:

{noformat}
java.io.FileNotFoundException: File does not exist: /test-file
  at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:64)
  at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:54)
  at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:444)
  at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:227)
  
  at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:136)
  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:816)
  at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:676)
  at 
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:279)
  at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:964)
  at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:711)
  at 
org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:530)
  at 
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:586)
  at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:752)
  at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:736)
  at 
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1412)
{noformat}

The sequence of events that I've identified that can cause this are the 
following:

# A file is opened for write and some data has been written/flushed to it, 
causing a block to be allocated.
# A snapshot is taken which includes the file.
# The file is deleted from the present file system, though the client has not 
yet closed the file. This will log an OP_DELETE to the edit log.
# Some error happens triggering pipeline recovery, which log an 
OP_UPDATE_BLOCKS to the edit log.

The reason it's possible for this to happen is basically because the 
{{updatePipeline}} RPC never checks if the file actually exists, but instead 
just finds the file INode based on the block ID being replaced in the pipeline. 
Later, when we're reading the {{OP_UPDATE_BLOCKS}} from the edit log, however, 
we try to find the file INode based on the path name of the file, which no 
longer exists because of the previous delete.

It's not entirely obvious to me what the right solution to this issue should 
be. It shouldn't be difficult to change the {{FSEditLogLoader}} to be able to 
read the {{OP_UPDATE_BLOCKS}} op if we just change it to look up the INode by 
block ID. On the other hand, however, I'm not entirely sure we should even be 
allowing this sequence of edit log ops in the first place. It doesn't seem 
unreasonable to me that we might check that the file actually exists in the 
present file system in the {{updatePipeline}} RPC call and throw an error if it 
doesn't, since continuing to write to a file that only exists in a snapshot 
doesn't make much sense. Along similar lines, it seems a little odd to me that 
an INode that only exists in the snapshot would continue to be considered 
under-construction, but perhaps that's not unreasonable in itself.

Would love to hear others' thoughts on this.

> Edit log corruption when pipeline recovery occurs for deleted file present in 
> snapshot
> --------------------------------------------------------------------------------------
>
>                 Key: HDFS-6647
>                 URL: https://issues.apache.org/jira/browse/HDFS-6647
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode, snapshots
>    Affects Versions: 2.4.1
>            Reporter: Aaron T. Myers
>         Attachments: HDFS-6647-failing-test.patch
>
>
> I've encountered a situation wherein an OP_UPDATE_BLOCKS can appear in the 
> edit log for a file after an OP_DELETE has previously been logged for that 
> file. Such an edit log sequence cannot then be successfully read by the 
> NameNode.
> More details in the first comment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to