[ https://issues.apache.org/jira/browse/HDFS-6647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Aaron T. Myers updated HDFS-6647: --------------------------------- Attachment: HDFS-6647-failing-test.patch I'm attaching a test case which illustrates the problem. When this problem occurs, the NN will fail to be able to read the edit log and will fail to start with an error like the following: {noformat} java.io.FileNotFoundException: File does not exist: /test-file at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:64) at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:54) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:444) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:227) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:136) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:816) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:676) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:279) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:964) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:711) at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:530) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:586) at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:752) at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:736) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1412) {noformat} The sequence of events that I've identified that can cause this are the following: # A file is opened for write and some data has been written/flushed to it, causing a block to be allocated. # A snapshot is taken which includes the file. # The file is deleted from the present file system, though the client has not yet closed the file. This will log an OP_DELETE to the edit log. # Some error happens triggering pipeline recovery, which log an OP_UPDATE_BLOCKS to the edit log. The reason it's possible for this to happen is basically because the {{updatePipeline}} RPC never checks if the file actually exists, but instead just finds the file INode based on the block ID being replaced in the pipeline. Later, when we're reading the {{OP_UPDATE_BLOCKS}} from the edit log, however, we try to find the file INode based on the path name of the file, which no longer exists because of the previous delete. It's not entirely obvious to me what the right solution to this issue should be. It shouldn't be difficult to change the {{FSEditLogLoader}} to be able to read the {{OP_UPDATE_BLOCKS}} op if we just change it to look up the INode by block ID. On the other hand, however, I'm not entirely sure we should even be allowing this sequence of edit log ops in the first place. It doesn't seem unreasonable to me that we might check that the file actually exists in the present file system in the {{updatePipeline}} RPC call and throw an error if it doesn't, since continuing to write to a file that only exists in a snapshot doesn't make much sense. Along similar lines, it seems a little odd to me that an INode that only exists in the snapshot would continue to be considered under-construction, but perhaps that's not unreasonable in itself. Would love to hear others' thoughts on this. > Edit log corruption when pipeline recovery occurs for deleted file present in > snapshot > -------------------------------------------------------------------------------------- > > Key: HDFS-6647 > URL: https://issues.apache.org/jira/browse/HDFS-6647 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode, snapshots > Affects Versions: 2.4.1 > Reporter: Aaron T. Myers > Attachments: HDFS-6647-failing-test.patch > > > I've encountered a situation wherein an OP_UPDATE_BLOCKS can appear in the > edit log for a file after an OP_DELETE has previously been logged for that > file. Such an edit log sequence cannot then be successfully read by the > NameNode. > More details in the first comment. -- This message was sent by Atlassian JIRA (v6.2#6252)