Yongjun Zhang created HDFS-7707:
-----------------------------------

             Summary: Edit log corruption due to delayed block removal again
                 Key: HDFS-7707
                 URL: https://issues.apache.org/jira/browse/HDFS-7707
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: namenode
    Affects Versions: 2.6.0
            Reporter: Yongjun Zhang
            Assignee: Yongjun Zhang


Edit log corruption is seen again, even with the fix of HDFS-6825. 

Prior to HDFS-6825 fix, if dirX is deleted recursively, an OP_CLOSE can get 
into edit log for the fileY under dirX, thus corrupting the edit log 
(restarting NN with the edit log would fail). 

What HDFS-6825 does to fix this issue is, to detect whether fileY is already 
deleted by checking the ancestor dirs on it's path, if any of them doesn't 
exist, then fileY is already deleted, and don't put OP_CLOSE to edit log for 
the file.

For this new edit log corruption, what I found was, the client first deleted 
dirX recursively, then create another dir with exactly the same name as dirX 
right away.  Because HDFS-6825 count on the namespace checking (whether dirX 
exists in its parent dir) to decide whether a file has been deleted, the newly 
created dirX defeats this checking, thus OP_CLOSE for the already deleted file 
gets into the edit log, due to delayed block removal.

What we need to do is to have a more robust way to detect whether a file has 
been deleted.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to