[ https://issues.apache.org/jira/browse/HDFS-7707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14303709#comment-14303709 ]
Kihwal Lee commented on HDFS-7707: ---------------------------------- As for {{TestFailureToReadEdits}}, the test is flawed. Since the port for qjm is hard-coded, it sometimes does not work. HDFS-6054 is supposed to fix it. > Edit log corruption due to delayed block removal again > ------------------------------------------------------ > > Key: HDFS-7707 > URL: https://issues.apache.org/jira/browse/HDFS-7707 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode > Affects Versions: 2.6.0 > Reporter: Yongjun Zhang > Assignee: Yongjun Zhang > Attachments: HDFS-7707.001.patch, HDFS-7707.002.patch, > HDFS-7707.003.patch, reproduceHDFS-7707.patch > > > Edit log corruption is seen again, even with the fix of HDFS-6825. > Prior to HDFS-6825 fix, if dirX is deleted recursively, an OP_CLOSE can get > into edit log for the fileY under dirX, thus corrupting the edit log > (restarting NN with the edit log would fail). > What HDFS-6825 does to fix this issue is, to detect whether fileY is already > deleted by checking the ancestor dirs on it's path, if any of them doesn't > exist, then fileY is already deleted, and don't put OP_CLOSE to edit log for > the file. > For this new edit log corruption, what I found was, the client first deleted > dirX recursively, then create another dir with exactly the same name as dirX > right away. Because HDFS-6825 count on the namespace checking (whether dirX > exists in its parent dir) to decide whether a file has been deleted, the > newly created dirX defeats this checking, thus OP_CLOSE for the already > deleted file gets into the edit log, due to delayed block removal. > What we need to do is to have a more robust way to detect whether a file has > been deleted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)