[ https://issues.apache.org/jira/browse/HDFS-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14032063#comment-14032063 ]
Jing Zhao commented on HDFS-6527: --------------------------------- The v3 may not work when the file is contained in a snapshot. The new unit test can fail if we create a snapshot on root after the file creation: {code} FSDataOutputStream out = null; out = fs.create(filePath); SnapshotTestHelper.createSnapshot(fs, new Path("/"), "s1"); Thread deleteThread = new DeleteThread(fs, filePath, true); {code} Instead of the changes made in v3 patch, I guess the v2 patch may work with the following change: {code} @@ -3018,6 +3036,13 @@ private INodeFile checkLease(String src, String holder, INode inode, + (lease != null ? lease.toString() : "Holder " + holder + " does not have any open files.")); } + // If parent is not there or we mark the file as deleted in its snapshot + // feature, it must have been deleted. + if (file.getParent() == null + || (file.isWithSnapshot() && file.getFileWithSnapshotFeature() + .isCurrentFileDeleted())) { + throw new FileNotFoundException(src); + } String clientName = file.getFileUnderConstructionFeature().getClientName(); if (holder != null && !clientName.equals(holder)) { throw new LeaseExpiredException("Lease mismatch on " + ident + {code} > Edit log corruption due to defered INode removal > ------------------------------------------------ > > Key: HDFS-6527 > URL: https://issues.apache.org/jira/browse/HDFS-6527 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 2.4.0 > Reporter: Kihwal Lee > Assignee: Kihwal Lee > Priority: Blocker > Attachments: HDFS-6527.branch-2.4.patch, HDFS-6527.trunk.patch, > HDFS-6527.v2.patch, HDFS-6527.v3.patch > > > We have seen a SBN crashing with the following error: > {panel} > \[Edit log tailer\] ERROR namenode.FSEditLogLoader: > Encountered exception on operation AddBlockOp > [path=/xxx, > penultimateBlock=NULL, lastBlock=blk_111_111, RpcClientId=, > RpcCallId=-2] > java.io.FileNotFoundException: File does not exist: /xxx > {panel} > This was caused by the deferred removal of deleted inodes from the inode map. > Since getAdditionalBlock() acquires FSN read lock and then write lock, a > deletion can happen in between. Because of deferred inode removal outside FSN > write lock, getAdditionalBlock() can get the deleted inode from the inode map > with FSN write lock held. This allow addition of a block to a deleted file. > As a result, the edit log will contain OP_ADD, OP_DELETE, followed by > OP_ADD_BLOCK. This cannot be replayed by NN, so NN doesn't start up or SBN > crashes. -- This message was sent by Atlassian JIRA (v6.2#6252)