[ 
https://issues.apache.org/jira/browse/HDFS-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14031114#comment-14031114
 ] 

Kihwal Lee commented on HDFS-6527:
----------------------------------

Thanks for the comment, [~jingzhao]. We can null out the client name while 
deleting files. Then lease check is guaranteed to fail.

In {{INodeFile#destroyAndCollectBlocks()}}, we can delete the client name.
{code}
     if (sf != null) {
       sf.clearDiffs();
     }
+
+    // Delete client name if under construction. This destroys a half of
+    // the lease. The other half will be removed later from LeaseManager.
+    FileUnderConstructionFeature uc = getFileUnderConstructionFeature();
+    if (uc != null) {
+      uc.setClientName(null);
+    }
   }
{code}

And in {{FSNamesystem#checkLease()}}, we can have the following check instead 
of the parent == null check.
{code}
     String clientName = file.getFileUnderConstructionFeature().getClientName();
+    if (clientName == null) {
+      // clientName is removed when the file is deleted.
+      throw new FileNotFoundException(src);
+    }
{code}

This will make lease checks to fail once the "real" file is deleted, whether it 
is in a snapshot or not.  Do you think it is reasonable?

> Edit log corruption due to defered INode removal
> ------------------------------------------------
>
>                 Key: HDFS-6527
>                 URL: https://issues.apache.org/jira/browse/HDFS-6527
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.4.0
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>            Priority: Blocker
>         Attachments: HDFS-6527.branch-2.4.patch, HDFS-6527.trunk.patch, 
> HDFS-6527.v2.patch
>
>
> We have seen a SBN crashing with the following error:
> {panel}
> \[Edit log tailer\] ERROR namenode.FSEditLogLoader:
> Encountered exception on operation AddBlockOp
> [path=/xxx,
> penultimateBlock=NULL, lastBlock=blk_111_111, RpcClientId=,
> RpcCallId=-2]
> java.io.FileNotFoundException: File does not exist: /xxx
> {panel}
> This was caused by the deferred removal of deleted inodes from the inode map. 
> Since getAdditionalBlock() acquires FSN read lock and then write lock, a 
> deletion can happen in between. Because of deferred inode removal outside FSN 
> write lock, getAdditionalBlock() can get the deleted inode from the inode map 
> with FSN write lock held. This allow addition of a block to a deleted file.
> As a result, the edit log will contain OP_ADD, OP_DELETE, followed by
>  OP_ADD_BLOCK.  This cannot be replayed by NN, so NN doesn't start up or SBN 
> crashes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to