[ 
https://issues.apache.org/jira/browse/HDFS-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12854185#action_12854185
 ] 

Erik Steffl commented on HDFS-1072:
-----------------------------------

Further investigation revealed that the following sequence leads to 
AlreadyBeingCreatedException:

  - LEASE_LIMIT=500; cluster.setLeasePeriod(LEASE_LIMIT, LEASE_LIMIT);

  - thread A gets a lease on a file

  - thread B sleeps 2*soft limit

  - thread B tries to get lease on a file, triggers lease recovery and gets 
RecoveryInProgressException

  - before lease recovery ends, namenode LeaseManager.java:checkLeases finds 
out that hard limit was also expired, start a new recovery, resets timeouts

  - thread B tries to get lease again, timeout is not expired (it was reset in 
previous step) so it gets AlreadyBeingCreatedException

There are two problems in the code that lead to this:

  - hard limit should not be set to such a low value, it makes it very likely 
for recovery to not finish before it's taken over by another recovery (because 
of expired hard limit)

  - namenode should recognize that even though limit is not expired the 
recovery is ongoing and return RecoveryInProgressException instead of 
AlreadyBeingCreatedException (in FSNamesystem.java:startFileInternal, when it's 
deciding what to do if the file is under construction)

> AlreadyBeingCreatedException with HDFS_NameNode as the lease holder
> -------------------------------------------------------------------
>
>                 Key: HDFS-1072
>                 URL: https://issues.apache.org/jira/browse/HDFS-1072
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs client, name-node
>    Affects Versions: 0.21.0
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Erik Steffl
>             Fix For: 0.21.0
>
>
> TestReadWhileWriting may fail by AlreadyBeingCreatedException with 
> HDFS_NameNode as the lease holder, which indicates that lease recovery is in 
> an inconsistent state.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to