[ 
https://issues.apache.org/jira/browse/HDFS-11945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16040927#comment-16040927
 ] 

Kihwal Lee commented on HDFS-11945:
-----------------------------------

We could change the namenode lease holder ID every hour.  Normally there will 
be only a brief moment of two being active in the system. Multiple ones can be 
active If there are failures. If the ID is suffixed by time stamp or date 
string, the log message for recovery will show how old the leases are.

The major cause of lease recovery failures is datanodes having problems during 
block recoveries. One interesting case is when the namenode throws "server too 
busy" to datanodes. A {{commitBlockSynchronization()}} call can fail for this 
reason and won't be retried.

> Internal lease recovery may not be retried for a long time
> ----------------------------------------------------------
>
>                 Key: HDFS-11945
>                 URL: https://issues.apache.org/jira/browse/HDFS-11945
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>            Reporter: Kihwal Lee
>
> Lease is assigned per client who is identified by its holder ID or client ID, 
> thus a renewal or an expiration of a lease affects all files being written by 
> the client.
> When a client/writer dies without closing a file, its lease expires in one 
> hour (hard limit) and the namenode tries to recover the lease. As a part of 
> the process, the namenode takes the ownership of the lease and renews it. If 
> the recovery does not finish successfully, the lease will expire in one hour 
> and the namenode will try again to recover the lease.
> However, if a file system has another lease expiring within the hour, the 
> recovery attempt for the lease will push forward the expiration of the lease 
> held by the namenode.  This causes failed lease recoveries to be not retried 
> for a long time. We have seen it happening for days.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to