[ 
http://issues.apache.org/jira/browse/HADOOP-563?page=comments#action_12438246 ] 
            
Owen O'Malley commented on HADOOP-563:
--------------------------------------

One minute is awfully short to lose your lease that kills a day worth of work. 
However, if we make the leases longer that will interact badly with a 
replacement reduce tasks starting. One approach that might be reasonable is to 
have two time limits:

lease becomes losable: 1 minute
lease is lost: 1 hour

A losable lease is lost when someone tries to create the same file. We need to 
have a forced timeout to handle the case of clients that disappear where the 
filename is never written again.

You need to separate out the handling of losable/lost leases on the namenode 
because once the lease is  declared lost on the name node, the blocks will be 
deleted. 

> DFS client should try to re-new lease if it gets a lease expiration exception 
> when it adds a block to a file
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-563
>                 URL: http://issues.apache.org/jira/browse/HADOOP-563
>             Project: Hadoop
>          Issue Type: Bug
>            Reporter: Runping Qi
>
> In the current DFS client implementation, there is one thread responsible for 
> renewing leases. If for whatever reason, that thread runs behind, the lease 
> may get expired. That causes the client gets a lease expiration exception 
> when writing a block. The consequence of that is very devastating: the client 
> can no longer write to the file, and all the partial results up to that point 
> are gone! This is especially costly for some map reduce jobs where a reducer 
> may take hours or even days to sort the intermediate results before the 
> actual reducing work can start.
> The problem will be solved if the flush method of  DFS client can renew lease 
> on demand. That is, it should try to re-new lease  when it catches a lease 
> expiration exception. That way,  even when under heavy load and the lease 
> renewing thread runs behind, the reducer  task (or what ever tasks use that 
> client) can preceed.  That will be a huge saving in some cases (where sorting 
> intermediate results take a long time to finish). We can set a limit on the 
> number of retries, and may even make it configurable (or changeable at 
> runtime). 
> The namenode can use a different expiration time that is much higher than the 
> current 1 minute lease expiration time for cleaning  up the abandoned 
> unclosed files.
>  

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to