[ 
https://issues.apache.org/jira/browse/HDFS-9909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bogdan Raducanu updated HDFS-9909:
----------------------------------
    Description: 
If HDFS is restarted while a file is open for writing then new clients can't 
read that file until the hard lease limit expires and block recovery starts.

Scenario:
1. write to file, call hflush
2. without closing the file, restart hdfs 
3. after hdfs is back up, opening file for reading from a new client fails for 
1 hour

Repro attached.

Thoughts:
* possibly this also happens in other cases not just when hdfs is restarted 
(e.g. only all datanodes in pipeline are restarted)
* As far as I can tell this happens because the last block is RWR and 
getReplicaVisibleLength returns -1 for this. The recovery starts after hard 
lease limit expires (so file is readable only after 1 hour).
* one can call recoverLease which will start the lease recovery sooner, BUT, 
how can one know when to call this? The exception thrown is IOException which 
can happen for other reasons.

I think a reasonable solution would be to return a specialized exception 
(similar to AlreadyBeingCreatedException when trying to write to open file).

  was:
If HDFS is restarted while a file is open for writing then new clients can't 
read that file until the hard lease limit expires and block recovery starts.

Scenario:
1. write to file, call hflush
2. without closing the file, restart hdfs 
3. after hdfs is back up, opening file for reading from a new client fails for 
1 hour

Repro attached.

Thoughts:
* As far as I can tell this happens because the last block is RWR and 
getReplicaVisibleLength returns -1 for this. The recovery starts after hard 
lease limit expires (so file is readable only after 1 hour).
* one can call recoverLease which will start the lease recovery sooner, BUT, 
how can one know when to call this? The exception thrown is IOException which 
can happen for other reasons.

I think a reasonable solution would be to return a specialized exception 
(similar to AlreadyBeingCreatedException when trying to write to open file).


> Can't read file after hdfs restart
> ----------------------------------
>
>                 Key: HDFS-9909
>                 URL: https://issues.apache.org/jira/browse/HDFS-9909
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs-client
>    Affects Versions: 2.7.1
>            Reporter: Bogdan Raducanu
>         Attachments: Main.java
>
>
> If HDFS is restarted while a file is open for writing then new clients can't 
> read that file until the hard lease limit expires and block recovery starts.
> Scenario:
> 1. write to file, call hflush
> 2. without closing the file, restart hdfs 
> 3. after hdfs is back up, opening file for reading from a new client fails 
> for 1 hour
> Repro attached.
> Thoughts:
> * possibly this also happens in other cases not just when hdfs is restarted 
> (e.g. only all datanodes in pipeline are restarted)
> * As far as I can tell this happens because the last block is RWR and 
> getReplicaVisibleLength returns -1 for this. The recovery starts after hard 
> lease limit expires (so file is readable only after 1 hour).
> * one can call recoverLease which will start the lease recovery sooner, BUT, 
> how can one know when to call this? The exception thrown is IOException which 
> can happen for other reasons.
> I think a reasonable solution would be to return a specialized exception 
> (similar to AlreadyBeingCreatedException when trying to write to open file).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to