[ 
https://issues.apache.org/jira/browse/HDFS-4504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13744144#comment-13744144
 ] 

Colin Patrick McCabe commented on HDFS-4504:
--------------------------------------------

bq. In the latest patch, some unnecessary (Only space) file changes from 
Mapreduce, tools and Yarn project got added. I assume these added by mistake. 
Can you please remove them..

I did that because I wanted a test run on all projects.  I will remove them for 
the next patch, since the other projects came up fine with this.

bq. I think above peice of code can be problematic in case of hflush failure + 
close call.  on sync failure, closeThreads called and streamer becomes null 
there. closed flag also will marked here.

Thanks, that's a good catch.  I will check that streamer is not null in 
closeThreads.

bq. How about simply informing NN about zombie situation for a file...

In a lot of cases when close fails, the NameNode is not reachable.  The 
behavior I implemented in the patch is designed to let long-running clients 
handle these transient problems gracefully.  Currently, uncloseable files get 
created, and a client restart is needed to get rid of them.  For example, you 
might have to restart your Flume daemon, your NFS gateway, etc.  The client 
wants to be able to tell when the file is actually closed in case it needs to 
reopen or move it.  Currently, there is no way to do this.  With this patch, it 
can do this by continuing to call close until it no longer throws an exception.

It seems like the case you are concerned about is the case where we fail to get 
the last block, because of a streamer failure.  This is already a problem, and 
I don't think this patch makes it worse (although it doesn't make it better, 
either).  If you have ideas for how to improve this case, maybe we should file 
a follow-on JIRA?
                
> DFSOutputStream#close doesn't always release resources (such as leases)
> -----------------------------------------------------------------------
>
>                 Key: HDFS-4504
>                 URL: https://issues.apache.org/jira/browse/HDFS-4504
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>         Attachments: HDFS-4504.001.patch, HDFS-4504.002.patch, 
> HDFS-4504.007.patch, HDFS-4504.008.patch, HDFS-4504.009.patch, 
> HDFS-4504.010.patch, HDFS-4504.011.patch, HDFS-4504.014.patch, 
> HDFS-4504.015.patch
>
>
> {{DFSOutputStream#close}} can throw an {{IOException}} in some cases.  One 
> example is if there is a pipeline error and then pipeline recovery fails.  
> Unfortunately, in this case, some of the resources used by the 
> {{DFSOutputStream}} are leaked.  One particularly important resource is file 
> leases.
> So it's possible for a long-lived HDFS client, such as Flume, to write many 
> blocks to a file, but then fail to close it.  Unfortunately, the 
> {{LeaseRenewerThread}} inside the client will continue to renew the lease for 
> the "undead" file.  Future attempts to close the file will just rethrow the 
> previous exception, and no progress can be made by the client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to