[jira] [Commented] (HDFS-4504) DFSOutputStream#close doesn't always release resources (such as leases)

Vinay (JIRA) Wed, 21 Aug 2013 01:29:55 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-4504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13745868#comment-13745868
 ]


Vinay commented on HDFS-4504:
-----------------------------

{quote}A DFSOutputStream is a zombie for one of two reasons:
1. The client can't contact the NameNode (perhaps because of a network problem)
2. The client asked the NameNode to complete the file and it refused, because 
the NN does not (yet?) have a record that all of the file's blocks are present 
and complete.{quote}
You cannot ignore DataStreamer failure as not zombie. Thats the potential one 
and can happen frequently too. As already described in the above test, trying 
to close file, in case of DataStreamer failure, could lead to potential 
problem. Force complete also fails though.

{quote}As I said before, the current code doesn't do anything special in the 
case of a data streamer failure in DFSOutputStream#close. It just throws up its 
hands and says "oh well, guess that data's gone!" After the hard-lease period 
expires, we will complete the file anyway. So it's exactly the same behavior 
with this patch as without it-- only the timeout is different.{quote}
In case of DataStreamer failure due to pipeline failure, 
Without patch, complete() call wont be called, so no changes of block state at 
NN side, hence recovery of the file will succeed. *No data will be lost*
With patch, complete() call marks the block state to COMMITTED, which will 
block recovery/force complete until block is reported by DN, which will not. 
*So data lost*

{quote} This might be a good idea, but we should do it in a future JIRA. This 
patch is big enough, and changes enough things already.{quote}
Yes, thats correct. To avoid too many changes in this patch itself, we 
suggesting to just try to report zombie to NN instead of force complete. This 
covers all cases you mentioned.
                
> DFSOutputStream#close doesn't always release resources (such as leases)
> -----------------------------------------------------------------------
>
>                 Key: HDFS-4504
>                 URL: https://issues.apache.org/jira/browse/HDFS-4504
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>         Attachments: HDFS-4504.001.patch, HDFS-4504.002.patch, 
> HDFS-4504.007.patch, HDFS-4504.008.patch, HDFS-4504.009.patch, 
> HDFS-4504.010.patch, HDFS-4504.011.patch, HDFS-4504.014.patch, 
> HDFS-4504.015.patch, HDFS-4504.016.patch
>
>
> {{DFSOutputStream#close}} can throw an {{IOException}} in some cases.  One 
> example is if there is a pipeline error and then pipeline recovery fails.  
> Unfortunately, in this case, some of the resources used by the 
> {{DFSOutputStream}} are leaked.  One particularly important resource is file 
> leases.
> So it's possible for a long-lived HDFS client, such as Flume, to write many 
> blocks to a file, but then fail to close it.  Unfortunately, the 
> {{LeaseRenewerThread}} inside the client will continue to renew the lease for 
> the "undead" file.  Future attempts to close the file will just rethrow the 
> previous exception, and no progress can be made by the client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4504) DFSOutputStream#close doesn't always release resources (such as leases)

Reply via email to