[ 
https://issues.apache.org/jira/browse/HDFS-4504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13743586#comment-13743586
 ] 

Uma Maheswara Rao G commented on HDFS-4504:
-------------------------------------------

Hi Colin, Nice work on this issue.

{code}
 List<IOException> ioExceptions = new LinkedList<IOException>();
    if (!closed) {
      try {
        flushBuffer();       // flush from all upper layers
  
        if (currentPacket != null) { 
          waitAndQueueCurrentPacket();
        }
  
        if (bytesCurBlock != 0) {
          // send an empty packet to mark the end of the block
          currentPacket = new Packet(0, 0, bytesCurBlock, 
              currentSeqno++, this.checksum.getChecksumSize());
          currentPacket.lastPacketInBlock = true;
          currentPacket.syncBlock = shouldSyncBlock;
        }
  
        flushInternal();             // flush all data to Datanodes
      } catch (IOException e) {
        DFSClient.LOG.error("unable to flush buffers during file close " +
              "for " + src, e);
        ioExceptions.add(e);
      } finally {
        closed = true;
      }
    }
    // get last block before destroying the streamer
    ExtendedBlock lastBlock = streamer.getBlock();
    closeThreads(false);
{code}

I think above peice of code can be problematic in case of hflush failure + 
close call.
on sync failure, closeThreads called and streamer becomes null there. closed 
flag also will marked here.
When user calls close, unconditionally we will try to closeThreads again and 
also we are trying to get lastblock from streamer.

I think in pipeline failure case, if we don't  get last block(because of 
streamer closure in pipeline failure), force closing may not be a good choice 
as if we don't get last block correctly from client.

Me and Vinay think on this issue. How about simply informing NN about zombie 
situation for a file and change that client holder name to ZombieFile(intension 
is just make sure client is not renewing unintended files)? so, that ensure 
renewLease will not renew such files and closing will happen normally as NN 
does before viq hardlimit expiry. or renewLease call tell the ZombieFiles list 
which should be skipped from renewing from this client.



 
                
> DFSOutputStream#close doesn't always release resources (such as leases)
> -----------------------------------------------------------------------
>
>                 Key: HDFS-4504
>                 URL: https://issues.apache.org/jira/browse/HDFS-4504
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>         Attachments: HDFS-4504.001.patch, HDFS-4504.002.patch, 
> HDFS-4504.007.patch, HDFS-4504.008.patch, HDFS-4504.009.patch, 
> HDFS-4504.010.patch, HDFS-4504.011.patch, HDFS-4504.014.patch, 
> HDFS-4504.015.patch
>
>
> {{DFSOutputStream#close}} can throw an {{IOException}} in some cases.  One 
> example is if there is a pipeline error and then pipeline recovery fails.  
> Unfortunately, in this case, some of the resources used by the 
> {{DFSOutputStream}} are leaked.  One particularly important resource is file 
> leases.
> So it's possible for a long-lived HDFS client, such as Flume, to write many 
> blocks to a file, but then fail to close it.  Unfortunately, the 
> {{LeaseRenewerThread}} inside the client will continue to renew the lease for 
> the "undead" file.  Future attempts to close the file will just rethrow the 
> previous exception, and no progress can be made by the client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to