[ 
https://issues.apache.org/jira/browse/FLUME-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13829704#comment-13829704
 ] 

Juhani Connolly commented on FLUME-2245:
----------------------------------------

It appears to me that when an error occurs during an append,

BucketWriter.close() attempts to call BucketWriter.flush() however this fails, 
and thus we never reach an attempt to actually close the backing HDFSWriter. As 
a result of this, isOpen remains true, and the process is repeated constantly.

Upon examination of the code, the flush() seems entirely unnecessary as 
HDFSWriter.close() functions will flush and sync the backing buffer before 
closing it. Is there a  reason for it being called separately, and outside the 
try/catch?

Further, upon examination of HDFSDataStream, since we're going to be rolling 
back anyway, that we could call closeHDFSOutputStream() regardless of whether 
the flush and sync succeed? So long as we throw the exception it should be 
propagated, and rollback will occur.

There may be some deeper consequences I'm missing here due to only a passing 
familiarity with the HDFSSink code. I'll throw up a fix to review board  and 
would appreciate hearing from someone more familiar with the hdfs streams

> HDFS files with errors unable to close
> --------------------------------------
>
>                 Key: FLUME-2245
>                 URL: https://issues.apache.org/jira/browse/FLUME-2245
>             Project: Flume
>          Issue Type: Bug
>            Reporter: Juhani Connolly
>         Attachments: flume.log.1133, flume.log.file
>
>
> This  is running on a snapshot of Flume-1.5 with the git hash 
> 99db32ccd163daf9d7685f0e8485941701e1133d
> When a datanode goes unresponsive for a significant amount of time(for 
> example a big gc) an append failure will occur followed by repeated time outs 
> appearing in the log, and failure to close the stream. Relevant section of 
> logs attached(where it first starts appearing.
> The same log repeats periodically, consistently running into a 
> TimeoutException.
> Restarting  flume(or presumably just the HDFSSink) solves the issue.
> Probable cause in comments



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to