[ https://issues.apache.org/jira/browse/FLUME-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13997131#comment-13997131 ]
Hari Shreedharan commented on FLUME-2245: ----------------------------------------- This patch does not really need the changes in the HDFSDataStream and HDFSCompressedStream classes. We should just catch the exception thrown by the flush and try to close. If the close fails, it will get rescheduled anyway. > HDFS files with errors unable to close > -------------------------------------- > > Key: FLUME-2245 > URL: https://issues.apache.org/jira/browse/FLUME-2245 > Project: Flume > Issue Type: Bug > Reporter: Juhani Connolly > Attachments: flume.log.1133, flume.log.file > > > This is running on a snapshot of Flume-1.5 with the git hash > 99db32ccd163daf9d7685f0e8485941701e1133d > When a datanode goes unresponsive for a significant amount of time(for > example a big gc) an append failure will occur followed by repeated time outs > appearing in the log, and failure to close the stream. Relevant section of > logs attached(where it first starts appearing. > The same log repeats periodically, consistently running into a > TimeoutException. > Restarting flume(or presumably just the HDFSSink) solves the issue. > Probable cause in comments -- This message was sent by Atlassian JIRA (v6.2#6252)