[ 
https://issues.apache.org/jira/browse/FLUME-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478747#comment-13478747
 ] 

Juhani Connolly commented on FLUME-1350:
----------------------------------------

I had a more detailed look at the source and I don't really think as it is it 
will solve a problem without introducing a new one(the thrashing that Mike 
referred to).

An alternative solution I would suggest is tracking last writes to each open 
file, and having a watcher thread close them after a configured timeout period 
where the file has received no writes. This would solve every case of unclused 
idle files I can think of, and if a file becomes active again due to a 
temporarily out of commission source reactivating, it would not result in 
thrashing(the file would be reopened, and then closed again soon after all the 
backlog has been handled)

Yongcheng/Mike: What do you think? If no-one else wants to do it, I can put it 
together.
                
> HDFS file handle not closed properly when date bucketing 
> ---------------------------------------------------------
>
>                 Key: FLUME-1350
>                 URL: https://issues.apache.org/jira/browse/FLUME-1350
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v1.1.0, v1.2.0
>            Reporter: Robert Mroczkowski
>         Attachments: HDFSEventSink.java.patch
>
>
> With configuration:
> agent.sinks.hdfs-cafe-access.type = hdfs
> agent.sinks.hdfs-cafe-access.hdfs.path =  
> hdfs://nga/nga/apache/access/%y-%m-%d/
> agent.sinks.hdfs-cafe-access.hdfs.fileType = DataStream
> agent.sinks.hdfs-cafe-access.hdfs.filePrefix = cafe_access
> agent.sinks.hdfs-cafe-access.hdfs.rollInterval = 21600
> agent.sinks.hdfs-cafe-access.hdfs.rollSize = 10485760
> agent.sinks.hdfs-cafe-access.hdfs.rollCount = 0
> agent.sinks.hdfs-cafe-access.hdfs.txnEventMax = 1000
> agent.sinks.hdfs-cafe-access.hdfs.batchSize = 1000
> #agent.sinks.hdfs-cafe-access.hdfs.codeC = snappy
> agent.sinks.hdfs-cafe-access.hdfs.hdfs.maxOpenFiles = 5000
> agent.sinks.hdfs-cafe-access.channel = memo-1
> When new directory is created previous file handle remains opened. 
> rollInterval setting is used only with files in current date bucket. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to