[ 
https://issues.apache.org/jira/browse/FLUME-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13476371#comment-13476371
 ] 

Mike Percy commented on FLUME-1350:
-----------------------------------

In production deployments that I have seen, people often use a rollInterval of 
5 or 10 minutes to generate the rolled log files. Depending on how many 
concurrent log files you are writing, this is usually fine for long-term data 
storage if you are worried about Namenode memory usage. By the way, 
rollInterval will not generate empty files over periods of inactivity, if you 
are worried about that.

If you only want one file open at a time then you can set maxOpenFiles = 1 and 
get that behavior. However, in a real-world scenario, where you have 
concurrency and multiple tiers, likely resulting in out-of-order delivery, it 
is highly unlikely you really want only one file open at a time, since that 
will result in thrashing.

Does that help?
                
> HDFS file handle not closed properly when date bucketing 
> ---------------------------------------------------------
>
>                 Key: FLUME-1350
>                 URL: https://issues.apache.org/jira/browse/FLUME-1350
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v1.1.0, v1.2.0
>            Reporter: Robert Mroczkowski
>         Attachments: HDFSEventSink.java.patch
>
>
> With configuration:
> agent.sinks.hdfs-cafe-access.type = hdfs
> agent.sinks.hdfs-cafe-access.hdfs.path =  
> hdfs://nga/nga/apache/access/%y-%m-%d/
> agent.sinks.hdfs-cafe-access.hdfs.fileType = DataStream
> agent.sinks.hdfs-cafe-access.hdfs.filePrefix = cafe_access
> agent.sinks.hdfs-cafe-access.hdfs.rollInterval = 21600
> agent.sinks.hdfs-cafe-access.hdfs.rollSize = 10485760
> agent.sinks.hdfs-cafe-access.hdfs.rollCount = 0
> agent.sinks.hdfs-cafe-access.hdfs.txnEventMax = 1000
> agent.sinks.hdfs-cafe-access.hdfs.batchSize = 1000
> #agent.sinks.hdfs-cafe-access.hdfs.codeC = snappy
> agent.sinks.hdfs-cafe-access.hdfs.hdfs.maxOpenFiles = 5000
> agent.sinks.hdfs-cafe-access.channel = memo-1
> When new directory is created previous file handle remains opened. 
> rollInterval setting is used only with files in current date bucket. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to