[
https://issues.apache.org/jira/browse/FLUME-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13476371#comment-13476371
]
Mike Percy commented on FLUME-1350:
-----------------------------------
In production deployments that I have seen, people often use a rollInterval of
5 or 10 minutes to generate the rolled log files. Depending on how many
concurrent log files you are writing, this is usually fine for long-term data
storage if you are worried about Namenode memory usage. By the way,
rollInterval will not generate empty files over periods of inactivity, if you
are worried about that.
If you only want one file open at a time then you can set maxOpenFiles = 1 and
get that behavior. However, in a real-world scenario, where you have
concurrency and multiple tiers, likely resulting in out-of-order delivery, it
is highly unlikely you really want only one file open at a time, since that
will result in thrashing.
Does that help?
> HDFS file handle not closed properly when date bucketing
> ---------------------------------------------------------
>
> Key: FLUME-1350
> URL: https://issues.apache.org/jira/browse/FLUME-1350
> Project: Flume
> Issue Type: Bug
> Components: Sinks+Sources
> Affects Versions: v1.1.0, v1.2.0
> Reporter: Robert Mroczkowski
> Attachments: HDFSEventSink.java.patch
>
>
> With configuration:
> agent.sinks.hdfs-cafe-access.type = hdfs
> agent.sinks.hdfs-cafe-access.hdfs.path =
> hdfs://nga/nga/apache/access/%y-%m-%d/
> agent.sinks.hdfs-cafe-access.hdfs.fileType = DataStream
> agent.sinks.hdfs-cafe-access.hdfs.filePrefix = cafe_access
> agent.sinks.hdfs-cafe-access.hdfs.rollInterval = 21600
> agent.sinks.hdfs-cafe-access.hdfs.rollSize = 10485760
> agent.sinks.hdfs-cafe-access.hdfs.rollCount = 0
> agent.sinks.hdfs-cafe-access.hdfs.txnEventMax = 1000
> agent.sinks.hdfs-cafe-access.hdfs.batchSize = 1000
> #agent.sinks.hdfs-cafe-access.hdfs.codeC = snappy
> agent.sinks.hdfs-cafe-access.hdfs.hdfs.maxOpenFiles = 5000
> agent.sinks.hdfs-cafe-access.channel = memo-1
> When new directory is created previous file handle remains opened.
> rollInterval setting is used only with files in current date bucket.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira