[
https://issues.apache.org/jira/browse/FLUME-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13476101#comment-13476101
]
Yongcheng Li commented on FLUME-1350:
-------------------------------------
Is it required to use rollinterval when using data bucketing? In real world,
you don't want to use rollinterval to generate many small files. Instead, it's
common to use rollSize combined with data bucketing.
When using rollSize and data backeting, since no more data will be written into
the old file, the file may never be closed until the system is down or there
are too many open files, which is not the desired/expected behavior and that's
why so many people complained about this bug.
> HDFS file handle not closed properly when date bucketing
> ---------------------------------------------------------
>
> Key: FLUME-1350
> URL: https://issues.apache.org/jira/browse/FLUME-1350
> Project: Flume
> Issue Type: Bug
> Components: Sinks+Sources
> Affects Versions: v1.1.0, v1.2.0
> Reporter: Robert Mroczkowski
> Attachments: HDFSEventSink.java.patch
>
>
> With configuration:
> agent.sinks.hdfs-cafe-access.type = hdfs
> agent.sinks.hdfs-cafe-access.hdfs.path =
> hdfs://nga/nga/apache/access/%y-%m-%d/
> agent.sinks.hdfs-cafe-access.hdfs.fileType = DataStream
> agent.sinks.hdfs-cafe-access.hdfs.filePrefix = cafe_access
> agent.sinks.hdfs-cafe-access.hdfs.rollInterval = 21600
> agent.sinks.hdfs-cafe-access.hdfs.rollSize = 10485760
> agent.sinks.hdfs-cafe-access.hdfs.rollCount = 0
> agent.sinks.hdfs-cafe-access.hdfs.txnEventMax = 1000
> agent.sinks.hdfs-cafe-access.hdfs.batchSize = 1000
> #agent.sinks.hdfs-cafe-access.hdfs.codeC = snappy
> agent.sinks.hdfs-cafe-access.hdfs.hdfs.maxOpenFiles = 5000
> agent.sinks.hdfs-cafe-access.channel = memo-1
> When new directory is created previous file handle remains opened.
> rollInterval setting is used only with files in current date bucket.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira