[
https://issues.apache.org/jira/browse/FLUME-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13475413#comment-13475413
]
Mike Percy commented on FLUME-1350:
-----------------------------------
That path means that any Event that goes to the HDFS sink must have a header
called "timestamp" which is a stringified Long value, typical Java timestamp in
milliseconds. The year-month-day will be generated from that timestamp, and the
event will be stored in a file under that directory.
If there is already an open file in that directory, the event will be appended
to that file. If there is no open file in that directory, a new file will be
created.
The only rules for closing a file are listed above, because when events are
collected from many hosts, there may be old events coming through at the same
time as new events, and we would not want to create too many small files. So,
the time to allow a file to remain open is configurable before automatically
closing it using rollInterval.
> HDFS file handle not closed properly when date bucketing
> ---------------------------------------------------------
>
> Key: FLUME-1350
> URL: https://issues.apache.org/jira/browse/FLUME-1350
> Project: Flume
> Issue Type: Bug
> Components: Sinks+Sources
> Affects Versions: v1.1.0, v1.2.0
> Reporter: Robert Mroczkowski
> Attachments: HDFSEventSink.java.patch
>
>
> With configuration:
> agent.sinks.hdfs-cafe-access.type = hdfs
> agent.sinks.hdfs-cafe-access.hdfs.path =
> hdfs://nga/nga/apache/access/%y-%m-%d/
> agent.sinks.hdfs-cafe-access.hdfs.fileType = DataStream
> agent.sinks.hdfs-cafe-access.hdfs.filePrefix = cafe_access
> agent.sinks.hdfs-cafe-access.hdfs.rollInterval = 21600
> agent.sinks.hdfs-cafe-access.hdfs.rollSize = 10485760
> agent.sinks.hdfs-cafe-access.hdfs.rollCount = 0
> agent.sinks.hdfs-cafe-access.hdfs.txnEventMax = 1000
> agent.sinks.hdfs-cafe-access.hdfs.batchSize = 1000
> #agent.sinks.hdfs-cafe-access.hdfs.codeC = snappy
> agent.sinks.hdfs-cafe-access.hdfs.hdfs.maxOpenFiles = 5000
> agent.sinks.hdfs-cafe-access.channel = memo-1
> When new directory is created previous file handle remains opened.
> rollInterval setting is used only with files in current date bucket.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira