I can confirm that we are seeing this issue as well. We are only using rollSize and when time stamp indicated its time to create a new date bucket. The path and new file are created however the existing file is never closed and renamed.
Applying this patch resolved the issue we were seeing and existing files are closed now when the new one is opened. Sent from my iPhone On Oct 12, 2012, at 4:41 PM, "Mike Percy (JIRA)" <[email protected]> wrote: > > [ > https://issues.apache.org/jira/browse/FLUME-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13475413#comment-13475413 > ] > > Mike Percy commented on FLUME-1350: > ----------------------------------- > > That path means that any Event that goes to the HDFS sink must have a header > called "timestamp" which is a stringified Long value, typical Java timestamp > in milliseconds. The year-month-day will be generated from that timestamp, > and the event will be stored in a file under that directory. > > If there is already an open file in that directory, the event will be > appended to that file. If there is no open file in that directory, a new file > will be created. > > The only rules for closing a file are listed above, because when events are > collected from many hosts, there may be old events coming through at the same > time as new events, and we would not want to create too many small files. So, > the time to allow a file to remain open is configurable before automatically > closing it using rollInterval. > >> HDFS file handle not closed properly when date bucketing >> --------------------------------------------------------- >> >> Key: FLUME-1350 >> URL: https://issues.apache.org/jira/browse/FLUME-1350 >> Project: Flume >> Issue Type: Bug >> Components: Sinks+Sources >> Affects Versions: v1.1.0, v1.2.0 >> Reporter: Robert Mroczkowski >> Attachments: HDFSEventSink.java.patch >> >> >> With configuration: >> agent.sinks.hdfs-cafe-access.type = hdfs >> agent.sinks.hdfs-cafe-access.hdfs.path = >> hdfs://nga/nga/apache/access/%y-%m-%d/ >> agent.sinks.hdfs-cafe-access.hdfs.fileType = DataStream >> agent.sinks.hdfs-cafe-access.hdfs.filePrefix = cafe_access >> agent.sinks.hdfs-cafe-access.hdfs.rollInterval = 21600 >> agent.sinks.hdfs-cafe-access.hdfs.rollSize = 10485760 >> agent.sinks.hdfs-cafe-access.hdfs.rollCount = 0 >> agent.sinks.hdfs-cafe-access.hdfs.txnEventMax = 1000 >> agent.sinks.hdfs-cafe-access.hdfs.batchSize = 1000 >> #agent.sinks.hdfs-cafe-access.hdfs.codeC = snappy >> agent.sinks.hdfs-cafe-access.hdfs.hdfs.maxOpenFiles = 5000 >> agent.sinks.hdfs-cafe-access.channel = memo-1 >> When new directory is created previous file handle remains opened. >> rollInterval setting is used only with files in current date bucket. > > -- > This message is automatically generated by JIRA. > If you think it was sent incorrectly, please contact your JIRA administrators > For more information on JIRA, see: http://www.atlassian.com/software/jira
