[ https://issues.apache.org/jira/browse/FLUME-3083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15987636#comment-15987636 ]
Hudson commented on FLUME-3083: ------------------------------- FAILURE: Integrated in Jenkins build Flume-trunk-hbase-1 #246 (See [https://builds.apache.org/job/Flume-trunk-hbase-1/246/]) FLUME-3083. Check byte position of file in update condition of Taildir (bessbd: [http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=dfa0627573b9a75a25dc7149a7d63c9bac953ff4]) * (edit) flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/ReliableTaildirEventReader.java * (edit) flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirEventReader.java > Taildir source can miss events if file updated in same second as file close > --------------------------------------------------------------------------- > > Key: FLUME-3083 > URL: https://issues.apache.org/jira/browse/FLUME-3083 > Project: Flume > Issue Type: Bug > Components: Sinks+Sources > Affects Versions: 1.7.0 > Reporter: eskrm > Assignee: eskrm > Fix For: 1.8.0 > > Attachments: FLUME-3083-0.patch, FLUME-3083-1.patch > > > I suspect there is a scenario where the taildir source can miss reading > events from a log file due to how the source determines whether a file has > been updated. In ReliableTaildirEventReader: > {code} > boolean updated = tf.getLastUpdated() < f.lastModified() > ... > tf.setNeedTail(updated); > {code} > Consider this sequence of events from TaildirSource.process(). Assume they > all happen within the same second and there is just a single log file. > # Call ReliableTaildirEventReader.updateTailFiles() > #* This call will set ReliableTaildirEventReader.updateTime to current time > in milliseconds > #* Assume the underlying file has not been updated within the last > idleTimeout milliseconds > # Due to idleness, the tail file's inode is added to TaildirSource.idleInodes > in idleFileCheckerRunnable > # tf.needTail is false. Skip reading file. > # Underlying file is updated with events E1 > # TaildirSource.closeTailFiles() > #* Call TaildirSource.tailFileProcess() before close to read any pending > events > #* Events E1 are read and processed > #* Since events were read, call ReliableTaildirEventReader.commit() which > updates the tail file's position and sets its last updated time to > ReliableTaildirEventReader.updateTime from 1.a > #* Close file > # Events E2 are written to underlying file. File's modification time is in > the same second as the tail file's last updated time. > # Since the time returned by File.lastModified() is the mtime in seconds > converted to milliseconds the file's last modified time is less than the tail > file's last updated time and taildir won't reopen the file to read E2. > #* This behaviour of File.lastModified() may be platform/jvm specific. I > confirmed the behavior using OpenJDK 8 on Ubuntu precise. > Can someone confirm this? -- This message was sent by Atlassian JIRA (v6.3.15#6346)