[ https://issues.apache.org/jira/browse/FLUME-3083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Bessenyei Balázs Donát resolved FLUME-3083. ------------------------------------------- Resolution: Fixed Assignee: eskrm Fix Version/s: 1.8.0 > Taildir source can miss events if file updated in same second as file close > --------------------------------------------------------------------------- > > Key: FLUME-3083 > URL: https://issues.apache.org/jira/browse/FLUME-3083 > Project: Flume > Issue Type: Bug > Components: Sinks+Sources > Affects Versions: 1.7.0 > Reporter: eskrm > Assignee: eskrm > Fix For: 1.8.0 > > Attachments: FLUME-3083-0.patch, FLUME-3083-1.patch > > > I suspect there is a scenario where the taildir source can miss reading > events from a log file due to how the source determines whether a file has > been updated. In ReliableTaildirEventReader: > {code} > boolean updated = tf.getLastUpdated() < f.lastModified() > ... > tf.setNeedTail(updated); > {code} > Consider this sequence of events from TaildirSource.process(). Assume they > all happen within the same second and there is just a single log file. > # Call ReliableTaildirEventReader.updateTailFiles() > #* This call will set ReliableTaildirEventReader.updateTime to current time > in milliseconds > #* Assume the underlying file has not been updated within the last > idleTimeout milliseconds > # Due to idleness, the tail file's inode is added to TaildirSource.idleInodes > in idleFileCheckerRunnable > # tf.needTail is false. Skip reading file. > # Underlying file is updated with events E1 > # TaildirSource.closeTailFiles() > #* Call TaildirSource.tailFileProcess() before close to read any pending > events > #* Events E1 are read and processed > #* Since events were read, call ReliableTaildirEventReader.commit() which > updates the tail file's position and sets its last updated time to > ReliableTaildirEventReader.updateTime from 1.a > #* Close file > # Events E2 are written to underlying file. File's modification time is in > the same second as the tail file's last updated time. > # Since the time returned by File.lastModified() is the mtime in seconds > converted to milliseconds the file's last modified time is less than the tail > file's last updated time and taildir won't reopen the file to read E2. > #* This behaviour of File.lastModified() may be platform/jvm specific. I > confirmed the behavior using OpenJDK 8 on Ubuntu precise. > Can someone confirm this? -- This message was sent by Atlassian JIRA (v6.3.15#6346)