[
https://issues.apache.org/jira/browse/FLUME-3083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15987302#comment-15987302
]
ASF GitHub Bot commented on FLUME-3083:
---------------------------------------
Github user asfgit closed the pull request at:
https://github.com/apache/flume/pull/128
> Taildir source can miss events if file updated in same second as file close
> ---------------------------------------------------------------------------
>
> Key: FLUME-3083
> URL: https://issues.apache.org/jira/browse/FLUME-3083
> Project: Flume
> Issue Type: Bug
> Components: Sinks+Sources
> Affects Versions: 1.7.0
> Reporter: eskrm
> Attachments: FLUME-3083-0.patch, FLUME-3083-1.patch
>
>
> I suspect there is a scenario where the taildir source can miss reading
> events from a log file due to how the source determines whether a file has
> been updated. In ReliableTaildirEventReader:
> {code}
> boolean updated = tf.getLastUpdated() < f.lastModified()
> ...
> tf.setNeedTail(updated);
> {code}
> Consider this sequence of events from TaildirSource.process(). Assume they
> all happen within the same second and there is just a single log file.
> # Call ReliableTaildirEventReader.updateTailFiles()
> #* This call will set ReliableTaildirEventReader.updateTime to current time
> in milliseconds
> #* Assume the underlying file has not been updated within the last
> idleTimeout milliseconds
> # Due to idleness, the tail file's inode is added to TaildirSource.idleInodes
> in idleFileCheckerRunnable
> # tf.needTail is false. Skip reading file.
> # Underlying file is updated with events E1
> # TaildirSource.closeTailFiles()
> #* Call TaildirSource.tailFileProcess() before close to read any pending
> events
> #* Events E1 are read and processed
> #* Since events were read, call ReliableTaildirEventReader.commit() which
> updates the tail file's position and sets its last updated time to
> ReliableTaildirEventReader.updateTime from 1.a
> #* Close file
> # Events E2 are written to underlying file. File's modification time is in
> the same second as the tail file's last updated time.
> # Since the time returned by File.lastModified() is the mtime in seconds
> converted to milliseconds the file's last modified time is less than the tail
> file's last updated time and taildir won't reopen the file to read E2.
> #* This behaviour of File.lastModified() may be platform/jvm specific. I
> confirmed the behavior using OpenJDK 8 on Ubuntu precise.
> Can someone confirm this?
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)