[ 
https://issues.apache.org/jira/browse/FLUME-3083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15987301#comment-15987301
 ] 

Bessenyei Balázs Donát commented on FLUME-3083:
-----------------------------------------------

[~eskrm]: thank you for the patch!

> Taildir source can miss events if file updated in same second as file close
> ---------------------------------------------------------------------------
>
>                 Key: FLUME-3083
>                 URL: https://issues.apache.org/jira/browse/FLUME-3083
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: 1.7.0
>            Reporter: eskrm
>         Attachments: FLUME-3083-0.patch, FLUME-3083-1.patch
>
>
> I suspect there is a scenario where the taildir source can miss reading 
> events from a log file due to how the source determines whether a file has 
> been updated. In ReliableTaildirEventReader:
> {code}
> boolean updated = tf.getLastUpdated() < f.lastModified()
> ...
> tf.setNeedTail(updated);
> {code}
> Consider this sequence of events from TaildirSource.process(). Assume they 
> all happen within the same second and there is just a single log file.
> # Call ReliableTaildirEventReader.updateTailFiles()
> #* This call will set ReliableTaildirEventReader.updateTime to current time 
> in milliseconds
> #* Assume the underlying file has not been updated within the last 
> idleTimeout milliseconds
> # Due to idleness, the tail file's inode is added to TaildirSource.idleInodes 
> in idleFileCheckerRunnable
> # tf.needTail is false. Skip reading file.
> # Underlying file is updated with events E1
> # TaildirSource.closeTailFiles()
> #* Call TaildirSource.tailFileProcess() before close to read any pending 
> events
> #* Events E1 are read and processed
> #* Since events were read, call ReliableTaildirEventReader.commit() which 
> updates the tail file's position and sets its last updated time to 
> ReliableTaildirEventReader.updateTime from 1.a
> #* Close file
> # Events E2 are written to underlying file. File's modification time is in 
> the same second as the tail file's last updated time.
> # Since the time returned by File.lastModified() is the mtime in seconds 
> converted to milliseconds the file's last modified time is less than the tail 
> file's last updated time and taildir won't reopen the file to read E2.
> #* This behaviour of File.lastModified() may be platform/jvm specific. I 
> confirmed the behavior using OpenJDK 8 on Ubuntu precise.  
> Can someone confirm this?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to