[ 
https://issues.apache.org/jira/browse/FLUME-3083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15962249#comment-15962249
 ] 

ASF GitHub Bot commented on FLUME-3083:
---------------------------------------

GitHub user eskrm opened a pull request:

    https://github.com/apache/flume/pull/128

    FLUME-3083. Change file update condition in Taildir Source from mtime to 
byte position

    This is to resolve an edge case where events can be missed during tail file 
close. More details are provided at 
https://issues.apache.org/jira/browse/FLUME-3083.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/eskrm/flume FLUME-3083

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flume/pull/128.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #128
    
----
commit 82a8b42c15a12bb4d7cc78b834bb2c6e198f5822
Author: eskrm <es...@users.noreply.github.com>
Date:   2017-04-09T20:02:54Z

    FLUME-3083. Change file update condition in Taildir Source from mtime to 
byte position

----


> Taildir source can miss events if last updated time in same second as file 
> mtime
> --------------------------------------------------------------------------------
>
>                 Key: FLUME-3083
>                 URL: https://issues.apache.org/jira/browse/FLUME-3083
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: 1.7.0
>            Reporter: eskrm
>         Attachments: FLUME-3083-0.patch
>
>
> I suspect there is a scenario where the taildir source can miss reading 
> events from a log file due to how the source determines whether a file has 
> been updated. In ReliableTaildirEventReader:
> {code}
> boolean updated = tf.getLastUpdated() < f.lastModified()
> ...
> tf.setNeedTail(updated);
> {code}
> Consider this sequence of events from TaildirSource.process(). Assume they 
> all happen within the same second and there is just a single log file.
> # Call ReliableTaildirEventReader.updateTailFiles()
> #* This call will set ReliableTaildirEventReader.updateTime to current time 
> in milliseconds
> #* Assume the underlying file has not been updated within the last 
> idleTimeout milliseconds
> # Due to idleness, the tail file's inode is added to TaildirSource.idleInodes 
> in idleFileCheckerRunnable
> # tf.needTail is false. Skip reading file.
> # Underlying file is updated with events E1
> # TaildirSource.closeTailFiles()
> #* Call TaildirSource.tailFileProcess() before close to read any pending 
> events
> #* Events E1 are read and processed
> #* Since events were read, call ReliableTaildirEventReader.commit() which 
> updates the tail file's position and sets its last updated time to 
> ReliableTaildirEventReader.updateTime from 1.a
> #* Close file
> # Events E2 are written to underlying file. File's modification time is in 
> the same second as the tail file's last updated time.
> # Since the time returned by File.lastModified() is the mtime in seconds 
> converted to milliseconds the file's last modified time is less than the tail 
> file's last updated time and taildir won't reopen the file to read E2.
> #* This behaviour of File.lastModified() may be platform/jvm specific. I 
> confirmed the behavior using OpenJDK 8 on Ubuntu precise.  
> Can someone confirm this?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to