[ 
https://issues.apache.org/jira/browse/FLUME-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Johny Rufus updated FLUME-2777:
-------------------------------
    Attachment: FLUME-2777-1.patch

[~iijima_satoshi], I have attached a patch that addresses your concern 
regarding Inodes being reused in which case, we need to read from beginning.
I have used a similar logic as before, but in addition, I have handled 2 cases, 
when the file names do not match
1) if its a renamed file, in which case we read from last position from Tail 
File
2) its the case of deleted/truncated file and inode being reused in which case, 
we read the file from the beginning
(the difference between the two cases above is identified by the Creation Time)

> Tail Dir Source leads to duplicate events on rolling the tailed file
> --------------------------------------------------------------------
>
>                 Key: FLUME-2777
>                 URL: https://issues.apache.org/jira/browse/FLUME-2777
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: 1.7
>            Reporter: Johny Rufus
>            Assignee: Johny Rufus
>         Attachments: FLUME-2777-1.patch, FLUME-2777.patch
>
>
> I have a simple setup, where I write 200 events to logfile1. [TailSrc is on 
> the lookout for logfile* ]
> Then I rename logfile1 to logfile2.
> I create a new logfile1 and write 100 events to it.
> Typically I should see 300 events in my channel. But I see 500 events.
> I was able to trace the duplicates to ReliableTaildirEventReader.java 
> updateFiles(boolean) to the way renamed files are handled , by specifying 
> starting position as 0. [This starting position should be obtained from 
> tf.getPosition()]
> I am attaching a proposed fix, would be great if one of you guys 
> [~iijima_satoshi] / [~hshreedharan]/ [~roshan_naik] can take a look at the 
> fix and validate the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to