[ https://issues.apache.org/jira/browse/FLUME-3083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15962177#comment-15962177 ]
eskrm commented on FLUME-3083: ------------------------------ [~iijima_satoshi], Why wouldn't it be possible regardless of the value of idleTimeout? A tail file is closed from idleness based on its last updated at value. It's only reopened for reading if the file's mtime is strictly greater than the tail file's last updated at value. So even if the tail file was last updated in the current second after being idle for 2 minutes and then closed, if the file's mtime is within the same second of that value it won't reopen the file to read events written just after the last commit due to File.lastModified being in second granularity. That's the working theory anyway. A production system I work on is using the taildir source with the default idleTimeout value of 2 minutes. I observed missing events this past week on a few hosts each for a different 15m interval which is 1-1 with a log file. The thing they had in common was that they were closed and then written to in the same second and never reopened to read the remaining event(s). An example of one: The taildir position file: {noformat} { "file": "/var/lib/flume-ng/taildir/logs/data.0.BBL1.2017-04-05T033000Z.15m.csv", "inode": 11141305, "pos": 88317952 }, {noformat} The flume log line when this file was closed: {noformat} /var/log/flume-ng/flume.log.5:05 Apr 2017 03:46:15,322 INFO [PollableSourceRunner-TaildirSource-taildir] (org.apache.flume.source.taildir.TaildirSource.closeTailFiles:288) - Closed file: /var/lib/flume-ng/taildir/logs/data.0.BBL1.2017-04-05T033000Z.15m.csv, inode: 11141305, pos: 88317952 {noformat} File's size and mtime: {noformat} -rw-r--r-- 1 root root 88318908 2017-04-05 03:46:15.704907240 +0000 /var/lib/flume-ng/taildir/logs/data.0.BBL1.2017-04-05T033000Z.15m.csv {noformat} The mtime is milliseconds after the close time, but within the same second, and taildir hasn't read the last 956 bytes. > Taildir source can miss events if last updated time in same second as file > mtime > -------------------------------------------------------------------------------- > > Key: FLUME-3083 > URL: https://issues.apache.org/jira/browse/FLUME-3083 > Project: Flume > Issue Type: Bug > Components: Sinks+Sources > Affects Versions: 1.7.0 > Reporter: eskrm > Attachments: FLUME-3083-0.patch > > > I suspect there is a scenario where the taildir source can miss reading > events from a log file due to how the source determines whether a file has > been updated. In ReliableTaildirEventReader: > {code} > boolean updated = tf.getLastUpdated() < f.lastModified() > ... > tf.setNeedTail(updated); > {code} > Consider this sequence of events from TaildirSource.process(). Assume they > all happen within the same second and there is just a single log file. > # Call ReliableTaildirEventReader.updateTailFiles() > #* This call will set ReliableTaildirEventReader.updateTime to current time > in milliseconds > #* Assume the underlying file has not been updated within the last > idleTimeout milliseconds > # Due to idleness, the tail file's inode is added to TaildirSource.idleInodes > in idleFileCheckerRunnable > # tf.needTail is false. Skip reading file. > # Underlying file is updated with events E1 > # TaildirSource.closeTailFiles() > #* Call TaildirSource.tailFileProcess() before close to read any pending > events > #* Events E1 are read and processed > #* Since events were read, call ReliableTaildirEventReader.commit() which > updates the tail file's position and sets its last updated time to > ReliableTaildirEventReader.updateTime from 1.a > #* Close file > # Events E2 are written to underlying file. File's modification time is in > the same second as the tail file's last updated time. > # Since the time returned by File.lastModified() is the mtime in seconds > converted to milliseconds the file's last modified time is less than the tail > file's last updated time and taildir won't reopen the file to read E2. > #* This behaviour of File.lastModified() may be platform/jvm specific. I > confirmed the behavior using OpenJDK 8 on Ubuntu precise. > Can someone confirm this? -- This message was sent by Atlassian JIRA (v6.3.15#6346)