pphust opened a new pull request #336:
URL: https://github.com/apache/flume/pull/336


   The issue:
   When file is renamed or rotated,  just as it is in log4j or other similar 
log system, currently Flume taildirSource will treat it as a new file then all 
contents will be collected again. It will cause data duplicated, which has been 
described in [FLUME-3233](https://issues.apache.org/jira/browse/FLUME-3233)、 
[FLUME-3219](https://issues.apache.org/jira/browse/FLUME-3219)、[FLUME-3094](https://issues.apache.org/jira/browse/FLUME-3094)、[FLUME-3216](https://issues.apache.org/jira/browse/FLUME-3216)
 and [FLUME-2777](https://issues.apache.org/jira/browse/FLUME-2777).
   The general solution is only monitor original  *.log and NOT monitor the 
renamed *.log.xxx. But for below two reasons, we must monitor both *.log and  
renamed *.log.xxx:
   1、 Sometimes log system uses async writting. Contents may be flushed to disk 
after file is renamed. If we do not monitor renamed *.log.xxx, the content will 
only be sent out when Flume close inactive file. Though Flume will send it out 
finally, but it will cause sending delay and curreny the interval is decided by 
_idleTimeout_, default 120 seconds. In many cases it is unacceptable.
   2、Sometimes both service and Flume are shutdown. Service is restarted 
firstly then it writes something to *.log and rename it as *.log.xxx. If we do 
not monitor renamed *.log.xxx, the data will get lost certernly.
   
   The solution:
   The PR add a new _inodeOnly_ paramater and make taildirSource support file 
rename/rotation. By default,  _inodeOnly_  is false and Flume just works same 
with now. When _inodeOnly_ in config is set as true, Flume only use inode to 
identify file then taildirSource will support file rename/rotation. And the 
above 2 problems will be solved perfectly.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to