[ https://issues.apache.org/jira/browse/FLUME-1665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13488532#comment-13488532 ]
Mike Percy commented on FLUME-1665: ----------------------------------- Yes, Flume may create duplicates. But the goal is not to create any under normal conditions... Definitely less duplicates is better. But correctness and reliability are more important. Example of slowness: Maybe you have a 50-megabyte data transfer transaction over a slow network link, or you are operating a file channel on an overwhelmed disk with a large batch of large events, or you hit a Hadoop GC when writing to HDFS... in such cases, a multi-second delay is not difficult to achieve. > Data from FileChannel will be duplicated when restarting configuration > ---------------------------------------------------------------------- > > Key: FLUME-1665 > URL: https://issues.apache.org/jira/browse/FLUME-1665 > Project: Flume > Issue Type: Bug > Components: Channel > Affects Versions: v1.2.0, v1.3.0 > Reporter: Denny Ye > Labels: FileChannel > > When Flume process was running, I changed configuration property and Flume > rebooted without process restarting. Events will be duplicated in next loop, > also has been consumed before all components have stopped. > I found the root cause. When FileChannel was stopping, it should save the > 'inflightPuts' and 'inflightTakes' into disk for preparing in next loop. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira