[ https://issues.apache.org/jira/browse/FLUME-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13791115#comment-13791115 ]
Phil Scala commented on FLUME-2066: ----------------------------------- @Hari - It could be an option to shutdown, however I'd rather not get a call at 3AM. I like Roshan's thought of a dead-letter queue. This is very much like a queueing system where you do not poison the queue, but move the offending message to a safe place and move on. Renaming the file to ERROR_lastLine#processed.COMPLETED is an easy solution. I linked this to Flume-2119 which touches on this same area of code, I have a patch for that, but does not implement any of this renaming discussion. > Spool directory source can get stuck in a "Serializer has been closed" loop > when retireCurrentFile throws an exception > ---------------------------------------------------------------------------------------------------------------------- > > Key: FLUME-2066 > URL: https://issues.apache.org/jira/browse/FLUME-2066 > Project: Flume > Issue Type: Bug > Components: Sinks+Sources > Affects Versions: v1.4.0, v1.3.1 > Reporter: Phil Scala > Assignee: Phil Scala > > The following 2 java files have similar code and are affected by this > issue... > 1.31. SpoolingfileLineReader.java > 1.4 ReliableSpoolingFileEventReader.java > retireCurrentFile is called by 1 caller (readLines in 1.3.1 and readEvents in > 1.4) > {code:java} > retireCurrentFile(); > currentFile = getNextFile(); > if (!currentFile.isPresent()) { > return Collections.emptyList(); > } > {code} > if retireCurrentFile throws an exception after closing the reader (there are > a few causes for an exception tobe raised which are described below) the the > currentFile still points to the attempted to be retired file. This causes > subsequent calls to readLines/readEvents to raise a "Serializer has been > closed" exception. At this point the application needs to be shutdown in > order to rectify the problem. If Flume is left running for a while, the logs > are littered with the error, so you have to go to the initial error logged to > understand what happened. > *Exceptions raised in "retireCurrentFile()"* > IlligalStateException when the file modified date changes > IlligalStateException when the size changes > IllegalStateException when renaming the current file and the target file > already exists (with different sizes) > IllegalStateException when renaming the current file and the target file > already exists [non windows] > FlumeException when renameTo does not return true. > The documentation does say: > *Warning This channel expects that only immutable, uniquely named files are > dropped in the spooling directory. If duplicate names are used, or files are > modified while being read, the source will fail with an error message * > I am not sure however if the intention was to get caught into the "Serializer > has been closed" loop. 3 possible solutions: > 1. Re-spool the retired file, this will cause duplicates and could get caught > in a loop of constantly spooling this file. > 2. Log an error and continue spooling the next files. > 3. Shutdown > I like option..2 -- This message was sent by Atlassian JIRA (v6.1#6144)