Juhani Connolly created FLUME-1257:
--------------------------------------

             Summary: Components that fail to start can put flume into a state 
which it can't shutdown from
                 Key: FLUME-1257
                 URL: https://issues.apache.org/jira/browse/FLUME-1257
             Project: Flume
          Issue Type: Bug
            Reporter: Juhani Connolly


Clean shutdown of a flume agent where a component fails to start doesn't work.

One way of confirming this is to try and use a FileChannel without hadoop IO 
jars on the classpath.

My understanding of this is that the first Ctrl+C will try to stop the 
supervisor, which in turn should take down everything, but 
AbstractFileConfigurationProvider#stop will try to gently stop the local 
executor, which in turn is in an endless loop trying to start up a 
channel(DefaultLogicalNodeManager#startAllComponents). This loop can only be 
broken by an interrupt, but none ever comes to it, or the higher level(which 
would try shutdownNow on the executors.

The interrupts will never come, since they are all relying on something further 
up the chain for them, but it doesn't exist.

One solution for this is just to be less merciful in 
AbstractFileConfiguration#stop() and give it a moderate timeout, then do 
executorService.shutdownNow()

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to