I've been burrowing into the lifecycle code recently and posted
https://issues.apache.org/jira/browse/FLUME-1257 regarding it(more
specifically, regarding inability to shutdown after a component gets
stuck in a starting loop).
Right now, to put it bluntly, the shutdown model feels incorrect, there
are a lot of places with a loop waiting for changes in state, that have
an exception handler for InterruptedException, but nothing will ever
trigger that interrupt. SIGINT is handled by the shutdown hook, which
requests a clean shutdown. If a clean shutdown doesn't succeed,
everything just hangs in limbo until you forcibly kill it.
I'd like to start some kind of discussion on how to go about handling
this. One argument may be that it's just not a problem... so long as
everything starts and stops in a timely manner. Other simple kludges
would be setting a limit to number of times through each loop. Or we
could just start sending interrupts after a predetermined delay.
Another possibility is adding more lifecycle states to better
communicate state(starting/stopping?) so that actions can be aborted as
necessary. This would likely require all existing components to
communicate these new states.
Feedback/opinions would be appreciated