Hi Eric and all, It's nice to see input from people using and considering Flume. Three sub-features coming out of this discussion appear to be:
1. Ability to hot-modify the configuration of a single component while it's running; 2. Ability to add/remove components without affecting other parts of the system; - To me it seems that doing #2 would get us ~80% of the uptime improvement of doing #1 correctly, but would involve ~20% of the complexity. 3. Ability to trigger a reconfiguration manually, instead of using the current file modification polling system - This looks a little less prone to human error vs. how we reconfigure now. I have a couple of ideas for ways to implement this simply. - Side note: right now, Flume 1.x "reconfiguration" means that, whenever the poller thread detects a change to the configuration file, we: a. stop all components b. configure each component with the latest settings in the file c. start all components Best, Mike On Thursday, June 7, 2012 at 3:18 PM, Eric Sammer wrote: > Flumers: > > Flume 0.9.x supported online reconfiguration and the intention was for the > 1.x branch to do so as well (it doesn't yet). I wanted to start a discussion > around whether people are interested in this kind of functionality or if > simply restarting the daemon(s) was sufficient for your deployment. There are > two ways of thinking about it: > > * Support reconfiguration. Agents may have multiple flows passing through > them and, ideally, adding new ones shouldn't interrupt existing flows. Agent > restarts interrupt collection and, for non-durable channels (i.e. > MemoryChannel), data *may* be lost. Reconfiguration will add significant > complexity and ultimately does not get around host level maintenance, > software upgrades, and the like. > > * Do no support reconfiguration. Accept the fact that agents may go down > eventually, so it should be supported as a first class case. In other words, > embrace the idea of failure / maintenance and handle it by recommending > topologies of agents that include multiple agents at each tier and simply > roundrobin / failover where necessary. The only downside is the agent tier > closest to the originating source (e.g. a log4j client); restarting that > agent means the client application needs to be able to find another agent or > buffer (which impacts durability or blocks the application). > > We can optionally support some subset of online reconfiguration such as only > allowing new flows to be introduced or existing flows to be "decommissioned," > but not allow alteration of existing flows. Ultimately this feature is a ton > of work and adds a ton of complexity so if it's not something folks are > clambering for, we should spend our time worrying about more pressing issues. > > Thoughts? Comments?-- > Eric Sammer > twitter: esammer > data: www.cloudera.com (http://www.cloudera.com)
