Thanks for the nice summary Mike. How about if we add an option to Flume, to dictate the behavior.
auto_reconfigure = on or off It has to be a command line option. On Tue, Jun 12, 2012 at 10:55 PM, Mike Percy <[email protected]> wrote: > Hi Eric and all, > It's nice to see input from people using and considering Flume. Three > sub-features coming out of this discussion appear to be: > > 1. Ability to hot-modify the configuration of a single component while > it's running; > 2. Ability to add/remove components without affecting other parts of the > system; > - To me it seems that doing #2 would get us ~80% of the uptime > improvement of doing #1 correctly, but would involve ~20% of the complexity. > > 3. Ability to trigger a reconfiguration manually, instead of using the > current file modification polling system > - This looks a little less prone to human error vs. how we reconfigure > now. I have a couple of ideas for ways to implement this simply. > > - Side note: right now, Flume 1.x "reconfiguration" means that, whenever > the poller thread detects a change to the configuration file, we: > a. stop all components > b. configure each component with the latest settings in the file > c. start all components > > Best, > Mike > > > > > On Thursday, June 7, 2012 at 3:18 PM, Eric Sammer wrote: > > > Flumers: > > > > Flume 0.9.x supported online reconfiguration and the intention was for > the 1.x branch to do so as well (it doesn't yet). I wanted to start a > discussion around whether people are interested in this kind of > functionality or if simply restarting the daemon(s) was sufficient for your > deployment. There are two ways of thinking about it: > > > > * Support reconfiguration. Agents may have multiple flows passing > through them and, ideally, adding new ones shouldn't interrupt existing > flows. Agent restarts interrupt collection and, for non-durable channels > (i.e. MemoryChannel), data *may* be lost. Reconfiguration will add > significant complexity and ultimately does not get around host level > maintenance, software upgrades, and the like. > > > > * Do no support reconfiguration. Accept the fact that agents may go down > eventually, so it should be supported as a first class case. In other > words, embrace the idea of failure / maintenance and handle it by > recommending topologies of agents that include multiple agents at each tier > and simply roundrobin / failover where necessary. The only downside is the > agent tier closest to the originating source (e.g. a log4j client); > restarting that agent means the client application needs to be able to find > another agent or buffer (which impacts durability or blocks the > application). > > > > We can optionally support some subset of online reconfiguration such as > only allowing new flows to be introduced or existing flows to be > "decommissioned," but not allow alteration of existing flows. Ultimately > this feature is a ton of work and adds a ton of complexity so if it's not > something folks are clambering for, we should spend our time worrying about > more pressing issues. > > > > Thoughts? Comments?-- > > Eric Sammer > > twitter: esammer > > data: www.cloudera.com (http://www.cloudera.com) > > > > -- ..Senthil "If there's anything more important than my ego around, I want it caught and shot now." - Douglas Adams.
