Hi Eric and all,
It's nice to see input from people using and considering Flume. Three 
sub-features coming out of this discussion appear to be:

1. Ability to hot-modify the configuration of a single component while it's 
running;
2. Ability to add/remove components without affecting other parts of the system;
 - To me it seems that doing #2 would get us ~80% of the uptime improvement of 
doing #1 correctly, but would involve ~20% of the complexity.

3. Ability to trigger a reconfiguration manually, instead of using the current 
file modification polling system
 - This looks a little less prone to human error vs. how we reconfigure now. I 
have a couple of ideas for ways to implement this simply.

 - Side note: right now, Flume 1.x "reconfiguration" means that, whenever the 
poller thread detects a change to the configuration file, we:
   a. stop all components
   b. configure each component with the latest settings in the file
   c. start all components

Best,
Mike




On Thursday, June 7, 2012 at 3:18 PM, Eric Sammer wrote:

> Flumers:
> 
> Flume 0.9.x supported online reconfiguration and the intention was for the 
> 1.x branch to do so as well (it doesn't yet). I wanted to start a discussion 
> around whether people are interested in this kind of functionality or if 
> simply restarting the daemon(s) was sufficient for your deployment. There are 
> two ways of thinking about it: 
> 
> * Support reconfiguration. Agents may have multiple flows passing through 
> them and, ideally, adding new ones shouldn't interrupt existing flows. Agent 
> restarts interrupt collection and, for non-durable channels (i.e. 
> MemoryChannel), data *may* be lost. Reconfiguration will add significant 
> complexity and ultimately does not get around host level maintenance, 
> software upgrades, and the like. 
> 
> * Do no support reconfiguration. Accept the fact that agents may go down 
> eventually, so it should be supported as a first class case. In other words, 
> embrace the idea of failure / maintenance and handle it by recommending 
> topologies of agents that include multiple agents at each tier and simply 
> roundrobin / failover where necessary. The only downside is the agent tier 
> closest to the originating source (e.g. a log4j client); restarting that 
> agent means the client application needs to be able to find another agent or 
> buffer (which impacts durability or blocks the application).
> 
> We can optionally support some subset of online reconfiguration such as only 
> allowing new flows to be introduced or existing flows to be "decommissioned," 
> but not allow alteration of existing flows. Ultimately this feature is a ton 
> of work and adds a ton of complexity so if it's not something folks are 
> clambering for, we should spend our time worrying about more pressing issues. 
> 
> Thoughts? Comments?-- 
> Eric Sammer
> twitter: esammer
> data: www.cloudera.com (http://www.cloudera.com)



Reply via email to