Re: [DISCUSS] Should we support hot reconfiguration?

Eric Sammer Tue, 12 Jun 2012 11:18:44 -0700

Eran:

On Tue, Jun 12, 2012 at 3:00 AM, Eran Kutner <[email protected]> wrote:


> As a long time flume user in a large production environment I'd like to
> say the main reason I preferred flume over scribe and the reason I'm
> avoiding switching to flume NG  is the dynamic configuration capability.
> From my experience it is much much easier to manage the node configurations
> in a single text file, then deploy the whole thing at once to all the
> servers than dealing with individual configurations on each server. In
> particular is makes it much easier in cases where the configuration is NOT
> identical between the servers.
> For example, if you have a tier of servers producing events and a tier of
> collectors, you may probably want to load balance the collectors so some
> servers write to certain collectors while others write to other collectors,
> which means their failover options are also different, so you have multiple
> configurations to deal with, now imagine you're adding more collector nodes
> and need to rebalance the whole thing, so some servers will now have to
> write to other collectors. Managing individual configuration files becomes
> a nightmare very quickly, while having one configuration source that can
> easily deploy the configuration to all the servers makes it very easy.
> Using Puppet or Chef will not help in this case either.
>
Now that I think about it, there might be a good enough middle-ground here.
> If there was one "master" configuration file that could contain
> configurations for multiple servers, and each server knew its own identity
> and could load its own configuration out of this file, then it would be
> enough to deploy this one file to all the servers at once and gain the same
> behavior is flume OG without having a real "master" server. I'd still like
> to see this configuration being auto-loaded when deployed without having to
> restart the flume service, this alternative should be much simpler to
> develop and give 90% of the important functionality (the other 10% is
> having one location to see the status of all the nodes).
>

This last paragraph is exactly what 1.x does: it supports defining multiple
agents within a config file. When the agent starts, it knows its name and
only loads the slice of the file that matters to that agent. The question
is if we should poll this file for changes and load them dynamically. It
sounds like that's valuable to you (but correct me if I'm wrong). Either
way, the ability to define all config in a single file is still there. The
thing 1.x doesn't do it get it out to all the machines for you (in other
words, you should switch to 1.x! :)).


> Hope this makes sense.
>

Thanks for your input. Definitely makes sense. I'd love to hear your
thoughts on my comments above.


>
>
> -eran
>
>
>
> On Tue, Jun 12, 2012 at 10:16 AM, Jorn Argelo - Ephorus <
> [email protected]> wrote:
>
>>  Hi all,****
>>
>> ** **
>>
>> Not sure if a regular user should participate in these sort of
>> discussions but here’s my opinion nevertheless ;-)****
>>
>> ** **
>>
>> I think one of the biggest flaws of OG Flume is that it’s too complex and
>> maybe even over-engineered. For example the centralized configuration in
>> the Flume Master sounds good on paper but in practice it doesn’t make
>> things easier at all (fortunately this was fixed with NG Flume). So IMO
>> stick to the KISS principle and keep things down-to-earth. I have no
>> problem restarting agents, and a HUP construction to reload the
>> configuration as Senthivil suggested is even better.****
>>
>> ** **
>>
>> I guess most users are using a config management system like Puppet or
>> Chef to deploy and configure their agents. If you keep that in mind in
>> terms of configuration it makes things a whole lot easier for those users.
>> ****
>>
>> ** **
>>
>> Regards,****
>>
>> Jorn****
>>
>>
>>
>> ****
>>
>> *Van:* Senthilvel Rangaswamy [mailto:[email protected]]
>> *Verzonden:* zaterdag 9 juni 2012 1:32
>> *Aan:* [email protected]
>> *CC:* Flume Development
>> *Onderwerp:* Re: [DISCUSS] Should we support hot reconfiguration?****
>>
>> ** **
>>
>> IMHO, online reconfiguration is dangerous. A typical use case for flume
>> is to be deployed at the very
>> beginning of the data source, like web servers. These are typically in
>> large numbers. Say you push out
>> a bad config and that gets picked up, it will wreak havoc on the
>> infrastructure.
>>
>> I would like the flume to pick the new config when it is HUP'ed. This
>> way, it is a controlled deployment,
>> but at the same time not a full restart.
>>
>> ****
>>
>> On Thu, Jun 7, 2012 at 3:18 PM, Eric Sammer <[email protected]> wrote:
>> ****
>>
>> Flumers:****
>>
>> ** **
>>
>> Flume 0.9.x supported online reconfiguration and the intention was for
>> the 1.x branch to do so as well (it doesn't yet). I wanted to start a
>> discussion around whether people are interested in this kind of
>> functionality or if simply restarting the daemon(s) was sufficient for your
>> deployment. There are two ways of thinking about it:****
>>
>> ** **
>>
>> * Support reconfiguration. Agents may have multiple flows passing through
>> them and, ideally, adding new ones shouldn't interrupt existing flows.
>> Agent restarts interrupt collection and, for non-durable channels (i.e.
>> MemoryChannel), data *may* be lost. Reconfiguration will add significant
>> complexity and ultimately does not get around host level maintenance,
>> software upgrades, and the like.****
>>
>> ** **
>>
>> * Do no support reconfiguration. Accept the fact that agents may go down
>> eventually, so it should be supported as a first class case. In other
>> words, embrace the idea of failure / maintenance and handle it by
>> recommending topologies of agents that include multiple agents at each tier
>> and simply roundrobin / failover where necessary. The only downside is the
>> agent tier closest to the originating source (e.g. a log4j client);
>> restarting that agent means the client application needs to be able to find
>> another agent or buffer (which impacts durability or blocks the
>> application).
>> ****
>>
>> ** **
>>
>> We can optionally support some subset of online reconfiguration such as
>> only allowing new flows to be introduced or existing flows to be
>> "decommissioned," but not allow alteration of existing flows. Ultimately
>> this feature is a ton of work and adds a ton of complexity so if it's not
>> something folks are clambering for, we should spend our time worrying about
>> more pressing issues.****
>>
>> ** **
>>
>> Thoughts? Comments?****
>>
>> --
>> Eric Sammer
>> twitter: esammer
>> data: www.cloudera.com****
>>
>>
>>
>>
>> --
>> ..Senthil
>>
>> "If there's anything more important than my ego around, I want it
>>  caught and shot now."
>>                                                     - Douglas Adams.****
>>
>
>


-- 
Eric Sammer
twitter: esammer
data: www.cloudera.com

Re: [DISCUSS] Should we support hot reconfiguration?

Reply via email to