Hi Eric,

 

Not sure if a regular user should participate in these sort of
discussions but here's my opinion nevertheless ;-)

 

>You should *absolutely* share your opinion on this. That's the point of
the thread. To be clear, there is no such thing as a "developer" or
"committer-only" discussion in Apache projects. All development is open
and >user feedback is just as much a contribution as a patch!

 

 

Thanks for the clarification, good to hear.

         

        I think one of the biggest flaws of OG Flume is that it's too
complex and maybe even over-engineered. For example the centralized
configuration in the Flume Master sounds good on paper but in practice
it doesn't make things easier at all (fortunately this was fixed with NG
Flume). So IMO stick to the KISS principle and keep things
down-to-earth. I have no problem restarting agents, and a HUP
construction to reload the configuration as Senthivil suggested is even
better.

 

>Just to clarify, when I talk about online reconfiguration, that is
orthogonal to a centralized master that controls configuration. In other
words, if we were to do it, it would be reloading changes to the
property config >file, not via ZK and a master (unless someone wanted to
work on that - the plugin interfaces are there - it's just that I'm not
interested in doing that at this time for the KISS reason you mention).

 

>As an aside, Java does not make signal handling easy[1] (at all)
because it's a platform specific feature so the specific mechanism of
responding to SIGHUP is unlikely. Trivially, we could poll the file for
mtime >changes and reload; that's the most likely candidate. Atomicity
of changes could be achieved by making changes to a temp file and
renaming into place (which most conf management tools support).

 

>[1] The SIGKILL handler is a special case in Java as it supports JVM
termination callbacks. That's how we handle that. Technically, there's a
(not so) secret signal handling class in the com.sun namespace in the
>Oracle JDK, but it's entirely unsupported so depending on it could be
dangerous. I'd rather not make it harder to support non-Linux platforms
if someone wanted to pick up that ball and run with it.

 

That makes sense yes. Out of curiosity, does Windows also support an
mtime like polling? I'm no Java programmer for the record so I may be
asking something silly.

 

         

        I guess most users are using a config management system like
Puppet or Chef to deploy and configure their agents. If you keep that in
mind in terms of configuration it makes things a whole lot easier for
those users.

 

> Many of the users I've talked to are doing exactly this, I agree. The
question is only if we should support reloading changes to the files
(for now).

 

Those users can simply restart / reload their agents when the config
file is changed in the config management tool. I guess a question which
remains is what you do with the sources during the restart (in case of a
syslog UDP source the packets will be dropped and you can end up in loss
of events). Ideally you keep your sources alive so you can still receive
syslog packets while the agent restarts (so you get more of a reload
functionality rather than a full restart of the agent). Note sure how
easy or difficult that is ...

 

Regards,

Jorn

 

         

        Van: Senthilvel Rangaswamy [mailto:senthil...@gmail.com
<mailto:senthil...@gmail.com> ] 
        Verzonden: zaterdag 9 juni 2012 1:32
        Aan: flume-user@incubator.apache.org
<mailto:flume-user@incubator.apache.org> 
        CC: Flume Development
        Onderwerp: Re: [DISCUSS] Should we support hot reconfiguration?

         

        IMHO, online reconfiguration is dangerous. A typical use case
for flume is to be deployed at the very
        beginning of the data source, like web servers. These are
typically in large numbers. Say you push out
        a bad config and that gets picked up, it will wreak havoc on the
infrastructure.
        
        I would like the flume to pick the new config when it is HUP'ed.
This way, it is a controlled deployment,
        but at the same time not a full restart.

        On Thu, Jun 7, 2012 at 3:18 PM, Eric Sammer
<esam...@cloudera.com> wrote:

        Flumers:

         

        Flume 0.9.x supported online reconfiguration and the intention
was for the 1.x branch to do so as well (it doesn't yet). I wanted to
start a discussion around whether people are interested in this kind of
functionality or if simply restarting the daemon(s) was sufficient for
your deployment. There are two ways of thinking about it:

         

        * Support reconfiguration. Agents may have multiple flows
passing through them and, ideally, adding new ones shouldn't interrupt
existing flows. Agent restarts interrupt collection and, for non-durable
channels (i.e. MemoryChannel), data *may* be lost. Reconfiguration will
add significant complexity and ultimately does not get around host level
maintenance, software upgrades, and the like.

         

        * Do no support reconfiguration. Accept the fact that agents may
go down eventually, so it should be supported as a first class case. In
other words, embrace the idea of failure / maintenance and handle it by
recommending topologies of agents that include multiple agents at each
tier and simply roundrobin / failover where necessary. The only downside
is the agent tier closest to the originating source (e.g. a log4j
client); restarting that agent means the client application needs to be
able to find another agent or buffer (which impacts durability or blocks
the application).
        

         

        We can optionally support some subset of online reconfiguration
such as only allowing new flows to be introduced or existing flows to be
"decommissioned," but not allow alteration of existing flows. Ultimately
this feature is a ton of work and adds a ton of complexity so if it's
not something folks are clambering for, we should spend our time
worrying about more pressing issues.

         

        Thoughts? Comments?

        -- 
        Eric Sammer
        twitter: esammer
        data: www.cloudera.com

        
        
        
        -- 
        ..Senthil
        
        "If there's anything more important than my ego around, I want
it 
         caught and shot now."
                                                            - Douglas
Adams.





 

-- 
Eric Sammer
twitter: esammer
data: www.cloudera.com

Reply via email to