Re: adding HA monitoring to bigtop

Jos Backus Fri, 12 Oct 2012 15:37:41 -0700

On Fri, Oct 12, 2012 at 3:17 PM, Roman Shaposhnik <[email protected]> wrote:

> On Thu, Oct 11, 2012 at 4:27 PM, Jos Backus <[email protected]> wrote:
> > I'm still working on this, now in the context of HDP. My goal at work is
> to
> > ultimately use the notify functionality in daemontools-encore to send
> > program crash/restart alerts, and change the various wrapper scripts
> > (start-dfs.sh, etc.) to use daemontools-encore (and shmux), per the
> ticket.
>
> Jos, is there any chance to see glimpses of your code? It doesn't have
> to be complete but it would be very useful as a starting point. Just like
> what is happening on BIGTOP-713 right now.
>

Sure, I just added
https://issues.apache.org/jira/secure/attachment/12548978/daemontools-services-0.0.1-1.el6.src.rpmThis
will be reworked for HDP which I'm in the process of installing in my
test cluster.

>
> > It would be great to be able to deprecate the init scripts completely
> > long-term and replace them with hooks for process supervision systems
> such
> > as daemontools-encore, which is very portable, or for the most popular
> > process supervision tools out there (Upstart, SMF, launchd).
>
> I think the unfortunate complication here is that we don't have a luxury
> of converging on a single service management system. We will be always
> supporting a number of them in Bigtop. And most likely init.d scripts
> will stick around for at least as long as RHEL5 is sticking around.
>

Fair enough. It's not blocking anything I'm doing.

>
> So the question then becomes -- do we strive for the lowest common
> denominator that works across all the distros we care about or do
> we provide hooks for *all* of these systems.
>
> Thoughts?
>

It would be good to have support for the most common ones (Upstart,
daemontools, systemd, SMF, runit) and it should not be too hard as they are
generally very similar, as long as we don't try to support complex service
dependencies in the service definitions themselves. One reason I like
daemontools (and don't like Upstart and systemd) is that it is very
portable and fairly small. You could even include it in Bigtop. ;-)

>
> > Also, right now a major source of trouble is the mess that are the
> startup/wrapper
> > scripts for Hadoop, because of the plethora of global environment
> variables and
> > shell code that sets/reads those variables.
>
> While working on Bigtop I've come to realize that the needs of upstream
> developers might actually be very different from the needs of downstream
> DevOPS. IOW, it may not be out of the question for us to completely bypass
> upstream service management scritpts and replace them with our own
> implementations. Essentially for things like systemd we'd have to that
> anyway.
>

Yes.

>
> If we embark on this project we have a chance of completely unifying
> how things are done -- at the end of the day all services in Hadoop
> ecosystem end up in java/jsvc invocation with certain env. vars
> and arguments passed to the JVM.
>

Agreed. All that's needed is a way to model and store the java/jsvc
commandline and environment. The service controller takes care of all the
complexity of watching the process, checking its status, handling stale
pidfiles and making sure the permissions are set correctly, runlevels
(chckconfig), etc. Basically all that code is factored out into the process
supervisor. The only sticky issue remains logging. I personally like the
convenience of being able to type `tail -F /service/foo/log/main/current'
for any service A LOT because I don't have to hunt for where the log is
stored. And multilog also takes care of log rotation and never fills up my
disks. But sadly many people just don't get the benefits of this approach,
stuck as they are in the traditional, cumbersome way of doing things. :-(

> > There's also the continuing issue that the various organizations/vendors
> > can't seem to make their minds up about the script UIs; there are the
> > hadoop, mapred, hdfs, yarn, hadoop-daemon.sh and hadoop-daemons.sh
> > commands, all which source various config files and use a ton of global
> > variables. It's unclear which ones to use such that the right
> configuration
> > is applied. So my plan is to use the most low-level interface and stick
> all
> > needed environment variables in /service/foo/env/... so they are easy to
> > find, query and set in a platform-independent manner.
>
> Bigtop is the place for this to be resolved. I think if we reasonable job
> of unifying this things vendors will follow.
>
> Basically, at this point the real issue is having enough folks interested
> in improving the situation and willing to post patches to the JIRAs.
> Personally, I plan to work on:
>     https://issues.apache.org/jira/browse/BIGTOP-460
>     https://issues.apache.org/jira/browse/BIGTOP-263
> but I would surely benefit from any help I can get.
>

Thanks, Roman. I will try to contribute more code and experience soon. My
goal is to make running Hadoop under process supervision my contribution to
the Hadoop community.

Cheers,
Jos

>
> Thanks,
> Roman.
>

-- 
Jos Backus
jos at catnook.com

Re: adding HA monitoring to bigtop

Reply via email to