Bradford Stephens wrote:
Greetings,
I'm compiling a list of (free/OSS) tools commonly used to administer Linux
clusters to help my company transition away from Win solutions.
I use Ganglia for monitoring the general stats of the machines (Although I
didn't get the hadoop metrics to work). I also use ntop to check out network
performance (especially with Nutch).
Once you move to larger farms, you have to move away from running stuff
by hand to even more automation. You dont really want to work with
individual machines, just have some central configuration that you
adjust and let it propagate out. The management tools can detect
machines refusing to play and hadoop should stop sticking data and work
on them.
-LinuxCOE is how we build images; InstaLinux: http://www.instalinux.com/
is a public instance of this. It can create .iso kickstart images that
pulls RPM or deb packages down off local/remote servers
-Configuration Management becomes your next problem. A lot of the CM
tools let you declare the state of the machines, they then work to keep
the machines in that state, detect when they are out of it, and push
your machines back in to the desired state, or, failing that, start
paging you. The line between CM and monitoring tools gets kind of blurred.
There are a few open source tools that can do this
http://en.wikipedia.org/wiki/Comparison_of_open_source_configuration_management_software
I'd point you at
-Smartfrog (personal bias there, as I work on it)
-puppet
-bcfg2
-LCFG
-Quattor
Then I'd go search the LISA archives to see what other people are up to;
there are some good papers there. Like this one, "On Designing and
Deploying Internet-Scale Services":
http://research.microsoft.com/~jamesrh/TalksAndPapers/JamesRH_Lisa.pdf
-steve
--
Steve Loughran http://www.1060.org/blogxter/publish/5
Author: Ant in Action http://antbook.org/