Bradford Stephens wrote:
Greetings,

I'm compiling a list of (free/OSS) tools commonly used to administer Linux
clusters to help my company transition away from Win solutions.

I use Ganglia for monitoring the general stats of the machines (Although I
didn't get the hadoop metrics to work). I also use ntop to check out network
performance (especially with Nutch).

Once you move to larger farms, you have to move away from running stuff by hand to even more automation. You dont really want to work with individual machines, just have some central configuration that you adjust and let it propagate out. The management tools can detect machines refusing to play and hadoop should stop sticking data and work on them.

-LinuxCOE is how we build images; InstaLinux: http://www.instalinux.com/ is a public instance of this. It can create .iso kickstart images that pulls RPM or deb packages down off local/remote servers

-Configuration Management becomes your next problem. A lot of the CM tools let you declare the state of the machines, they then work to keep the machines in that state, detect when they are out of it, and push your machines back in to the desired state, or, failing that, start paging you. The line between CM and monitoring tools gets kind of blurred.

There are a few open source tools that can do this
http://en.wikipedia.org/wiki/Comparison_of_open_source_configuration_management_software

I'd point you at
 -Smartfrog (personal bias there,  as I work on it)
 -puppet
 -bcfg2
 -LCFG
 -Quattor

Then I'd go search the LISA archives to see what other people are up to; there are some good papers there. Like this one, "On Designing and Deploying Internet-Scale Services":
http://research.microsoft.com/~jamesrh/TalksAndPapers/JamesRH_Lisa.pdf

-steve

--
Steve Loughran                  http://www.1060.org/blogxter/publish/5
Author: Ant in Action           http://antbook.org/

Reply via email to