Thanks Mr. Steve, and everyone.. I actually have just 16 machines (normal P4 PCs), so in case I need to do things manually it takes half an hour (for example when installing sun-java, I had to type that 'yes' for each .bin install) but for now i'm ok with pssh or just a simple custom script, however, I'm afraid things will get complicated soon enough...
You said: "you can automate rpm install using pure "rpm" command, and check for installed artifacts yourself" Could you please explain more, I understand you run the same rpm against all machines provided the cluster is homogeneous. K. Honsali 2008/4/30 Steve Loughran <[EMAIL PROTECTED]>: > Bradford Stephens wrote: > > > Greetings, > > > > I'm compiling a list of (free/OSS) tools commonly used to administer > > Linux > > clusters to help my company transition away from Win solutions. > > > > I use Ganglia for monitoring the general stats of the machines (Although > > I > > didn't get the hadoop metrics to work). I also use ntop to check out > > network > > performance (especially with Nutch). > > > > Once you move to larger farms, you have to move away from running stuff by > hand to even more automation. You dont really want to work with individual > machines, just have some central configuration that you adjust and let it > propagate out. The management tools can detect machines refusing to play and > hadoop should stop sticking data and work on them. > > -LinuxCOE is how we build images; InstaLinux: http://www.instalinux.com/is a > public instance of this. It can create .iso kickstart images that pulls > RPM or deb packages down off local/remote servers > > -Configuration Management becomes your next problem. A lot of the CM tools > let you declare the state of the machines, they then work to keep the > machines in that state, detect when they are out of it, and push your > machines back in to the desired state, or, failing that, start paging you. > The line between CM and monitoring tools gets kind of blurred. > > There are a few open source tools that can do this > > http://en.wikipedia.org/wiki/Comparison_of_open_source_configuration_management_software > > I'd point you at > -Smartfrog (personal bias there, as I work on it) > -puppet > -bcfg2 > -LCFG > -Quattor > > Then I'd go search the LISA archives to see what other people are up to; > there are some good papers there. Like this one, "On Designing and Deploying > Internet-Scale Services": > http://research.microsoft.com/~jamesrh/TalksAndPapers/JamesRH_Lisa.pdf<http://research.microsoft.com/%7Ejamesrh/TalksAndPapers/JamesRH_Lisa.pdf> > > > -steve > > -- > Steve Loughran http://www.1060.org/blogxter/publish/5 > Author: Ant in Action http://antbook.org/ >