I've noticed how much perspective on managing systems changes when the distance between machines expands greatly. Managing 600+ systems in one datacenter location is much easier than managing 600+ systems spread throughout the country. Though a lot of the fundamentals are the same (package management, security patches, service management, etc), performing these functions become much more difficult, especially doing critical system upgrades.
All the code we're writing has some pretty serious error handling, working on implementing a rollback mechanism for our package management system (outside of system packages, application packages), nothing ultra fancy, but it works for now. I'll have to take a look at those projects and see if they fit my needs. Its getting to the point where I think with the work I've put into this system so far, if I can't find any reasonable utilities I'll have to clean up the bubble gum and popsicle sticks solution I've got right now. Regards, Mike Lockhart =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Mike Lockhart [Systems Engineering & Operations] StayOnline, Inc http://www.stayonline.net/ mailto: [EMAIL PROTECTED] GPG: 8714 6F73 3FC8 E0A4 0663 3AFF 9F5C 888D 0767 1550 =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Will Maier Sent: Thursday, November 02, 2006 8:46 PM To: OpenBSD Misc Subject: Re: Large scale deployments On Thu, Nov 02, 2006 at 08:10:50PM -0500, Michael Lockhart wrote: > 2. Command and Control. What projects or capabilities are > available for performing remote command and control over services, > packages, and system health? Currently, all push/pull is done > with perl/sh scripts to bring files over, sanity check, install, > update, etc. I've been leaning towards creating a daemon that > runs on each system and has a secure connection back to a > centralized location for determining if updates are available. My > proof of concept works, but thoughts on how to do this right are > GREATLY appreciated. I've used cfengine on large (500+ nodes) Linux clusters. There lots of things I wish were better in cfengine, but I haven't found a more capable tool. For one-time mass administration tasks, I use dsh from sysutils/clusterit, though the scenario you describe above seems cfenginy to me. > 3. Remote upgrading. Going from 3.2 -> 3.8 or 4.0 is going to be > very difficult, and the approach that I am taking right now is > creating a bsd.rd based kernel/image that will boot fully into > memory, and contain the appropriate scripts to re-initialized the > disks, rsync/scp/ftp/get/whatever the new base image and kernel > over, then reboot, and go into the new image, and perform the rest > of the upgrade from there. Has anyone done something similar to > this or know of any projects along these lines? Upgrading from 3.2 to 4.0 is going to be a headache. The clusters I've worked in have all used network filesystems (mostly AFS) for most data storage; reimaging a node has never cost much. Combined with a well-thought-out configuration management system, and major upgrades seem like less of a problem. Of course, you need to vet your new system image with your applications first. I sure wish I had 600 OpenBSD boxes to worry about...Scientific Linux is a headache. -- o--------------------------{ Will Maier }--------------------------o | web:.......http://www.lfod.us/ | [EMAIL PROTECTED] | *------------------[ BSD Unix: Live Free or Die ]------------------*