Re: Large scale deployments

Michael Lockhart Thu, 02 Nov 2006 19:49:29 -0800

I've noticed how much perspective on managing systems changes when the
distance between machines expands greatly.  Managing 600+ systems in one
datacenter location is much easier than managing 600+ systems spread
throughout the country. Though a lot of the fundamentals are the same
(package management, security patches, service management, etc),
performing these functions become much more difficult, especially doing
critical system upgrades.

All the code we're writing has some pretty serious error handling,
working on implementing a rollback mechanism for our package management
system (outside of system packages, application packages), nothing ultra
fancy, but it works for now.

I'll have to take a look at those projects and see if they fit my needs.
Its getting to the point where I think with the work I've put into this
system so far, if I can't find any reasonable utilities I'll have to
clean up the bubble gum and popsicle sticks solution I've got right now.

Regards,
Mike Lockhart

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Mike Lockhart        [Systems Engineering & Operations]
StayOnline, Inc
http://www.stayonline.net/
mailto: [EMAIL PROTECTED]
GPG: 8714 6F73 3FC8 E0A4 0663  3AFF 9F5C 888D 0767 1550
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf
Of Will Maier
Sent: Thursday, November 02, 2006 8:46 PM
To: OpenBSD Misc
Subject: Re: Large scale deployments

On Thu, Nov 02, 2006 at 08:10:50PM -0500, Michael Lockhart wrote:
> 2. Command and Control.  What projects or capabilities are
> available for performing remote command and control over services,
> packages, and system health?  Currently, all push/pull is done
> with perl/sh scripts to bring files over, sanity check, install,
> update, etc.  I've been leaning towards creating a daemon that
> runs on each system and has a secure connection back to a
> centralized location for determining if updates are available.  My
> proof of concept works, but thoughts on how to do this right are
> GREATLY appreciated.

I've used cfengine on large (500+ nodes) Linux clusters. There lots
of things I wish were better in cfengine, but I haven't found a more
capable tool. For one-time mass administration tasks, I use dsh from
sysutils/clusterit, though the scenario you describe above seems
cfenginy to me.

> 3. Remote upgrading.  Going from 3.2 -> 3.8 or 4.0 is going to be
> very difficult, and the approach that I am taking right now is
> creating a bsd.rd based kernel/image that will boot fully into
> memory, and contain the appropriate scripts to re-initialized the
> disks, rsync/scp/ftp/get/whatever the new base image and kernel
> over, then reboot, and go into the new image, and perform the rest
> of the upgrade from there.  Has anyone done something similar to
> this or know of any projects along these lines?

Upgrading from 3.2 to 4.0 is going to be a headache. The clusters
I've worked in have all used network filesystems (mostly AFS) for
most data storage; reimaging a node has never cost much. Combined
with a well-thought-out configuration management system, and major
upgrades seem like less of a problem.

Of course, you need to vet your new system image with your
applications first.

I sure wish I had 600 OpenBSD boxes to worry about...Scientific
Linux is a headache.

-- 

o--------------------------{ Will Maier }--------------------------o
| web:.......http://www.lfod.us/ | [EMAIL PROTECTED] |
*------------------[ BSD Unix: Live Free or Die ]------------------*

Re: Large scale deployments

Reply via email to