On 8/7/06, Christopher D. Clausen <[EMAIL PROTECTED]> wrote:
Umm, am I missing something?  One of the major reasons I use AFS is the
"vos move" command.  And it was my understanding that AFS can handle
server outages without breaking.  Do you all have different experiences?
If AFS can't handle a server outage (especially a planned one) there is
no point in using it.

Don't be silly.  No system can handle all outages "without breaking."
RO replication is great, but it doesn't help users.

I have three machines, A, B, C.  Users distributed among them.
Machine B coughs up a lung due to hardware failure.  Suddenly 1/3 of
my users don't have accounts.

"vos move" isn't going to help volumes that aren't there.

I mean, there were [and are] things we do to try to limit the downtime
-- hot spare hardware,  RAID-5 disks,  and we improved the ability to
plug a RAID set into an existing server and getting it going asap [we
named the AFS partitions on each machine differently so there won't be
conflicts with, for example, two partitions called /vicepa].

But in the end, hardware failure is hardware failure and there's
nothing you can do to stop it.

I patch and reboot all of our AFS servers about once a month to ensure
that they have the latest operating system patches.  I usually also
upgrade to the latest 1.4.x release (just installed 1.4.2b3 on a system
today.)


I also run with fast-restart.  Have not had any reported problems with
volumes crapping out.  And I generally vos move eveything off of a
fileserver before planned restarts, so there is nothing there for the
salvager to keep offline.

Eventually volumes will kick offline if the fileserver detects they're
damaged and in need of a salvage.  Worse, sometimes the fileserver
hasn't yet figured out and the users get freaked out because files
seem to be "missing".

Salvages are *important* to the integrity of AFS volumes, just like
fsck is important to (non-journaled) disks.

> We're starting a routine of monthly salvages for each server to try to
> combat this.

Do salvages touch the volumes themselves, or is it just a parition level
thing?  I.e. if I vos move volumes off of the paritions and mkfs them
monthly, do I still need to worry about salvaging periodically?

YES!  The salvager is talking to the volumes themselves, checking
actual structure.  It tries to put things back together when it can.


Oh yes.  I don't run anything else on my AFS servers or KDCs.  I'd hate
to see a flaw in openafs compromise a KDC and thus I keep them seperate.
Although our (currently non-existant) DR plans might have a KDC and AFS
server on the same machine, possibly in a Solaris zone.

I am far less worried about OpenAFS comprimising my servers than all
the other cruft out there.
_______________________________________________
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info

Reply via email to