On Thursday, April 13, 2006 09:41:49 AM -0700 Renata Maria Dart <[EMAIL PROTECTED]> wrote:

Hi, we recently had a fileserver crash because of an ecache error.
When the server came back up it had the further misfortune of a fibre
channel adapter error which prevented the drives containing the vice
partitions from coming back online.  Once those issues were dealt
with, the system was again rebooted and came up with its vice
partitions but did not salvage on its own...we had to run bos salvage
manually to bring the volumes online.  This is a solaris 9 system
running openafs 1.4.1-rc10.  There are 2 partitions on it and the fs
process specifies 2 parallel salvage processes.  Unfortunately I was
not there to see all the details when the system came back online and
the admin who restored the system ran separate salvager commands for
the 3 200gb volumes that live on the system and didn't preserve the
original salvage logs.  Is it to be expected that salvager won't run
automatically after such a sequence of events?  Another couple of
pieces of information...I recently converted this system from inode to
namei, it does not have 'enable-fast-restart' configured into it, and
here are the entries from BosLog:

Under the circumstances you describe, yes, this is normal.

The bosserver forces a whole-server salvage any time the fileserver exits abnormally, or on startup if the fileserver was not shut down cleanly (there is a file in /usr/afs/local or whereever indicating the fileserver is "running"; if that file is present when the bossserver starts, it assumes an unclean shutdown).

On an inode-based server, fsck sets a flag if it makes any changes to a partition. When the fileserver starts up, if it sees this flag on any partitions, it immediately exits with an error, which causes the bosserver to force a salvage. Since the needs-salvage flag is stored on the partition, it is set and reset only when the partition is actually fsck'd and salvaged, respectively. It moves around if you move the disk, and doesn't get touched if the disk is "missing", as with your fc disk problem.

However, you switched to namei, which doesn't have that feature (and can't, since it doesn't use a modified fsck). So on your first start, the bosserver forced a salvage, but there weren't any partitions to salvage, so nothing interesting happened. Then the fileserver started up, and someone noticed there were no partitions, and rebooted. That involved a clean shutdown of the fileserver, which meant there was no forced salvage on the next boot.


I have to admit I'm a little curious why you switched from inode to namei on a Solaris server...

-- Jeffrey T. Hutzelman (N3NHS) <[EMAIL PROTECTED]>
  Sr. Research Systems Programmer
  School of Computer Science - Research Computing Facility
  Carnegie Mellon University - Pittsburgh, PA

_______________________________________________
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info

Reply via email to