Re: [OpenAFS] Salvager did not run automatically on solaris 9, 1.4.1-rc10

Jeffrey Hutzelman Thu, 13 Apr 2006 15:57:22 -0700

On Thursday, April 13, 2006 09:41:49 AM -0700 Renata Maria Dart<[EMAIL PROTECTED]> wrote:

Hi, we recently had a fileserver crash because of an ecache error.
When the server came back up it had the further misfortune of a fibre
channel adapter error which prevented the drives containing the vice
partitions from coming back online.  Once those issues were dealt
with, the system was again rebooted and came up with its vice
partitions but did not salvage on its own...we had to run bos salvage
manually to bring the volumes online.  This is a solaris 9 system
running openafs 1.4.1-rc10.  There are 2 partitions on it and the fs
process specifies 2 parallel salvage processes.  Unfortunately I was
not there to see all the details when the system came back online and
the admin who restored the system ran separate salvager commands for
the 3 200gb volumes that live on the system and didn't preserve the
original salvage logs.  Is it to be expected that salvager won't run
automatically after such a sequence of events?  Another couple of
pieces of information...I recently converted this system from inode to
namei, it does not have 'enable-fast-restart' configured into it, and
here are the entries from BosLog:


Under the circumstances you describe, yes, this is normal.

The bosserver forces a whole-server salvage any time the fileserver exitsabnormally, or on startup if the fileserver was not shut down cleanly(there is a file in /usr/afs/local or whereever indicating the fileserveris "running"; if that file is present when the bossserver starts, itassumes an unclean shutdown).

On an inode-based server, fsck sets a flag if it makes any changes to apartition. When the fileserver starts up, if it sees this flag on anypartitions, it immediately exits with an error, which causes the bosserverto force a salvage. Since the needs-salvage flag is stored on thepartition, it is set and reset only when the partition is actually fsck'dand salvaged, respectively. It moves around if you move the disk, anddoesn't get touched if the disk is "missing", as with your fc disk problem.

However, you switched to namei, which doesn't have that feature (and can't,since it doesn't use a modified fsck). So on your first start, thebosserver forced a salvage, but there weren't any partitions to salvage, sonothing interesting happened. Then the fileserver started up, and someonenoticed there were no partitions, and rebooted. That involved a cleanshutdown of the fileserver, which meant there was no forced salvage on thenext boot.

I have to admit I'm a little curious why you switched from inode to nameion a Solaris server...


-- Jeffrey T. Hutzelman (N3NHS) <[EMAIL PROTECTED]>
  Sr. Research Systems Programmer
  School of Computer Science - Research Computing Facility
  Carnegie Mellon University - Pittsburgh, PA

_______________________________________________
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info

Re: [OpenAFS] Salvager did not run automatically on solaris 9, 1.4.1-rc10

Reply via email to