tape server crash recovery

Brandon D. Valentine Mon, 01 Apr 2002 12:10:47 -0800

So early Saturday morning our tape server suffered a kernel panic, and
it appears to have happened smack in the middle of the amdump run.  When
I brought the machine back up I looked and found that the run had not
completed, so I ran amcleanup, which appears to have taken care of the
logfiles left behind.  Then I looked and found a directory still in the
holding disk from that night, which contained 4 incomplete dumps left
behind by the 4 active dumpers at crash time.  I removed this directory.
Then I ran amcheck to see if anything else was odd.  Where it should
have told me which tape was needed for the next run, it only said that
it was expecting a new tape.  I went and looked at the tapelist file for
this config and it was empty!  Amdump must have been accessing it in
some way when the crash occured.  Luckily tapelist.yesterday had not
been overwritten so I looked at it and found that it was exactly what I
needed:


20020329 Daily03 reuse
20020328 Daily02 reuse
20020327 Daily01 reuse
20020326 Daily10 reuse
20020323 Daily09 reuse
20020322 Daily08 reuse
20020321 Daily07 reuse
20020320 Daily06 reuse
20020319 Daily05 reuse
20020316 Daily04 reuse

The 20020330 run (Saturday morning) was set to reuse Daily04, and that
was what was in the drive when the crash occured.  I copied
tapelist.yesterday to tapelist, and ran amcheck again, which told me
what I expected to see:

Tape Daily04 label ok

I started up amdump by hand (although it didn't background itself like
one would think it would given all the other amanda programs do, I had
to hit ^Z and run 'bg' from the shell to get my terminal back).  It's
running now and appears to be going well except that some of the disks
are listed in the amstatus output like this:

[dumps too big, but cannot incremental dump new disk]

This seems to be contingent on some variable that is currently escaping
me though.  The amstatus output lists plenty of disks that have already
dumped & taped level 1 and level 2 incrementals since I started up
amdump.  Amdump doesn't seem to be ignorant of the run history, so I
wonder why it is singling out these certain volumes as 'new disks'.
Perhaps these are the volumes in the process of being dumped or taped
when the crash occured?  Thoughts?  Suggestions?

If I force full dumps of the afflicted partitions for tonight should I
be safe?

-- 
Brandon D. Valentine <[EMAIL PROTECTED]>
Computer Geek, Center for Structural Biology

"This isn't rocket science -- but it _is_ computer science."
        - Terry Lambert on [EMAIL PROTECTED]

tape server crash recovery

Reply via email to