On 2011-03-04 at 16:30, Andrew Deason ( adea...@sinenomine.net ) said:
On Fri, 4 Mar 2011 17:20:34 -0500 (EST)
Andy Cobaugh <phale...@gmail.com> wrote:

The first issue you reported had problems much earlier before the
log messages you gave. Did anything happen to the backup volume
before that?  No messages referencing that volume id? Did you or
someone/thing else remove the backup clone or anything?

Nope. We don't even access the backup volume when doing the file-level
backups anymore.

Well, _something_ deleted it, unless it didn't exist before 1 mar 2011.
This message

It certainly did exist before that, and nothing I did and no part of our backup system would have delete it.

Tue Mar  1 00:02:12 2011 VReadVolumeDiskHeader: Couldn't open header for volume 
536871061 (errno 2)

means the volume doesn't exist. It's not that it's corrupt or anything;
the volume was completely deleted. (or something just deleted the .vol
header, but the other messages suggest it was deleted normally)

What does 'deleted normally' mean in this context? Nothing touched the volume since the previous night, where it created the .backup volume just fine. Unfortunately, those logs have since rolled over, so I don't have anything older than from when I restarted the fileserver at 16:12 on Mar 1.

Yes, the zaps were me trying to get the .backup into a usable state.
Though, the first string of salvages started in the middle of the
afternoon without any intervention - I think the event that caused
them is what's missing from the picture.

Well, do you have the messages from around then?

Ugh, no. Hopefully I will if it happens again.

I'm still a little hesitant to bos salvage that server - whole reason
we're trying to switch to DAFS is to avoid the multi-hour fileserver
outages.

Salvaging a single volume is the same as a demand-salvage; it is no
slower and no more impactful than an automatically-triggered one. But
you can manually trigger the salvage of a single volume group in cases
like this (e.g. when the fileserver refuses to because it's been
salvaged too many times).

Ok, I had to bos salvage the .backup volume directly with -forceDAFS. When I did this when this happened on my machine at home, it wasn't so easy. In that case, it was with an RO clone. I think I had to remsite, then remove or zap or some combination, along with manually deleting the .vol. I wish I had payed closer attention then.

I still have no idea what caused the volume to spontaneously need salvaging Tuesday afternoon. I did notice that until I fixed the BK volume, if I did a 'vos exam home.gsong.backup', that triggered a salvage.

Wish I had more to go on. I'll be working on standardizing our logging configuration across servers next week, logging via syslog, etc, so we don't lose valuable logs like this.

--andy
_______________________________________________
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info

Reply via email to