On 6/13/13 12:33 PM, Andrew Deason wrote: > On Wed, 12 Jun 2013 21:24:49 -0400 > Garance A Drosihn<dro...@rpi.edu> wrote: > >> On May 29th, "something happened" to two AFS volumes which are >> both on the same vice partition. The volumes had been mounted >> fine, but suddenly they could not be attached. > > I'm not quite sure what you mean by this; there isn't really any > "mounting" operation for volumes on the server side; the volume > either attaches or it fails to attach. We do mount the /vicep* > partitions containing volumes, though, of course.
A bad choice of words on my part. I meant 'attached' not 'mounted'. We last restarted these fileservers back in January, and these volumes attached fine at that time. And we've been recreating the backup volumes three times a week ever since then, so I assume the volumes remained attached and not corrupted. There's also a report which runs every day to determine how much disk space people are using (so we can charge them for the space). One of those volumes appeared on all reports up to May 28th, and was gone on May 29th. >> This happened at around 9:30pm, and as far as I know nothing >> interesting was happening at that time. It turns out that daily report runs between 9pm and 10pm, so that report is almost certainly what triggered the error messages. > How are you determining these times? From this description, maybe > it sounds like the problems with the backup run alerted you to a > problem, and you looked in FileLog/VolserLog/etc, and saw errors > around those times. Is that what happened? The first errors I noticed in any logs where these in FileLog: Wed May 29 21:27:53 2013 Volume 537480983: couldn't reread volume header Wed May 29 21:27:53 2013 VAttachVolume: Error reading diskDataHandle \ vol header /vicepb//V0537480983.vol; error=101 Wed May 29 21:27:02 2013 Volume 537444438: couldn't reread volume header Wed May 29 21:27:02 2013 VAttachVolume: Error reading diskDataHandle \ vol header /vicepb//V0537444438.vol; error=101 And I knew our backups run at 5am. So I assumed something else must have happened at 9:30pm. But now I see that's just when we first ran into the problem due to our own procedure. >> So, before I get myself into too much trouble, what's the prudent >> thing to do here? Should I just redo the salvage, with '-oktozap'? > So, I assume you want to remove those volumes, and 'vos restore' > them from a previous volume dump. You can try to 'vos zap' the > volumes, to just remove them from disk. If that complains about > the volume needing salvage or whatnot, you can try to force it > with 'vos zap -force'. If that fails to remove the volume (it > shouldn't, but I'm not sure about older versions...), you may > need to directly tinker with the vicepb contents. But, we can > deal with that just as it becomes necessary. A plain 'vos zap' complained that the volumes needed to be salvaged. Adding -force resulted in: [root]# vos zap -server afsfs14 -partition vicepb \ -id 537444436 -backup -localauth -force vos: forcibly removing all traces of volume 537444436, \ please wait...failed with code 2. [root]# vos zap -server afsfs14 -partition vicepb \ -id 537480981 -backup -localauth -force vos: forcibly removing all traces of volume 537480981, \ please wait...failed with code 30. I should note that we don't care at all about the contents of these volumes. I just want to make sure I don't trigger damage to *other* volumes while trying to fix this. And the guy who is responsible for backups is anxious about this, as apparently these damaged volumes cause the backup to hang in some way. I had intended to take this week as vacation, but Murphey's law seems determined to prevent that! At this point I'm tempted to try 'salvager -oktozap' based on the documentation for it, but I'll wait to hear if that's the right thing to do in this situation. I should also note that this week we did finish a full backup of the entire AFS cell, except for these two volumes. So it should be true that everything else is reasonably okay. I hope to keep it that way! -- Garance Alistair Drosehn = dro...@rpi.edu Senior Systems Programmer or g...@freebsd.org Rensselaer Polytechnic Institute; Troy, NY; USA _______________________________________________ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info