Looks like a crash of the salvager. The SalvageLog should end differently with
the summary line for the RW-volume. Are there any core files in /usr/afs/logs?
If not, make sure ulimit for core file size isn't set to 0 and retry.
You also could run the salvager by hand under gdb to see why it crashes. You
need then to add the -debug flag to prevent it from forking. E.g.
gdb /usr/afs/bin/salvager
...
(gdb) run /vicepb 536871656 -debug
Good luck,
Hartmut
McKee, Shawn wrote:
Hi Everyone,
I am having a problem with one of my OpenAFS file servers. About ½ of
the volumes are “Off-line” and I am unable to bring them online. First
some system info and then I will list problem details and what I have tried.
The system is running Scientific Linux 5.5/x86_64 (basically CentOS 5.5
64-bit). The openafs rpms are:
[atums2:~]# rpm -qa | grep openafs
openafs-kpasswd-1.4.12-6.cern
openafs-client-1.4.12-6.cern
kernel-module-openafs-2.6.18-194.3.1.el5-1.4.12-5.cern
openafs-1.4.12-6.cern
kernel-module-openafs-2.6.18-194.8.1.el5-1.4.12-5.cern
openafs-krb5-1.4.12-6.cern
kernel-module-openafs-2.6.18-238.1.1.el5-1.4.12-6.cern
openafs-server-1.4.12-6.cern
The version of ‘e2fsprogs’ is 1.39
The system has an ext3 1TB partition for AFS:
[atums2:~]# df /vicepb
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda1 1007931664 635382472 321349196 67% /vicepb
The system has 931 volumes and only 470 are On-line while 461 are Off-line:
[atums2:~]# vos listvol atums2
Total number of volumes on server atums2 partition /vicepb: 931
chamber.OLD_eml4a07 536872814 RW 8634169 K Off-line
chamber.OLD_eml4a07.readonly 536872815 RO 8634169 K On-line
chamber.OLD_eml4a09 536872817 RW 702642 K Off-line
chamber.OLD_eml4a09.readonly 536872818 RO 702642 K On-line
…
Total volumes onLine 470 ; Total volumes offLine 461 ; Total busy 0
I have run ‘bos salvage’ on the partition multiple times. I have
restarted the system. I have run a force fsck.ext3 check on the
underlying partition (no problems found). Only RW volumes are Off-line.
All RO volumes are On-line. There are a few RW volumes On-line (8 out of
469) but the rest won’t come On-line.
Here is a particular volume which is Off-line:
[atums2:~]# vos examine chdata.sn
chdata.sn 536871656 RW 598 K Off-line
atums2.cern.ch /vicepb
RWrite 536871656 ROnly 0 Backup 0
MaxQuota 10000000 K
Creation Fri May 26 04:02:49 2006
Copy Wed Oct 11 12:35:42 2006
Backup Sun Jun 11 00:30:10 2006
Last Access Fri Jan 7 16:38:32 2011
Last Update Wed Apr 4 15:29:42 2007
0 accesses in the past day (i.e., vnode references)
RWrite: 536871656 ROnly: 536871657 RClone: 536871657
number of sites -> 3
server atums1.cern.ch partition /vicepi RO Site -- Old release
server atums2.cern.ch partition /vicepb RW Site -- New release
server atums2.cern.ch partition /vicepb RO Site -- New release
Try to bring online:
[atums2:~]# vos online -server atums2 -partition /vicepb -id chdata.sn
The FileLog shows:
Sun Jan 23 22:57:03 2011 GetBitmap: addled vnode index in volume
chdata.sn; volume needs salvage
Sun Jan 23 22:57:03 2011 VAttachVolume: error getting bitmap for volume
(/vicepb//V0536871656.vol)
Try to Salvage:
[atums2:~]# bos salvage atums2 /vicepb chdata.sn
Starting salvage.
bos: salvage completed
The SalvageLog shows:
[atums2:~]# tail /usr/afs/logs/SalvageLog
@(#) OpenAFS 1.4.12 built 2010-12-13 1928681 19919656
01/23/2011 22:58:19 STARTING AFS SALVAGER 2.4 (/usr/afs/bin/salvager
/vicepb 536871656)
01/23/2011 22:58:19 2 nVolumesInInodeFile 64
01/23/2011 22:58:19 CHECKING CLONED VOLUME 536871657.
01/23/2011 22:58:19 chdata.sn.readonly (536871657) updated 04/04/2007 15:29
01/23/2011 22:58:19 Partially allocated vnode 2 deleted.
Try again:
[atums2:~]# vos online -server atums2 -partition /vicepb -id chdata.sn
FileLog has the same message:
Sun Jan 23 22:59:05 2011 GetBitmap: addled vnode index in volume
chdata.sn; volume needs salvage
Sun Jan 23 22:59:05 2011 VAttachVolume: error getting bitmap for volume
(/vicepb//V0536871656.vol)
Salvage attempt again:
[atums2:~]# bos salvage atums2 /vicepb chdata.sn
Starting salvage.
bos: salvage completed
[atums2:~]# tail /usr/afs/logs/SalvageLog
@(#) OpenAFS 1.4.12 built 2010-12-13 1928681 19919656
01/23/2011 23:00:07 STARTING AFS SALVAGER 2.4 (/usr/afs/bin/salvager
/vicepb 536871656)
01/23/2011 23:00:07 2 nVolumesInInodeFile 64
01/23/2011 23:00:07 CHECKING CLONED VOLUME 536871657.
01/23/2011 23:00:07 chdata.sn.readonly (536871657) updated 04/04/2007 15:29
01/23/2011 23:00:07 Partially allocated vnode 2 deleted.
Same result as if the prior salvage didn’t do anything. This is exactly
what happens on other volumes I have tried to bring online.
So how would I fix this? Any suggestions for how to get the rest of
these volumes On-line?
Let me know if you need further details. Thanks,
Shawn
--
-----------------------------------------------------------------
Hartmut Reuter e-mail reu...@rzg.mpg.de
phone +49-89-3299-1328
fax +49-89-3299-1301
RZG (Rechenzentrum Garching) web http://www.rzg.mpg.de/~hwr
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-----------------------------------------------------------------
_______________________________________________
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info