Looks like a crash of the salvager. The SalvageLog should end differently with the summary line for the RW-volume. Are there any core files in /usr/afs/logs? If not, make sure ulimit for core file size isn't set to 0 and retry.

You also could run the salvager by hand under gdb to see why it crashes. You need then to add the -debug flag to prevent it from forking. E.g.

gdb /usr/afs/bin/salvager
...
(gdb) run /vicepb 536871656 -debug


Good luck,
Hartmut

McKee, Shawn wrote:
Hi Everyone,

I am having a problem with one of my OpenAFS file servers. About ½ of
the volumes are “Off-line” and I am unable to bring them online. First
some system info and then I will list problem details and what I have tried.

The system is running Scientific Linux 5.5/x86_64 (basically CentOS 5.5
64-bit). The openafs rpms are:

[atums2:~]# rpm -qa | grep openafs

openafs-kpasswd-1.4.12-6.cern

openafs-client-1.4.12-6.cern

kernel-module-openafs-2.6.18-194.3.1.el5-1.4.12-5.cern

openafs-1.4.12-6.cern

kernel-module-openafs-2.6.18-194.8.1.el5-1.4.12-5.cern

openafs-krb5-1.4.12-6.cern

kernel-module-openafs-2.6.18-238.1.1.el5-1.4.12-6.cern

openafs-server-1.4.12-6.cern

The version of ‘e2fsprogs’ is 1.39

The system has an ext3 1TB partition for AFS:

[atums2:~]# df /vicepb

Filesystem 1K-blocks Used Available Use% Mounted on

/dev/sda1 1007931664 635382472 321349196 67% /vicepb

The system has 931 volumes and only 470 are On-line while 461 are Off-line:

[atums2:~]# vos listvol atums2

Total number of volumes on server atums2 partition /vicepb: 931

chamber.OLD_eml4a07 536872814 RW 8634169 K Off-line

chamber.OLD_eml4a07.readonly 536872815 RO 8634169 K On-line

chamber.OLD_eml4a09 536872817 RW 702642 K Off-line

chamber.OLD_eml4a09.readonly 536872818 RO 702642 K On-line

…

Total volumes onLine 470 ; Total volumes offLine 461 ; Total busy 0

I have run ‘bos salvage’ on the partition multiple times. I have
restarted the system. I have run a force fsck.ext3 check on the
underlying partition (no problems found). Only RW volumes are Off-line.
All RO volumes are On-line. There are a few RW volumes On-line (8 out of
469) but the rest won’t come On-line.

Here is a particular volume which is Off-line:

[atums2:~]# vos examine chdata.sn

chdata.sn 536871656 RW 598 K Off-line

atums2.cern.ch /vicepb

RWrite 536871656 ROnly 0 Backup 0

MaxQuota 10000000 K

Creation Fri May 26 04:02:49 2006

Copy Wed Oct 11 12:35:42 2006

Backup Sun Jun 11 00:30:10 2006

Last Access Fri Jan 7 16:38:32 2011

Last Update Wed Apr 4 15:29:42 2007

0 accesses in the past day (i.e., vnode references)

RWrite: 536871656 ROnly: 536871657 RClone: 536871657

number of sites -> 3

server atums1.cern.ch partition /vicepi RO Site -- Old release

server atums2.cern.ch partition /vicepb RW Site -- New release

server atums2.cern.ch partition /vicepb RO Site -- New release

Try to bring online:

[atums2:~]# vos online -server atums2 -partition /vicepb -id chdata.sn

The FileLog shows:

Sun Jan 23 22:57:03 2011 GetBitmap: addled vnode index in volume
chdata.sn; volume needs salvage

Sun Jan 23 22:57:03 2011 VAttachVolume: error getting bitmap for volume
(/vicepb//V0536871656.vol)

Try to Salvage:

[atums2:~]# bos salvage atums2 /vicepb chdata.sn

Starting salvage.

bos: salvage completed

The SalvageLog shows:

[atums2:~]# tail /usr/afs/logs/SalvageLog

@(#) OpenAFS 1.4.12 built 2010-12-13 1928681 19919656

01/23/2011 22:58:19 STARTING AFS SALVAGER 2.4 (/usr/afs/bin/salvager
/vicepb 536871656)

01/23/2011 22:58:19 2 nVolumesInInodeFile 64

01/23/2011 22:58:19 CHECKING CLONED VOLUME 536871657.

01/23/2011 22:58:19 chdata.sn.readonly (536871657) updated 04/04/2007 15:29

01/23/2011 22:58:19 Partially allocated vnode 2 deleted.

Try again:

[atums2:~]# vos online -server atums2 -partition /vicepb -id chdata.sn


FileLog has the same message:

Sun Jan 23 22:59:05 2011 GetBitmap: addled vnode index in volume
chdata.sn; volume needs salvage

Sun Jan 23 22:59:05 2011 VAttachVolume: error getting bitmap for volume
(/vicepb//V0536871656.vol)

Salvage attempt again:

[atums2:~]# bos salvage atums2 /vicepb chdata.sn

Starting salvage.

bos: salvage completed

[atums2:~]# tail /usr/afs/logs/SalvageLog

@(#) OpenAFS 1.4.12 built 2010-12-13 1928681 19919656

01/23/2011 23:00:07 STARTING AFS SALVAGER 2.4 (/usr/afs/bin/salvager
/vicepb 536871656)

01/23/2011 23:00:07 2 nVolumesInInodeFile 64

01/23/2011 23:00:07 CHECKING CLONED VOLUME 536871657.

01/23/2011 23:00:07 chdata.sn.readonly (536871657) updated 04/04/2007 15:29

01/23/2011 23:00:07 Partially allocated vnode 2 deleted.

Same result as if the prior salvage didn’t do anything. This is exactly
what happens on other volumes I have tried to bring online.

So how would I fix this? Any suggestions for how to get the rest of
these volumes On-line?

Let me know if you need further details. Thanks,

Shawn



--
-----------------------------------------------------------------
Hartmut Reuter                  e-mail          reu...@rzg.mpg.de
                                phone            +49-89-3299-1328
                                fax              +49-89-3299-1301
RZG (Rechenzentrum Garching)    web    http://www.rzg.mpg.de/~hwr
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-----------------------------------------------------------------
_______________________________________________
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info

Reply via email to