Re: [OpenAFS] Help needed for receovery of data of inode fileserver (Solaris 10 x86)

2008-04-04 Thread Derrick Brashear
>  I just saw -nowrite exists also in OpenAFS only that the bos command claims
> it would be possible only in MR-AFS. So one could at least run the salvager
> under the debugger with -nowrite

That's because for some reason command parser in bos salvage doesn't
know about that flag and so assumes its MR-AFS only.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Help needed for receovery of data of inode fileserver (Solaris 10 x86)

2008-04-04 Thread Hartmut Reuter

Jeffrey Altman wrote:

Hartmut Reuter wrote:


Jeffrey Altman wrote:


Hartmut Reuter wrote:


So what is the value of 'class' if not vLarge?


As you can see from that line above it's vSmall:

 >>   [6] DistilVnodeEssence(rwVId = 536870912U, class = 1, ino =
 >> 21977313U, maxu = 0x8046bc4), line 3175 in "vol-salvage.c"

So there might be really some thing wrong with the SmallVnodeFile, 
but to do an AssertionFailed is not the best way to repair it!




What the AssertionFailed means is that no one has written code to
deal with a case where this error has occurred.   It can't be
fixed with Salvager until someone writes the missing code.



Of course, but for the user it might be better to skip handling of 
this error and to continue with the next vnode. So he could get back 
at least the damaged volume and copy whatever is still accessible.


So John, ifdef line 3175 and recompile. If this was a single bad vnode 
your volume may come online again, otherwise it's probably lost anyway.


Hartmut



I disagree.   The reason that assert is there is that continuing
will cause more damage to the data.  We do not know based upon
the available data whether this is a single bad vnode or whether
perhaps the wrong file is being reference for the SmallVnodeFile.

What is known is that one vnode, perhaps the first vnode examined
has completely valid data except for the fact that it is in the
wrong file.

There are several issues that are worth pursuing here.  Especially 
because whatever the problem is has begun occurring on multiple machines:


1. what is the actual damage that has taken place?

2. can the damage be correct?

3. can the damage be avoided in the first place?  What is the cause?

Jeffrey Altman


Of course we should not remove the assert() forever, but just for the
test of this volume which otherwise probably will be lost anyway.

In MR-AFS we had a -nowrite option to do just a dry-run. I admit that
it's a lot work to implement this, but some times it is very helpful.

I just saw -nowrite exists also in OpenAFS only that the bos command 
claims it would be possible only in MR-AFS. So one could at least run 
the salvager under the debugger with -nowrite


Hartmut



___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info



--
-
Hartmut Reuter  e-mail  [EMAIL PROTECTED]
phone+49-89-3299-1328
fax  +49-89-3299-1301
RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Help needed for receovery of data of inode fileserver (Solaris 10 x86)

2008-04-04 Thread Hartmut Reuter

Jeffrey Altman wrote:

Hartmut Reuter wrote:


Jeffrey Altman wrote:


Hartmut Reuter wrote:


So what is the value of 'class' if not vLarge?


As you can see from that line above it's vSmall:

 >>   [6] DistilVnodeEssence(rwVId = 536870912U, class = 1, ino =
 >> 21977313U, maxu = 0x8046bc4), line 3175 in "vol-salvage.c"

So there might be really some thing wrong with the SmallVnodeFile, 
but to do an AssertionFailed is not the best way to repair it!




What the AssertionFailed means is that no one has written code to
deal with a case where this error has occurred.   It can't be
fixed with Salvager until someone writes the missing code.



Of course, but for the user it might be better to skip handling of 
this error and to continue with the next vnode. So he could get back 
at least the damaged volume and copy whatever is still accessible.


So John, ifdef line 3175 and recompile. If this was a single bad vnode 
your volume may come online again, otherwise it's probably lost anyway.


Hartmut



I disagree.   The reason that assert is there is that continuing
will cause more damage to the data.  We do not know based upon
the available data whether this is a single bad vnode or whether
perhaps the wrong file is being reference for the SmallVnodeFile.

What is known is that one vnode, perhaps the first vnode examined
has completely valid data except for the fact that it is in the
wrong file.

There are several issues that are worth pursuing here.  Especially 
because whatever the problem is has begun occurring on multiple machines:


1. what is the actual damage that has taken place?

2. can the damage be correct?

3. can the damage be avoided in the first place?  What is the cause?

Jeffrey Altman


Of course we should not remove the assert() forever, but just for the 
test of this volume which otherwise probably will be lost anyway.


In MR-AFS we had a -nowrite option to do just a dry-run. I admit that 
it's a lot work to implement this, but some times it is very helpful.


Hartmut



___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info



--
-
Hartmut Reuter  e-mail  [EMAIL PROTECTED]
phone+49-89-3299-1328
fax  +49-89-3299-1301
RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Help needed for receovery of data of inode fileserver (Solaris 10 x86)

2008-04-04 Thread Derrick Brashear
>  There are several issues that are worth pursuing here.  Especially because
> whatever the problem is has begun occurring on multiple machines:
>
>  1. what is the actual damage that has taken place?
>
>  2. can the damage be correct?
>
>  3. can the damage be avoided in the first place?  What is the cause?
>

If it's a reproducible problem, it should be easy to change the volume
package to assert when writing a "wrong" type vnode for a class to its
index. But I think there's more to it than that.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Help needed for receovery of data of inode fileserver (Solaris 10 x86)

2008-04-04 Thread Jeffrey Altman

Hartmut Reuter wrote:

Jeffrey Altman wrote:

Hartmut Reuter wrote:


So what is the value of 'class' if not vLarge?


As you can see from that line above it's vSmall:

 >>   [6] DistilVnodeEssence(rwVId = 536870912U, class = 1, ino =
 >> 21977313U, maxu = 0x8046bc4), line 3175 in "vol-salvage.c"

So there might be really some thing wrong with the SmallVnodeFile, 
but to do an AssertionFailed is not the best way to repair it!



What the AssertionFailed means is that no one has written code to
deal with a case where this error has occurred.   It can't be
fixed with Salvager until someone writes the missing code.


Of course, but for the user it might be better to skip handling of this 
error and to continue with the next vnode. So he could get back at least 
the damaged volume and copy whatever is still accessible.


So John, ifdef line 3175 and recompile. If this was a single bad vnode 
your volume may come online again, otherwise it's probably lost anyway.


Hartmut


I disagree.   The reason that assert is there is that continuing
will cause more damage to the data.  We do not know based upon
the available data whether this is a single bad vnode or whether
perhaps the wrong file is being reference for the SmallVnodeFile.

What is known is that one vnode, perhaps the first vnode examined
has completely valid data except for the fact that it is in the
wrong file.

There are several issues that are worth pursuing here.  Especially 
because whatever the problem is has begun occurring on multiple machines:


1. what is the actual damage that has taken place?

2. can the damage be correct?

3. can the damage be avoided in the first place?  What is the cause?

Jeffrey Altman
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Help needed for receovery of data of inode fileserver (Solaris 10 x86)

2008-04-04 Thread Hartmut Reuter

Jeffrey Altman wrote:

Hartmut Reuter wrote:


So what is the value of 'class' if not vLarge?


As you can see from that line above it's vSmall:

 >>   [6] DistilVnodeEssence(rwVId = 536870912U, class = 1, ino =
 >> 21977313U, maxu = 0x8046bc4), line 3175 in "vol-salvage.c"

So there might be really some thing wrong with the SmallVnodeFile, but 
to do an AssertionFailed is not the best way to repair it!



What the AssertionFailed means is that no one has written code to
deal with a case where this error has occurred.   It can't be
fixed with Salvager until someone writes the missing code.


Of course, but for the user it might be better to skip handling of this 
error and to continue with the next vnode. So he could get back at least 
the damaged volume and copy whatever is still accessible.


So John, ifdef line 3175 and recompile. If this was a single bad vnode 
your volume may come online again, otherwise it's probably lost anyway.


Hartmut


___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info



--
-
Hartmut Reuter  e-mail  [EMAIL PROTECTED]
phone+49-89-3299-1328
fax  +49-89-3299-1301
RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Help needed for receovery of data of inode fileserver (Solaris 10 x86)

2008-04-03 Thread Jeffrey Altman

John Tang Boyland wrote:


(dbx) print class
class = 1

If it's useful, here's some more info:

(dbx) print *vnode
*vnode = {
type = 2U
cloned   = 1U
modeBits = 493U
linkCount= 2
length   = 8192U
uniquifier   = 1U
dataVersion  = 166U
vn_ino_lo= 21977315
unixModifyTime   = 1134748419U
author   = 1U
owner= 0
parent   = 0
vnodeMagic   = 2911331838U
lock = {
lockCount = 0
lockTime  = 0
}
serverModifyTime = 1134748419U
group= 0
vn_ino_hi= 0
reserved6= 0
}


This entry is clearly a vDirectory (type == 2) and it a vLarge vnode 
(vnodeMagic == LARGEVNODEMAGIC) but it is showing up in the file 
referenced by rwIsp->volSummary->header.smallVnodeIndex in 
SalvageVolume().  This header might be corrupted thereby referring to 
the wrong indexfile.  I won't be able to do more without closer 
inspection of the volume data.


Jeffrey Altman

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Help needed for receovery of data of inode fileserver (Solaris 10 x86)

2008-04-03 Thread Jeffrey Altman

Hartmut Reuter wrote:

So what is the value of 'class' if not vLarge?


As you can see from that line above it's vSmall:

 >>   [6] DistilVnodeEssence(rwVId = 536870912U, class = 1, ino =
 >> 21977313U, maxu = 0x8046bc4), line 3175 in "vol-salvage.c"

So there might be really some thing wrong with the SmallVnodeFile, but 
to do an AssertionFailed is not the best way to repair it!


What the AssertionFailed means is that no one has written code to
deal with a case where this error has occurred.   It can't be
fixed with Salvager until someone writes the missing code.

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Help needed for receovery of data of inode fileserver (Solaris 10 x86)

2008-04-03 Thread John Tang Boyland
] > (dbx) up
] > Current function is DistilVnodeEssence
] >  3175   assert(class == vLarge);
] > (dbx) list 3170,3180
] >  3170   vep->type = vnode->type;
] >  3171   vep->author = vnode->author;
] >  3172   vep->owner = vnode->owner;
] >  3173   vep->group = vnode->group;
] >  3174   if (vnode->type == vDirectory) {
] >  3175   assert(class == vLarge);
] >  3176   vip->inodes[vnodeIndex] = VNDISK_GET_INO(vnode);
] >  3177   }
] >  3178   }
] >  3179   }
] >  3180   STREAM_CLOSE(file);
] 
] So what is the value of 'class' if not vLarge?

Oops.  Sorry.  Yes I should have included that in my email:

(dbx) print class
class = 1

If it's useful, here's some more info:

(dbx) print *vnode
*vnode = {
type = 2U
cloned   = 1U
modeBits = 493U
linkCount= 2
length   = 8192U
uniquifier   = 1U
dataVersion  = 166U
vn_ino_lo= 21977315
unixModifyTime   = 1134748419U
author   = 1U
owner= 0
parent   = 0
vnodeMagic   = 2911331838U
lock = {
lockCount = 0
lockTime  = 0
}
serverModifyTime = 1134748419U
group= 0
vn_ino_hi= 0
reserved6= 0
}

John Boyland
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Help needed for receovery of data of inode fileserver (Solaris 10 x86)

2008-04-03 Thread Hartmut Reuter

Jeffrey Altman wrote:

John Tang Boyland wrote:


OK I compiled the salvager with debugging and without optimization.

filip# /opt/SUNWspro/bin/dbx salvager.debug
For information about new features see `help changes'
To remove this message, put `dbxenv suppress_startup_message 7.5' in 
your .dbxrc

Reading salvager.debug
Reading ld.so.1
Reading libresolv.so.2
Reading libsocket.so.1
Reading libnsl.so.1
Reading libintl.so.1
Reading libdl.so.1
Reading libc.so.1
(dbx) run /vicepa -debug -parallel 1
Running: salvager.debug /vicepa -debug -parallel 1 (process id 3491)

[after three hours, I pressed return]

Thu Apr  3 14:14:20 2008: Assertion failed! file vol-salvage.c, line 
3175.

signal ABRT (Abort) in __lwp_kill at 0xfee21157
0xfee21157: __lwp_kill+0x0007:  jae  __lwp_kill+0x15[ 
0xfee21165, .+0xe ]

Current function is AssertionFailed
   48   abort();
(dbx) where
  [1] __lwp_kill(0x1, 0x6), at 0xfee21157   [2] _thr_kill(0x1, 0x6), 
at 0xfee1e8c9   [3] raise(0x6), at 0xfedcd163   [4] abort(0x804694a, 
0x47f52c8c, 0x6854, 0x70412075, 0x33202072, 0x3a343120), at 
0xfedb0ba9 =>[5] AssertionFailed(file = 0x808b724 "vol-salvage.c", 
line = 3175), line 48 in "assert.c"
  [6] DistilVnodeEssence(rwVId = 536870912U, class = 1, ino = 
21977313U, maxu = 0x8046bc4), line 3175 in "vol-salvage.c"
  [7] SalvageVolume(rwIsp = 0x9ab0130, alinkH = 0x9ac0de8), line 3346 
in "vol-salvage.c"
  [8] DoSalvageVolumeGroup(isp = 0x9ab0130, nVols = 1), line 2104 in 
"vol-salvage.c"
  [9] SalvageFileSys1(partP = 0x80bacd8, singleVolumeNumber = 0), line 
1357 in "vol-salvage.c"
  [10] SalvageFileSys(partP = 0x80bacd8, singleVolumeNumber = 0), line 
1192 in "vol-salvage.c"

  [11] handleit(as = 0x80a9340), line 687 in "vol-salvage.c"
  [12] cmd_Dispatch(argc = 6, argv = 0x80aaba8), line 902 in "cmd.c"
  [13] main(argc = 5, argv = 0x8047650), line 845 in "vol-salvage.c"
(dbx) up
Current function is DistilVnodeEssence
 3175   assert(class == vLarge);
(dbx) list 3170,3180
 3170   vep->type = vnode->type;
 3171   vep->author = vnode->author;
 3172   vep->owner = vnode->owner;
 3173   vep->group = vnode->group;
 3174   if (vnode->type == vDirectory) {
 3175   assert(class == vLarge);
 3176   vip->inodes[vnodeIndex] = VNDISK_GET_INO(vnode);
 3177   }
 3178   }
 3179   }
 3180   STREAM_CLOSE(file);



So what is the value of 'class' if not vLarge?


As you can see from that line above it's vSmall:

>>   [6] DistilVnodeEssence(rwVId = 536870912U, class = 1, ino =
>> 21977313U, maxu = 0x8046bc4), line 3175 in "vol-salvage.c"

So there might be really some thing wrong with the SmallVnodeFile, but 
to do an AssertionFailed is not the best way to repair it!


Hartmut
-
Hartmut Reuter  e-mail  [EMAIL PROTECTED]
phone+49-89-3299-1328
fax  +49-89-3299-1301
RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Help needed for receovery of data of inode fileserver (Solaris 10 x86)

2008-04-03 Thread Jeffrey Altman

John Tang Boyland wrote:

OK I compiled the salvager with debugging and without optimization.

filip# /opt/SUNWspro/bin/dbx salvager.debug
For information about new features see `help changes'
To remove this message, put `dbxenv suppress_startup_message 7.5' in your .dbxrc
Reading salvager.debug
Reading ld.so.1
Reading libresolv.so.2
Reading libsocket.so.1
Reading libnsl.so.1
Reading libintl.so.1
Reading libdl.so.1
Reading libc.so.1
(dbx) run /vicepa -debug -parallel 1
Running: salvager.debug /vicepa -debug -parallel 1 
(process id 3491)


[after three hours, I pressed return]

Thu Apr  3 14:14:20 2008: Assertion failed! file vol-salvage.c, line 3175.
signal ABRT (Abort) in __lwp_kill at 0xfee21157
0xfee21157: __lwp_kill+0x0007:  jae  __lwp_kill+0x15[ 0xfee21165, 
.+0xe ]
Current function is AssertionFailed
   48   abort();
(dbx) where
  [1] __lwp_kill(0x1, 0x6), at 0xfee21157 
  [2] _thr_kill(0x1, 0x6), at 0xfee1e8c9 
  [3] raise(0x6), at 0xfedcd163 
  [4] abort(0x804694a, 0x47f52c8c, 0x6854, 0x70412075, 0x33202072, 0x3a343120), at 0xfedb0ba9 
=>[5] AssertionFailed(file = 0x808b724 "vol-salvage.c", line = 3175), line 48 in "assert.c"

  [6] DistilVnodeEssence(rwVId = 536870912U, class = 1, ino = 21977313U, maxu = 
0x8046bc4), line 3175 in "vol-salvage.c"
  [7] SalvageVolume(rwIsp = 0x9ab0130, alinkH = 0x9ac0de8), line 3346 in 
"vol-salvage.c"
  [8] DoSalvageVolumeGroup(isp = 0x9ab0130, nVols = 1), line 2104 in 
"vol-salvage.c"
  [9] SalvageFileSys1(partP = 0x80bacd8, singleVolumeNumber = 0), line 1357 in 
"vol-salvage.c"
  [10] SalvageFileSys(partP = 0x80bacd8, singleVolumeNumber = 0), line 1192 in 
"vol-salvage.c"
  [11] handleit(as = 0x80a9340), line 687 in "vol-salvage.c"
  [12] cmd_Dispatch(argc = 6, argv = 0x80aaba8), line 902 in "cmd.c"
  [13] main(argc = 5, argv = 0x8047650), line 845 in "vol-salvage.c"
(dbx) up
Current function is DistilVnodeEssence
 3175   assert(class == vLarge);
(dbx) list 3170,3180
 3170   vep->type = vnode->type;
 3171   vep->author = vnode->author;
 3172   vep->owner = vnode->owner;
 3173   vep->group = vnode->group;
 3174   if (vnode->type == vDirectory) {
 3175   assert(class == vLarge);
 3176   vip->inodes[vnodeIndex] = VNDISK_GET_INO(vnode);
 3177   }
 3178   }
 3179   }
 3180   STREAM_CLOSE(file);


So what is the value of 'class' if not vLarge?



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [OpenAFS] Help needed for receovery of data of inode fileserver (Solaris 10 x86)

2008-04-03 Thread John Tang Boyland
OK I compiled the salvager with debugging and without optimization.

filip# /opt/SUNWspro/bin/dbx salvager.debug
For information about new features see `help changes'
To remove this message, put `dbxenv suppress_startup_message 7.5' in your .dbxrc
Reading salvager.debug
Reading ld.so.1
Reading libresolv.so.2
Reading libsocket.so.1
Reading libnsl.so.1
Reading libintl.so.1
Reading libdl.so.1
Reading libc.so.1
(dbx) run /vicepa -debug -parallel 1
Running: salvager.debug /vicepa -debug -parallel 1 
(process id 3491)

[after three hours, I pressed return]

Thu Apr  3 14:14:20 2008: Assertion failed! file vol-salvage.c, line 3175.
signal ABRT (Abort) in __lwp_kill at 0xfee21157
0xfee21157: __lwp_kill+0x0007:  jae  __lwp_kill+0x15[ 0xfee21165, 
.+0xe ]
Current function is AssertionFailed
   48   abort();
(dbx) where
  [1] __lwp_kill(0x1, 0x6), at 0xfee21157 
  [2] _thr_kill(0x1, 0x6), at 0xfee1e8c9 
  [3] raise(0x6), at 0xfedcd163 
  [4] abort(0x804694a, 0x47f52c8c, 0x6854, 0x70412075, 0x33202072, 
0x3a343120), at 0xfedb0ba9 
=>[5] AssertionFailed(file = 0x808b724 "vol-salvage.c", line = 3175), line 48 
in "assert.c"
  [6] DistilVnodeEssence(rwVId = 536870912U, class = 1, ino = 21977313U, maxu = 
0x8046bc4), line 3175 in "vol-salvage.c"
  [7] SalvageVolume(rwIsp = 0x9ab0130, alinkH = 0x9ac0de8), line 3346 in 
"vol-salvage.c"
  [8] DoSalvageVolumeGroup(isp = 0x9ab0130, nVols = 1), line 2104 in 
"vol-salvage.c"
  [9] SalvageFileSys1(partP = 0x80bacd8, singleVolumeNumber = 0), line 1357 in 
"vol-salvage.c"
  [10] SalvageFileSys(partP = 0x80bacd8, singleVolumeNumber = 0), line 1192 in 
"vol-salvage.c"
  [11] handleit(as = 0x80a9340), line 687 in "vol-salvage.c"
  [12] cmd_Dispatch(argc = 6, argv = 0x80aaba8), line 902 in "cmd.c"
  [13] main(argc = 5, argv = 0x8047650), line 845 in "vol-salvage.c"
(dbx) up
Current function is DistilVnodeEssence
 3175   assert(class == vLarge);
(dbx) list 3170,3180
 3170   vep->type = vnode->type;
 3171   vep->author = vnode->author;
 3172   vep->owner = vnode->owner;
 3173   vep->group = vnode->group;
 3174   if (vnode->type == vDirectory) {
 3175   assert(class == vLarge);
 3176   vip->inodes[vnodeIndex] = VNDISK_GET_INO(vnode);
 3177   }
 3178   }
 3179   }
 3180   STREAM_CLOSE(file);

The log says:

@(#) OpenAFS 1.4.7pre2 built  2008-04-03 
04/03/2008 11:21:46 STARTING AFS SALVAGER 2.4 (/usr/openafs-1.4.7pre2/bin/salvag
er.debug /vicepa -debug -parallel 1)
04/03/2008 11:21:46 SALVAGING FILE SYSTEM PARTITION /vicepa (device=c1t1d0s6)
04/03/2008 11:21:56 Scanning inodes on device /dev/rdsk/c1t1d0s6...
04/03/2008 11:24:06 242 nVolumesInInodeFile 6776 
04/03/2008 14:14:20 SALVAGING VOLUME 536870912.
04/03/2008 14:14:20 root.afs (536870912) updated 12/16/2005 09:53
04/03/2008 14:14:20 totalInodes 165

and then it breaks off

John.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Help needed for receovery of data of inode fileserver (Solaris 10 x86)

2008-04-03 Thread Derrick Brashear
On Thu, Apr 3, 2008 at 11:17 AM, Jeffrey Altman
<[EMAIL PROTECTED]> wrote:
> John Tang Boyland wrote:
>
> > ] There may be patches in 1.4.7-pre2 that might help.
> >
> >
>   04/03/2008 08:24:41 SALVAGING VOLUME 536870912.
>
> > 04/03/2008 08:24:41 Part of the header (Volume information) is corrupted
> > 04/03/2008 08:24:41 totalInodes 165
> > 04/03/2008 08:24:41 "Salvage volume group" core dumped!
> >
>
>
> > Is there a way for the core files to be found?  It seems
> > that bos/salvager deletes them?  "ulimit" says "unlimited".
> > There's no core files in /usr/afs/logs, /tmp or /var/tmp
> >
>
>  There are two things you can try.
>
>  1. Force the salvager to run without forking by running it with "-parallel
> 1".  That might get you a core file.
>
>  2. Run the salvager manually under gdb.

and if you do 2, you don't care: set it to follow forked children, and
then collect the core.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Help needed for receovery of data of inode fileserver (Solaris 10 x86)

2008-04-03 Thread Jeffrey Altman

John Tang Boyland wrote:

] There may be patches in 1.4.7-pre2 that might help.


 04/03/2008 08:24:41 SALVAGING VOLUME 536870912.

04/03/2008 08:24:41 Part of the header (Volume information) is corrupted
04/03/2008 08:24:41 totalInodes 165
04/03/2008 08:24:41 "Salvage volume group" core dumped!



Is there a way for the core files to be found?  It seems
that bos/salvager deletes them?  "ulimit" says "unlimited".
There's no core files in /usr/afs/logs, /tmp or /var/tmp


There are two things you can try.

1. Force the salvager to run without forking by running it with 
"-parallel 1".  That might get you a core file.


2. Run the salvager manually under gdb.




smime.p7s
Description: S/MIME Cryptographic Signature


Re: [OpenAFS] Help needed for receovery of data of inode fileserver (Solaris 10 x86)

2008-04-03 Thread John Tang Boyland
] There may be patches in 1.4.7-pre2 that might help.

I tried it, but got no better results:

@(#) OpenAFS 1.4.7pre2 built  2008-04-03 
04/03/2008 08:22:21 STARTING AFS SALVAGER 2.4 (/usr/afs/bin/salvager -f)
04/03/2008 08:22:21 Starting salvage of file system partition /vicepa
04/03/2008 08:22:21 SALVAGING FILE SYSTEM PARTITION /vicepa (device=c1t1d0s6)
04/03/2008 08:22:21 ***Forced salvage of all volumes on this partition***
04/03/2008 08:22:31 Scanning inodes on device /dev/rdsk/c1t1d0s6...
04/03/2008 08:24:41 246 nVolumesInInodeFile 6888 
04/03/2008 08:24:41 /vicepa/V0536871296.vol is not a legitimate volume header f
le; deleted
04/03/2008 08:24:41 SALVAGING VOLUME 536870912.
04/03/2008 08:24:41 Part of the header (Volume information) is corrupted
04/03/2008 08:24:41 totalInodes 165
04/03/2008 08:24:41 "Salvage volume group" core dumped!
04/03/2008 08:24:41 SALVAGING VOLUME 536870915.
04/03/2008 08:24:41 Part of the header (Volume information) is corrupted
04/03/2008 08:24:41 totalInodes 30
04/03/2008 08:24:42 "Salvage volume group" core dumped!
04/03/2008 08:24:42 SALVAGING VOLUME 536870918.
04/03/2008 08:24:42 Part of the header (Volume information) is corrupted
04/03/2008 08:24:42 totalInodes 5
04/03/2008 08:24:42 "Salvage volume group" core dumped!
04/03/2008 08:24:42 SALVAGING VOLUME 536870921.
04/03/2008 08:24:42 Part of the header (Volume information) is corrupted
04/03/2008 08:24:42 totalInodes 7
04/03/2008 08:24:43 "Salvage volume group" core dumped!
04/03/2008 08:24:43 SALVAGING VOLUME 536870924.
04/03/2008 08:24:43 Part of the header (Volume information) is corrupted
04/03/2008 08:24:43 totalInodes 526
04/03/2008 08:24:43 "Salvage volume group" core dumped!
04/03/2008 08:24:43 SALVAGING VOLUME 536870927.
04/03/2008 08:24:43 Part of the header (Volume information) is corrupted
04/03/2008 08:24:43 Vnode 2050 (unique 1505): corresponding inode 22233575 is m
ssing; vnode deleted, vnode mod time=Tue Nov 24 08:23:17 1998
04/03/2008 08:24:43 Vnode 2052 (unique 1506): corresponding inode 22233576 is m
ssing; vnode deleted, vnode mod time=Tue Nov 24 08:23:17 1998

... etc ad nauseum (almost 2000 lines of this)

04/03/2008 08:26:44 Vnode 1600 (unique 2574): corresponding inode 22768239 is 
missing; vnode deleted, vnode mod time=Wed Feb 13 15:49:27 2008
04/03/2008 08:26:44 totalInodes 701
04/03/2008 08:26:44 "Salvage volume group" core dumped!
04/03/2008 08:26:44 SALVAGING OF PARTITION /vicepa COMPLETED

# fsck /vicepa
Open AFS (R) openafs 1.4.7pre2 fsck
** /dev/rdsk/c1t1d0s6
** Last Mounted on /vicepa
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
849926 files, 18516648 used, 39053355 free, 849677 AFS files (89091 frags, 
4870533 blocks, 0.2% fragmentation)

Is there a way for the core files to be found?  It seems
that bos/salvager deletes them?  "ulimit" says "unlimited".
There's no core files in /usr/afs/logs, /tmp or /var/tmp

John Boyland
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Help needed for receovery of data of inode fileserver (Solaris 10 x86)

2008-04-02 Thread Jeffrey Altman

John Tang Boyland wrote:


] There may be patches in 1.4.7-pre2 that might help.

Let me know how I can use them. (I assume I get the 1.4.6 source
and then find patches on the openafs site somewhere and apply?)


Install 1.4.7-pre2.  It is after all a release candidate.




smime.p7s
Description: S/MIME Cryptographic Signature


Re: [OpenAFS] Help needed for receovery of data of inode fileserver (Solaris 10 x86)

2008-04-02 Thread John Tang Boyland
] Do you have logging turned on in this partition? (It should be off.)

Yes, we've known about the problems with logging:

/etc/vfstab has:

/dev/dsk/c1t0d0s5   /dev/rdsk/c1t0d0s5  /usr/vice   ufs 2
yes nologging
/dev/dsk/c1t1d0s6   /dev/rdsk/c1t1d0s6  /vicepa afs 3   yes
nologging

John 
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Help needed for receovery of data of inode fileserver (Solaris 10 x86)

2008-04-02 Thread John Tang Boyland
] > SalvageLog starts:
] > @(#) OpenAFS 1.4.6 built  2007-12-17 
] > 04/01/2008 17:15:21 STARTING AFS SALVAGER 2.4 (/usr/afs/bin/salvager -f)
] > 04/01/2008 17:15:21 Starting salvage of file system partition /vicepa
] > 04/01/2008 17:15:21 SALVAGING FILE SYSTEM PARTITION /vicepa 
(device=c1t1d0s6)
] > 04/01/2008 17:15:21 ***Forced salvage of all volumes on this partition***
] > 04/01/2008 17:15:31 Scanning inodes on device /dev/rdsk/c1t1d0s6...
] > 04/01/2008 17:17:40 242 nVolumesInInodeFile 6776 
] > 04/01/2008 17:17:40 SALVAGING VOLUME 536870912.
] > 04/01/2008 17:17:40 Part of the header (Volume information) is corrupted
] 
] Would need to try to fix header by hand if at all possible.  Data on the 
] disk is corrupted.
] 
] > 04/01/2008 17:17:40 totalInodes 165
] > 04/01/2008 17:17:41 "Salvage volume group" core dumped!
] 
] Badly enough that the tools fail.  If you have a core perhaps the tool 
] can be patched to not crash.

/usr/afs/logs includes a corefile.fs from 3/31 but no (recent) core.salv
and no cores from yesterday when the salvager ran.  Any idea where it
might be stashing cores?  or how to force it not to delete them?

] There may be patches in 1.4.7-pre2 that might help.

Let me know how I can use them. (I assume I get the 1.4.6 source
and then find patches on the openafs site somewhere and apply?)

John Boyland
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Help needed for receovery of data of inode fileserver (Solaris 10 x86)

2008-04-02 Thread Douglas E. Engert

Do you have logging turned on in this partition? (It should be off.)
see man mount_ufs
There were some issues in the past the the log might use some fields
in the inode that AFS also uses.

You might try turn off logging then run the AFS fsck. (Use at your own risk!)


John Tang Boyland wrote:

I mentioned a few weeks ago about an OpenAFS 1.4.6 Solaris 10 x86 inode
fileserver that refused to attach any volumes and for which the salavager
just coredumps (without leaving any core files?)  Repeated salvaging does
nothing except remove a few more vnodes.  On the other hand, umounting and 
fsck shows that everything is fine.


Is there any way to get some of the data out of the inode fileserver
partitions?  It's very frustrating because some things weren't
backed up (yes, yes, ...).  The -ForceOnLine option to bos salvage
looked promising but that's only available for MR AFS.

The only thing I can think of is labeling the partition as UFS and
then have the built-in fsck drop everything into lost+found,
but I'm not sure if any of the structure will be recoverable.
Is there any way to just get the data out of the inode partition?
(since fsck is happy with it.)

I even tried reinstalling openafs-1.4.1 to see if some horrible new
bug was introduced, but the same result happened.

John

documentation: 


SalvageLog starts:
@(#) OpenAFS 1.4.6 built  2007-12-17 
04/01/2008 17:15:21 STARTING AFS SALVAGER 2.4 (/usr/afs/bin/salvager -f)

04/01/2008 17:15:21 Starting salvage of file system partition /vicepa
04/01/2008 17:15:21 SALVAGING FILE SYSTEM PARTITION /vicepa (device=c1t1d0s6)
04/01/2008 17:15:21 ***Forced salvage of all volumes on this partition***
04/01/2008 17:15:31 Scanning inodes on device /dev/rdsk/c1t1d0s6...
04/01/2008 17:17:40 242 nVolumesInInodeFile 6776 
04/01/2008 17:17:40 SALVAGING VOLUME 536870912.

04/01/2008 17:17:40 Part of the header (Volume information) is corrupted
04/01/2008 17:17:40 totalInodes 165
04/01/2008 17:17:41 "Salvage volume group" core dumped!
04/01/2008 17:17:41 SALVAGING VOLUME 536870915.
04/01/2008 17:17:41 Part of the header (Volume information) is corrupted
04/01/2008 17:17:41 totalInodes 30
04/01/2008 17:17:41 "Salvage volume group" core dumped!
04/01/2008 17:17:41 SALVAGING 
04/01/2008 17:17:42 totalInodes 526

04/01/2008 17:17:43 "Salvage volume group" core dumped!
04/01/2008 17:17:43 SALVAGING VOLUME 536870927.
04/01/2008 17:17:43 Part of the header (Volume information) is corrupted
04/01/2008 17:17:43 Vnode 1922 (unique 1437): corresponding inode 22233511 is 
missing; vnode deleted, vnode mod time=Tue Nov 24 08:23:15 1998
04/01/2008 17:17:43 Vnode 1924 (unique 1438): corresponding inode 22233512 is 
missing; vnode deleted, vnode mod time=Tue Nov 24 08:23:15 1998
04/01/2008 17:17:43 Vnode 1926 (unique 431951): corresponding inode 22233513 is 
missing; vnode deleted, vnode mod time=Sun Mar 30 22:33:54 2003
04/01/2008 17:17:43 Vnode 1928 (unique 434559): corresponding inode 22233514 is 
missing; vnode deleted, vnode mod time=Sun Apr  6 23:02:50 2003
04/01/2008 17:17:43 Vnode 1930 (unique 431952): corresponding inode 22233515 is 
missing; vnode deleted, vnode mod time=Sun Mar 30 22:34:15 2003
...

# fsck /vicepa
Open AFS (R) openafs 1.4.1 fsck
** /dev/rdsk/c1t1d0s6
** Last Mounted on /vicepa
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
849922 files, 18516628 used, 39053375 free, 849677 AFS files (89095 frags, 
4870535 blocks, 0.2% fragmentation)

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info




--

 Douglas E. Engert  <[EMAIL PROTECTED]>
 Argonne National Laboratory
 9700 South Cass Avenue
 Argonne, Illinois  60439
 (630) 252-5444
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Help needed for receovery of data of inode fileserver (Solaris 10 x86)

2008-04-02 Thread Jeffrey Altman

John Tang Boyland wrote:
documentation: 


SalvageLog starts:
@(#) OpenAFS 1.4.6 built  2007-12-17 
04/01/2008 17:15:21 STARTING AFS SALVAGER 2.4 (/usr/afs/bin/salvager -f)

04/01/2008 17:15:21 Starting salvage of file system partition /vicepa
04/01/2008 17:15:21 SALVAGING FILE SYSTEM PARTITION /vicepa (device=c1t1d0s6)
04/01/2008 17:15:21 ***Forced salvage of all volumes on this partition***
04/01/2008 17:15:31 Scanning inodes on device /dev/rdsk/c1t1d0s6...
04/01/2008 17:17:40 242 nVolumesInInodeFile 6776 
04/01/2008 17:17:40 SALVAGING VOLUME 536870912.

04/01/2008 17:17:40 Part of the header (Volume information) is corrupted


Would need to try to fix header by hand if at all possible.  Data on the 
disk is corrupted.



04/01/2008 17:17:40 totalInodes 165
04/01/2008 17:17:41 "Salvage volume group" core dumped!


Badly enough that the tools fail.  If you have a core perhaps the tool 
can be patched to not crash.



04/01/2008 17:17:41 SALVAGING VOLUME 536870915.
04/01/2008 17:17:41 Part of the header (Volume information) is corrupted
04/01/2008 17:17:41 totalInodes 30
04/01/2008 17:17:41 "Salvage volume group" core dumped!
04/01/2008 17:17:41 SALVAGING 
04/01/2008 17:17:42 totalInodes 526

04/01/2008 17:17:43 "Salvage volume group" core dumped!
04/01/2008 17:17:43 SALVAGING VOLUME 536870927.
04/01/2008 17:17:43 Part of the header (Volume information) is corrupted
04/01/2008 17:17:43 Vnode 1922 (unique 1437): corresponding inode 22233511 is 
missing; vnode deleted, vnode mod time=Tue Nov 24 08:23:15 1998
04/01/2008 17:17:43 Vnode 1924 (unique 1438): corresponding inode 22233512 is 
missing; vnode deleted, vnode mod time=Tue Nov 24 08:23:15 1998
04/01/2008 17:17:43 Vnode 1926 (unique 431951): corresponding inode 22233513 is 
missing; vnode deleted, vnode mod time=Sun Mar 30 22:33:54 2003
04/01/2008 17:17:43 Vnode 1928 (unique 434559): corresponding inode 22233514 is 
missing; vnode deleted, vnode mod time=Sun Apr  6 23:02:50 2003
04/01/2008 17:17:43 Vnode 1930 (unique 431952): corresponding inode 22233515 is 
missing; vnode deleted, vnode mod time=Sun Mar 30 22:34:15 2003
...


There may be patches in 1.4.7-pre2 that might help.

Jeffrey Altman



smime.p7s
Description: S/MIME Cryptographic Signature


[OpenAFS] Help needed for receovery of data of inode fileserver (Solaris 10 x86)

2008-04-02 Thread John Tang Boyland
I mentioned a few weeks ago about an OpenAFS 1.4.6 Solaris 10 x86 inode
fileserver that refused to attach any volumes and for which the salavager
just coredumps (without leaving any core files?)  Repeated salvaging does
nothing except remove a few more vnodes.  On the other hand, umounting and 
fsck shows that everything is fine.

Is there any way to get some of the data out of the inode fileserver
partitions?  It's very frustrating because some things weren't
backed up (yes, yes, ...).  The -ForceOnLine option to bos salvage
looked promising but that's only available for MR AFS.

The only thing I can think of is labeling the partition as UFS and
then have the built-in fsck drop everything into lost+found,
but I'm not sure if any of the structure will be recoverable.
Is there any way to just get the data out of the inode partition?
(since fsck is happy with it.)

I even tried reinstalling openafs-1.4.1 to see if some horrible new
bug was introduced, but the same result happened.

John

documentation: 

SalvageLog starts:
@(#) OpenAFS 1.4.6 built  2007-12-17 
04/01/2008 17:15:21 STARTING AFS SALVAGER 2.4 (/usr/afs/bin/salvager -f)
04/01/2008 17:15:21 Starting salvage of file system partition /vicepa
04/01/2008 17:15:21 SALVAGING FILE SYSTEM PARTITION /vicepa (device=c1t1d0s6)
04/01/2008 17:15:21 ***Forced salvage of all volumes on this partition***
04/01/2008 17:15:31 Scanning inodes on device /dev/rdsk/c1t1d0s6...
04/01/2008 17:17:40 242 nVolumesInInodeFile 6776 
04/01/2008 17:17:40 SALVAGING VOLUME 536870912.
04/01/2008 17:17:40 Part of the header (Volume information) is corrupted
04/01/2008 17:17:40 totalInodes 165
04/01/2008 17:17:41 "Salvage volume group" core dumped!
04/01/2008 17:17:41 SALVAGING VOLUME 536870915.
04/01/2008 17:17:41 Part of the header (Volume information) is corrupted
04/01/2008 17:17:41 totalInodes 30
04/01/2008 17:17:41 "Salvage volume group" core dumped!
04/01/2008 17:17:41 SALVAGING 
04/01/2008 17:17:42 totalInodes 526
04/01/2008 17:17:43 "Salvage volume group" core dumped!
04/01/2008 17:17:43 SALVAGING VOLUME 536870927.
04/01/2008 17:17:43 Part of the header (Volume information) is corrupted
04/01/2008 17:17:43 Vnode 1922 (unique 1437): corresponding inode 22233511 is 
missing; vnode deleted, vnode mod time=Tue Nov 24 08:23:15 1998
04/01/2008 17:17:43 Vnode 1924 (unique 1438): corresponding inode 22233512 is 
missing; vnode deleted, vnode mod time=Tue Nov 24 08:23:15 1998
04/01/2008 17:17:43 Vnode 1926 (unique 431951): corresponding inode 22233513 is 
missing; vnode deleted, vnode mod time=Sun Mar 30 22:33:54 2003
04/01/2008 17:17:43 Vnode 1928 (unique 434559): corresponding inode 22233514 is 
missing; vnode deleted, vnode mod time=Sun Apr  6 23:02:50 2003
04/01/2008 17:17:43 Vnode 1930 (unique 431952): corresponding inode 22233515 is 
missing; vnode deleted, vnode mod time=Sun Mar 30 22:34:15 2003
...

# fsck /vicepa
Open AFS (R) openafs 1.4.1 fsck
** /dev/rdsk/c1t1d0s6
** Last Mounted on /vicepa
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
849922 files, 18516628 used, 39053375 free, 849677 AFS files (89095 frags, 
4870535 blocks, 0.2% fragmentation)

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info