Re: [OpenAFS] Help needed for receovery of data of inode fileserver (Solaris 10 x86)
> I just saw -nowrite exists also in OpenAFS only that the bos command claims > it would be possible only in MR-AFS. So one could at least run the salvager > under the debugger with -nowrite That's because for some reason command parser in bos salvage doesn't know about that flag and so assumes its MR-AFS only. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Help needed for receovery of data of inode fileserver (Solaris 10 x86)
Jeffrey Altman wrote: Hartmut Reuter wrote: Jeffrey Altman wrote: Hartmut Reuter wrote: So what is the value of 'class' if not vLarge? As you can see from that line above it's vSmall: >> [6] DistilVnodeEssence(rwVId = 536870912U, class = 1, ino = >> 21977313U, maxu = 0x8046bc4), line 3175 in "vol-salvage.c" So there might be really some thing wrong with the SmallVnodeFile, but to do an AssertionFailed is not the best way to repair it! What the AssertionFailed means is that no one has written code to deal with a case where this error has occurred. It can't be fixed with Salvager until someone writes the missing code. Of course, but for the user it might be better to skip handling of this error and to continue with the next vnode. So he could get back at least the damaged volume and copy whatever is still accessible. So John, ifdef line 3175 and recompile. If this was a single bad vnode your volume may come online again, otherwise it's probably lost anyway. Hartmut I disagree. The reason that assert is there is that continuing will cause more damage to the data. We do not know based upon the available data whether this is a single bad vnode or whether perhaps the wrong file is being reference for the SmallVnodeFile. What is known is that one vnode, perhaps the first vnode examined has completely valid data except for the fact that it is in the wrong file. There are several issues that are worth pursuing here. Especially because whatever the problem is has begun occurring on multiple machines: 1. what is the actual damage that has taken place? 2. can the damage be correct? 3. can the damage be avoided in the first place? What is the cause? Jeffrey Altman Of course we should not remove the assert() forever, but just for the test of this volume which otherwise probably will be lost anyway. In MR-AFS we had a -nowrite option to do just a dry-run. I admit that it's a lot work to implement this, but some times it is very helpful. I just saw -nowrite exists also in OpenAFS only that the bos command claims it would be possible only in MR-AFS. So one could at least run the salvager under the debugger with -nowrite Hartmut ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info -- - Hartmut Reuter e-mail [EMAIL PROTECTED] phone+49-89-3299-1328 fax +49-89-3299-1301 RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Help needed for receovery of data of inode fileserver (Solaris 10 x86)
Jeffrey Altman wrote: Hartmut Reuter wrote: Jeffrey Altman wrote: Hartmut Reuter wrote: So what is the value of 'class' if not vLarge? As you can see from that line above it's vSmall: >> [6] DistilVnodeEssence(rwVId = 536870912U, class = 1, ino = >> 21977313U, maxu = 0x8046bc4), line 3175 in "vol-salvage.c" So there might be really some thing wrong with the SmallVnodeFile, but to do an AssertionFailed is not the best way to repair it! What the AssertionFailed means is that no one has written code to deal with a case where this error has occurred. It can't be fixed with Salvager until someone writes the missing code. Of course, but for the user it might be better to skip handling of this error and to continue with the next vnode. So he could get back at least the damaged volume and copy whatever is still accessible. So John, ifdef line 3175 and recompile. If this was a single bad vnode your volume may come online again, otherwise it's probably lost anyway. Hartmut I disagree. The reason that assert is there is that continuing will cause more damage to the data. We do not know based upon the available data whether this is a single bad vnode or whether perhaps the wrong file is being reference for the SmallVnodeFile. What is known is that one vnode, perhaps the first vnode examined has completely valid data except for the fact that it is in the wrong file. There are several issues that are worth pursuing here. Especially because whatever the problem is has begun occurring on multiple machines: 1. what is the actual damage that has taken place? 2. can the damage be correct? 3. can the damage be avoided in the first place? What is the cause? Jeffrey Altman Of course we should not remove the assert() forever, but just for the test of this volume which otherwise probably will be lost anyway. In MR-AFS we had a -nowrite option to do just a dry-run. I admit that it's a lot work to implement this, but some times it is very helpful. Hartmut ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info -- - Hartmut Reuter e-mail [EMAIL PROTECTED] phone+49-89-3299-1328 fax +49-89-3299-1301 RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Help needed for receovery of data of inode fileserver (Solaris 10 x86)
> There are several issues that are worth pursuing here. Especially because > whatever the problem is has begun occurring on multiple machines: > > 1. what is the actual damage that has taken place? > > 2. can the damage be correct? > > 3. can the damage be avoided in the first place? What is the cause? > If it's a reproducible problem, it should be easy to change the volume package to assert when writing a "wrong" type vnode for a class to its index. But I think there's more to it than that. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Help needed for receovery of data of inode fileserver (Solaris 10 x86)
Hartmut Reuter wrote: Jeffrey Altman wrote: Hartmut Reuter wrote: So what is the value of 'class' if not vLarge? As you can see from that line above it's vSmall: >> [6] DistilVnodeEssence(rwVId = 536870912U, class = 1, ino = >> 21977313U, maxu = 0x8046bc4), line 3175 in "vol-salvage.c" So there might be really some thing wrong with the SmallVnodeFile, but to do an AssertionFailed is not the best way to repair it! What the AssertionFailed means is that no one has written code to deal with a case where this error has occurred. It can't be fixed with Salvager until someone writes the missing code. Of course, but for the user it might be better to skip handling of this error and to continue with the next vnode. So he could get back at least the damaged volume and copy whatever is still accessible. So John, ifdef line 3175 and recompile. If this was a single bad vnode your volume may come online again, otherwise it's probably lost anyway. Hartmut I disagree. The reason that assert is there is that continuing will cause more damage to the data. We do not know based upon the available data whether this is a single bad vnode or whether perhaps the wrong file is being reference for the SmallVnodeFile. What is known is that one vnode, perhaps the first vnode examined has completely valid data except for the fact that it is in the wrong file. There are several issues that are worth pursuing here. Especially because whatever the problem is has begun occurring on multiple machines: 1. what is the actual damage that has taken place? 2. can the damage be correct? 3. can the damage be avoided in the first place? What is the cause? Jeffrey Altman ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Help needed for receovery of data of inode fileserver (Solaris 10 x86)
Jeffrey Altman wrote: Hartmut Reuter wrote: So what is the value of 'class' if not vLarge? As you can see from that line above it's vSmall: >> [6] DistilVnodeEssence(rwVId = 536870912U, class = 1, ino = >> 21977313U, maxu = 0x8046bc4), line 3175 in "vol-salvage.c" So there might be really some thing wrong with the SmallVnodeFile, but to do an AssertionFailed is not the best way to repair it! What the AssertionFailed means is that no one has written code to deal with a case where this error has occurred. It can't be fixed with Salvager until someone writes the missing code. Of course, but for the user it might be better to skip handling of this error and to continue with the next vnode. So he could get back at least the damaged volume and copy whatever is still accessible. So John, ifdef line 3175 and recompile. If this was a single bad vnode your volume may come online again, otherwise it's probably lost anyway. Hartmut ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info -- - Hartmut Reuter e-mail [EMAIL PROTECTED] phone+49-89-3299-1328 fax +49-89-3299-1301 RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Help needed for receovery of data of inode fileserver (Solaris 10 x86)
John Tang Boyland wrote: (dbx) print class class = 1 If it's useful, here's some more info: (dbx) print *vnode *vnode = { type = 2U cloned = 1U modeBits = 493U linkCount= 2 length = 8192U uniquifier = 1U dataVersion = 166U vn_ino_lo= 21977315 unixModifyTime = 1134748419U author = 1U owner= 0 parent = 0 vnodeMagic = 2911331838U lock = { lockCount = 0 lockTime = 0 } serverModifyTime = 1134748419U group= 0 vn_ino_hi= 0 reserved6= 0 } This entry is clearly a vDirectory (type == 2) and it a vLarge vnode (vnodeMagic == LARGEVNODEMAGIC) but it is showing up in the file referenced by rwIsp->volSummary->header.smallVnodeIndex in SalvageVolume(). This header might be corrupted thereby referring to the wrong indexfile. I won't be able to do more without closer inspection of the volume data. Jeffrey Altman ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Help needed for receovery of data of inode fileserver (Solaris 10 x86)
Hartmut Reuter wrote: So what is the value of 'class' if not vLarge? As you can see from that line above it's vSmall: >> [6] DistilVnodeEssence(rwVId = 536870912U, class = 1, ino = >> 21977313U, maxu = 0x8046bc4), line 3175 in "vol-salvage.c" So there might be really some thing wrong with the SmallVnodeFile, but to do an AssertionFailed is not the best way to repair it! What the AssertionFailed means is that no one has written code to deal with a case where this error has occurred. It can't be fixed with Salvager until someone writes the missing code. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Help needed for receovery of data of inode fileserver (Solaris 10 x86)
] > (dbx) up ] > Current function is DistilVnodeEssence ] > 3175 assert(class == vLarge); ] > (dbx) list 3170,3180 ] > 3170 vep->type = vnode->type; ] > 3171 vep->author = vnode->author; ] > 3172 vep->owner = vnode->owner; ] > 3173 vep->group = vnode->group; ] > 3174 if (vnode->type == vDirectory) { ] > 3175 assert(class == vLarge); ] > 3176 vip->inodes[vnodeIndex] = VNDISK_GET_INO(vnode); ] > 3177 } ] > 3178 } ] > 3179 } ] > 3180 STREAM_CLOSE(file); ] ] So what is the value of 'class' if not vLarge? Oops. Sorry. Yes I should have included that in my email: (dbx) print class class = 1 If it's useful, here's some more info: (dbx) print *vnode *vnode = { type = 2U cloned = 1U modeBits = 493U linkCount= 2 length = 8192U uniquifier = 1U dataVersion = 166U vn_ino_lo= 21977315 unixModifyTime = 1134748419U author = 1U owner= 0 parent = 0 vnodeMagic = 2911331838U lock = { lockCount = 0 lockTime = 0 } serverModifyTime = 1134748419U group= 0 vn_ino_hi= 0 reserved6= 0 } John Boyland ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Help needed for receovery of data of inode fileserver (Solaris 10 x86)
Jeffrey Altman wrote: John Tang Boyland wrote: OK I compiled the salvager with debugging and without optimization. filip# /opt/SUNWspro/bin/dbx salvager.debug For information about new features see `help changes' To remove this message, put `dbxenv suppress_startup_message 7.5' in your .dbxrc Reading salvager.debug Reading ld.so.1 Reading libresolv.so.2 Reading libsocket.so.1 Reading libnsl.so.1 Reading libintl.so.1 Reading libdl.so.1 Reading libc.so.1 (dbx) run /vicepa -debug -parallel 1 Running: salvager.debug /vicepa -debug -parallel 1 (process id 3491) [after three hours, I pressed return] Thu Apr 3 14:14:20 2008: Assertion failed! file vol-salvage.c, line 3175. signal ABRT (Abort) in __lwp_kill at 0xfee21157 0xfee21157: __lwp_kill+0x0007: jae __lwp_kill+0x15[ 0xfee21165, .+0xe ] Current function is AssertionFailed 48 abort(); (dbx) where [1] __lwp_kill(0x1, 0x6), at 0xfee21157 [2] _thr_kill(0x1, 0x6), at 0xfee1e8c9 [3] raise(0x6), at 0xfedcd163 [4] abort(0x804694a, 0x47f52c8c, 0x6854, 0x70412075, 0x33202072, 0x3a343120), at 0xfedb0ba9 =>[5] AssertionFailed(file = 0x808b724 "vol-salvage.c", line = 3175), line 48 in "assert.c" [6] DistilVnodeEssence(rwVId = 536870912U, class = 1, ino = 21977313U, maxu = 0x8046bc4), line 3175 in "vol-salvage.c" [7] SalvageVolume(rwIsp = 0x9ab0130, alinkH = 0x9ac0de8), line 3346 in "vol-salvage.c" [8] DoSalvageVolumeGroup(isp = 0x9ab0130, nVols = 1), line 2104 in "vol-salvage.c" [9] SalvageFileSys1(partP = 0x80bacd8, singleVolumeNumber = 0), line 1357 in "vol-salvage.c" [10] SalvageFileSys(partP = 0x80bacd8, singleVolumeNumber = 0), line 1192 in "vol-salvage.c" [11] handleit(as = 0x80a9340), line 687 in "vol-salvage.c" [12] cmd_Dispatch(argc = 6, argv = 0x80aaba8), line 902 in "cmd.c" [13] main(argc = 5, argv = 0x8047650), line 845 in "vol-salvage.c" (dbx) up Current function is DistilVnodeEssence 3175 assert(class == vLarge); (dbx) list 3170,3180 3170 vep->type = vnode->type; 3171 vep->author = vnode->author; 3172 vep->owner = vnode->owner; 3173 vep->group = vnode->group; 3174 if (vnode->type == vDirectory) { 3175 assert(class == vLarge); 3176 vip->inodes[vnodeIndex] = VNDISK_GET_INO(vnode); 3177 } 3178 } 3179 } 3180 STREAM_CLOSE(file); So what is the value of 'class' if not vLarge? As you can see from that line above it's vSmall: >> [6] DistilVnodeEssence(rwVId = 536870912U, class = 1, ino = >> 21977313U, maxu = 0x8046bc4), line 3175 in "vol-salvage.c" So there might be really some thing wrong with the SmallVnodeFile, but to do an AssertionFailed is not the best way to repair it! Hartmut - Hartmut Reuter e-mail [EMAIL PROTECTED] phone+49-89-3299-1328 fax +49-89-3299-1301 RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Help needed for receovery of data of inode fileserver (Solaris 10 x86)
John Tang Boyland wrote: OK I compiled the salvager with debugging and without optimization. filip# /opt/SUNWspro/bin/dbx salvager.debug For information about new features see `help changes' To remove this message, put `dbxenv suppress_startup_message 7.5' in your .dbxrc Reading salvager.debug Reading ld.so.1 Reading libresolv.so.2 Reading libsocket.so.1 Reading libnsl.so.1 Reading libintl.so.1 Reading libdl.so.1 Reading libc.so.1 (dbx) run /vicepa -debug -parallel 1 Running: salvager.debug /vicepa -debug -parallel 1 (process id 3491) [after three hours, I pressed return] Thu Apr 3 14:14:20 2008: Assertion failed! file vol-salvage.c, line 3175. signal ABRT (Abort) in __lwp_kill at 0xfee21157 0xfee21157: __lwp_kill+0x0007: jae __lwp_kill+0x15[ 0xfee21165, .+0xe ] Current function is AssertionFailed 48 abort(); (dbx) where [1] __lwp_kill(0x1, 0x6), at 0xfee21157 [2] _thr_kill(0x1, 0x6), at 0xfee1e8c9 [3] raise(0x6), at 0xfedcd163 [4] abort(0x804694a, 0x47f52c8c, 0x6854, 0x70412075, 0x33202072, 0x3a343120), at 0xfedb0ba9 =>[5] AssertionFailed(file = 0x808b724 "vol-salvage.c", line = 3175), line 48 in "assert.c" [6] DistilVnodeEssence(rwVId = 536870912U, class = 1, ino = 21977313U, maxu = 0x8046bc4), line 3175 in "vol-salvage.c" [7] SalvageVolume(rwIsp = 0x9ab0130, alinkH = 0x9ac0de8), line 3346 in "vol-salvage.c" [8] DoSalvageVolumeGroup(isp = 0x9ab0130, nVols = 1), line 2104 in "vol-salvage.c" [9] SalvageFileSys1(partP = 0x80bacd8, singleVolumeNumber = 0), line 1357 in "vol-salvage.c" [10] SalvageFileSys(partP = 0x80bacd8, singleVolumeNumber = 0), line 1192 in "vol-salvage.c" [11] handleit(as = 0x80a9340), line 687 in "vol-salvage.c" [12] cmd_Dispatch(argc = 6, argv = 0x80aaba8), line 902 in "cmd.c" [13] main(argc = 5, argv = 0x8047650), line 845 in "vol-salvage.c" (dbx) up Current function is DistilVnodeEssence 3175 assert(class == vLarge); (dbx) list 3170,3180 3170 vep->type = vnode->type; 3171 vep->author = vnode->author; 3172 vep->owner = vnode->owner; 3173 vep->group = vnode->group; 3174 if (vnode->type == vDirectory) { 3175 assert(class == vLarge); 3176 vip->inodes[vnodeIndex] = VNDISK_GET_INO(vnode); 3177 } 3178 } 3179 } 3180 STREAM_CLOSE(file); So what is the value of 'class' if not vLarge? smime.p7s Description: S/MIME Cryptographic Signature
Re: [OpenAFS] Help needed for receovery of data of inode fileserver (Solaris 10 x86)
OK I compiled the salvager with debugging and without optimization. filip# /opt/SUNWspro/bin/dbx salvager.debug For information about new features see `help changes' To remove this message, put `dbxenv suppress_startup_message 7.5' in your .dbxrc Reading salvager.debug Reading ld.so.1 Reading libresolv.so.2 Reading libsocket.so.1 Reading libnsl.so.1 Reading libintl.so.1 Reading libdl.so.1 Reading libc.so.1 (dbx) run /vicepa -debug -parallel 1 Running: salvager.debug /vicepa -debug -parallel 1 (process id 3491) [after three hours, I pressed return] Thu Apr 3 14:14:20 2008: Assertion failed! file vol-salvage.c, line 3175. signal ABRT (Abort) in __lwp_kill at 0xfee21157 0xfee21157: __lwp_kill+0x0007: jae __lwp_kill+0x15[ 0xfee21165, .+0xe ] Current function is AssertionFailed 48 abort(); (dbx) where [1] __lwp_kill(0x1, 0x6), at 0xfee21157 [2] _thr_kill(0x1, 0x6), at 0xfee1e8c9 [3] raise(0x6), at 0xfedcd163 [4] abort(0x804694a, 0x47f52c8c, 0x6854, 0x70412075, 0x33202072, 0x3a343120), at 0xfedb0ba9 =>[5] AssertionFailed(file = 0x808b724 "vol-salvage.c", line = 3175), line 48 in "assert.c" [6] DistilVnodeEssence(rwVId = 536870912U, class = 1, ino = 21977313U, maxu = 0x8046bc4), line 3175 in "vol-salvage.c" [7] SalvageVolume(rwIsp = 0x9ab0130, alinkH = 0x9ac0de8), line 3346 in "vol-salvage.c" [8] DoSalvageVolumeGroup(isp = 0x9ab0130, nVols = 1), line 2104 in "vol-salvage.c" [9] SalvageFileSys1(partP = 0x80bacd8, singleVolumeNumber = 0), line 1357 in "vol-salvage.c" [10] SalvageFileSys(partP = 0x80bacd8, singleVolumeNumber = 0), line 1192 in "vol-salvage.c" [11] handleit(as = 0x80a9340), line 687 in "vol-salvage.c" [12] cmd_Dispatch(argc = 6, argv = 0x80aaba8), line 902 in "cmd.c" [13] main(argc = 5, argv = 0x8047650), line 845 in "vol-salvage.c" (dbx) up Current function is DistilVnodeEssence 3175 assert(class == vLarge); (dbx) list 3170,3180 3170 vep->type = vnode->type; 3171 vep->author = vnode->author; 3172 vep->owner = vnode->owner; 3173 vep->group = vnode->group; 3174 if (vnode->type == vDirectory) { 3175 assert(class == vLarge); 3176 vip->inodes[vnodeIndex] = VNDISK_GET_INO(vnode); 3177 } 3178 } 3179 } 3180 STREAM_CLOSE(file); The log says: @(#) OpenAFS 1.4.7pre2 built 2008-04-03 04/03/2008 11:21:46 STARTING AFS SALVAGER 2.4 (/usr/openafs-1.4.7pre2/bin/salvag er.debug /vicepa -debug -parallel 1) 04/03/2008 11:21:46 SALVAGING FILE SYSTEM PARTITION /vicepa (device=c1t1d0s6) 04/03/2008 11:21:56 Scanning inodes on device /dev/rdsk/c1t1d0s6... 04/03/2008 11:24:06 242 nVolumesInInodeFile 6776 04/03/2008 14:14:20 SALVAGING VOLUME 536870912. 04/03/2008 14:14:20 root.afs (536870912) updated 12/16/2005 09:53 04/03/2008 14:14:20 totalInodes 165 and then it breaks off John. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Help needed for receovery of data of inode fileserver (Solaris 10 x86)
On Thu, Apr 3, 2008 at 11:17 AM, Jeffrey Altman <[EMAIL PROTECTED]> wrote: > John Tang Boyland wrote: > > > ] There may be patches in 1.4.7-pre2 that might help. > > > > > 04/03/2008 08:24:41 SALVAGING VOLUME 536870912. > > > 04/03/2008 08:24:41 Part of the header (Volume information) is corrupted > > 04/03/2008 08:24:41 totalInodes 165 > > 04/03/2008 08:24:41 "Salvage volume group" core dumped! > > > > > > Is there a way for the core files to be found? It seems > > that bos/salvager deletes them? "ulimit" says "unlimited". > > There's no core files in /usr/afs/logs, /tmp or /var/tmp > > > > There are two things you can try. > > 1. Force the salvager to run without forking by running it with "-parallel > 1". That might get you a core file. > > 2. Run the salvager manually under gdb. and if you do 2, you don't care: set it to follow forked children, and then collect the core. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Help needed for receovery of data of inode fileserver (Solaris 10 x86)
John Tang Boyland wrote: ] There may be patches in 1.4.7-pre2 that might help. 04/03/2008 08:24:41 SALVAGING VOLUME 536870912. 04/03/2008 08:24:41 Part of the header (Volume information) is corrupted 04/03/2008 08:24:41 totalInodes 165 04/03/2008 08:24:41 "Salvage volume group" core dumped! Is there a way for the core files to be found? It seems that bos/salvager deletes them? "ulimit" says "unlimited". There's no core files in /usr/afs/logs, /tmp or /var/tmp There are two things you can try. 1. Force the salvager to run without forking by running it with "-parallel 1". That might get you a core file. 2. Run the salvager manually under gdb. smime.p7s Description: S/MIME Cryptographic Signature
Re: [OpenAFS] Help needed for receovery of data of inode fileserver (Solaris 10 x86)
] There may be patches in 1.4.7-pre2 that might help. I tried it, but got no better results: @(#) OpenAFS 1.4.7pre2 built 2008-04-03 04/03/2008 08:22:21 STARTING AFS SALVAGER 2.4 (/usr/afs/bin/salvager -f) 04/03/2008 08:22:21 Starting salvage of file system partition /vicepa 04/03/2008 08:22:21 SALVAGING FILE SYSTEM PARTITION /vicepa (device=c1t1d0s6) 04/03/2008 08:22:21 ***Forced salvage of all volumes on this partition*** 04/03/2008 08:22:31 Scanning inodes on device /dev/rdsk/c1t1d0s6... 04/03/2008 08:24:41 246 nVolumesInInodeFile 6888 04/03/2008 08:24:41 /vicepa/V0536871296.vol is not a legitimate volume header f le; deleted 04/03/2008 08:24:41 SALVAGING VOLUME 536870912. 04/03/2008 08:24:41 Part of the header (Volume information) is corrupted 04/03/2008 08:24:41 totalInodes 165 04/03/2008 08:24:41 "Salvage volume group" core dumped! 04/03/2008 08:24:41 SALVAGING VOLUME 536870915. 04/03/2008 08:24:41 Part of the header (Volume information) is corrupted 04/03/2008 08:24:41 totalInodes 30 04/03/2008 08:24:42 "Salvage volume group" core dumped! 04/03/2008 08:24:42 SALVAGING VOLUME 536870918. 04/03/2008 08:24:42 Part of the header (Volume information) is corrupted 04/03/2008 08:24:42 totalInodes 5 04/03/2008 08:24:42 "Salvage volume group" core dumped! 04/03/2008 08:24:42 SALVAGING VOLUME 536870921. 04/03/2008 08:24:42 Part of the header (Volume information) is corrupted 04/03/2008 08:24:42 totalInodes 7 04/03/2008 08:24:43 "Salvage volume group" core dumped! 04/03/2008 08:24:43 SALVAGING VOLUME 536870924. 04/03/2008 08:24:43 Part of the header (Volume information) is corrupted 04/03/2008 08:24:43 totalInodes 526 04/03/2008 08:24:43 "Salvage volume group" core dumped! 04/03/2008 08:24:43 SALVAGING VOLUME 536870927. 04/03/2008 08:24:43 Part of the header (Volume information) is corrupted 04/03/2008 08:24:43 Vnode 2050 (unique 1505): corresponding inode 22233575 is m ssing; vnode deleted, vnode mod time=Tue Nov 24 08:23:17 1998 04/03/2008 08:24:43 Vnode 2052 (unique 1506): corresponding inode 22233576 is m ssing; vnode deleted, vnode mod time=Tue Nov 24 08:23:17 1998 ... etc ad nauseum (almost 2000 lines of this) 04/03/2008 08:26:44 Vnode 1600 (unique 2574): corresponding inode 22768239 is missing; vnode deleted, vnode mod time=Wed Feb 13 15:49:27 2008 04/03/2008 08:26:44 totalInodes 701 04/03/2008 08:26:44 "Salvage volume group" core dumped! 04/03/2008 08:26:44 SALVAGING OF PARTITION /vicepa COMPLETED # fsck /vicepa Open AFS (R) openafs 1.4.7pre2 fsck ** /dev/rdsk/c1t1d0s6 ** Last Mounted on /vicepa ** Phase 1 - Check Blocks and Sizes ** Phase 2 - Check Pathnames ** Phase 3 - Check Connectivity ** Phase 4 - Check Reference Counts ** Phase 5 - Check Cyl groups 849926 files, 18516648 used, 39053355 free, 849677 AFS files (89091 frags, 4870533 blocks, 0.2% fragmentation) Is there a way for the core files to be found? It seems that bos/salvager deletes them? "ulimit" says "unlimited". There's no core files in /usr/afs/logs, /tmp or /var/tmp John Boyland ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Help needed for receovery of data of inode fileserver (Solaris 10 x86)
John Tang Boyland wrote: ] There may be patches in 1.4.7-pre2 that might help. Let me know how I can use them. (I assume I get the 1.4.6 source and then find patches on the openafs site somewhere and apply?) Install 1.4.7-pre2. It is after all a release candidate. smime.p7s Description: S/MIME Cryptographic Signature
Re: [OpenAFS] Help needed for receovery of data of inode fileserver (Solaris 10 x86)
] Do you have logging turned on in this partition? (It should be off.) Yes, we've known about the problems with logging: /etc/vfstab has: /dev/dsk/c1t0d0s5 /dev/rdsk/c1t0d0s5 /usr/vice ufs 2 yes nologging /dev/dsk/c1t1d0s6 /dev/rdsk/c1t1d0s6 /vicepa afs 3 yes nologging John ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Help needed for receovery of data of inode fileserver (Solaris 10 x86)
] > SalvageLog starts: ] > @(#) OpenAFS 1.4.6 built 2007-12-17 ] > 04/01/2008 17:15:21 STARTING AFS SALVAGER 2.4 (/usr/afs/bin/salvager -f) ] > 04/01/2008 17:15:21 Starting salvage of file system partition /vicepa ] > 04/01/2008 17:15:21 SALVAGING FILE SYSTEM PARTITION /vicepa (device=c1t1d0s6) ] > 04/01/2008 17:15:21 ***Forced salvage of all volumes on this partition*** ] > 04/01/2008 17:15:31 Scanning inodes on device /dev/rdsk/c1t1d0s6... ] > 04/01/2008 17:17:40 242 nVolumesInInodeFile 6776 ] > 04/01/2008 17:17:40 SALVAGING VOLUME 536870912. ] > 04/01/2008 17:17:40 Part of the header (Volume information) is corrupted ] ] Would need to try to fix header by hand if at all possible. Data on the ] disk is corrupted. ] ] > 04/01/2008 17:17:40 totalInodes 165 ] > 04/01/2008 17:17:41 "Salvage volume group" core dumped! ] ] Badly enough that the tools fail. If you have a core perhaps the tool ] can be patched to not crash. /usr/afs/logs includes a corefile.fs from 3/31 but no (recent) core.salv and no cores from yesterday when the salvager ran. Any idea where it might be stashing cores? or how to force it not to delete them? ] There may be patches in 1.4.7-pre2 that might help. Let me know how I can use them. (I assume I get the 1.4.6 source and then find patches on the openafs site somewhere and apply?) John Boyland ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Help needed for receovery of data of inode fileserver (Solaris 10 x86)
Do you have logging turned on in this partition? (It should be off.) see man mount_ufs There were some issues in the past the the log might use some fields in the inode that AFS also uses. You might try turn off logging then run the AFS fsck. (Use at your own risk!) John Tang Boyland wrote: I mentioned a few weeks ago about an OpenAFS 1.4.6 Solaris 10 x86 inode fileserver that refused to attach any volumes and for which the salavager just coredumps (without leaving any core files?) Repeated salvaging does nothing except remove a few more vnodes. On the other hand, umounting and fsck shows that everything is fine. Is there any way to get some of the data out of the inode fileserver partitions? It's very frustrating because some things weren't backed up (yes, yes, ...). The -ForceOnLine option to bos salvage looked promising but that's only available for MR AFS. The only thing I can think of is labeling the partition as UFS and then have the built-in fsck drop everything into lost+found, but I'm not sure if any of the structure will be recoverable. Is there any way to just get the data out of the inode partition? (since fsck is happy with it.) I even tried reinstalling openafs-1.4.1 to see if some horrible new bug was introduced, but the same result happened. John documentation: SalvageLog starts: @(#) OpenAFS 1.4.6 built 2007-12-17 04/01/2008 17:15:21 STARTING AFS SALVAGER 2.4 (/usr/afs/bin/salvager -f) 04/01/2008 17:15:21 Starting salvage of file system partition /vicepa 04/01/2008 17:15:21 SALVAGING FILE SYSTEM PARTITION /vicepa (device=c1t1d0s6) 04/01/2008 17:15:21 ***Forced salvage of all volumes on this partition*** 04/01/2008 17:15:31 Scanning inodes on device /dev/rdsk/c1t1d0s6... 04/01/2008 17:17:40 242 nVolumesInInodeFile 6776 04/01/2008 17:17:40 SALVAGING VOLUME 536870912. 04/01/2008 17:17:40 Part of the header (Volume information) is corrupted 04/01/2008 17:17:40 totalInodes 165 04/01/2008 17:17:41 "Salvage volume group" core dumped! 04/01/2008 17:17:41 SALVAGING VOLUME 536870915. 04/01/2008 17:17:41 Part of the header (Volume information) is corrupted 04/01/2008 17:17:41 totalInodes 30 04/01/2008 17:17:41 "Salvage volume group" core dumped! 04/01/2008 17:17:41 SALVAGING 04/01/2008 17:17:42 totalInodes 526 04/01/2008 17:17:43 "Salvage volume group" core dumped! 04/01/2008 17:17:43 SALVAGING VOLUME 536870927. 04/01/2008 17:17:43 Part of the header (Volume information) is corrupted 04/01/2008 17:17:43 Vnode 1922 (unique 1437): corresponding inode 22233511 is missing; vnode deleted, vnode mod time=Tue Nov 24 08:23:15 1998 04/01/2008 17:17:43 Vnode 1924 (unique 1438): corresponding inode 22233512 is missing; vnode deleted, vnode mod time=Tue Nov 24 08:23:15 1998 04/01/2008 17:17:43 Vnode 1926 (unique 431951): corresponding inode 22233513 is missing; vnode deleted, vnode mod time=Sun Mar 30 22:33:54 2003 04/01/2008 17:17:43 Vnode 1928 (unique 434559): corresponding inode 22233514 is missing; vnode deleted, vnode mod time=Sun Apr 6 23:02:50 2003 04/01/2008 17:17:43 Vnode 1930 (unique 431952): corresponding inode 22233515 is missing; vnode deleted, vnode mod time=Sun Mar 30 22:34:15 2003 ... # fsck /vicepa Open AFS (R) openafs 1.4.1 fsck ** /dev/rdsk/c1t1d0s6 ** Last Mounted on /vicepa ** Phase 1 - Check Blocks and Sizes ** Phase 2 - Check Pathnames ** Phase 3 - Check Connectivity ** Phase 4 - Check Reference Counts ** Phase 5 - Check Cyl groups 849922 files, 18516628 used, 39053375 free, 849677 AFS files (89095 frags, 4870535 blocks, 0.2% fragmentation) ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info -- Douglas E. Engert <[EMAIL PROTECTED]> Argonne National Laboratory 9700 South Cass Avenue Argonne, Illinois 60439 (630) 252-5444 ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Help needed for receovery of data of inode fileserver (Solaris 10 x86)
John Tang Boyland wrote: documentation: SalvageLog starts: @(#) OpenAFS 1.4.6 built 2007-12-17 04/01/2008 17:15:21 STARTING AFS SALVAGER 2.4 (/usr/afs/bin/salvager -f) 04/01/2008 17:15:21 Starting salvage of file system partition /vicepa 04/01/2008 17:15:21 SALVAGING FILE SYSTEM PARTITION /vicepa (device=c1t1d0s6) 04/01/2008 17:15:21 ***Forced salvage of all volumes on this partition*** 04/01/2008 17:15:31 Scanning inodes on device /dev/rdsk/c1t1d0s6... 04/01/2008 17:17:40 242 nVolumesInInodeFile 6776 04/01/2008 17:17:40 SALVAGING VOLUME 536870912. 04/01/2008 17:17:40 Part of the header (Volume information) is corrupted Would need to try to fix header by hand if at all possible. Data on the disk is corrupted. 04/01/2008 17:17:40 totalInodes 165 04/01/2008 17:17:41 "Salvage volume group" core dumped! Badly enough that the tools fail. If you have a core perhaps the tool can be patched to not crash. 04/01/2008 17:17:41 SALVAGING VOLUME 536870915. 04/01/2008 17:17:41 Part of the header (Volume information) is corrupted 04/01/2008 17:17:41 totalInodes 30 04/01/2008 17:17:41 "Salvage volume group" core dumped! 04/01/2008 17:17:41 SALVAGING 04/01/2008 17:17:42 totalInodes 526 04/01/2008 17:17:43 "Salvage volume group" core dumped! 04/01/2008 17:17:43 SALVAGING VOLUME 536870927. 04/01/2008 17:17:43 Part of the header (Volume information) is corrupted 04/01/2008 17:17:43 Vnode 1922 (unique 1437): corresponding inode 22233511 is missing; vnode deleted, vnode mod time=Tue Nov 24 08:23:15 1998 04/01/2008 17:17:43 Vnode 1924 (unique 1438): corresponding inode 22233512 is missing; vnode deleted, vnode mod time=Tue Nov 24 08:23:15 1998 04/01/2008 17:17:43 Vnode 1926 (unique 431951): corresponding inode 22233513 is missing; vnode deleted, vnode mod time=Sun Mar 30 22:33:54 2003 04/01/2008 17:17:43 Vnode 1928 (unique 434559): corresponding inode 22233514 is missing; vnode deleted, vnode mod time=Sun Apr 6 23:02:50 2003 04/01/2008 17:17:43 Vnode 1930 (unique 431952): corresponding inode 22233515 is missing; vnode deleted, vnode mod time=Sun Mar 30 22:34:15 2003 ... There may be patches in 1.4.7-pre2 that might help. Jeffrey Altman smime.p7s Description: S/MIME Cryptographic Signature
[OpenAFS] Help needed for receovery of data of inode fileserver (Solaris 10 x86)
I mentioned a few weeks ago about an OpenAFS 1.4.6 Solaris 10 x86 inode fileserver that refused to attach any volumes and for which the salavager just coredumps (without leaving any core files?) Repeated salvaging does nothing except remove a few more vnodes. On the other hand, umounting and fsck shows that everything is fine. Is there any way to get some of the data out of the inode fileserver partitions? It's very frustrating because some things weren't backed up (yes, yes, ...). The -ForceOnLine option to bos salvage looked promising but that's only available for MR AFS. The only thing I can think of is labeling the partition as UFS and then have the built-in fsck drop everything into lost+found, but I'm not sure if any of the structure will be recoverable. Is there any way to just get the data out of the inode partition? (since fsck is happy with it.) I even tried reinstalling openafs-1.4.1 to see if some horrible new bug was introduced, but the same result happened. John documentation: SalvageLog starts: @(#) OpenAFS 1.4.6 built 2007-12-17 04/01/2008 17:15:21 STARTING AFS SALVAGER 2.4 (/usr/afs/bin/salvager -f) 04/01/2008 17:15:21 Starting salvage of file system partition /vicepa 04/01/2008 17:15:21 SALVAGING FILE SYSTEM PARTITION /vicepa (device=c1t1d0s6) 04/01/2008 17:15:21 ***Forced salvage of all volumes on this partition*** 04/01/2008 17:15:31 Scanning inodes on device /dev/rdsk/c1t1d0s6... 04/01/2008 17:17:40 242 nVolumesInInodeFile 6776 04/01/2008 17:17:40 SALVAGING VOLUME 536870912. 04/01/2008 17:17:40 Part of the header (Volume information) is corrupted 04/01/2008 17:17:40 totalInodes 165 04/01/2008 17:17:41 "Salvage volume group" core dumped! 04/01/2008 17:17:41 SALVAGING VOLUME 536870915. 04/01/2008 17:17:41 Part of the header (Volume information) is corrupted 04/01/2008 17:17:41 totalInodes 30 04/01/2008 17:17:41 "Salvage volume group" core dumped! 04/01/2008 17:17:41 SALVAGING 04/01/2008 17:17:42 totalInodes 526 04/01/2008 17:17:43 "Salvage volume group" core dumped! 04/01/2008 17:17:43 SALVAGING VOLUME 536870927. 04/01/2008 17:17:43 Part of the header (Volume information) is corrupted 04/01/2008 17:17:43 Vnode 1922 (unique 1437): corresponding inode 22233511 is missing; vnode deleted, vnode mod time=Tue Nov 24 08:23:15 1998 04/01/2008 17:17:43 Vnode 1924 (unique 1438): corresponding inode 22233512 is missing; vnode deleted, vnode mod time=Tue Nov 24 08:23:15 1998 04/01/2008 17:17:43 Vnode 1926 (unique 431951): corresponding inode 22233513 is missing; vnode deleted, vnode mod time=Sun Mar 30 22:33:54 2003 04/01/2008 17:17:43 Vnode 1928 (unique 434559): corresponding inode 22233514 is missing; vnode deleted, vnode mod time=Sun Apr 6 23:02:50 2003 04/01/2008 17:17:43 Vnode 1930 (unique 431952): corresponding inode 22233515 is missing; vnode deleted, vnode mod time=Sun Mar 30 22:34:15 2003 ... # fsck /vicepa Open AFS (R) openafs 1.4.1 fsck ** /dev/rdsk/c1t1d0s6 ** Last Mounted on /vicepa ** Phase 1 - Check Blocks and Sizes ** Phase 2 - Check Pathnames ** Phase 3 - Check Connectivity ** Phase 4 - Check Reference Counts ** Phase 5 - Check Cyl groups 849922 files, 18516628 used, 39053375 free, 849677 AFS files (89095 frags, 4870535 blocks, 0.2% fragmentation) ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info