Re: [OpenAFS] Crash in volserver when restoring volume from backup.
Derrick Brashear wrote: looks like you already lost by the time it crashes. Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x4aafd940 (LWP 30735)] 0x003c13078d60 in strlen () from /lib64/libc.so.6 (gdb) bt #0 0x003c13078d60 in strlen () from /lib64/libc.so.6 #1 0x00430092 in afs_vsnprintf (p=0x4aafc3ba 4BF+0, avail=999, fmt=value optimized out, ap=0x4aafc7c0) at ../util/snprintf.c:395 #2 0x00416a60 in vFSLog ( format=0x467838 1 Volser: ReadVnodes: IH_CREATE: %s - restore aborted\n, Just for the record, the original cause of this was due to a bug in TSM. IBM is currently working on it. -- Ragge ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Crash in volserver when restoring volume from backup.
Hartmut Reuter wrote: Anders Magnusson wrote: Hi, I have a problem that I need some advice on how to go on with. I have a volume dump file, but when trying to read it back volserver crashes. The dump was generated under 1.4.8, and the volserver segv appears with both 1.4.8 and 1.4.11. VolserLog.old says: Wed Aug 26 14:56:34 2009 Starting AFS Volserver 2.0 (/usr/afs/bin/volserver -p 16) Wed Aug 26 15:00:12 2009 1 Volser: CreateVolume: volume 537998421 (students.waqazi-4) created BosLog says: Wed Aug 26 15:00:14 2009: fs:vol exited on signal 11 Any hints where to go from here? I can provide the dump file on request, but since it's a student home directory I don't want it to be public. -- Ragge Attach the volserver with gdb before running the command. Then you may be able to see where it crashes and why. Done: # gdb /usr/afs/bin/volserver GNU gdb Fedora (6.8-27.el5) Copyright (C) 2008 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type show copying and show warranty for details. This GDB was configured as x86_64-redhat-linux-gnu... (gdb) symb Discard symbol table from `/usr/afs/bin/volserver'? (y or n) y No symbol file now. (gdb) help file Use FILE as program to be debugged. It is read for its symbols, for getting the contents of pure memory, and it is the program executed when you use the `run' command. If FILE cannot be found as specified, your execution directory path ($PATH) is searched for a command of that name. No arg means to have no executable file and no symbols. (gdb) help symb Load symbol table from executable file FILE. The `file' command can also load symbol tables, as well as setting the file to execute. (gdb) symb volserver.debug Reading symbols from /usr/afs/bin/volserver.debug...done. (gdb) attach 30720 Attaching to program: /usr/afs/bin/volserver, process 30720 Reading symbols from /lib64/libpthread.so.0...done. [Thread debugging using libthread_db enabled] [New Thread 0x2ab6a9d3be90 (LWP 30720)] [New Thread 0x4dd02940 (LWP 30740)] [New Thread 0x4d301940 (LWP 30739)] [New Thread 0x4c900940 (LWP 30738)] [New Thread 0x4beff940 (LWP 30737)] [New Thread 0x4b4fe940 (LWP 30736)] [New Thread 0x4aafd940 (LWP 30735)] [New Thread 0x4a0fc940 (LWP 30734)] [New Thread 0x496fb940 (LWP 30733)] [New Thread 0x48cfa940 (LWP 30732)] [New Thread 0x482f9940 (LWP 30731)] [New Thread 0x478f8940 (LWP 30730)] [New Thread 0x46ef7940 (LWP 30729)] [New Thread 0x464f6940 (LWP 30728)] [New Thread 0x45af5940 (LWP 30727)] [New Thread 0x450f4940 (LWP 30726)] [New Thread 0x446f3940 (LWP 30725)] [New Thread 0x43cf2940 (LWP 30724)] [New Thread 0x432f1940 (LWP 30723)] [New Thread 0x428f0940 (LWP 30722)] [New Thread 0x41eef940 (LWP 30721)] Loaded symbols for /lib64/libpthread.so.0 Reading symbols from /lib64/libresolv.so.2...done. Loaded symbols for /lib64/libresolv.so.2 Reading symbols from /lib64/libc.so.6...done. Loaded symbols for /lib64/libc.so.6 Reading symbols from /lib64/ld-linux-x86-64.so.2...done. Loaded symbols for /lib64/ld-linux-x86-64.so.2 0x003c13c0a899 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 (gdb) c Continuing. [New Thread 0x4e703940 (LWP 30748)] Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x4aafd940 (LWP 30735)] 0x003c13078d60 in strlen () from /lib64/libc.so.6 (gdb) bt #0 0x003c13078d60 in strlen () from /lib64/libc.so.6 #1 0x00430092 in afs_vsnprintf (p=0x4aafc3ba 4BF+0, avail=999, fmt=value optimized out, ap=0x4aafc7c0) at ../util/snprintf.c:395 #2 0x00416a60 in vFSLog ( format=0x467838 1 Volser: ReadVnodes: IH_CREATE: %s - restore aborted\n, args=0x4) at ../util/serverLog.c:135 #3 0x0042550e in Log ( format=0x1311e7ec Address 0x1311e7ec out of bounds) at ../vol/common.c:41 #4 0x0040e60c in RestoreVolume (call=value optimized out, avp=0x2c03e780, incremental=value optimized out, cookie=value optimized out) at ../volser/dumpstuff.c:1214 #5 0x004067f4 in VolRestore (acid=0x41ff510, atrans=value optimized out, aflags=1, cookie=0x4aafd000) at ../volser/volprocs.c:1406 #6 0x00406850 in SAFSVolRestore (acid=0x1311e7ec, atrans=999, aflags=319940588, cookie=0x4) at ../volser/volprocs.c:1378 #7 0x00413822 in AFSVolExecuteRequest (z_call=0x41ff510) at ../volser/volint.ss.c:104 #8 0x0044d0de in rxi_ServerProc (threadID=5, newcall=0x3e7, socketp=0x4aafd10c) at ../rx/rx.c:1445 #9 0x0042afe4 in rx_ServerProc (dummy=value optimized out) at ../rx/rx_pthread.c:303 #10 0x0042a6a8 in server_entry (argp=0x1311e7ec) at ../rx/rx_pthread.c:101 #11 0x003c13c06367 in start_thread () from /lib64/libpthread.so.0 #12 0x003c130d30ad in clone () from /lib64/libc.so.6 (gdb) You also could try to analyze the dump file
Re: [OpenAFS] Crash in volserver when restoring volume from backup.
Anders Magnusson wrote: Hi, I have a problem that I need some advice on how to go on with. I have a volume dump file, but when trying to read it back volserver crashes. The dump was generated under 1.4.8, and the volserver segv appears with both 1.4.8 and 1.4.11. VolserLog.old says: Wed Aug 26 14:56:34 2009 Starting AFS Volserver 2.0 (/usr/afs/bin/volserver -p 16) Wed Aug 26 15:00:12 2009 1 Volser: CreateVolume: volume 537998421 (students.waqazi-4) created BosLog says: Wed Aug 26 15:00:14 2009: fs:vol exited on signal 11 Any hints where to go from here? I can provide the dump file on request, but since it's a student home directory I don't want it to be public. -- Ragge Attach the volserver with gdb before running the command. Then you may be able to see where it crashes and why. You also could try to analyze the dump file with dumptool which is built under sudirectory src/tests. Hartmut ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info -- - Hartmut Reuter e-mail reu...@rzg.mpg.de phone+49-89-3299-1328 fax +49-89-3299-1301 RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Crash in volserver when restoring volume from backup.
Anders Magnusson wrote: Hartmut Reuter wrote: Anders Magnusson wrote: Hi, I have a problem that I need some advice on how to go on with. I have a volume dump file, but when trying to read it back volserver crashes. The dump was generated under 1.4.8, and the volserver segv appears with both 1.4.8 and 1.4.11. VolserLog.old says: Wed Aug 26 14:56:34 2009 Starting AFS Volserver 2.0 (/usr/afs/bin/volserver -p 16) Wed Aug 26 15:00:12 2009 1 Volser: CreateVolume: volume 537998421 (students.waqazi-4) created BosLog says: Wed Aug 26 15:00:14 2009: fs:vol exited on signal 11 Any hints where to go from here? I can provide the dump file on request, but since it's a student home directory I don't want it to be public. -- Ragge Attach the volserver with gdb before running the command. Then you may be able to see where it crashes and why. Done: # gdb /usr/afs/bin/volserver GNU gdb Fedora (6.8-27.el5) Copyright (C) 2008 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type show copying and show warranty for details. This GDB was configured as x86_64-redhat-linux-gnu... (gdb) symb Discard symbol table from `/usr/afs/bin/volserver'? (y or n) y No symbol file now. (gdb) help file Use FILE as program to be debugged. It is read for its symbols, for getting the contents of pure memory, and it is the program executed when you use the `run' command. If FILE cannot be found as specified, your execution directory path ($PATH) is searched for a command of that name. No arg means to have no executable file and no symbols. (gdb) help symb Load symbol table from executable file FILE. The `file' command can also load symbol tables, as well as setting the file to execute. (gdb) symb volserver.debug Reading symbols from /usr/afs/bin/volserver.debug...done. (gdb) attach 30720 Attaching to program: /usr/afs/bin/volserver, process 30720 Reading symbols from /lib64/libpthread.so.0...done. [Thread debugging using libthread_db enabled] [New Thread 0x2ab6a9d3be90 (LWP 30720)] [New Thread 0x4dd02940 (LWP 30740)] [New Thread 0x4d301940 (LWP 30739)] [New Thread 0x4c900940 (LWP 30738)] [New Thread 0x4beff940 (LWP 30737)] [New Thread 0x4b4fe940 (LWP 30736)] [New Thread 0x4aafd940 (LWP 30735)] [New Thread 0x4a0fc940 (LWP 30734)] [New Thread 0x496fb940 (LWP 30733)] [New Thread 0x48cfa940 (LWP 30732)] [New Thread 0x482f9940 (LWP 30731)] [New Thread 0x478f8940 (LWP 30730)] [New Thread 0x46ef7940 (LWP 30729)] [New Thread 0x464f6940 (LWP 30728)] [New Thread 0x45af5940 (LWP 30727)] [New Thread 0x450f4940 (LWP 30726)] [New Thread 0x446f3940 (LWP 30725)] [New Thread 0x43cf2940 (LWP 30724)] [New Thread 0x432f1940 (LWP 30723)] [New Thread 0x428f0940 (LWP 30722)] [New Thread 0x41eef940 (LWP 30721)] Loaded symbols for /lib64/libpthread.so.0 Reading symbols from /lib64/libresolv.so.2...done. Loaded symbols for /lib64/libresolv.so.2 Reading symbols from /lib64/libc.so.6...done. Loaded symbols for /lib64/libc.so.6 Reading symbols from /lib64/ld-linux-x86-64.so.2...done. Loaded symbols for /lib64/ld-linux-x86-64.so.2 0x003c13c0a899 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 (gdb) c Continuing. [New Thread 0x4e703940 (LWP 30748)] Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x4aafd940 (LWP 30735)] 0x003c13078d60 in strlen () from /lib64/libc.so.6 (gdb) bt #0 0x003c13078d60 in strlen () from /lib64/libc.so.6 #1 0x00430092 in afs_vsnprintf (p=0x4aafc3ba 4BF+0, avail=999, fmt=value optimized out, ap=0x4aafc7c0) at ../util/snprintf.c:395 #2 0x00416a60 in vFSLog ( format=0x467838 1 Volser: ReadVnodes: IH_CREATE: %s - restore aborted\n, This is the message the volserver wanted to write into the VolserLog. Unfortunately afs_error_message(errno) didn't return a usable string so it came to the crash. However, if you repeat that experiment you may say up 4 to get to dumpstuff.c:1214 and then you can do a print *vnode and print vnodeNumber to see which vnode it is. A possible reason why IH_CREATE could fail is that you already tried this so many times that in the linktable of the volume all tags for this vnode number are already in use. Because of the crashes the volserver didn't remove the remains of the unsuccessful restores. args=0x4) at ../util/serverLog.c:135 #3 0x0042550e in Log ( format=0x1311e7ec Address 0x1311e7ec out of bounds) at ../vol/common.c:41 #4 0x0040e60c in RestoreVolume (call=value optimized out, avp=0x2c03e780, incremental=value optimized out, cookie=value optimized out) at ../volser/dumpstuff.c:1214 #5 0x004067f4 in VolRestore (acid=0x41ff510, atrans=value optimized out, aflags=1, cookie=0x4aafd000) at
Re: [OpenAFS] Crash in volserver when restoring volume from backup.
looks like you already lost by the time it crashes. Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x4aafd940 (LWP 30735)] 0x003c13078d60 in strlen () from /lib64/libc.so.6 (gdb) bt #0 0x003c13078d60 in strlen () from /lib64/libc.so.6 #1 0x00430092 in afs_vsnprintf (p=0x4aafc3ba 4BF+0, avail=999, fmt=value optimized out, ap=0x4aafc7c0) at ../util/snprintf.c:395 #2 0x00416a60 in vFSLog ( format=0x467838 1 Volser: ReadVnodes: IH_CREATE: %s - restore aborted\n, args=0x4) at ../util/serverLog.c:135 #3 0x0042550e in Log ( format=0x1311e7ec Address 0x1311e7ec out of bounds) at ../vol/common.c:41 [] (gdb) up up up up print errno. Now, why Log(1 Volser: ReadVnodes: IH_CREATE: %s - restore aborted\n, afs_error_message(errno)); is SEGVing is a different question. -- Derrick ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Crash in volserver when restoring volume from backup.
Derrick Brashear wrote: looks like you already lost by the time it crashes. Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x4aafd940 (LWP 30735)] 0x003c13078d60 in strlen () from /lib64/libc.so.6 (gdb) bt #0 0x003c13078d60 in strlen () from /lib64/libc.so.6 #1 0x00430092 in afs_vsnprintf (p=0x4aafc3ba 4BF+0, avail=999, fmt=value optimized out, ap=0x4aafc7c0) at ../util/snprintf.c:395 #2 0x00416a60 in vFSLog ( format=0x467838 1 Volser: ReadVnodes: IH_CREATE: %s - restore aborted\n, args=0x4) at ../util/serverLog.c:135 #3 0x0042550e in Log ( format=0x1311e7ec Address 0x1311e7ec out of bounds) at ../vol/common.c:41 [] (gdb) up up up up print errno. (gdb) up #1 0x00430092 in afs_vsnprintf (p=0x4489c3ba 4BF+0, avail=999, fmt=value optimized out, ap=0x4489c7c0) at ../util/snprintf.c:395 395 ../util/snprintf.c: No such file or directory. in ../util/snprintf.c (gdb) up #2 0x00416a60 in vFSLog ( format=0x467838 1 Volser: ReadVnodes: IH_CREATE: %s - restore aborted\n, args=0x4) at ../util/serverLog.c:135 135 ../util/serverLog.c: No such file or directory. in ../util/serverLog.c (gdb) up #3 0x0042550e in Log ( format=0x1311e7ec Address 0x1311e7ec out of bounds) at ../vol/common.c:41 41 ../vol/common.c: No such file or directory. in ../vol/common.c (gdb) up #4 0x0040e60c in RestoreVolume (call=value optimized out, avp=0x169410f0, incremental=value optimized out, cookie=value optimized out) at ../volser/dumpstuff.c:1214 1214../volser/dumpstuff.c: No such file or directory. in ../volser/dumpstuff.c (gdb) print errno $1 = 22 Is this the system errno? RHEL 5.3, 22 = EINVAL, doesn't sound so good. Now, why Log(1 Volser: ReadVnodes: IH_CREATE: %s - restore aborted\n, afs_error_message(errno)); is SEGVing is a different question. Yes... :-) Hartmud asked about vnode: (gdb) print *vnode No symbol vnode in current context. (gdb) print vnodeNumber $3 = 613646341 (gdb) Unfortunately it doesn't tell me much... -- Ragge ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info