Re: [OpenAFS] Crash in volserver when restoring volume from backup.

2009-10-16 Thread Anders Magnusson

Derrick Brashear wrote:

looks like you already lost by the time it crashes.
  

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x4aafd940 (LWP 30735)]
0x003c13078d60 in strlen () from /lib64/libc.so.6
(gdb) bt
#0  0x003c13078d60 in strlen () from /lib64/libc.so.6
#1  0x00430092 in afs_vsnprintf (p=0x4aafc3ba 4BF+0, avail=999,
  fmt=value optimized out, ap=0x4aafc7c0) at ../util/snprintf.c:395
#2  0x00416a60 in vFSLog (
  format=0x467838 1 Volser: ReadVnodes: IH_CREATE: %s - restore aborted\n,

Just for the record, the original cause of this was due to a bug in 
TSM.  IBM is currently working on it.


-- Ragge
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Crash in volserver when restoring volume from backup.

2009-08-26 Thread Anders Magnusson

Hartmut Reuter wrote:

Anders Magnusson wrote:
  

Hi,

I have a problem that I need some advice on how to go on with.

I have a volume dump file, but when trying to read it back volserver
crashes.

The dump was generated under 1.4.8, and the volserver segv appears with
both 1.4.8 and 1.4.11.

VolserLog.old says:
Wed Aug 26 14:56:34 2009 Starting AFS Volserver 2.0
(/usr/afs/bin/volserver -p 16)
Wed Aug 26 15:00:12 2009 1 Volser: CreateVolume: volume 537998421
(students.waqazi-4) created

BosLog says:
Wed Aug 26 15:00:14 2009: fs:vol exited on signal 11

Any hints where to go from here?  I can provide the dump file on
request, but since it's a
student home directory I don't want it to be public.

-- Ragge



Attach the volserver with gdb before running the command. Then you may
be able to see where it crashes and why.

  

Done:
# gdb /usr/afs/bin/volserver
GNU gdb Fedora (6.8-27.el5)
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
http://gnu.org/licenses/gpl.html

This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type show copying
and show warranty for details.
This GDB was configured as x86_64-redhat-linux-gnu...
(gdb) symb
Discard symbol table from `/usr/afs/bin/volserver'? (y or n) y
No symbol file now.
(gdb) help file
Use FILE as program to be debugged.
It is read for its symbols, for getting the contents of pure memory,
and it is the program executed when you use the `run' command.
If FILE cannot be found as specified, your execution directory path
($PATH) is searched for a command of that name.
No arg means to have no executable file and no symbols.
(gdb) help symb
Load symbol table from executable file FILE.
The `file' command can also load symbol tables, as well as setting the file
to execute.
(gdb) symb volserver.debug
Reading symbols from /usr/afs/bin/volserver.debug...done.
(gdb) attach 30720
Attaching to program: /usr/afs/bin/volserver, process 30720
Reading symbols from /lib64/libpthread.so.0...done.
[Thread debugging using libthread_db enabled]
[New Thread 0x2ab6a9d3be90 (LWP 30720)]
[New Thread 0x4dd02940 (LWP 30740)]
[New Thread 0x4d301940 (LWP 30739)]
[New Thread 0x4c900940 (LWP 30738)]
[New Thread 0x4beff940 (LWP 30737)]
[New Thread 0x4b4fe940 (LWP 30736)]
[New Thread 0x4aafd940 (LWP 30735)]
[New Thread 0x4a0fc940 (LWP 30734)]
[New Thread 0x496fb940 (LWP 30733)]
[New Thread 0x48cfa940 (LWP 30732)]
[New Thread 0x482f9940 (LWP 30731)]
[New Thread 0x478f8940 (LWP 30730)]
[New Thread 0x46ef7940 (LWP 30729)]
[New Thread 0x464f6940 (LWP 30728)]
[New Thread 0x45af5940 (LWP 30727)]
[New Thread 0x450f4940 (LWP 30726)]
[New Thread 0x446f3940 (LWP 30725)]
[New Thread 0x43cf2940 (LWP 30724)]
[New Thread 0x432f1940 (LWP 30723)]
[New Thread 0x428f0940 (LWP 30722)]
[New Thread 0x41eef940 (LWP 30721)]
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/libresolv.so.2...done.
Loaded symbols for /lib64/libresolv.so.2
Reading symbols from /lib64/libc.so.6...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
0x003c13c0a899 in pthread_cond_wait@@GLIBC_2.3.2 ()
  from /lib64/libpthread.so.0
(gdb) c
Continuing.
[New Thread 0x4e703940 (LWP 30748)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x4aafd940 (LWP 30735)]
0x003c13078d60 in strlen () from /lib64/libc.so.6
(gdb) bt
#0  0x003c13078d60 in strlen () from /lib64/libc.so.6
#1  0x00430092 in afs_vsnprintf (p=0x4aafc3ba 4BF+0, avail=999,
   fmt=value optimized out, ap=0x4aafc7c0) at ../util/snprintf.c:395
#2  0x00416a60 in vFSLog (
   format=0x467838 1 Volser: ReadVnodes: IH_CREATE: %s - restore 
aborted\n,

   args=0x4) at ../util/serverLog.c:135
#3  0x0042550e in Log (
   format=0x1311e7ec Address 0x1311e7ec out of bounds) at 
../vol/common.c:41

#4  0x0040e60c in RestoreVolume (call=value optimized out,
   avp=0x2c03e780, incremental=value optimized out,
   cookie=value optimized out) at ../volser/dumpstuff.c:1214
#5  0x004067f4 in VolRestore (acid=0x41ff510,
   atrans=value optimized out, aflags=1, cookie=0x4aafd000)
   at ../volser/volprocs.c:1406
#6  0x00406850 in SAFSVolRestore (acid=0x1311e7ec, atrans=999,
   aflags=319940588, cookie=0x4) at ../volser/volprocs.c:1378
#7  0x00413822 in AFSVolExecuteRequest (z_call=0x41ff510)
   at ../volser/volint.ss.c:104
#8  0x0044d0de in rxi_ServerProc (threadID=5, newcall=0x3e7,
   socketp=0x4aafd10c) at ../rx/rx.c:1445
#9  0x0042afe4 in rx_ServerProc (dummy=value optimized out)
   at ../rx/rx_pthread.c:303
#10 0x0042a6a8 in server_entry (argp=0x1311e7ec)
   at ../rx/rx_pthread.c:101
#11 0x003c13c06367 in start_thread () from /lib64/libpthread.so.0
#12 0x003c130d30ad in clone () from /lib64/libc.so.6
(gdb)



You also could try to analyze the dump file 

Re: [OpenAFS] Crash in volserver when restoring volume from backup.

2009-08-26 Thread Hartmut Reuter
Anders Magnusson wrote:
 Hi,
 
 I have a problem that I need some advice on how to go on with.
 
 I have a volume dump file, but when trying to read it back volserver
 crashes.
 
 The dump was generated under 1.4.8, and the volserver segv appears with
 both 1.4.8 and 1.4.11.
 
 VolserLog.old says:
 Wed Aug 26 14:56:34 2009 Starting AFS Volserver 2.0
 (/usr/afs/bin/volserver -p 16)
 Wed Aug 26 15:00:12 2009 1 Volser: CreateVolume: volume 537998421
 (students.waqazi-4) created
 
 BosLog says:
 Wed Aug 26 15:00:14 2009: fs:vol exited on signal 11
 
 Any hints where to go from here?  I can provide the dump file on
 request, but since it's a
 student home directory I don't want it to be public.
 
 -- Ragge

Attach the volserver with gdb before running the command. Then you may
be able to see where it crashes and why.

You also could try to analyze the dump file with dumptool which is built
under sudirectory src/tests.

Hartmut

 
 
 ___
 OpenAFS-info mailing list
 OpenAFS-info@openafs.org
 https://lists.openafs.org/mailman/listinfo/openafs-info


-- 
-
Hartmut Reuter  e-mail  reu...@rzg.mpg.de
phone+49-89-3299-1328
fax  +49-89-3299-1301
RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Crash in volserver when restoring volume from backup.

2009-08-26 Thread Hartmut Reuter
Anders Magnusson wrote:
 Hartmut Reuter wrote:
 Anders Magnusson wrote:
  
 Hi,

 I have a problem that I need some advice on how to go on with.

 I have a volume dump file, but when trying to read it back volserver
 crashes.

 The dump was generated under 1.4.8, and the volserver segv appears with
 both 1.4.8 and 1.4.11.

 VolserLog.old says:
 Wed Aug 26 14:56:34 2009 Starting AFS Volserver 2.0
 (/usr/afs/bin/volserver -p 16)
 Wed Aug 26 15:00:12 2009 1 Volser: CreateVolume: volume 537998421
 (students.waqazi-4) created

 BosLog says:
 Wed Aug 26 15:00:14 2009: fs:vol exited on signal 11

 Any hints where to go from here?  I can provide the dump file on
 request, but since it's a
 student home directory I don't want it to be public.

 -- Ragge
 

 Attach the volserver with gdb before running the command. Then you may
 be able to see where it crashes and why.

   
 Done:
 # gdb /usr/afs/bin/volserver
 GNU gdb Fedora (6.8-27.el5)
 Copyright (C) 2008 Free Software Foundation, Inc.
 License GPLv3+: GNU GPL version 3 or later
 http://gnu.org/licenses/gpl.html
 This is free software: you are free to change and redistribute it.
 There is NO WARRANTY, to the extent permitted by law.  Type show copying
 and show warranty for details.
 This GDB was configured as x86_64-redhat-linux-gnu...
 (gdb) symb
 Discard symbol table from `/usr/afs/bin/volserver'? (y or n) y
 No symbol file now.
 (gdb) help file
 Use FILE as program to be debugged.
 It is read for its symbols, for getting the contents of pure memory,
 and it is the program executed when you use the `run' command.
 If FILE cannot be found as specified, your execution directory path
 ($PATH) is searched for a command of that name.
 No arg means to have no executable file and no symbols.
 (gdb) help symb
 Load symbol table from executable file FILE.
 The `file' command can also load symbol tables, as well as setting the file
 to execute.
 (gdb) symb volserver.debug
 Reading symbols from /usr/afs/bin/volserver.debug...done.
 (gdb) attach 30720
 Attaching to program: /usr/afs/bin/volserver, process 30720
 Reading symbols from /lib64/libpthread.so.0...done.
 [Thread debugging using libthread_db enabled]
 [New Thread 0x2ab6a9d3be90 (LWP 30720)]
 [New Thread 0x4dd02940 (LWP 30740)]
 [New Thread 0x4d301940 (LWP 30739)]
 [New Thread 0x4c900940 (LWP 30738)]
 [New Thread 0x4beff940 (LWP 30737)]
 [New Thread 0x4b4fe940 (LWP 30736)]
 [New Thread 0x4aafd940 (LWP 30735)]
 [New Thread 0x4a0fc940 (LWP 30734)]
 [New Thread 0x496fb940 (LWP 30733)]
 [New Thread 0x48cfa940 (LWP 30732)]
 [New Thread 0x482f9940 (LWP 30731)]
 [New Thread 0x478f8940 (LWP 30730)]
 [New Thread 0x46ef7940 (LWP 30729)]
 [New Thread 0x464f6940 (LWP 30728)]
 [New Thread 0x45af5940 (LWP 30727)]
 [New Thread 0x450f4940 (LWP 30726)]
 [New Thread 0x446f3940 (LWP 30725)]
 [New Thread 0x43cf2940 (LWP 30724)]
 [New Thread 0x432f1940 (LWP 30723)]
 [New Thread 0x428f0940 (LWP 30722)]
 [New Thread 0x41eef940 (LWP 30721)]
 Loaded symbols for /lib64/libpthread.so.0
 Reading symbols from /lib64/libresolv.so.2...done.
 Loaded symbols for /lib64/libresolv.so.2
 Reading symbols from /lib64/libc.so.6...done.
 Loaded symbols for /lib64/libc.so.6
 Reading symbols from /lib64/ld-linux-x86-64.so.2...done.
 Loaded symbols for /lib64/ld-linux-x86-64.so.2
 0x003c13c0a899 in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
 (gdb) c
 Continuing.
 [New Thread 0x4e703940 (LWP 30748)]
 
 Program received signal SIGSEGV, Segmentation fault.
 [Switching to Thread 0x4aafd940 (LWP 30735)]
 0x003c13078d60 in strlen () from /lib64/libc.so.6
 (gdb) bt
 #0  0x003c13078d60 in strlen () from /lib64/libc.so.6
 #1  0x00430092 in afs_vsnprintf (p=0x4aafc3ba 4BF+0, avail=999,
fmt=value optimized out, ap=0x4aafc7c0) at ../util/snprintf.c:395
 #2  0x00416a60 in vFSLog (
format=0x467838 1 Volser: ReadVnodes: IH_CREATE: %s - restore
 aborted\n,

This is the message the volserver wanted to write into the VolserLog.
Unfortunately afs_error_message(errno) didn't return a usable string so
it came to the crash.

However, if you repeat that experiment you may say up 4 to get to
dumpstuff.c:1214 and then you can do a

print *vnode
and
print vnodeNumber

to see which vnode it is.

A possible reason why IH_CREATE could fail is that you already tried
this so many times that in the linktable of the volume all tags for this
vnode number are already in use.

Because of the crashes the volserver didn't remove the remains of the
unsuccessful restores.

args=0x4) at ../util/serverLog.c:135
 #3  0x0042550e in Log (
format=0x1311e7ec Address 0x1311e7ec out of bounds) at
 ../vol/common.c:41
 #4  0x0040e60c in RestoreVolume (call=value optimized out,
avp=0x2c03e780, incremental=value optimized out,
cookie=value optimized out) at ../volser/dumpstuff.c:1214
 #5  0x004067f4 in VolRestore (acid=0x41ff510,
atrans=value optimized out, aflags=1, cookie=0x4aafd000)
at 

Re: [OpenAFS] Crash in volserver when restoring volume from backup.

2009-08-26 Thread Derrick Brashear
looks like you already lost by the time it crashes.


 Program received signal SIGSEGV, Segmentation fault.
 [Switching to Thread 0x4aafd940 (LWP 30735)]
 0x003c13078d60 in strlen () from /lib64/libc.so.6
 (gdb) bt
 #0  0x003c13078d60 in strlen () from /lib64/libc.so.6
 #1  0x00430092 in afs_vsnprintf (p=0x4aafc3ba 4BF+0, avail=999,
   fmt=value optimized out, ap=0x4aafc7c0) at ../util/snprintf.c:395
 #2  0x00416a60 in vFSLog (
   format=0x467838 1 Volser: ReadVnodes: IH_CREATE: %s - restore aborted\n,
   args=0x4) at ../util/serverLog.c:135
 #3  0x0042550e in Log (
   format=0x1311e7ec Address 0x1311e7ec out of bounds) at
 ../vol/common.c:41

[]

 (gdb)
up
up
up
up
print errno.

Now, why
Log(1 Volser: ReadVnodes: IH_CREATE: %s - restore aborted\n,
afs_error_message(errno));
is SEGVing is a different question.


-- 
Derrick
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Crash in volserver when restoring volume from backup.

2009-08-26 Thread Anders Magnusson
Derrick Brashear wrote:
 looks like you already lost by the time it crashes.


   
 Program received signal SIGSEGV, Segmentation fault.
 [Switching to Thread 0x4aafd940 (LWP 30735)]
 0x003c13078d60 in strlen () from /lib64/libc.so.6
 (gdb) bt
 #0  0x003c13078d60 in strlen () from /lib64/libc.so.6
 #1  0x00430092 in afs_vsnprintf (p=0x4aafc3ba 4BF+0, avail=999,
   fmt=value optimized out, ap=0x4aafc7c0) at ../util/snprintf.c:395
 #2  0x00416a60 in vFSLog (
   format=0x467838 1 Volser: ReadVnodes: IH_CREATE: %s - restore aborted\n,
   args=0x4) at ../util/serverLog.c:135
 #3  0x0042550e in Log (
   format=0x1311e7ec Address 0x1311e7ec out of bounds) at
 ../vol/common.c:41
 

 []

   
 (gdb)
 
 up
 up
 up
 up
 print errno.
   
(gdb) up
#1  0x00430092 in afs_vsnprintf (p=0x4489c3ba 4BF+0, avail=999,
fmt=value optimized out, ap=0x4489c7c0) at ../util/snprintf.c:395
395 ../util/snprintf.c: No such file or directory.
in ../util/snprintf.c
(gdb) up
#2  0x00416a60 in vFSLog (
format=0x467838 1 Volser: ReadVnodes: IH_CREATE: %s - restore
aborted\n,
args=0x4) at ../util/serverLog.c:135
135 ../util/serverLog.c: No such file or directory.
in ../util/serverLog.c
(gdb) up
#3  0x0042550e in Log (
format=0x1311e7ec Address 0x1311e7ec out of bounds) at
../vol/common.c:41
41  ../vol/common.c: No such file or directory.
in ../vol/common.c
(gdb) up
#4  0x0040e60c in RestoreVolume (call=value optimized out,
avp=0x169410f0, incremental=value optimized out,
cookie=value optimized out) at ../volser/dumpstuff.c:1214
1214../volser/dumpstuff.c: No such file or directory.
in ../volser/dumpstuff.c
(gdb) print errno
$1 = 22

Is this the system errno? RHEL 5.3, 22 = EINVAL, doesn't sound so good.

 Now, why
 Log(1 Volser: ReadVnodes: IH_CREATE: %s - restore aborted\n,
 afs_error_message(errno));
 is SEGVing is a different question.
   
Yes... :-)

Hartmud asked about vnode:

(gdb) print *vnode
No symbol vnode in current context.
(gdb) print vnodeNumber
$3 = 613646341
(gdb)

Unfortunately it doesn't tell me much...

-- Ragge

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info