Please read the Kaboom chapter of the manual.  It will explain how to manually 
run the program under the debugger.  I believe you left of the -s -f options 
when running it so the traceback doesn't contain any useful information.

For vagrind, I don't know what options to use, please read their 
documentation.

For improving the output from smartalloc, find all the sm_check() calls in the 
SD and modify them to have true as the last argument.  You can also add more 
sm_checks at various strategic places where you think the code breaks to get 
closer to the problem.

On Sunday 16 September 2007 02:36, Marc Schiffbauer wrote:
> * Kern Sibbald schrieb am 15.09.07 um 22:28 Uhr:
> > On Saturday 15 September 2007 21:09, Marc Schiffbauer wrote:
> > > What can I do/try/test now?
>
> I played with "Minimum Block Size" a bit. When set this seems to
> make the SD crash even on labeling tapes. However I now removed that
> setting again and labeling works fine so far.
>
> I now erased a tape (mt erase), then labeled it using bconsole.
>
> > - Run the debugger on it, and make it crash in some different way, maybe
> > that will tell us something.
>
> I have a litte incremental job (about 700MB). It runs just fine, but
> at the very end of the job the SD crashed. I ran the SD with gdb
> attached.
>
> This is the output on the console:
>
> In "Terminated Jobs" the job is "OK"
>
> Terminated Jobs:
>  JobId  Level    Files      Bytes   Status   Finished        Name
> ======================================================================
> [...]
>    535  Incr        114    740.2 M  OK       16-Sep-07 01:31
> lisa-ImportantData
>
> but the summary is not:
>
> 16-Sep 01:31 lisa-sd: Job write elapsed time = 00:02:36, Transfer rate =
> 4.745 M bytes/second 16-Sep 01:31 lisa-sd: Sending spooled attrs to the
> Director.  Despooling 29,191 bytes ... 16-Sep 01:31 lisa-dir:
> lisa-ImportantData.2007-09-16_01.28.17 Error: Bacula lisa-dir 2.2.4
> (14Sep07): 16-Sep-2007 01:31:31
>   Build OS:               i386-pc-linux-gnu debian 3.1
>   JobId:                  535
>   Job:                    lisa-ImportantData.2007-09-16_01.28.17
>   Backup Level:           Incremental, since=2007-09-14 03:06:16
>   Client:                 "lisa-fd" 2.2.4 (14Sep07)
> i386-pc-linux-gnu,debian,3.1 FileSet:                "lisa ImportantData
> FileSet" 2007-03-29 13:23:26 Pool:                   "Default" (From Job
> resource)
>   Storage:                "lisa-sd" (From command line)
>   Scheduled time:         16-Sep-2007 01:27:55
>   Start time:             16-Sep-2007 01:28:50
>   End time:               16-Sep-2007 01:31:31
>   Elapsed time:           2 mins 41 secs
>   Priority:               10
>   FD Files Written:       114
>   SD Files Written:       0
>   FD Bytes Written:       740,281,859 (740.2 MB)
>   SD Bytes Written:       0 (0 B)
>   Rate:                   4598.0 KB/s
>   Software Compression:   None
>   VSS:                    no
>   Encryption:             no
>   Volume name(s):         Tape_13
>   Volume Session Id:      2
>   Volume Session Time:    1189898149
>   Last Volume Bytes:      740,920,320 (740.9 MB)
>   Non-fatal FD errors:    0
>   SD Errors:              0
>   FD termination status:  OK
>   SD termination status:  Error
>   Termination:            *** Backup Error ***
>
>
> Maybe I did something wrong, but I got no backtrace... :
>
> [EMAIL PROTECTED]:~# gdb /usr/sbin/bacula-sd 7791
> GNU gdb 6.3-debian
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and
> you are
> welcome to change it and/or distribute copies of it under certain
> conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for
> details.
> This GDB was configured as "i386-linux"...Using host libthread_db
> library "/lib/libthread_db.so.1".
>
> Attaching to program: /usr/sbin/bacula-sd, process 7791
> Reading symbols from /lib/libacl.so.1...done.
> Loaded symbols for /lib/libacl.so.1
> Reading symbols from /usr/lib/libz.so.1...done.
> Loaded symbols for /usr/lib/libz.so.1
> Reading symbols from /usr/lib/libpython2.3.so.1.0...done.
> Loaded symbols for /usr/lib/libpython2.3.so.1.0
> Reading symbols from /lib/libutil.so.1...done.
> Loaded symbols for /lib/libutil.so.1
> Reading symbols from /lib/librt.so.1...done.
> Loaded symbols for /lib/librt.so.1
> Reading symbols from /lib/libpthread.so.0...done.
> [Thread debugging using libthread_db enabled]
> [New Thread 16384 (LWP 7791)]
> [New Thread 32769 (LWP 7793)]
> [New Thread 16386 (LWP 7794)]
>
>
> [New Thread 32771 (LWP 7795)]
> Loaded symbols for /lib/libpthread.so.0
> Reading symbols from /lib/libdl.so.2...done.
> Loaded symbols for /lib/libdl.so.2
> Reading symbols from /lib/libwrap.so.0...done.
> Loaded symbols for /lib/libwrap.so.0
> Reading symbols from /usr/lib/i686/cmov/libssl.so.0.9.7...done.
> Loaded symbols for /usr/lib/i686/cmov/libssl.so.0.9.7
> Reading symbols from /usr/lib/i686/cmov/libcrypto.so.0.9.7...done.
> Loaded symbols for /usr/lib/i686/cmov/libcrypto.so.0.9.7
> Reading symbols from /usr/lib/libstdc++.so.5...done.
> Loaded symbols for /usr/lib/libstdc++.so.5
> Reading symbols from /lib/libm.so.6...done.
> Loaded symbols for /lib/libm.so.6
> Reading symbols from /lib/libgcc_s.so.1...done.
> Loaded symbols for /lib/libgcc_s.so.1
> Reading symbols from /lib/libc.so.6...done.
> Loaded symbols for /lib/libc.so.6
> Reading symbols from /lib/libattr.so.1...done.
> Loaded symbols for /lib/libattr.so.1
> Reading symbols from /lib/ld-linux.so.2...done.
> Loaded symbols for /lib/ld-linux.so.2
> Reading symbols from /lib/libnsl.so.1...done.
> Loaded symbols for /lib/libnsl.so.1
> Reading symbols from /lib/libnss_compat.so.2...done.
> Loaded symbols for /lib/libnss_compat.so.2
> Reading symbols from /lib/libnss_nis.so.2...done.
> Loaded symbols for /lib/libnss_nis.so.2
> Reading symbols from /lib/libnss_files.so.2...done.
> Loaded symbols for /lib/libnss_files.so.2
> 0x404a0001 in select () from /lib/libc.so.6
> (gdb)
> (gdb)
> (gdb) run
> The program being debugged has been started already.
> Start it from the beginning? (y or n) n
> Program not restarted.
> (gdb) cont
> Continuing.
> [New Thread 49156 (LWP 7817)]
> [Thread 16386 (LWP 7794) exited]
> [Thread 49156 (LWP 7817) exited]
> [New Thread 65541 (LWP 7852)]
> [Thread 65541 (LWP 7852) exited]
> [New Thread 81926 (LWP 7855)]
> [New Thread 98311 (LWP 7862)]
> [Thread 98311 (LWP 7862) exited]
> [New Thread 114696 (LWP 7865)]
> [Thread 114696 (LWP 7865) exited]
> [New Thread 131081 (LWP 7867)]
> [Thread 81926 (LWP 7855) exited]
> [Thread 131081 (LWP 7867) exited]
> [New Thread 147466 (LWP 7871)]
> [Thread 147466 (LWP 7871) exited]
> [New Thread 163851 (LWP 7873)]
> [Thread 163851 (LWP 7873) exited]
> [New Thread 180236 (LWP 7874)]
> [Thread 180236 (LWP 7874) exited]
> [New Thread 196621 (LWP 7876)]
>
> [New Thread 213006 (LWP 7878)]
> [New Thread 229391 (LWP 7915)]
> [New Thread 245776 (LWP 7916)]
> [Thread 229391 (LWP 7915) exited]
> [Thread 213006 (LWP 7878) exited]
> [Thread 245776 (LWP 7916) exited]
> [Thread 196621 (LWP 7876) exited]
>
> [New Thread 262161 (LWP 7934)]
> [Thread 262161 (LWP 7934) exited]
> [New Thread 278546 (LWP 7942)]
> [New Thread 294931 (LWP 7950)]
> [Thread 294931 (LWP 7950) exited]
>
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 278546 (LWP 7942)]
> 0x080d2050 in ?? ()
> (gdb) Quit
> (gdb)
> Continuing.
> Kaboom! bacula-sd, lisa-sd got signal 11 - Segmentation violation.
> Attempting traceback.
> Kaboom! exepath=/usr/sbin/
> Calling: /usr/sbin/btraceback /usr/sbin/bacula-sd 7791
> Traceback complete, attempting cleanup ...
> [Thread 32771 (LWP 7795) exited]
> Orphaned buffer:  lisa-sd    528 bytes buf=80c2f08 allocated at bnet.c:674
> Orphaned buffer:  lisa-sd    272 bytes buf=80cc3f8 allocated at jcr.c:253
> Orphaned buffer:  lisa-sd    528 bytes buf=80e1518 allocated at bnet.c:673
> Orphaned buffer:  lisa-sd    528 bytes buf=80e1748 allocated at jcr.c:255
> Orphaned buffer:  lisa-sd    528 bytes buf=80e1a90 allocated at bnet.c:673
> Orphaned buffer:  lisa-sd    528 bytes buf=80e2618 allocated at bnet.c:674
> Orphaned buffer:  lisa-sd    146 bytes buf=80e1f70 allocated at job.c:114
> Orphaned buffer:  lisa-sd    146 bytes buf=80e2020 allocated at job.c:117
> Orphaned buffer:  lisa-sd    146 bytes buf=80e20d0 allocated at job.c:120
> Orphaned buffer:  lisa-sd    146 bytes buf=80e2180 allocated at job.c:128
> Orphaned buffer:  lisa-sd    128 bytes buf=80d0938 allocated at bnet.c:667
> Orphaned buffer:  lisa-sd      7 bytes buf=80d1f30 allocated at bnet.c:675
> Orphaned buffer:  lisa-sd     12 bytes buf=80d09d8 allocated at bnet.c:676
> Orphaned buffer:  lisa-sd    528 bytes buf=80d19d8 allocated at bnet.c:674
> Orphaned buffer:  lisa-sd    128 bytes buf=80d2050 allocated at bnet.c:667
> Orphaned buffer:  lisa-sd      7 bytes buf=80d20f0 allocated at bnet.c:675
> Orphaned buffer:  lisa-sd     12 bytes buf=80d2118 allocated at bnet.c:676
> Orphaned buffer:  lisa-sd      8 bytes buf=80d2148 allocated at workq.c:167
> Orphaned buffer:  lisa-sd     16 bytes buf=80e1f40 allocated at jcr.c:247
> Orphaned buffer:  lisa-sd     24 bytes buf=80d1698 allocated at
> dircmd.c:185 Orphaned buffer:  lisa-sd     40 bytes buf=80d0a08 allocated
> at job.c:140 Orphaned buffer:  lisa-sd     24 bytes buf=80d1f68 allocated
> at reserve.c:583 Orphaned buffer:  lisa-sd     40 bytes buf=80e25c0
> allocated at alist.c:53 Orphaned buffer:  lisa-sd     24 bytes buf=80e2e20
> allocated at reserve.c:606 Orphaned buffer:  lisa-sd      8 bytes
> buf=80e1ef0 allocated at reserve.c:621 Orphaned buffer:  lisa-sd     40
> bytes buf=80cc528 allocated at alist.c:53 Orphaned buffer:  lisa-sd    128
> bytes buf=80d2de0 allocated at bnet.c:667 Orphaned buffer:  lisa-sd      7
> bytes buf=80d2d50 allocated at bnet.c:675 Orphaned buffer:  lisa-sd     12
> bytes buf=80cc570 allocated at bnet.c:676 Orphaned buffer:  lisa-sd  65652
> bytes buf=80f2dc8 allocated at bsock.c:583
>
> Program exited with code 01.
> (gdb) backtrace
> No stack.
> (gdb) print my_name
> $1 = '\0' <repeats 29 times>
> (gdb) bt
> No stack.
> (gdb) thread apply all bt
> (gdb) f 0
> No stack.
> (gdb) info locals
> No registers.
> (gdb) bt
> No stack.
> (gdb) f 1
> No stack.
> (gdb) info locals
> No registers.
> (gdb) f 2
> No stack.
> (gdb) info locals
> No registers.
> (gdb) f 3
> No stack.
> (gdb) info locals
> No registers.
> (gdb) f 4
> No stack.
> (gdb) info locals
> No registers.
> (gdb) f 5
> No stack.
> (gdb) info locals
> No registers.
> (gdb) f 6
> No stack.
> (gdb) info locals
> No registers.
> (gdb) f 7
> No stack.
> (gdb) info locals
> No registers.
> (gdb) detach
> (gdb) quit
> [EMAIL PROTECTED]:~#
>
> > - Make the smartalloc routine dump the *full* information it has on the
> > buffer that was overrun.
>
> How can I do this?
>
> > - Run the SD with valgrind, maybe it will point out what is overrunning
> > the buffer.
>
> Do I need somw special options to make this work?
>
> [EMAIL PROTECTED]:~# valgrind /usr/sbin/bacula-sd -f -c
> /etc/bacula/bacula-sd.conf
> ==8314== Memcheck, a memory error detector for x86-linux.
> ==8314== Copyright (C) 2002-2005, and GNU GPL'd, by Julian Seward et al.
> ==8314== Using valgrind-2.4.0, a program supervision framework for
> x86-linux. ==8314== Copyright (C) 2000-2005, and GNU GPL'd, by Julian
> Seward et al. ==8314== For more details, rerun with: -v
> ==8314==
> ==8314== Signal 11 (SIGSEGV) appears to have lost its siginfo; I can't go
> on. ==8314==   This may be because one of your programs has consumed your
> ==8314==   ration of siginfo structures.
> ==8314== Signal 11 (SIGSEGV) appears to have lost its siginfo; I can't go
> on. ==8314==   This may be because one of your programs has consumed your
> ==8314==   ration of siginfo structures.
> ==8314==
> ==8314== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
> ==8314== malloc/free: in use at exit: 0 bytes in 0 blocks.
> ==8314== malloc/free: 0 allocs, 0 frees, 0 bytes allocated.
> ==8314== For counts of detected errors, rerun with: -v
> ==8314== No malloc'd blocks -- no leaks are possible.
> Segmentation fault
> [EMAIL PROTECTED]:~#
>
> > - Figure out why it is the only SD crashing (at least that is what I
> > deduce from "one of my SD ... crashes ..." (paraphrased).
>
> The other one is on debian etch amd64 and is only storing to disk.
> The crashing SD is on the same machine as the Dir.
>
> > - Back up to 2.0.3, rebuild it and see if it crashes too in the same way.
>
> I built a 2.0.3 SD and dir and did the same job again (still 2.2.4 FD):
>
> *Worked perfectly*, no crash. So I think we can eliminate the case
> of any hardware failure.
>
> 16-Sep 02:29 lisa-sd: Job write elapsed time = 00:02:52, Transfer rate =
> 4.397 M bytes/second 16-Sep 02:29 lisa-sd: Sending spooled attrs to the
> Director.  Despooling 29,772 bytes ... 16-Sep 02:29 lisa-dir: Bacula 2.0.3
> (06Mar07): 16-Sep-2007 02:29:25 JobId:                  536
>   Job:                    lisa-ImportantData.2007-09-16_02.24.18
>   Backup Level:           Incremental, since=2007-09-14 03:06:16
>   Client:                 "lisa-fd" 2.2.4 (14Sep07)
> i386-pc-linux-gnu,debian,3.1 FileSet:                "lisa ImportantData
> FileSet" 2007-03-29 13:23:26 Pool:                   "Default" (From Job
> resource)
>   Storage:                "lisa-sd" (From Job resource)
>   Scheduled time:         16-Sep-2007 02:24:12
>   Start time:             16-Sep-2007 02:25:36
>   End time:               16-Sep-2007 02:29:25
>   Elapsed time:           3 mins 49 secs
>   Priority:               500
>   FD Files Written:       116
>   SD Files Written:       116
>   FD Bytes Written:       756,371,227 (756.3 MB)
>   SD Bytes Written:       756,383,199 (756.3 MB)
>   Rate:                   3302.9 KB/s
>   Software Compression:   None
>   VSS:                    no
>   Encryption:             no
>   Volume name(s):         Tape_13
>   Volume Session Id:      1
>   Volume Session Time:    1189902192
>   Last Volume Bytes:      1,497,904,128 (1.497 GB)
>   Non-fatal FD errors:    0
>   SD Errors:              0
>   FD termination status:  OK
>   SD termination status:  OK
>   Termination:            Backup OK
>
> 16-Sep 02:29 lisa-dir: Begin pruning Jobs.
> 16-Sep 02:29 lisa-dir: No Jobs found to prune.
> 16-Sep 02:29 lisa-dir: Begin pruning Files.
> 16-Sep 02:29 lisa-dir: No Files found to prune.
> 16-Sep 02:29 lisa-dir: End auto prune.
>
> *status client=lisa-fd
> Connecting to Client lisa-fd at lisa:9102
>
> lisa-fd Version: 2.2.4 (14 September 2007)  i386-pc-linux-gnu debian 3.1
> Daemon started 15-Sep-07 13:23, 6 Jobs run since started.
>  Heap: heap=694,224 smbytes=215,620 max_bytes=317,828 bufs=80 max_bufs=200
>  Sizeof: boffset_t=8 size_t=4 debug=1 trace=0
>
> Running Jobs:
> Director connected at: 16-Sep-07 02:29
> No Jobs running.
> ====
>
> Terminated Jobs:
>  JobId  Level    Files      Bytes   Status   Finished        Name
> ======================================================================
> [...]
>    535  Incr        114    740.2 M  OK       16-Sep-07 01:31
> lisa-ImportantData 536  Incr        116    756.3 M  OK       16-Sep-07
> 02:29 lisa-ImportantData ====
> *
>
>
> Any more hints?

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Bacula-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-devel

Reply via email to