Re: [Bacula-devel] (Fwd) Re: [Bacula-users] bacula-sd always crashing on job star

Kern Sibbald Sun, 16 Sep 2007 11:59:21 -0700

Hello,

It appears that it is dying on the posix_fadvise() OS call.  The call is in 
<bacula-source>/src/lib/bsock.c at line 492.  All the arguments to the call 
are valid, so, it would appear that something in the posix_fadvise() call 
(the OS I think) is broken.


You can test this theory by disabling the posix_fadvise() call by putting // 
at the beginning of the line and rebuilding.

You might also want to disable the posix_advise() calls in src/findlib/bfile.c 
(2 of them) and in src/stored/spool.c (one).

You can disable them all *after* doing a ./configure, by editing src/config.h 
and commenting out the #define HAV_POSIX_FADVISE  line.

Actually the crash is happening at the beginning of despooling attribute data.

Regards,

Kern

On Sunday 16 September 2007 13:53, Marc Schiffbauer wrote:
> * Kern Sibbald schrieb am 16.09.07 um 08:59 Uhr:
> > Please read the Kaboom chapter of the manual.  It will explain how to
> > manually run the program under the debugger.  I believe you left of the
> > -s -f options when running it so the traceback doesn't contain any useful
> > information.
>
> Ok thanks for the hint. Here is the traceback.
>
> Crash happens right after a job:
>
> *m
> 16-Sep 13:41 lisa-sd: Job write elapsed time = 00:02:37, Transfer rate =
> 4.489 M bytes/second 16-Sep 13:41 lisa-sd: Sending spooled attrs to the
> Director.  Despooling 12,242 bytes ... *
>
> ... then kaboom
>
> [EMAIL PROTECTED]:/usr/sbin# gdb /usr/sbin/bacula-sd
> GNU gdb 6.3-debian
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and
> you are
> welcome to change it and/or distribute copies of it under certain
> conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for
> details.
> This GDB was configured as "i386-linux"...Using host libthread_db
> library "/lib/libthread_db.so.1".
>
> (gdb) run -s -f -c /etc/bacula/bacula-sd.conf
> Starting program: /usr/sbin/bacula-sd -s -f -c
> /etc/bacula/bacula-sd.conf
> [Thread debugging using libthread_db enabled]
> [New Thread 16384 (LWP 14910)]
> [New Thread 32769 (LWP 14912)]
> [New Thread 16386 (LWP 14913)]
> [New Thread 32771 (LWP 14914)]
> [New Thread 49156 (LWP 14965)]
> [Thread 16386 (LWP 14913) exited]
> [New Thread 65541 (LWP 14975)]
> [Thread 65541 (LWP 14975) exited]
> [New Thread 81926 (LWP 14976)]
> [Thread 81926 (LWP 14976) exited]
> [New Thread 98311 (LWP 14994)]
> [Thread 98311 (LWP 14994) exited]
> [New Thread 114696 (LWP 14996)]
> [Thread 114696 (LWP 14996) exited]
>
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 49156 (LWP 14965)]
> 0x080e0018 in ?? ()
> (gdb) thread apply all bt
>
> Thread 5 (Thread 49156 (LWP 14965)):
> #0  0x080e0018 in ?? ()
> #1  0x00000000 in ?? ()
> #2  0x00000000 in ?? ()
> #3  0x0808751b in BSOCK::despool (this=0x80e0018,
> update_attr_spool_size=0x807df90 <update_attr_spool_size>, tsize=12242) at
> bsock.c:492 #4  0x0807e15b in commit_attribute_spool (jcr=0x80e0368) at
> spool.c:636 #5  0x08053dff in do_append_data (jcr=0x80e0368) at
> append.c:334
> #6  0x080691f8 in append_data_cmd (jcr=0x80e0368) at fd_cmds.c:194
> #7  0x08069131 in do_fd_commands (jcr=0x80e0368) at fd_cmds.c:165
> #8  0x08068f40 in run_job (jcr=0x80e0368) at fd_cmds.c:128
> #9  0x0806a517 in run_cmd (jcr=0x80e0368) at job.c:192
> #10 0x080636ba in handle_connection_request (arg=0x80e0018) at dircmd.c:224
> #11 0x080a2219 in workq_server (arg=0x80c0be0) at workq.c:357
> #12 0x40161e51 in pthread_start_thread () from /lib/libpthread.so.0
> #13 0x40161ecf in pthread_start_thread_event () from /lib/libpthread.so.0
> #14 0x404a68aa in clone () from /lib/libc.so.6
>
> Thread 4 (Thread 32771 (LWP 14914)):
> #0  0x40168456 in nanosleep () from /lib/libpthread.so.0
> #1  0x00000001 in ?? ()
> #2  0x4016452a in __pthread_timedsuspend_new () from /lib/libpthread.so.0
> #3  0x40161122 in pthread_cond_timedwait_relative () from
> /lib/libpthread.so.0 #4  0x080a183d in watchdog_thread (arg=0x0) at
> watchdog.c:307
> #5  0x40161e51 in pthread_start_thread () from /lib/libpthread.so.0
> #6  0x40161ecf in pthread_start_thread_event () from /lib/libpthread.so.0
> #7  0x404a68aa in clone () from /lib/libc.so.6
>
> Thread 2 (Thread 32769 (LWP 14912)):
> #0  0x4049da5a in poll () from /lib/libc.so.6
> #1  0x40161b50 in __pthread_manager () from /lib/libpthread.so.0
> #2  0x40161d57 in __pthread_manager_event () from /lib/libpthread.so.0
> #3  0x404a68aa in clone () from /lib/libc.so.6
>
> Thread 1 (Thread 16384 (LWP 14910)):
> #0  0x404a0001 in select () from /lib/libc.so.6
> #1  0x00000009 in ?? ()
> #2  0x404fec80 in ?? () from /lib/libc.so.6
> #3  0xbfffec00 in ?? ()
> #4  0x00000000 in ?? ()
> #5  0x08085c32 in bnet_thread_server (addrs=0x80c24a8, max_clients=-514,
> client_wq=0x80c0be0, handle_client_request=0xfffffdfe) at bnet_server.c:161
> #6  0x0804d2a4 in main (argc=0, argv=0x804dde0) at stored.c:263
> #0  0x080e0018 in ?? ()
> (gdb)
>
> is this useful?
>
> > For vagrind, I don't know what options to use, please read their
> > documentation.
> >
> > For improving the output from smartalloc, find all the sm_check() calls
> > in the SD and modify them to have true as the last argument.  You can
> > also add more sm_checks at various strategic places where you think the
> > code breaks to get closer to the problem.
>
> valgrind / smartalloc output will follow later...
>
> -Marc

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Bacula-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-devel

Re: [Bacula-devel] (Fwd) Re: [Bacula-users] bacula-sd always crashing on job star

Reply via email to