Re: [Bacula-users] Director crash- again with traceback
No one has any ideas on what would have caused this. Based on the trace dump it looks like there is a problem with the scheduler. Any pointers as to what I can look at? thanks, --- Jerold Lowry IT Manager / Software Engineer Engineering Design Team (EDT), Inc. a HEICO company 1400 NW Compton Drive, Suite 315 Beaverton, Oregon 97006 (U.S.A.) Phone: 503-690-1234 / 800-435-4320 Fax: 503-690-1243 Web: _www.edt.com http://www.edt.com/_ On 1/11/2011 9:12 AM, jerry lowry wrote: I really hate when I do that!!! [?1034h[Thread debugging using libthread_db enabled] [New Thread 0x7f8362bfd710 (LWP 9002)] [New Thread 0x7f8363fff710 (LWP 3111)] [New Thread 0x7f8368c49710 (LWP 3110)] 0x003377a0e91d in nanosleep () from /lib64/libpthread.so.0 $1 = '\000'repeats 29 times $2 = 0x1fe2068 bacula-dir $3 = 0x1fe20a8 /usr/bacula/bin/bacula-dir $4 = 0x7f834c004328 MySQL $5 = 0x7f836eadbd9e 5.0.1 (24 February 2010) $6 = 0x7f836eadbdb7 x86_64-unknown-linux-gnu $7 = 0x7f836eadbdd0 redhat $8 = 0x7f836eadba7c $9 = distress, '\000'repeats 41 times #0 0x003377a0e91d in nanosleep () from /lib64/libpthread.so.0 #1 0x7f836eaae6f7 in bmicrosleep (sec=60, usec=0) at bsys.c:61 #2 0x0042e1d5 in wait_for_next_job ( one_shot_job_to_run=value optimized out) at scheduler.c:131 #3 0x0040d93d in main (argc=value optimized out, argv=value optimized out) at dird.c:338 Thread 4 (Thread 0x7f8368c49710 (LWP 3110)): #0 0x0033772d7393 in select () from /lib64/libc.so.6 #1 0x7f836eab0ad4 in bnet_thread_server (addrs=value optimized out, max_clients=value optimized out, client_wq=value optimized out, handle_client_request=value optimized out) at bnet_server.c:161 #2 0x004468fc in connect_thread (arg=0x1fe3ee8) at ua_server.c:82 #3 0x003377a06a3a in start_thread () from /lib64/libpthread.so.0 #4 0x0033772de62d in clone () from /lib64/libc.so.6 #5 0x in ?? () Thread 3 (Thread 0x7f8363fff710 (LWP 3111)): #0 0x003377a0b3b9 inpthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x7f836ead402c in watchdog_thread (arg=value optimized out) at watchdog.c:308 #2 0x003377a06a3a in start_thread () from /lib64/libpthread.so.0 #3 0x0033772de62d in clone () from /lib64/libc.so.6 #4 0x in ?? () Thread 2 (Thread 0x7f8362bfd710 (LWP 9002)): #0 0x003377a0ec8d in waitpid () from /lib64/libpthread.so.0 #1 0x7f836eacb7ad in signal_handler (sig=11) at signal.c:229 #2signal handler called #3 0x003377a0c280 in pthread_kill () from /lib64/libpthread.so.0 #4 0x00420eba in cancel_storage_daemon_job (jcr=0x7f834c01c2f8) at job.c:515 #5 0x00410b50 in wait_for_job_termination (jcr=0x7f834c01c2f8, timeout=value optimized out) at backup.c:538 #6 0x004116f0 in do_backup (jcr=0x7f834c01c2f8) at backup.c:456 #7 0x00421fd4 in job_thread (arg=0x7f834c01c2f8) at job.c:314 #8 0x00423624 in jobq_server (arg=0x673b40) at jobq.c:450 #9 0x003377a06a3a in start_thread () from /lib64/libpthread.so.0 #10 0x0033772de62d in clone () from /lib64/libc.so.6 #11 0x in ?? () Thread 1 (Thread 0x7f836ea7b7e0 (LWP 3106)): #0 0x003377a0e91d in nanosleep () from /lib64/libpthread.so.0 #1 0x7f836eaae6f7 in bmicrosleep (sec=60, usec=0) at bsys.c:61 #2 0x0042e1d5 in wait_for_next_job ( one_shot_job_to_run=value optimized out) at scheduler.c:131 #3 0x0040d93d in main (argc=value optimized out, argv=value optimized out) at dird.c:338 #0 0x003377a0e91d in nanosleep () from /lib64/libpthread.so.0 No symbol table info available. #1 0x7f836eaae6f7 in bmicrosleep (sec=60, usec=0) at bsys.c:61 61 stat = nanosleep(timeout, NULL); timeout = {tv_sec = 60, tv_nsec = 0} tv = {tv_sec = 90194313216, tv_usec = 140202474247679} tz = {tz_minuteswest = 372, tz_dsttime = 0} stat =value optimized out #2 0x0042e1d5 in wait_for_next_job ( one_shot_job_to_run=value optimized out) at scheduler.c:131 131 bmicrosleep(next_check_secs, 0); /* recheck once per minute */ jcr =value optimized out job =value optimized out run =value optimized out now =value optimized out prev =value optimized out first = false next_job =value optimized out #3 0x0040d93d in main (argc=value optimized out, argv=value optimized out) at dird.c:338 338while ( (jcr = wait_for_next_job(runjob)) ) { jcr =value optimized out test_config = false ch =value optimized out no_signals = false uid = 0x0 gid = 0x0 mode =value optimized out #0 0x in ?? () No symbol table info available. #0 0x in ?? () No symbol table info available. #0 0x in ?? () No symbol table info available. #0 0x in ?? () No symbol table info available. Original Message Subject:
[Bacula-users] Director crash- again with traceback
I really hate when I do that!!! [?1034h[Thread debugging using libthread_db enabled] [New Thread 0x7f8362bfd710 (LWP 9002)] [New Thread 0x7f8363fff710 (LWP 3111)] [New Thread 0x7f8368c49710 (LWP 3110)] 0x003377a0e91d in nanosleep () from /lib64/libpthread.so.0 $1 = '\000'repeats 29 times $2 = 0x1fe2068 bacula-dir $3 = 0x1fe20a8 /usr/bacula/bin/bacula-dir $4 = 0x7f834c004328 MySQL $5 = 0x7f836eadbd9e 5.0.1 (24 February 2010) $6 = 0x7f836eadbdb7 x86_64-unknown-linux-gnu $7 = 0x7f836eadbdd0 redhat $8 = 0x7f836eadba7c $9 = distress, '\000'repeats 41 times #0 0x003377a0e91d in nanosleep () from /lib64/libpthread.so.0 #1 0x7f836eaae6f7 in bmicrosleep (sec=60, usec=0) at bsys.c:61 #2 0x0042e1d5 in wait_for_next_job ( one_shot_job_to_run=value optimized out) at scheduler.c:131 #3 0x0040d93d in main (argc=value optimized out, argv=value optimized out) at dird.c:338 Thread 4 (Thread 0x7f8368c49710 (LWP 3110)): #0 0x0033772d7393 in select () from /lib64/libc.so.6 #1 0x7f836eab0ad4 in bnet_thread_server (addrs=value optimized out, max_clients=value optimized out, client_wq=value optimized out, handle_client_request=value optimized out) at bnet_server.c:161 #2 0x004468fc in connect_thread (arg=0x1fe3ee8) at ua_server.c:82 #3 0x003377a06a3a in start_thread () from /lib64/libpthread.so.0 #4 0x0033772de62d in clone () from /lib64/libc.so.6 #5 0x in ?? () Thread 3 (Thread 0x7f8363fff710 (LWP 3111)): #0 0x003377a0b3b9 inpthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x7f836ead402c in watchdog_thread (arg=value optimized out) at watchdog.c:308 #2 0x003377a06a3a in start_thread () from /lib64/libpthread.so.0 #3 0x0033772de62d in clone () from /lib64/libc.so.6 #4 0x in ?? () Thread 2 (Thread 0x7f8362bfd710 (LWP 9002)): #0 0x003377a0ec8d in waitpid () from /lib64/libpthread.so.0 #1 0x7f836eacb7ad in signal_handler (sig=11) at signal.c:229 #2signal handler called #3 0x003377a0c280 in pthread_kill () from /lib64/libpthread.so.0 #4 0x00420eba in cancel_storage_daemon_job (jcr=0x7f834c01c2f8) at job.c:515 #5 0x00410b50 in wait_for_job_termination (jcr=0x7f834c01c2f8, timeout=value optimized out) at backup.c:538 #6 0x004116f0 in do_backup (jcr=0x7f834c01c2f8) at backup.c:456 #7 0x00421fd4 in job_thread (arg=0x7f834c01c2f8) at job.c:314 #8 0x00423624 in jobq_server (arg=0x673b40) at jobq.c:450 #9 0x003377a06a3a in start_thread () from /lib64/libpthread.so.0 #10 0x0033772de62d in clone () from /lib64/libc.so.6 #11 0x in ?? () Thread 1 (Thread 0x7f836ea7b7e0 (LWP 3106)): #0 0x003377a0e91d in nanosleep () from /lib64/libpthread.so.0 #1 0x7f836eaae6f7 in bmicrosleep (sec=60, usec=0) at bsys.c:61 #2 0x0042e1d5 in wait_for_next_job ( one_shot_job_to_run=value optimized out) at scheduler.c:131 #3 0x0040d93d in main (argc=value optimized out, argv=value optimized out) at dird.c:338 #0 0x003377a0e91d in nanosleep () from /lib64/libpthread.so.0 No symbol table info available. #1 0x7f836eaae6f7 in bmicrosleep (sec=60, usec=0) at bsys.c:61 61 stat = nanosleep(timeout, NULL); timeout = {tv_sec = 60, tv_nsec = 0} tv = {tv_sec = 90194313216, tv_usec = 140202474247679} tz = {tz_minuteswest = 372, tz_dsttime = 0} stat =value optimized out #2 0x0042e1d5 in wait_for_next_job ( one_shot_job_to_run=value optimized out) at scheduler.c:131 131 bmicrosleep(next_check_secs, 0); /* recheck once per minute */ jcr =value optimized out job =value optimized out run =value optimized out now =value optimized out prev =value optimized out first = false next_job =value optimized out #3 0x0040d93d in main (argc=value optimized out, argv=value optimized out) at dird.c:338 338while ( (jcr = wait_for_next_job(runjob)) ) { jcr =value optimized out test_config = false ch =value optimized out no_signals = false uid = 0x0 gid = 0x0 mode =value optimized out #0 0x in ?? () No symbol table info available. #0 0x in ?? () No symbol table info available. #0 0x in ?? () No symbol table info available. #0 0x in ?? () No symbol table info available. Original Message Subject:Director crash Date: Tue, 11 Jan 2011 09:11:17 -0800 From: jerry lowry jlo...@edt.com To: bacula-users@lists.sourceforge.net Hi list, I came in this morning and found that my director had died last night after doing two of the backups. The traceback follows at the end. This is the scenario: I noticed yesterday that the only two jobs that were scheduled to be performed last night were a monthly backup and the catalog backup. Given that I did not have the time to research why the other 5 backups were not scheduled I started BAT and selected