Re: [Bacula-users] Director crash- again with traceback

2011-01-14 Thread jerry lowry
No one has any ideas on what would have caused this.  Based on the trace 
dump it looks like there is a problem with the scheduler.  Any pointers 
as to what I can look at?


thanks,

---
Jerold Lowry
IT Manager / Software Engineer
Engineering Design Team (EDT), Inc. a HEICO company
1400 NW Compton Drive, Suite 315
Beaverton, Oregon 97006 (U.S.A.)
Phone: 503-690-1234 / 800-435-4320
Fax: 503-690-1243
Web: _www.edt.com http://www.edt.com/_



On 1/11/2011 9:12 AM, jerry lowry wrote:

I really hate when I do that!!!

[?1034h[Thread debugging using libthread_db enabled]
[New Thread 0x7f8362bfd710 (LWP 9002)]
[New Thread 0x7f8363fff710 (LWP 3111)]
[New Thread 0x7f8368c49710 (LWP 3110)]
0x003377a0e91d in nanosleep () from /lib64/libpthread.so.0
$1 = '\000'repeats 29 times
$2 = 0x1fe2068 bacula-dir
$3 = 0x1fe20a8 /usr/bacula/bin/bacula-dir
$4 = 0x7f834c004328 MySQL
$5 = 0x7f836eadbd9e 5.0.1 (24 February 2010)
$6 = 0x7f836eadbdb7 x86_64-unknown-linux-gnu
$7 = 0x7f836eadbdd0 redhat
$8 = 0x7f836eadba7c 
$9 = distress, '\000'repeats 41 times
#0  0x003377a0e91d in nanosleep () from /lib64/libpthread.so.0
#1  0x7f836eaae6f7 in bmicrosleep (sec=60, usec=0) at bsys.c:61
#2  0x0042e1d5 in wait_for_next_job (
 one_shot_job_to_run=value optimized out) at scheduler.c:131
#3  0x0040d93d in main (argc=value optimized out,
 argv=value optimized out) at dird.c:338

Thread 4 (Thread 0x7f8368c49710 (LWP 3110)):
#0  0x0033772d7393 in select () from /lib64/libc.so.6
#1  0x7f836eab0ad4 in bnet_thread_server (addrs=value optimized out,
 max_clients=value optimized out, client_wq=value optimized out,
 handle_client_request=value optimized out) at bnet_server.c:161
#2  0x004468fc in connect_thread (arg=0x1fe3ee8) at ua_server.c:82
#3  0x003377a06a3a in start_thread () from /lib64/libpthread.so.0
#4  0x0033772de62d in clone () from /lib64/libc.so.6
#5  0x in ?? ()

Thread 3 (Thread 0x7f8363fff710 (LWP 3111)):
#0  0x003377a0b3b9 inpthread_cond_timedwait@@GLIBC_2.3.2  ()
from /lib64/libpthread.so.0
#1  0x7f836ead402c in watchdog_thread (arg=value optimized out)
 at watchdog.c:308
#2  0x003377a06a3a in start_thread () from /lib64/libpthread.so.0
#3  0x0033772de62d in clone () from /lib64/libc.so.6
#4  0x in ?? ()

Thread 2 (Thread 0x7f8362bfd710 (LWP 9002)):
#0  0x003377a0ec8d in waitpid () from /lib64/libpthread.so.0
#1  0x7f836eacb7ad in signal_handler (sig=11) at signal.c:229
#2signal handler called
#3  0x003377a0c280 in pthread_kill () from /lib64/libpthread.so.0
#4  0x00420eba in cancel_storage_daemon_job (jcr=0x7f834c01c2f8)
 at job.c:515
#5  0x00410b50 in wait_for_job_termination (jcr=0x7f834c01c2f8,
 timeout=value optimized out) at backup.c:538
#6  0x004116f0 in do_backup (jcr=0x7f834c01c2f8) at backup.c:456
#7  0x00421fd4 in job_thread (arg=0x7f834c01c2f8) at job.c:314
#8  0x00423624 in jobq_server (arg=0x673b40) at jobq.c:450
#9  0x003377a06a3a in start_thread () from /lib64/libpthread.so.0
#10 0x0033772de62d in clone () from /lib64/libc.so.6
#11 0x in ?? ()

Thread 1 (Thread 0x7f836ea7b7e0 (LWP 3106)):
#0  0x003377a0e91d in nanosleep () from /lib64/libpthread.so.0
#1  0x7f836eaae6f7 in bmicrosleep (sec=60, usec=0) at bsys.c:61
#2  0x0042e1d5 in wait_for_next_job (
 one_shot_job_to_run=value optimized out) at scheduler.c:131
#3  0x0040d93d in main (argc=value optimized out,
 argv=value optimized out) at dird.c:338
#0  0x003377a0e91d in nanosleep () from /lib64/libpthread.so.0
No symbol table info available.
#1  0x7f836eaae6f7 in bmicrosleep (sec=60, usec=0) at bsys.c:61
61 stat = nanosleep(timeout, NULL);
timeout = {tv_sec = 60, tv_nsec = 0}
tv = {tv_sec = 90194313216, tv_usec = 140202474247679}
tz = {tz_minuteswest = 372, tz_dsttime = 0}
stat =value optimized out
#2  0x0042e1d5 in wait_for_next_job (
 one_shot_job_to_run=value optimized out) at scheduler.c:131
131   bmicrosleep(next_check_secs, 0); /* recheck once per minute */
jcr =value optimized out
job =value optimized out
run =value optimized out
now =value optimized out
prev =value optimized out
first = false
next_job =value optimized out
#3  0x0040d93d in main (argc=value optimized out,
 argv=value optimized out) at dird.c:338
338while ( (jcr = wait_for_next_job(runjob)) ) {
jcr =value optimized out
test_config = false
ch =value optimized out
no_signals = false
uid = 0x0
gid = 0x0
mode =value optimized out
#0  0x in ?? ()
No symbol table info available.
#0  0x in ?? ()
No symbol table info available.
#0  0x in ?? ()
No symbol table info available.
#0  0x in ?? ()
No symbol table info available.


 Original Message 
Subject:

[Bacula-users] Director crash- again with traceback

2011-01-11 Thread jerry lowry

I really hate when I do that!!!

[?1034h[Thread debugging using libthread_db enabled]
[New Thread 0x7f8362bfd710 (LWP 9002)]
[New Thread 0x7f8363fff710 (LWP 3111)]
[New Thread 0x7f8368c49710 (LWP 3110)]
0x003377a0e91d in nanosleep () from /lib64/libpthread.so.0
$1 = '\000'repeats 29 times
$2 = 0x1fe2068 bacula-dir
$3 = 0x1fe20a8 /usr/bacula/bin/bacula-dir
$4 = 0x7f834c004328 MySQL
$5 = 0x7f836eadbd9e 5.0.1 (24 February 2010)
$6 = 0x7f836eadbdb7 x86_64-unknown-linux-gnu
$7 = 0x7f836eadbdd0 redhat
$8 = 0x7f836eadba7c 
$9 = distress, '\000'repeats 41 times
#0  0x003377a0e91d in nanosleep () from /lib64/libpthread.so.0
#1  0x7f836eaae6f7 in bmicrosleep (sec=60, usec=0) at bsys.c:61
#2  0x0042e1d5 in wait_for_next_job (
one_shot_job_to_run=value optimized out) at scheduler.c:131
#3  0x0040d93d in main (argc=value optimized out,
argv=value optimized out) at dird.c:338

Thread 4 (Thread 0x7f8368c49710 (LWP 3110)):
#0  0x0033772d7393 in select () from /lib64/libc.so.6
#1  0x7f836eab0ad4 in bnet_thread_server (addrs=value optimized out,
max_clients=value optimized out, client_wq=value optimized out,
handle_client_request=value optimized out) at bnet_server.c:161
#2  0x004468fc in connect_thread (arg=0x1fe3ee8) at ua_server.c:82
#3  0x003377a06a3a in start_thread () from /lib64/libpthread.so.0
#4  0x0033772de62d in clone () from /lib64/libc.so.6
#5  0x in ?? ()

Thread 3 (Thread 0x7f8363fff710 (LWP 3111)):
#0  0x003377a0b3b9 inpthread_cond_timedwait@@GLIBC_2.3.2  ()
   from /lib64/libpthread.so.0
#1  0x7f836ead402c in watchdog_thread (arg=value optimized out)
at watchdog.c:308
#2  0x003377a06a3a in start_thread () from /lib64/libpthread.so.0
#3  0x0033772de62d in clone () from /lib64/libc.so.6
#4  0x in ?? ()

Thread 2 (Thread 0x7f8362bfd710 (LWP 9002)):
#0  0x003377a0ec8d in waitpid () from /lib64/libpthread.so.0
#1  0x7f836eacb7ad in signal_handler (sig=11) at signal.c:229
#2signal handler called
#3  0x003377a0c280 in pthread_kill () from /lib64/libpthread.so.0
#4  0x00420eba in cancel_storage_daemon_job (jcr=0x7f834c01c2f8)
at job.c:515
#5  0x00410b50 in wait_for_job_termination (jcr=0x7f834c01c2f8,
timeout=value optimized out) at backup.c:538
#6  0x004116f0 in do_backup (jcr=0x7f834c01c2f8) at backup.c:456
#7  0x00421fd4 in job_thread (arg=0x7f834c01c2f8) at job.c:314
#8  0x00423624 in jobq_server (arg=0x673b40) at jobq.c:450
#9  0x003377a06a3a in start_thread () from /lib64/libpthread.so.0
#10 0x0033772de62d in clone () from /lib64/libc.so.6
#11 0x in ?? ()

Thread 1 (Thread 0x7f836ea7b7e0 (LWP 3106)):
#0  0x003377a0e91d in nanosleep () from /lib64/libpthread.so.0
#1  0x7f836eaae6f7 in bmicrosleep (sec=60, usec=0) at bsys.c:61
#2  0x0042e1d5 in wait_for_next_job (
one_shot_job_to_run=value optimized out) at scheduler.c:131
#3  0x0040d93d in main (argc=value optimized out,
argv=value optimized out) at dird.c:338
#0  0x003377a0e91d in nanosleep () from /lib64/libpthread.so.0
No symbol table info available.
#1  0x7f836eaae6f7 in bmicrosleep (sec=60, usec=0) at bsys.c:61
61 stat = nanosleep(timeout, NULL);
timeout = {tv_sec = 60, tv_nsec = 0}
tv = {tv_sec = 90194313216, tv_usec = 140202474247679}
tz = {tz_minuteswest = 372, tz_dsttime = 0}
stat =value optimized out
#2  0x0042e1d5 in wait_for_next_job (
one_shot_job_to_run=value optimized out) at scheduler.c:131
131   bmicrosleep(next_check_secs, 0); /* recheck once per minute */
jcr =value optimized out
job =value optimized out
run =value optimized out
now =value optimized out
prev =value optimized out
first = false
next_job =value optimized out
#3  0x0040d93d in main (argc=value optimized out,
argv=value optimized out) at dird.c:338
338while ( (jcr = wait_for_next_job(runjob)) ) {
jcr =value optimized out
test_config = false
ch =value optimized out
no_signals = false
uid = 0x0
gid = 0x0
mode =value optimized out
#0  0x in ?? ()
No symbol table info available.
#0  0x in ?? ()
No symbol table info available.
#0  0x in ?? ()
No symbol table info available.
#0  0x in ?? ()
No symbol table info available.



 Original Message 
Subject:Director crash
Date:   Tue, 11 Jan 2011 09:11:17 -0800
From:   jerry lowry jlo...@edt.com
To: bacula-users@lists.sourceforge.net



Hi list,

I came in this morning and found that my director had died last night 
after doing two of the backups.  The traceback follows at the end.

This is the scenario:

I noticed yesterday that the only two jobs that were scheduled to 
be performed last night were a monthly backup and the catalog backup.  
Given that I did not have the time to research why the other 5 backups 
were not scheduled I started BAT and selected