-----BEGIN PGP SIGNED MESSAGE-----
Hello everybody,
I am dealing with a segmentation fault error on one of my bacula-fd clients.
It's running 5.2.13 on SPARC Solaris 10 Generic_147440-01. According to the
debug output it is caused by
abudhabi-rad1-sys-fd: lockmgr.c:360-0 ASSERT failed at bnet_server.c:209:
current >= 0
Bacula interrupted by signal 11: Segmentation Fault
I am able to reproduce this by querying the client status from bconsole a
SECOND time after restarting bacula-fd. The first time it works fine but the
second time it crashes. It happens even when I try to run backup jobs. The
first one succeeds, the second one crashes with the same assert problem. Or if
I try to query the status of the client while a backup is running, i.e. the
second connection after restart.
Here is the backtrace of the time when it crashes:
- ----- bacula.1203.traceback ------
[New process 1203]
Retry #1:
Retry #2:
Retry #3:
Retry #4:
[Thread debugging using libthread_db enabled]
[New LWP 3 ]
[New LWP 2 ]
[New Thread 1 ]
[New Thread 2 (LWP 2)]
[New Thread 3 ]
[Switching to Thread 1 ]
0xfebca710 in __lwp_park () from /lib/libc.so.1
$1 = '\000' <repeats 29 times>
$2 = 0x47530 "bacula-fd"
$3 = 0x0
$4 = 0x0
$5 = 0xff2985d0 "5.2.13 (19 February 2013)"
$6 = 0xff2985a8 "sparc-sun-solaris2.10"
$7 = 0xff2985a0 "solaris"
$8 = 0xff298598 "5.10"
$9 = "abudhabi-rad1-sys", '\000' <repeats 20 times>
$10 = 0xff2985c0 "solaris 5.10"
$11 = 0
Environment variable "TestName" not defined.
#0 0xfebca710 in __lwp_park () from /lib/libc.so.1
#1 0xfebc2a78 in mutex_lock_queue () from /lib/libc.so.1
#2 0xfebace94 in _flockget () from /lib/libc.so.1
#3 0xfebadbf8 in fclose () from /lib/libc.so.1
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
Thread 6 (Thread 3 ):
#0 0xfebca710 in __lwp_park () from /lib/libc.so.1
#1 0xfebc459c in cond_sleep_queue () from /lib/libc.so.1
#2 0xfebc4760 in cond_wait_queue () from /lib/libc.so.1
#3 0xfebc4ba4 in cond_wait_common () from /lib/libc.so.1
#4 0xfebc4d38 in _cond_timedwait () from /lib/libc.so.1
#5 0xfebc4e2c in cond_timedwait () from /lib/libc.so.1
#6 0xfebc4e6c in pthread_cond_timedwait () from /lib/libc.so.1
#7 0xff2919ec in bthread_cond_timedwait_p (cond=0xff2ae3d0, m=0xff2ae3b8,
abstime=0xfe8fbf18, file=0xff29ab68 "watchdog.c", line=321) at lockmgr.c:824
#8 0xff28af50 in watchdog_thread (arg=<optimized out>) at watchdog.c:321
#9 0xff2912d4 in lmgr_thread_launcher (x=0x493f0) at lockmgr.c:939
#10 0xfebca678 in _lwp_start () from /lib/libc.so.1
#11 0xfebca678 in _lwp_start () from /lib/libc.so.1
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Thread 5 (Thread 2 (LWP 2)):
#0 0xfebce658 in _waitid () from /lib/libc.so.1
#1 0xfeb6f81c in _waitpid () from /lib/libc.so.1
#2 0xfebbe000 in waitpid () from /lib/libc.so.1
#3 0xff281dc0 in signal_handler (sig=11) at signal.c:237
#4 <signal handler called>
#5 0xff2924cc in lmgr_thread_t::do_V (this=0x5dcd8, m=0xff2ae158, f=0xff293a20
"bnet_server.c", l=209) at lockmgr.c:360
#6 0xff2918b0 in bthread_mutex_unlock_p (m=0xff2ae158, file=0xff293a20
"bnet_server.c", line=209) at lockmgr.c:793
#7 0xff262bf4 in bnet_thread_server (addr_list=<optimized out>,
max_clients=20, client_wq=0x47180, handle_client_request=0x242dc
<handle_client_request(void*)>) at bnet_server.c:209
#8 0x0002e4ec in main (argc=<optimized out>, argv=<optimized out>) at
filed.c:278
Thread 4 (Thread 1 ):
#0 0xfebca710 in __lwp_park () from /lib/libc.so.1
#1 0xfebc2a78 in mutex_lock_queue () from /lib/libc.so.1
#2 0xfebace94 in _flockget () from /lib/libc.so.1
#3 0xfebadbf8 in fclose () from /lib/libc.so.1
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
Thread 3 (LWP 2 ):
#0 0xfebce658 in _waitid () from /lib/libc.so.1
#1 0xfeb6f81c in _waitpid () from /lib/libc.so.1
#2 0xfebbe000 in waitpid () from /lib/libc.so.1
#3 0xff281dc0 in signal_handler (sig=11) at signal.c:237
#4 <signal handler called>
#5 0xff2924cc in lmgr_thread_t::do_V (this=0x5dcd8, m=0xff2ae158, f=0xff293a20
"bnet_server.c", l=209) at lockmgr.c:360
#6 0xff2918b0 in bthread_mutex_unlock_p (m=0xff2ae158, file=0xff293a20
"bnet_server.c", line=209) at lockmgr.c:793
#7 0xff262bf4 in bnet_thread_server (addr_list=<optimized out>,
max_clients=20, client_wq=0x47180, handle_client_request=0x242dc
<handle_client_request(void*)>) at bnet_server.c:209
#8 0x0002e4ec in main (argc=<optimized out>, argv=<optimized out>) at
filed.c:278
Thread 2 (LWP 3 ):
#0 0xfebca710 in __lwp_park () from /lib/libc.so.1
#1 0xfebc459c in cond_sleep_queue () from /lib/libc.so.1
#2 0xfebc4760 in cond_wait_queue () from /lib/libc.so.1
#3 0xfebc4ba4 in cond_wait_common () from /lib/libc.so.1
#4 0xfebc4d38 in _cond_timedwait () from /lib/libc.so.1
#5 0xfebc4e2c in cond_timedwait () from /lib/libc.so.1
#6 0xfebc4e6c in pthread_cond_timedwait () from /lib/libc.so.1
#7 0xff2919ec in bthread_cond_timedwait_p (cond=0xff2ae3d0, m=0xff2ae3b8,
abstime=0xfe8fbf18, file=0xff29ab68 "watchdog.c", line=321) at lockmgr.c:824
#8 0xff28af50 in watchdog_thread (arg=<optimized out>) at watchdog.c:321
#9 0xff2912d4 in lmgr_thread_launcher (x=0x493f0) at lockmgr.c:939
#10 0xfebca678 in _lwp_start () from /lib/libc.so.1
#11 0xfebca678 in _lwp_start () from /lib/libc.so.1
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Thread 1 (LWP 1 ):
#0 0xfebca710 in __lwp_park () from /lib/libc.so.1
#1 0xfebc2a78 in mutex_lock_queue () from /lib/libc.so.1
#2 0xfebace94 in _flockget () from /lib/libc.so.1
#3 0xfebadbf8 in fclose () from /lib/libc.so.1
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
#0 0xfebca710 in __lwp_park () from /lib/libc.so.1
No symbol table info available.
#1 0xfebc2a78 in mutex_lock_queue () from /lib/libc.so.1
No symbol table info available.
#2 0xfebace94 in _flockget () from /lib/libc.so.1
No symbol table info available.
#3 0xfebadbf8 in fclose () from /lib/libc.so.1
No symbol table info available.
#0 0x00000000 in ?? ()
No symbol table info available.
#0 0x00000000 in ?? ()
No symbol table info available.
#0 0x00000000 in ?? ()
No symbol table info available.
#0 0x00000000 in ?? ()
No symbol table info available.
- ----- SNIP -----
And here is the debug output from bacula-fd -c ../etc/bacula-fd.conf -v -f -d799
- ------ SNIP -----
# ./bacula-fd -c ../etc/bacula-fd.conf -v -f -d799
bacula-fd: lex.c:185-0 Open config file: ../etc/bacula-fd.conf
bacula-fd: filed_conf.c:452-0 Inserting director res: bacula-mon
bacula-fd: lex.c:185-0 Open config file: ../etc/bacula-fd.conf
abudhabi-rad1-sys-fd: message.c:504-0 Close_msg jcr=0
abudhabi-rad1-sys-fd: message.c:347-0 Copy message resource 4a040 to 48300
abudhabi-rad1-sys-fd: bsys.c:556-0 Could not open state file. sfd=-1 size=192:
ERR=No such file or directory
abudhabi-rad1-sys-fd: fd_plugins.c:1100-0 plugin dir is NULL
abudhabi-rad1-sys-fd: filed.c:276-0 filed: listening on port 9102
abudhabi-rad1-sys-fd: bnet_server.c:112-0 Addresses host[ipv4:0.0.0.0:9102]
abudhabi-rad1-sys-fd: bnet.c:766-0 who=client host=128.122.128.60 port=9102
abudhabi-rad1-sys-fd: find.c:81-0 init_find_files ff=629d0
abudhabi-rad1-sys-fd: job.c:270-0 <dird: Hello Director bacula-dir calling
abudhabi-rad1-sys-fd: job.c:286-0 Executing Hello command.
abudhabi-rad1-sys-fd: job.c:436-0 Calling Authenticate
abudhabi-rad1-sys-fd: cram-md5.c:72-0 send: auth cram-md5
<146880531.1365609328@abudhabi-rad1-sys-fd> ssl=0
abudhabi-rad1-sys-fd: cram-md5.c:131-0 cram-get received: auth cram-md5
<1124129763.1365609328@bacula-dir> ssl=0
abudhabi-rad1-sys-fd: cram-md5.c:150-0 sending resp to challenge:
vSJPvl/W69/Ag6Zs83+6+C
abudhabi-rad1-sys-fd: job.c:440-0 OK Authenticate
abudhabi-rad1-sys-fd: job.c:270-0 <dird: JobId=0
Job=-Console-.2013-04-10_11.40.56_40 SDid=0 SDtime=0 Authorization=dummy
abudhabi-rad1-sys-fd: job.c:286-0 Executing JobId= command.
abudhabi-rad1-sys-fd: job.c:1737-0 set sd auth key
abudhabi-rad1-sys-fd: job.c:544-0 JobId=0 Auth=dummy
abudhabi-rad1-sys-fd: fd_plugins.c:1197-0 plugin list is NULL
abudhabi-rad1-sys-fd: job.c:270-0 <dird: statusabudhabi-rad1-sys-fd:
job.c:286-0 Executing status command.
abudhabi-rad1-sys-fd: runscript.c:108-0 runscript: running all RUNSCRIPT object
(ClientAfterJob) JobStatus=C
abudhabi-rad1-sys-fd: job.c:399-0 Calling term_find_files
abudhabi-rad1-sys-fd: job.c:404-0 Done with term_find_files
abudhabi-rad1-sys-fd: runscript.c:286-0 runscript: freeing all RUNSCRIPTS object
abudhabi-rad1-sys-fd: message.c:504-0 Close_msg jcr=62580
abudhabi-rad1-sys-fd: message.c:504-0 Close_msg jcr=0
abudhabi-rad1-sys-fd: job.c:406-0 Done with free_jcr
abudhabi-rad1-sys-fd: mem_pool.c:375-0 garbage collect memory pool
That was the first client status query. This is the debug output of the second
query:
abudhabi-rad1-sys-fd: lockmgr.c:360-0 ASSERT failed at bnet_server.c:209:
current >= 0
Bacula interrupted by signal 11: Segmentation Fault
Kaboom! bacula-fd, abudhabi-rad1-sys-fd got signal 11 - Segmentation Fault.
Attempting traceback.
Kaboom! exepath=/usr/local/bacula/sbin
abudhabi-rad1-sys-fd: signal.c:205-0 Working=/usr/local/bacula/var
abudhabi-rad1-sys-fd: signal.c:206-0 btpath=/usr/local/bacula/sbin/btraceback
abudhabi-rad1-sys-fd: signal.c:207-0 exepath=/usr/local/bacula/sbin/bacula-fd
abudhabi-rad1-sys-fd: signal.c:236-0 Doing waitpid
Calling: /usr/local/bacula/sbin/btraceback /usr/local/bacula/sbin/bacula-fd
1203 /usr/local/bacula/var
gcore: /usr/local/bacula/var/bacula-fd.1203 dumped
/usr/local/bacula/sbin/btraceback: /usr/local/bacula/sbin/bsmtp: not found
abudhabi-rad1-sys-fd: signalThe btraceback call returned 1
Dumping: /usr/local/bacula/var/abudhabi-rad1-sys-fd.1203.bactrace
cat: write error: Broken pipe
- ----- SNIP -----
I'll be happy to provide more information if needed.
Thanks!
- - Michael
- --
Michael Hocke New York University
Sr UNIX Systems Administrator Information Technology Services
C&CS COS
-----BEGIN PGP SIGNATURE-----
Version: PGP Desktop 10.0.3 (Build 1)
Charset: us-ascii
wsBVAwUBUWWw/5bfnpCg64TVAQGAWggAp+gq0qVwciCCYarrO/3fSshpl7svySeK
wtvxEcGx90c86Hb8KMb33F7XmB2uiwM/e2roMeHh7Q8qrD2RxmFVkUmrZvp5usq6
ttL2NC72nVWkqtg6axeOjcQkcFQc6m6bsObDJv11p3LIcD78aHXUYellhU8RNXSZ
Zjh/zE2iIJ5MRJk9gcoaOOmicfMIaGLXScQAw2EJsD3TF/QsxoiXUbc3pwu9b5eI
yZZ4C5P1Z1RdZROp/AU3i417znTPCXaObaulEnnt96uGqaKU79lNq/g5eb/58qpd
lFy8/goB1Fd94J4/KG0zfoWMSf9POmeaBosBkrza9UkAU5TZ00yPVA==
=PVlP
-----END PGP SIGNATURE-----
------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Bacula-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-devel