Hmmm -- ugly.  I wonder why all of a sudden we have a huge wave of users 
running into problems from the same code that was implemented at least two 
years ago -- some strange synchrony working here ??????

This looks like it is yet another manifestation of the MaxRunTime code timing 
out the job and canceling it.  In this case, the cancel just happened to 
occur while the Dir was manipulating a mutex, which caused it to fail, and 
Bacula aborted.  I've now changed the mutex in question to fail the job 
rather than to abort Bacula, but there are lots of other mutexes in the 
program that don't expect to get errors.

This particular crash will be fixed in the next beta of 1.38.4 to be released 
today or tomorrow.  A workaround for you would be to remove any timers you 
have set on the job for MaxRunTime or MaxWaitTime or to set them to  larger 
time values.  I have a feeling you didn't really want your job canceled 
anyway ...



On Thursday 12 January 2006 15:21, Benjamin Menking wrote:
> I was backing up a directory that was approximately 5GB.  The backup got
> about 2.5GB of the data and seg faulted.  Is this the right place to
> post these kind of messages?  I've included the daemon message and the
> traceback:
>
> ---
>
> 11-Jan 23:52 ensim-dir: ABORTING due to ERROR in sql_create.c:662
> rwl_writelock failure. ERR=Invalid argument
> 11-Jan 23:52 ensim-dir: Fatal Error because: Bacula interrupted by signal
> 11: Segmentation violation
>
> ---
>
> Using host libthread_db library "/lib/libthread_db.so.1".
> [Thread debugging using libthread_db enabled]
> [New Thread 16384 (LWP 28215)]
> [New Thread 32769 (LWP 28216)]
> [New Thread 16386 (LWP 28217)]
> [New Thread 32771 (LWP 28218)]
> [New Thread 98309 (LWP 28252)]
> [New Thread 114694 (LWP 28254)]
> 0xb7553946 in nanosleep () from /lib/i686/libpthread.so.0
> $1 = "ensim-dir", '\0' <repeats 20 times>
> $2 = 0x80c4d30 "bacula-dir"
> $3 = 0x80c5b10 "/opt/bacula/sbin/bacula-dir"
> $4 = "MySQL"
> $5 = 0x80b06d4 "1.38.3 (22 December 2005)"
> $6 = 0x80a86f1 "i686-pc-linux-gnu"
> $7 = 0x80a86ea "redhat"
> $8 = 0x80a86d7 "Enterprise release"
> #0  0xb7553946 in nanosleep () from /lib/i686/libpthread.so.0
>
> Thread 6 (Thread 114694 (LWP 28254)):
> #0  0xb755403b in waitpid () from /lib/i686/libpthread.so.0
> #1  0xb56fbf50 in ?? ()
> #2  0x00000001 in ?? ()
> #3  0x080937d4 in signal_handler (sig=11) at signal.c:159
> #4  0xb7552d96 in __pthread_sighandler () from /lib/i686/libpthread.so.0
> #5  <signal handler called>
> #6  e_msg(char const*, int, int, int, char const*, ...) (
>     file=0x80aab8d "sql_create.c", line=-1250966612, type=1, level=0,
>     fmt=0x80aaa18 "rwl_writelock failure. ERR=%s\n") at message.c:971
> #7  0x08079799 in _db_lock(char const*, int, B_DB*) (
>     file=0x80aab8d "sql_create.c", line=662, mdb=0x80d2a00) at berrno.h:69
> #8  0x0807bbca in db_create_file_attributes_record(JCR*, B_DB*, ATTR_DBR*)
> ( jcr=0x80d54c0, mdb=0x80d2a00, ar=0x80d6330) at sql_create.c:662 #9 
> 0x080798fe in db_end_transaction(JCR*, B_DB*) (jcr=0x80d54c0,
>     mdb=0x80d2a00) at sql.c:325
> #10 0x0805d7ea in msg_thread_cleanup (arg=0x80d54c0) at msgchan.c:291
> #11 0xb754bf28 in __pthread_perform_cleanup () from
> /lib/i686/libpthread.so.0 #12 0xb754c695 in __pthread_do_exit () from
> /lib/i686/libpthread.so.0 #13 0xb754f74b in pthread_handle_sigcancel ()
> from /lib/i686/libpthread.so.0 #14 <signal handler called>
> #15 0xb755318b in read () from /lib/i686/libpthread.so.0
> #16 0x00000004 in ?? ()
>
> Thread 5 (Thread 98309 (LWP 28252)):
> #0  0xb7553946 in nanosleep () from /lib/i686/libpthread.so.0
>
> Thread 4 (Thread 32771 (LWP 28218)):
> #0  0xb7553946 in nanosleep () from /lib/i686/libpthread.so.0
> #1  0x00000001 in ?? ()
> #2  0xb754fbb8 in __pthread_timedsuspend_new () from
> /lib/i686/libpthread.so.0 #3  0xb754c479 in pthread_cond_timedwait_relative
> ()
>    from /lib/i686/libpthread.so.0
> #4  0x08099c98 in watchdog_thread (arg=0x0) at watchdog.c:296
> #5  0xb754de51 in pthread_start_thread () from /lib/i686/libpthread.so.0
> #6  0xb72e406a in clone () from /lib/i686/libc.so.6
>
> Thread 3 (Thread 16386 (LWP 28217)):
> #0  0xb72dd621 in select () from /lib/i686/libc.so.6
> #1  0x00000006 in ?? ()
>
> Thread 2 (Thread 32769 (LWP 28216)):
> #0  0xb72db3aa in poll () from /lib/i686/libc.so.6
> #1  0xb754cd7e in __pthread_manager () from /lib/i686/libpthread.so.0
> #2  0xb72e406a in clone () from /lib/i686/libc.so.6
>
> Thread 1 (Thread 16384 (LWP 28215)):
> #0  0xb7553946 in nanosleep () from /lib/i686/libpthread.so.0
> #0  0xb7553946 in nanosleep () from /lib/i686/libpthread.so.0
> #0  0xb7553946 in nanosleep () from /lib/i686/libpthread.so.0
> No symbol table info available.
> #0  0x00000000 in ?? ()
> No symbol table info available.
> #0  0x00000000 in ?? ()
> No symbol table info available.
> #0  0x00000000 in ?? ()
> No symbol table info available.
> #0  0x00000000 in ?? ()
> No symbol table info available.
> #0  0x00000000 in ?? ()
> No symbol table info available.
> #0  0x00000000 in ?? ()
> No symbol table info available.
> #0  0x00000000 in ?? ()
> No symbol table info available.

-- 
Best regards,

Kern

  (">
  /\
  V_V


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to