Am 18.03.2011 10:40, schrieb Christian Manal:
> Am 16.03.2011 09:14, schrieb Christian Manal:
>> Am 15.03.2011 19:12, schrieb Christian Manal:
>>> Am 15.03.2011 17:49, schrieb Kjetil Torgrim Homme:
>>>> Christian Manal <moen...@informatik.uni-bremen.de> writes:
>>>>
>>>>> Also, after several accurate jobs running without restarting Bacula,
>>>>> the total memory usage of the director and fd didn't go up anymore, so
>>>>> I presume it comes down to the behavior of Solaris' free(), as
>>>>> described in the above quoted manpage.
>>>>
>>>> libumem may work better -- just set LD_PRELOAD, you don't have to
>>>> recompile.  I'd appreciate it if you report back if you try it.
>>>>
>>>
>>> Actually, I already did that. Modified the startup script for the
>>> affected fd (don't want the director crashing if things go wrong) and
>>> restarted. I will report the results tomorrow.
>>
>> Looks good. 
> 
> Maybe I spoke too soon. Last night my director crashed with a segfault,
> after switching to libumem. Leading to that was an unusually long
> running job (the accurate one) which, going by the size, looked like it
> was doing a full instead of incremental for some reason.
> 
> I have some output from mdb and pstack attached.

And going by dbx, the dir went kaboom in Jmsg().

Regards,
Christian Manal
Reading bacula-dir
core file header read successfully
Reading ld.so.1
Reading libumem.so.1
Reading libintl.so.8.1.1
Reading libbacfind-5.0.3.so
Reading libbacsql-5.0.3.so
Reading libbacpy-5.0.3.so
Reading libbaccfg-5.0.3.so
Reading libbac-5.0.3.so
Reading libz.so.1.2.5
Reading libstdc++.so.6.0.10
Reading libpython2.4.so.1.0
Reading libpq.so.5.1
Reading libpthread.so.1
Reading libnsl.so.1
Reading libsocket.so.1
Reading libxnet.so.1
Reading libresolv.so.2
Reading librt.so.1
Reading libssl.so.0.9.8
Reading libcrypto.so.0.9.8
Reading libm.so.2
Reading libgcc_s.so.1
Reading libc.so.1
Reading libiconv.so.2.5.0
Reading libm.so.1
Reading libdl.so.1
Reading libssl.so.0.9.7
Reading libcrypto.so.0.9.7
Reading libgss.so.1
Reading libaio.so.1
Reading libmd.so.1
Reading libcrypto.so.0.9.8
Reading libcmd.so.1
Reading libcrypto_extra.so.0.9.7
t@489 (l@489) terminated by signal SEGV (no mapping at the fault address)
0xfee6a580: Jmsg+0x04c9:        movb     $0x00000000,(%eax)
(dbx) where                                                                  
current thread: t@489
=>[1] Jmsg(0xbefe5be0, 0x1, 0x0, 0x0, 0xfee8e25e, 0xf6caddb0), at 0xfee6a580 
  [2] j_msg(0x80c360e, 0x154, 0xbefe5be0, 0x1, 0x0, 0x0), at 0xfee6a7ad 
  [3] start_storage_daemon_message_thread(0xbefe5be0, 0x80bc7f5, 0xfdc7f960, 
0x0, 0x80bc798, 0xfde8fe6c), at 0x80834bc 
  [4] do_backup(0xbefe5be0, 0x4, 0x0, 0xfdf91200, 0xfeea26e4, 0xfdf91200), at 
0x80658b0 
  [5] _ZL10job_threadPv(0xbefe5be0, 0x1, 0xfe7c0dc7, 0xfe8422cc, 0xfe8422c0, 
0xfdf91200), at 0x807a96e 
  [6] jobq_server(0x80e5080), at 0x807d127 
  [7] _thr_setup(0xfdf91200), at 0xfe7c7e66 
  [8] _lwp_start(0xfee8e708, 0x0, 0x0, 0xfde8ea00, 0x7, 0x0), at 0xfe7c8150 
(dbx) threads
      t@1  a  l@1   ?()   sleep on 0xfeea33cc  in  __lwp_park() 
      t@2  a  l@2   connect_thread()   LWP suspended in  __pollsys() 
      t@3  a  l@3   watchdog_thread()   sleep on 0xfeea39f8  in  __lwp_park() 
      t@6  b  l@6   umem_update_thread()   sleep on (unknown) in  __lwp_park() 
    t@161  a l@161   jobq_server()   sleep on 0xbefea704  in  __lwp_park() 
    t@171  a l@171   jobq_server()   sleep on 0xfe165e44  in  __lwp_park() 
    t@203  a l@203   msg_thread()   LWP suspended in  __lwp_park() 
    t@222  a l@222   jobq_server()   LWP suspended in  _waitid() 
o>  t@489  a l@489   jobq_server()   signal SIGSEGV in  Jmsg() 
    t@490  a l@490   jobq_server()   LWP suspended in  _read() 
    t@491  a l@491   jobq_server()   sleep on 0xfe133828  in  __lwp_park() 
    t@492  a l@492   msg_thread()   sleep on 0xfe133828  in  __lwp_park() 
    t@494  a l@494   jobq_server()   LWP suspended in  _read() 
    t@495  a l@495   jobq_server()   LWP suspended in  _read() 
    t@496  a l@496   jobq_server()   LWP suspended in  _read() 
    t@497  a l@497   jobq_server()   LWP suspended in  _read() 
    t@589  a l@589   msg_thread()   LWP suspended in  _read() 
    t@590  a l@590   msg_thread()   LWP suspended in  _read() 
------------------------------------------------------------------------------
Colocation vs. Managed Hosting
A question and answer guide to determining the best fit
for your organization - today and in the future.
http://p.sf.net/sfu/internap-sfd2d
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to