Hi all.

I have a problem where every weekend (or more frequently) my storage daemon 
crashes.
The crash is random, but is happening either while running VirtualFull jobs or 
Copy jobs.
So far it hasn't crashed during regular incremental backups.

I am running version 3.0.3 of the Bacula software.

First of all I tried adding a '-d 200' to the arguments that start bacula-sd.
This produced a lot of messages, but nothing unusual that I can see prior to 
the crash.
The last few lines in this log look like so:

        vc-sd: mac.c:241-468 before write JobId=468 FI=363302 SessId=1 Strm=MD5 
len=16
        vc-sd: mac.c:241-468 before write JobId=468 FI=363303 SessId=1 
Strm=UATTR len=104
        vc-sd: mac.c:241-468 before write JobId=468 FI=363304 SessId=1 
Strm=UATTR len=122
        vc-sd: mac.c:241-468 before write JobId=468 FI=363305 SessId=1 
Strm=UATTR len=77
        vc-sd: mac.c:241-468 before write JobId=468 FI=363305 SessId=1 
Strm=DATA len=4496
        vc-sd: mac.c:241-468 before write JobId=468 FI=363305 SessId=1 Strm=MD5 
len=16

So next I have been trying to get the btraceback program running.

I am using Debian packages (self built based on the 3.0.2 Debian sources).
These run the storage daemon under the bacula:tape user:group.
So I modified the btraceback program to use sudo to run gdb.
I also configured sudo to allow the bacula user to do so without being prompted 
for a password.
I then modified the Debian sources so that packages with debugging symbols are 
produced.

If I become the bacula user and run a test like so:

        /usr/sbin/btraceback /usr/sbin/bacula-sd $PID

Where: $PID = the process ID of the bacula-sd process,
then I get an email showing debugging information.
So as far as I can tell the btraceback program should be working.

I had another crash of the storage daemon after making the changes and no email 
was sent.
Nor was a bacula-sd.9103.traceback file produced.
So I can't send any useful information to try and track down why the storage 
daemon is so unstable.

It was also unstable when using the 3.0.2 Debian package as well so I don't 
think it is my rebuild that is causing the issue.
Although I feel 3.0.3 is more stable than 3.0.2 was, I still can't get a 
complete weeks cycle working without a crash.

The /etc/init.d/bacula-sd script defines the PATH to be, 
PATH=/sbin:/bin:/usr/sbin:/usr/bin
So /usr/sbin is in the PATH and so I'd imagine the program should be able to 
find the traceback program.

Any ideas how I can get some useful information from the crash?

-- 
----------
Jim Barber
DDI Health

------------------------------------------------------------------------------
Join us December 9, 2009 for the Red Hat Virtual Experience,
a free event focused on virtualization and cloud computing. 
Attend in-depth sessions from your desk. Your couch. Anywhere.
http://p.sf.net/sfu/redhat-sfdev2dev
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to