Re: [Bacula-users] Crashing storage director. Need help getting trace.
> On Mon, 14 Dec 2009 16:47:00 +0800, Jim Barber said: > > Jim Barber wrote: > > > > Thanks Martin. > > > > I've compiled and installed version 3.1.6 from a git pull I did on 10th Dec. > > I'm not sure if this new version will crash or not. > > But I've manually attached a gdb session to it just in case it does. > > > > Thanks. > > I'm not having much luck with this. > When I attached to the process with gdb it seems to interfere with it. > It's like to stops running. > It no longer responds to status commands etc. > > I'm not familiar enough with gdb to resolve it. > I tried the 'c'ontinue command just in case attaching stops the process. > But it doesn't make any difference. Yes, you do need to use the continue command after attaching, but that should work. Possibly your version of gdb is broken, which might also explain the lack of email from btraceback. Did gdb print anything after you did that? It may be worth posting the whole gdb session. __Martin -- Return on Information: Google Enterprise Search pays you back Get the facts. http://p.sf.net/sfu/google-dev2dev ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Crashing storage director. Need help getting trace.
Jim Barber wrote: > > Thanks Martin. > > I've compiled and installed version 3.1.6 from a git pull I did on 10th Dec. > I'm not sure if this new version will crash or not. > But I've manually attached a gdb session to it just in case it does. > > Thanks. I'm not having much luck with this. When I attached to the process with gdb it seems to interfere with it. It's like to stops running. It no longer responds to status commands etc. I'm not familiar enough with gdb to resolve it. I tried the 'c'ontinue command just in case attaching stops the process. But it doesn't make any difference. Regards, -- Jim Barber DDI Health -- Return on Information: Google Enterprise Search pays you back Get the facts. http://p.sf.net/sfu/google-dev2dev ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Crashing storage director. Need help getting trace.
Martin Simmons wrote: > > Try doing it interactively by attaching gdb to the bacula-sd process before it > crashes (run gdb /path/to/bacula-sd and then use gdb's attach command). Then > use the commands in btraceback.gdb when it crashes. > > __Martin Thanks Martin. I've compiled and installed version 3.1.6 from a git pull I did on 10th Dec. I'm not sure if this new version will crash or not. But I've manually attached a gdb session to it just in case it does. Thanks. -- Jim Barber DDI Health -- Return on Information: Google Enterprise Search pays you back Get the facts. http://p.sf.net/sfu/google-dev2dev ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Crashing storage director. Need help getting trace.
> On Mon, 07 Dec 2009 14:30:41 +0800, Jim Barber said: > > Hi all. > > I have a problem where every weekend (or more frequently) my storage daemon > crashes. > The crash is random, but is happening either while running VirtualFull jobs > or Copy jobs. > So far it hasn't crashed during regular incremental backups. > > I am running version 3.0.3 of the Bacula software. > > First of all I tried adding a '-d 200' to the arguments that start bacula-sd. > This produced a lot of messages, but nothing unusual that I can see prior to > the crash. > The last few lines in this log look like so: > > vc-sd: mac.c:241-468 before write JobId=468 FI=363302 SessId=1 Strm=MD5 > len=16 > vc-sd: mac.c:241-468 before write JobId=468 FI=363303 SessId=1 > Strm=UATTR len=104 > vc-sd: mac.c:241-468 before write JobId=468 FI=363304 SessId=1 > Strm=UATTR len=122 > vc-sd: mac.c:241-468 before write JobId=468 FI=363305 SessId=1 > Strm=UATTR len=77 > vc-sd: mac.c:241-468 before write JobId=468 FI=363305 SessId=1 > Strm=DATA len=4496 > vc-sd: mac.c:241-468 before write JobId=468 FI=363305 SessId=1 Strm=MD5 > len=16 > > So next I have been trying to get the btraceback program running. > > I am using Debian packages (self built based on the 3.0.2 Debian sources). > These run the storage daemon under the bacula:tape user:group. > So I modified the btraceback program to use sudo to run gdb. > I also configured sudo to allow the bacula user to do so without being > prompted for a password. > I then modified the Debian sources so that packages with debugging symbols > are produced. > > If I become the bacula user and run a test like so: > > /usr/sbin/btraceback /usr/sbin/bacula-sd $PID > > Where: $PID = the process ID of the bacula-sd process, > then I get an email showing debugging information. > So as far as I can tell the btraceback program should be working. > > I had another crash of the storage daemon after making the changes and no > email was sent. > Nor was a bacula-sd.9103.traceback file produced. > So I can't send any useful information to try and track down why the storage > daemon is so unstable. > > It was also unstable when using the 3.0.2 Debian package as well so I don't > think it is my rebuild that is causing the issue. > Although I feel 3.0.3 is more stable than 3.0.2 was, I still can't get a > complete weeks cycle working without a crash. > > The /etc/init.d/bacula-sd script defines the PATH to be, > PATH=/sbin:/bin:/usr/sbin:/usr/bin > So /usr/sbin is in the PATH and so I'd imagine the program should be able to > find the traceback program. > > Any ideas how I can get some useful information from the crash? Try doing it interactively by attaching gdb to the bacula-sd process before it crashes (run gdb /path/to/bacula-sd and then use gdb's attach command). Then use the commands in btraceback.gdb when it crashes. __Martin -- Return on Information: Google Enterprise Search pays you back Get the facts. http://p.sf.net/sfu/google-dev2dev ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] Crashing storage director. Need help getting trace.
Hi all. I have a problem where every weekend (or more frequently) my storage daemon crashes. The crash is random, but is happening either while running VirtualFull jobs or Copy jobs. So far it hasn't crashed during regular incremental backups. I am running version 3.0.3 of the Bacula software. First of all I tried adding a '-d 200' to the arguments that start bacula-sd. This produced a lot of messages, but nothing unusual that I can see prior to the crash. The last few lines in this log look like so: vc-sd: mac.c:241-468 before write JobId=468 FI=363302 SessId=1 Strm=MD5 len=16 vc-sd: mac.c:241-468 before write JobId=468 FI=363303 SessId=1 Strm=UATTR len=104 vc-sd: mac.c:241-468 before write JobId=468 FI=363304 SessId=1 Strm=UATTR len=122 vc-sd: mac.c:241-468 before write JobId=468 FI=363305 SessId=1 Strm=UATTR len=77 vc-sd: mac.c:241-468 before write JobId=468 FI=363305 SessId=1 Strm=DATA len=4496 vc-sd: mac.c:241-468 before write JobId=468 FI=363305 SessId=1 Strm=MD5 len=16 So next I have been trying to get the btraceback program running. I am using Debian packages (self built based on the 3.0.2 Debian sources). These run the storage daemon under the bacula:tape user:group. So I modified the btraceback program to use sudo to run gdb. I also configured sudo to allow the bacula user to do so without being prompted for a password. I then modified the Debian sources so that packages with debugging symbols are produced. If I become the bacula user and run a test like so: /usr/sbin/btraceback /usr/sbin/bacula-sd $PID Where: $PID = the process ID of the bacula-sd process, then I get an email showing debugging information. So as far as I can tell the btraceback program should be working. I had another crash of the storage daemon after making the changes and no email was sent. Nor was a bacula-sd.9103.traceback file produced. So I can't send any useful information to try and track down why the storage daemon is so unstable. It was also unstable when using the 3.0.2 Debian package as well so I don't think it is my rebuild that is causing the issue. Although I feel 3.0.3 is more stable than 3.0.2 was, I still can't get a complete weeks cycle working without a crash. The /etc/init.d/bacula-sd script defines the PATH to be, PATH=/sbin:/bin:/usr/sbin:/usr/bin So /usr/sbin is in the PATH and so I'd imagine the program should be able to find the traceback program. Any ideas how I can get some useful information from the crash? -- -- Jim Barber DDI Health -- Join us December 9, 2009 for the Red Hat Virtual Experience, a free event focused on virtualization and cloud computing. Attend in-depth sessions from your desk. Your couch. Anywhere. http://p.sf.net/sfu/redhat-sfdev2dev ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users