Hi,

On 4/9/2007 10:23 PM, Michael Proto wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Sorry I'm so late in replying back to this thread, I've had a number of
> long-running full backups running on my Bacula 2.0.3 server that I
> didn't want to touch until completed.

I know the feeling...

> Now that they are done I have some
> time to continue investigating this issue.
> 
> Arno Lehmann wrote:
>> Hi,
>>
>> On 4/3/2007 12:04 AM, Michael Proto wrote:
> ...
>> I built and packaged all the Bacula Linux clients myself (so they all
>> pull from the same set of config files for quick installation), and I
>> used the following compile-time flags when building them:
>>
>> --with-openssl --enable-client-only --enable-static-fd --enable-smartalloc
>>
>> I'm using the static-bacula-fd binary (instead of the bacula-fd binary)
>>
>>> Have you checked that the binary is really completely static? Lat time I 
>>> tried I could not create a static binary of the FD, the one I could come 
>>> up with was created under a really old linux/gcc/glibc combination, and 
>>> lacked ACL support.
>>> If your binary pulls in shared objects, version differences there might 
>>> be related to the problem.
>>> Of course, trying the heartbeat interval setting will be the most useful 
>>> first step. I'm can't prove that, but I have the impression that Bacula 
>>> reacts more sensitive to network problems recently. Might be because 
>>> it's more efficient, and so leaves the TCP connections idling longer, or 
>>> something...
> 
> Regarding static client binaries, when building the client with the
> "--enable-client-only --enable-static-fd" flags, 2 binaries were
> produced, bacula-fd and static-bacula-fd. Running file and ldd on the
> static binary do seem to indicate that it is compiled statically:
> 
> [EMAIL PROTECTED] ~]# file /sbin/static-bacula-fd
> /sbin/static-bacula-fd: ELF 32-bit LSB executable, Intel 80386, version
> 1 (SYSV), for GNU/Linux 2.2.0, statically linked, stripped

This looks good. Better than what I experienced.

> [EMAIL PROTECTED] ~]# ldd /sbin/static-bacula-fd
>         not a dynamic executable
> 
> Even stranger, this error also occurs intermittently on the bacula-fd
> program running on my Bacula server, which was not compiled staticly.

Ok, so I think we can rule out a version mismatch between OS/kernel, 
system libraries, and user libraries.

> 
> On the network/heartbeat issue:
> The strange thing is, when I initiate a backup job against an affected
> client, it fails immediately, before sending much of anything to the SD
> with the "Fatal error: Socket error on Storage command: ERR=No data
> available" error. Could that really be related to heartbeat?

It does not sound like that.

> In any case, I've added the following to the bacula-sd.conf:
> 
>   Heartbeat Interval = 60
> 
> And I'll try adding the same interval to a few affected clients today
> and see if that helps.

Watch out when you are prompted for tape changes - I experienced some 
problems (not yet reported as a bug, but soon - hopefully).

> I'll also see if I can get a valid tcpdump of the client and server
> communications to see exactly what sorts of packets are being sent when
> this failure occurs.

Yep, that's I'd try, too, although this is tricky... you run tcpdump for 
days, produce tons of output, and no error occurs... and then, when the 
error occors, it's on another machine, your tcpdump crashed because the 
file system for the log ws full, or some i**** killed it because he 
thought it was a forgotten process :-)

If you get a usable dump, you've still got to read and understand it, 
which can be new challenge...

> Somewhat difficult, as the failure is intermittent
> at best but its happening on enough hosts that I might be able to get
> some valid data.

Good luck!

>> for maximum portability. They were built on a Debian Sarge host and then
>> packaged into appropriate distribution packages.
>>
>> On one of the often-affected hosts I now have the client started with
>> the following flags (out of /etc/inittab):
>>
>> /sbin/static-bacula-fd -fvc -d100 /etc/bacula/bacula-fd.conf
>>>>> /tmp/bacula-fd.out
>> When the client fails, I see the modification timestamp update on the
>> resultant /tmp/bacula-fd.out file, but its currently empty. Do I need to
>> redirect stderr to this file instead of stdout?
>>
>>> It wouldn't do any harm redirecting *both* :-)
>> Anyone have any ideas what might be causing these errors or how I can go
>> about debugging this unusual (and while not critical, still very
>> annoying) problem?
>>
>>> Check your system logs... I found some cases where Baculas components 
>>> were killed because the used more memory than was healthy... the DIR 
>>> mainly, but I have seen cases where I suspect the SD... no details 
>>> available, unfortunately.
>>> Arno
> 
> I haven't seen anything in the standard system logs on the client or
> server when the failure occurs, but I'll try the stderr redirect on a
> few clients and see if anything is showing-up there.

Anything useful, yet?

> Thanks for the tips guys!

I hope you finally analyze this... this sort of problems is the most 
challenging one, and I've had my share of it recently :-)

Arno

> 
> - -Proto
> - --
> Michael Proto            | SecureWorks
> Unix Administrator       |
> PGP ID: 5D575BBE         | [EMAIL PROTECTED]
> *******************************************************
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.7 (FreeBSD)
> 
> iD8DBQFGGqCtOLq/wl1XW74RAhXXAJ9PfwwKxWY6UiQEm3yccOE5CgLSpQCeMedy
> WrikI6eqOZt6gD9PDLvVgN0=
> =GIqr
> -----END PGP SIGNATURE-----

-- 
IT-Service Lehmann                    [EMAIL PROTECTED]
Arno Lehmann                  http://www.its-lehmann.de

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to