Chris,
look in the amandad debug files on the client and the dumper debug file
on the server.
Jean-Louis
On 05/09/2013 03:06 PM, Chris Hoogendyk wrote:
On one of my older setups that is running Amanda 2.5.1p3, I'm getting
patterns of failures that I can't make sense of. Some days everything
works just fine. Other days I get everything from a couple of DLEs
failing to a half dozen. They are typically the same three servers,
which are in another building. However, there is a server that has the
largest DLEs in that same building that does not exhibit failures.
There are also several servers in the same building as the Amanda
backup server that don't show any failures. I even made a spreadsheet
that shows a 0, 1 or 2 for backup levels of successes and a red x for
failures. Can't see any pattern.
I've looked at a bunch of things and pored over the log files to no
avail.
The errors show up in the Amanda reports as:
anise.nsm.umass.edu /home lev 1 FAILED [cannot read
header: got 0 instead of 32768]
metzi1.physics.umass.edu /data lev 1 FAILED [cannot read
header: got 0 instead of 32768]
anise.nsm.umass.edu /home lev 1 FAILED [cannot read
header: got 0 instead of 32768]
anise.nsm.umass.edu /home lev 1 FAILED [too many dumper
retry: "[request failed: timeout waiting for ACK]"]
metzi1.physics.umass.edu /data lev 1 FAILED [too many dumper
retry: "[request failed: timeout waiting for ACK]"]
metzi1.physics.umass.edu /data lev 1 FAILED [cannot read
header: got 0 instead of 32768]
The interesting thing is that if I go to metzi1, into
/tmp/amanda/client/daily/ and do an `ls *0508*` I can see the debug
logs from last night. If I do a `grep '/data' *0508*` I can see any
entries that mention the DLE /data. I see no instances of sendbackup.
I see only those runtar debug files that correspond to the size
estimates for 0, 1 and 2 level backups. There is no runtar that would
correspond to an actual dump.
If I do the same thing with `grep '/home' *0508*` (the DLE /home was
successfully backed up), then I see all the runtar debug files for the
estimates as well as a runtar debug file for the actual backup. I also
see several lines in the sendbackup debug file for /home.
I've also looked through /var/log/syslog, /var/log/auth.log, etc. on
the client (which is Ubuntu 12.04 LTS), and I've looked through Amanda
debug logs, /var/adm/messages, /var/adm/authlog, etc. on the server
(which is Solaris 10). I don't see any logged errors for dropped
connections or failures of any sort. The Amanda logs just don't
mention /data on metzi1. A couple of the other servers that are being
backed up are Ubuntu 12.04 and several are Ubuntu 10.04. None of them
have been tweeked for sshd_config. All have tcpkeepalive turned on.
I tried bumping up the timeouts in amanda.conf (by a factor of 5).
That seems a bit much, and it didn't seem to make any difference.
What should I be looking for? Where would Amanda log what is going on?
(Or, why would it not be logging it?)
Thank you,