There is something strange about the way selfcheck timeouts are generated.
I've set ctimeout to 120 in amanda.conf on the server, rebooted all the
hosts, and run amcheck multiple times. Selfcheck requests still timeout
sporadically and clearly earlier than ctimeout. There are lower level
timeouts logged by amandad on the server and on the client. Do I need to
tune something in the lower level protocols?

Here is an amandad.*.debug (edited down) from the server when a client
timed out:

amandad: debug 1 pid 2294 ruid 33224 euid 33224 start time Wed Jun  5
10:42:31 2002
amandad: version 2.4.2p2
[...snip...]
amandad: dgram_recv: timeout after 30 seconds
amandad: error receiving message: timeout
error receiving message: timeout
amandad: pid 2294 finish time Wed Jun  5 10:43:01 2002


Problems are logged wrt dgram_recv at the client too, sometimes overcome
but sometimes leading to "giving up", even when the client is the server
machine itself. Here is a (slightly condensed) amandad.*.debug from a
timed out client:

amandad: debug 1 pid 1143 ruid 33224 euid 33224 start time Wed Jun  5 10:41:56 2002
[...snip...]
amandad: version 2.4.2p2
got packet:
--------
Amanda 2.4 REQ HANDLE 006-10068D10 SEQ 1023288140
SECURITY USER amanda
SERVICE selfcheck
OPTIONS ;
GNUTAR /smith 0 OPTIONS 
|;bsd-auth;compress-fast;index;exclude-list=/usr/local/lib/amanda/exclude.gtar;
[...snip out various other options...]
GNUTAR /dev/hda2 0 OPTIONS 
|;bsd-auth;compress-fast;index;exclude-list=/usr/local/lib/amanda/exclude.gtar;
--------

sending ack:
----
Amanda 2.4 ACK HANDLE 006-10068D10 SEQ 1023288140
----

bsd security: remote host raleigh.cpt.afip.org user amanda local user amanda
amandahosts security check passed
amandad: running service "/usr/local/amanda_client/libexec/selfcheck"
amandad: sending REP packet:
----
Amanda 2.4 REP HANDLE 006-10068D10 SEQ 1023288140
OPTIONS ;
OK /smith
[...snip out lot of other OK's...]
OK /etc has more than 64 KB available.
----

amandad: dgram_recv: timeout after 10 seconds
amandad: waiting for ack: timeout, retrying
amandad: got packet:
----
Amanda 2.4 REQ HANDLE 006-10068CA0 SEQ 1023288156
SECURITY USER amanda
SERVICE selfcheck
OPTIONS ;
GNUTAR /smith 0 OPTIONS 
|;bsd-auth;compress-fast;index;exclude-list=/usr/local/lib/amanda/exclude.gtar;
[...snip out options...]
GNUTAR /dev/hda2 0 OPTIONS 
|;bsd-auth;compress-fast;index;exclude-list=/usr/local/lib/amanda/exclude.gtar;
----

amandad: weird, it's not a proper ack
  addr: peer 10.20.30.55 dup 10.20.30.55, port: peer 559 dup 582
amandad: dgram_recv: recvfrom() failed: Connection refused
amandad: waiting for ack: Connection refused, retrying
amandad: dgram_recv: recvfrom() failed: Connection refused
amandad: waiting for ack: Connection refused, retrying
amandad: dgram_recv: recvfrom() failed: Connection refused
amandad: waiting for ack: Connection refused, retrying
amandad: dgram_recv: recvfrom() failed: Connection refused
amandad: waiting for ack: Connection refused, giving up!
amandad: pid 1143 finish time Wed Jun  5 10:42:27 2002




Somewhat condensed results from multiple runs of amcheck are given below,
but I doubt that these are relevant -- it seems the problem is at a lower
level than amcheck reports. Clearly one client (optimas3) times out most
often, though it succeeded on the first five checks. Two other clients
"come and go" with timeouts. One of these (raleigh) is the server itself.
Other clients are occasionally affected (in other checks).

Still wondering why clients come and go on the failure list. Especially
wondering why so fast on the timeouts? I don't see anything it the
selfcheck.*.debug logs on affected hosts (looks like no such file
is created for the amcheck instances that time out). Thanks.

Robert L. Becker, Jr.
Col, USAF, MC
Department of Cellular Pathology
Armed Forces Institute of Pathology
Washington, DC 20306-6000
301-319-0300


raleigh 23% amcheck -c Daily ; amcheck -c Daily ; amcheck -c Daily ; amcheck -c Daily 
; amcheck -c Daily ; amcheck -c Daily

Amanda Backup Client Hosts Check
--------------------------------
Client check: 7 hosts checked in 10.034 seconds, 0 problems found

(brought to you by Amanda 2.4.2p2)

[snip... 4 more sets of output, all in =<31sec, all with 0 problems
found...]

Amanda Backup Client Hosts Check
--------------------------------
WARNING: optimas3.cpt.afip.org: selfcheck request timed out.  Host down?
Client check: 7 hosts checked in 29.644 seconds, 1 problem found

(brought to you by Amanda 2.4.2p2)

[...ok, run it again...]

raleigh 24% !!
amcheck -c Daily ; amcheck -c Daily ; amcheck -c Daily ; amcheck -c Daily ; amcheck -c 
Daily ; amcheck -c Daily

Amanda Backup Client Hosts Check
--------------------------------
WARNING: charlotte.cpt.afip.org: selfcheck request timed out.  Host down?
WARNING: optimas3.cpt.afip.org: selfcheck request timed out.  Host down?
Client check: 7 hosts checked in 30.170 seconds, 2 problems found

(brought to you by Amanda 2.4.2p2)

Amanda Backup Client Hosts Check
--------------------------------
WARNING: optimas3.cpt.afip.org: selfcheck request timed out.  Host down?
Client check: 7 hosts checked in 29.743 seconds, 1 problem found

(brought to you by Amanda 2.4.2p2)

Amanda Backup Client Hosts Check
--------------------------------
WARNING: raleigh.cpt.afip.org: selfcheck request timed out.  Host down?
WARNING: optimas3.cpt.afip.org: selfcheck request timed out.  Host down?
Client check: 7 hosts checked in 30.097 seconds, 2 problems found

(brought to you by Amanda 2.4.2p2)

Amanda Backup Client Hosts Check
--------------------------------
NFS version 3 mount failed, trying NFS version 2
WARNING: optimas3.cpt.afip.org: selfcheck request timed out.  Host down?
Client check: 7 hosts checked in 29.652 seconds, 1 problem found

(brought to you by Amanda 2.4.2p2)

[...snip out two more sets of output, both flagging optimas3 timeout;
logout, login again and test once more...]

raleigh 112% exit
raleigh 113% logout
Connection closed.
wesser 12% rsh raleigh
Password:
You have mail.
raleigh 1% amcheck -c Daily ; amcheck -c Daily ; amcheck -c Daily ; amcheck -c Daily ; 
amcheck -c Daily ; amcheck -c Daily

Amanda Backup Client Hosts Check
--------------------------------
WARNING: raleigh.cpt.afip.org: selfcheck request timed out.  Host down?
WARNING: optimas3.cpt.afip.org: selfcheck request timed out.  Host down?
Client check: 7 hosts checked in 30.215 seconds, 2 problems found

(brought to you by Amanda 2.4.2p2)

Amanda Backup Client Hosts Check
--------------------------------
WARNING: optimas3.cpt.afip.org: selfcheck request timed out.  Host down?
Client check: 7 hosts checked in 30.037 seconds, 1 problem found

(brought to you by Amanda 2.4.2p2)

[...snip out two more sets of output flagging optimas3 only...]

Amanda Backup Client Hosts Check
--------------------------------
WARNING: charlotte.cpt.afip.org: selfcheck request timed out.  Host down?
WARNING: optimas3.cpt.afip.org: selfcheck request timed out.  Host down?
Client check: 7 hosts checked in 30.016 seconds, 2 problems found

(brought to you by Amanda 2.4.2p2)

Amanda Backup Client Hosts Check
--------------------------------
WARNING: optimas3.cpt.afip.org: selfcheck request timed out.  Host down?
Client check: 7 hosts checked in 31.370 seconds, 1 problem found

(brought to you by Amanda 2.4.2p2)
raleigh 2%







Reply via email to