Re: Dumps Fail - Estimate Timeout errors

2005-05-18 Thread Paul Bijnens
Tom Brown wrote:
actually digging around in /tmp/amanda i have come accross files with 
amandad.DATE.debug and a few of them apper to end like this

amandad: time 111.846: dgram_recv: timeout after 10 seconds
amandad: time 111.846: waiting for ack: timeout, retrying
amandad: time 121.846: dgram_recv: timeout after 10 seconds
amandad: time 121.847: waiting for ack: timeout, retrying
amandad: time 131.847: dgram_recv: timeout after 10 seconds
amandad: time 131.847: waiting for ack: timeout, retrying
amandad: time 141.847: dgram_recv: timeout after 10 seconds
amandad: time 141.848: waiting for ack: timeout, retrying
amandad: time 151.848: dgram_recv: timeout after 10 seconds
amandad: time 151.848: waiting for ack: timeout, giving up!
amandad: time 151.848: pid 17383 finish time Wed May 18 00:56:58 2005
what would cause that and i presume this could be the cause of the failure
Maybe this a problem in some firewall settings that forbids reverse
traffic, or expires the udp-reply after less than 101 seconds.
--
Paul Bijnens, XplanationTel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax  +32 16 397.512
http://www.xplanation.com/  email:  [EMAIL PROTECTED]
***
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, F6, *
* quit,  ZZ, :q, :q!,  M-Z, ^X^C,  logoff, logout, close, bye,  /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* kill -9 1,  Alt-F4,  Ctrl-Alt-Del,  AltGr-NumLock,  Stop-A,  ...*
* ...  "Are you sure?"  ...   YES   ...   Phew ...   I'm out  *
***


Re: Dumps Fail - Estimate Timeout errors

2005-05-18 Thread Tom Brown
Have a look on that client in /tmp/amanda, look for the files
sendsize.DATETIME.debug and see how long the estimate did take.
The first line of the file is the start time and the last line is the
finish time.  How long did it really take?  You many have to
change the "etimeout" parameter in amanda.conf.
If there is no "finish time" line, then the estimate crashed, and
probably there is an error message in that file too.
actually digging around in /tmp/amanda i have come accross files with 
amandad.DATE.debug and a few of them apper to end like this

amandad: time 111.846: dgram_recv: timeout after 10 seconds
amandad: time 111.846: waiting for ack: timeout, retrying
amandad: time 121.846: dgram_recv: timeout after 10 seconds
amandad: time 121.847: waiting for ack: timeout, retrying
amandad: time 131.847: dgram_recv: timeout after 10 seconds
amandad: time 131.847: waiting for ack: timeout, retrying
amandad: time 141.847: dgram_recv: timeout after 10 seconds
amandad: time 141.848: waiting for ack: timeout, retrying
amandad: time 151.848: dgram_recv: timeout after 10 seconds
amandad: time 151.848: waiting for ack: timeout, giving up!
amandad: time 151.848: pid 17383 finish time Wed May 18 00:56:58 2005
what would cause that and i presume this could be the cause of the failure
thanks 




Re: Dumps Fail - Estimate Timeout errors

2005-05-18 Thread Tom Brown
Have a look on that client in /tmp/amanda, look for the files
sendsize.DATETIME.debug and see how long the estimate did take.
The first line of the file is the start time and the last line is the
finish time.  How long did it really take?  You many have to
change the "etimeout" parameter in amanda.conf.
If there is no "finish time" line, then the estimate crashed, and
probably there is an error message in that file too.
Hi
Please see pasted below the 3 entries that applt to last night - everything 
appears OK in there it seems. If the sendsize is not crashing what else 
could cause this?

thanks
sendsize: debug 1 pid 16820 ruid 11 euid 11: start at Wed May 18 00:29:27 
2005
sendsize: version 2.4.4p1

sendsize[16820]: time 118.545: child 16874 terminated normally
sendsize: time 118.545: pid 16820 finish time Wed May 18 00:31:25 2005

sendsize: debug 1 pid 17384 ruid 11 euid 11: start at Wed May 18 00:54:26 
2005
sendsize: version 2.4.4p1

sendsize[17384]: time 101.841: child 17418 terminated normally
sendsize: time 101.841: pid 17384 finish time Wed May 18 00:56:08 2005

sendsize: debug 1 pid 17795 ruid 11 euid 11: start at Wed May 18 01:19:26 
2005
sendsize: version 2.4.4p1

sendsize[17795]: time 106.417: child 17842 terminated normally
sendsize: time 106.417: pid 17795 finish time Wed May 18 01:21:12 2005 




Re: Dumps Fail - Estimate Timeout errors

2005-05-18 Thread Paul Bijnens
Tom Brown wrote:
Failure errors are
hostname/dev/rd/c0d0p3 lev 0 FAILED [Estimate timeout from hostname]
...
Does anyone know how to debug these timeout type issues as i have been 
using amanda for about 3 years now and have not encountered this before.
Have a look on that client in /tmp/amanda, look for the files
sendsize.DATETIME.debug and see how long the estimate did take.
The first line of the file is the start time and the last line is the
finish time.  How long did it really take?  You many have to
change the "etimeout" parameter in amanda.conf.
If there is no "finish time" line, then the estimate crashed, and
probably there is an error message in that file too.
--
Paul Bijnens, XplanationTel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax  +32 16 397.512
http://www.xplanation.com/  email:  [EMAIL PROTECTED]
***
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, F6, *
* quit,  ZZ, :q, :q!,  M-Z, ^X^C,  logoff, logout, close, bye,  /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* kill -9 1,  Alt-F4,  Ctrl-Alt-Del,  AltGr-NumLock,  Stop-A,  ...*
* ...  "Are you sure?"  ...   YES   ...   Phew ...   I'm out  *
***



Dumps Fail - Estimate Timeout errors

2005-05-18 Thread Tom Brown
Hi
Clients are all RH 7.3 or WhiteBox respin 2
Server is 2.4.4p4 running on whitebox respin 2
My amchecks un fine and without issue however i have come in on 2 
morning snow to find that some of the clients failed. The actual fails 
have occurred on different clients, ie some that failed 2 nights ago 
worked last night without changes, and i can't figure out why. All 
clients were working fine at a different site as we move idc over the 
weekend so we have new network architecture.

Failure errors are
hostname/dev/rd/c0d0p3 lev 0 FAILED [Estimate timeout from hostname]
hostname/dev/rd/c0d0p1 lev 0 FAILED [Estimate timeout from hostname]
anotherhostname /dev/rd/c0d0p2 lev 0 FAILED [Estimate timeout from 
anotherhostname]
anotherhostname /dev/rd/c0d0p5 lev 0 FAILED [Estimate timeout from 
anotherhostname]

There is nothing in the firewall log to indicate a drop of packet. I 
have, as a test actually allowed any ports between these networks, but 
it has not helped.

Does anyone know how to debug these timeout type issues as i have been 
using amanda for about 3 years now and have not encountered this before.

thanks