Re: Problem backing up just a few machines

2004-05-27 Thread Paul Bijnens
Henson, George Mr JMLFDC wrote:
logfun02   / lev 0 FAILED [Request to logfun02 timed out.]
Then the next day I had two hosts with the same error. In trying to fix the
issue I found if I removed the curinfo directories for these two hosts, the
backups would run the next time amdump was called. It is always the same 
two hosts which fail like this.

After reviewing the log files, I see the client does not report it received
the sendsize service request.
Increase etimeout.  (probably)

Why does deleting the curinfo directories correct the problem?
In that case amanda thinks that DLE is completely new, and schedules
a level 0.  It does not bother to estimate how much data incremental
level 1 would be.
The next day(s) the planner needs estimates for the level 0, the last 
incremental level, and the lastincr+1. Doing 2 or 3 estimates for each
DLE on that host takes longer than only level 0...

Just speculating of course.  Seeing logfiles (sendsize.XXX.debug on the 
client, and amdump.X on the server) and config files would help.

--
Paul Bijnens, XplanationTel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax  +32 16 397.512
http://www.xplanation.com/  email:  [EMAIL PROTECTED]
***
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, F6, *
* quit,  ZZ, :q, :q!,  M-Z, ^X^C,  logoff, logout, close, bye,  /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* kill -9 1,  Alt-F4,  Ctrl-Alt-Del,  AltGr-NumLock,  Stop-A,  ...*
* ...  Are you sure?  ...   YES   ...   Phew ...   I'm out  *
***



RE: Problem backing up just a few machines

2004-05-27 Thread Henson, George Mr JMLFDC
Title: RE: Problem backing up just a few machines






 Henson, George Mr JMLFDC wrote:
 
  logfun02 / lev 0 FAILED [Request to logfun02 timed out.]
  
  Then the next day I had two hosts with the same error. In 
  trying to fix the issue I found if I removed the curinfo directories
  for these two hosts, the backups would run the next time amdump was
  called. It is always the same two hosts which fail like this.
  
  After reviewing the log files, I see the client does not 
  report it received the sendsize service request.
 
 Increase etimeout. (probably)


Currently etimeout is 300. Should this be increased


  Why does deleting the curinfo directories correct the problem?
 
 In that case amanda thinks that DLE is completely new, and schedules
 a level 0. It does not bother to estimate how much data incremental
 level 1 would be.
 The next day(s) the planner needs estimates for the level 0, the last 
 incremental level, and the lastincr+1. Doing 2 or 3 estimates for each
 DLE on that host takes longer than only level 0...


One part of the mystery solved. 


 Just speculating of course. Seeing logfiles 
 (sendsize.XXX.debug on the 
 client, and amdump.X on the server) and config files would help.


There is no sendsize.XXX.debug log. This is one of things making me thing the server
can/does not send a send size request. :(


amdump log:
snip


planner: time 0.414: setting up estimates for logfun02:/
logfun02:/ overdue 13 days for level 0
setup_estimate: logfun02:/: command 0, options:
 last_level 1 next_level0 -13 level_days 2
 getting estimates 0 (28420) 1 (583) -1 (-1)


snip


planner: time 30.790: error result for host logfun02 disk /: Request to logfun02 timed out.


snip


If I am reading the above messages correctly the timeout is happening well within
the etimeout window. And the 30 second time frame almost sounds like a network
timeout threshold





Re: Problem backing up just a few machines

2004-05-27 Thread Paul Bijnens
Henson, George Mr JMLFDC wrote:
  Increase etimeout.  (probably)
Currently etimeout is 300. Should this be increased
Could be, but to find out how much, the sendsize.XXX.debug log helps a 
lot. But see below.


planner: time 30.790: error result for host logfun02 disk /: Request to 
logfun02 timed out.

snip
If I am reading the above messages correctly the timeout is happening
 well within the etimeout window. And the 30 second time frame almost
sounds like a network timeout threshold
That's strange indeed.
If there isn't a sendsize.XXX.debug file, are there other files.
Is there at least one or more amandad..debug on the client
in directory /tmp/amanda?
--
Paul Bijnens, XplanationTel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax  +32 16 397.512
http://www.xplanation.com/  email:  [EMAIL PROTECTED]
***
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, F6, *
* quit,  ZZ, :q, :q!,  M-Z, ^X^C,  logoff, logout, close, bye,  /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* kill -9 1,  Alt-F4,  Ctrl-Alt-Del,  AltGr-NumLock,  Stop-A,  ...*
* ...  Are you sure?  ...   YES   ...   Phew ...   I'm out  *
***



RE: Problem backing up just a few machines

2004-05-27 Thread Henson, George Mr JMLFDC
Title: RE: Problem backing up just a few machines






  planner: time 30.790: error result for host logfun02 disk 
  /: Request to logfun02 timed out.
  
  snip
  
  If I am reading the above messages correctly the timeout is 
  happening well within the etimeout window. And the 30 second time 
  frame almost sounds like a network timeout threshold
 
 That's strange indeed.
 If there isn't a sendsize.XXX.debug file, are there other files.
 Is there at least one or more amandad..debug on the client
 in directory /tmp/amanda?


I have 4 files. three amandad and one selfcheck (I have cron fire 
amcheck  amdump)


Below are the files in client:/tmp/amanda


amandad.20040527020003.debug amandad.20040527020502.debug
amandad.20040527020003000.debug selfcheck.20040527020004.debug