Re: Problem backing up just a few machines
Henson, George Mr JMLFDC wrote: logfun02 / lev 0 FAILED [Request to logfun02 timed out.] Then the next day I had two hosts with the same error. In trying to fix the issue I found if I removed the curinfo directories for these two hosts, the backups would run the next time amdump was called. It is always the same two hosts which fail like this. After reviewing the log files, I see the client does not report it received the sendsize service request. Increase etimeout. (probably) Why does deleting the curinfo directories correct the problem? In that case amanda thinks that DLE is completely new, and schedules a level 0. It does not bother to estimate how much data incremental level 1 would be. The next day(s) the planner needs estimates for the level 0, the last incremental level, and the lastincr+1. Doing 2 or 3 estimates for each DLE on that host takes longer than only level 0... Just speculating of course. Seeing logfiles (sendsize.XXX.debug on the client, and amdump.X on the server) and config files would help. -- Paul Bijnens, XplanationTel +32 16 397.511 Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax +32 16 397.512 http://www.xplanation.com/ email: [EMAIL PROTECTED] *** * I think I've got the hang of it now: exit, ^D, ^C, ^\, ^Z, ^Q, F6, * * quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, * * stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt, abort, hangup, * * PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e, kill -1 $$, shutdown, * * kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, Stop-A, ...* * ... Are you sure? ... YES ... Phew ... I'm out * ***
RE: Problem backing up just a few machines
Title: RE: Problem backing up just a few machines Henson, George Mr JMLFDC wrote: logfun02 / lev 0 FAILED [Request to logfun02 timed out.] Then the next day I had two hosts with the same error. In trying to fix the issue I found if I removed the curinfo directories for these two hosts, the backups would run the next time amdump was called. It is always the same two hosts which fail like this. After reviewing the log files, I see the client does not report it received the sendsize service request. Increase etimeout. (probably) Currently etimeout is 300. Should this be increased Why does deleting the curinfo directories correct the problem? In that case amanda thinks that DLE is completely new, and schedules a level 0. It does not bother to estimate how much data incremental level 1 would be. The next day(s) the planner needs estimates for the level 0, the last incremental level, and the lastincr+1. Doing 2 or 3 estimates for each DLE on that host takes longer than only level 0... One part of the mystery solved. Just speculating of course. Seeing logfiles (sendsize.XXX.debug on the client, and amdump.X on the server) and config files would help. There is no sendsize.XXX.debug log. This is one of things making me thing the server can/does not send a send size request. :( amdump log: snip planner: time 0.414: setting up estimates for logfun02:/ logfun02:/ overdue 13 days for level 0 setup_estimate: logfun02:/: command 0, options: last_level 1 next_level0 -13 level_days 2 getting estimates 0 (28420) 1 (583) -1 (-1) snip planner: time 30.790: error result for host logfun02 disk /: Request to logfun02 timed out. snip If I am reading the above messages correctly the timeout is happening well within the etimeout window. And the 30 second time frame almost sounds like a network timeout threshold
Re: Problem backing up just a few machines
Henson, George Mr JMLFDC wrote: Increase etimeout. (probably) Currently etimeout is 300. Should this be increased Could be, but to find out how much, the sendsize.XXX.debug log helps a lot. But see below. planner: time 30.790: error result for host logfun02 disk /: Request to logfun02 timed out. snip If I am reading the above messages correctly the timeout is happening well within the etimeout window. And the 30 second time frame almost sounds like a network timeout threshold That's strange indeed. If there isn't a sendsize.XXX.debug file, are there other files. Is there at least one or more amandad..debug on the client in directory /tmp/amanda? -- Paul Bijnens, XplanationTel +32 16 397.511 Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax +32 16 397.512 http://www.xplanation.com/ email: [EMAIL PROTECTED] *** * I think I've got the hang of it now: exit, ^D, ^C, ^\, ^Z, ^Q, F6, * * quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, * * stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt, abort, hangup, * * PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e, kill -1 $$, shutdown, * * kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, Stop-A, ...* * ... Are you sure? ... YES ... Phew ... I'm out * ***
RE: Problem backing up just a few machines
Title: RE: Problem backing up just a few machines planner: time 30.790: error result for host logfun02 disk /: Request to logfun02 timed out. snip If I am reading the above messages correctly the timeout is happening well within the etimeout window. And the 30 second time frame almost sounds like a network timeout threshold That's strange indeed. If there isn't a sendsize.XXX.debug file, are there other files. Is there at least one or more amandad..debug on the client in directory /tmp/amanda? I have 4 files. three amandad and one selfcheck (I have cron fire amcheck amdump) Below are the files in client:/tmp/amanda amandad.20040527020003.debug amandad.20040527020502.debug amandad.20040527020003000.debug selfcheck.20040527020004.debug