Estimate timeout
Amanda 2.6.1 on Solaris 10/Sparc Amanda 2.6.1, Solaris 10x86 Server has 21 clients with a total of 109 DLEs. One of the client systems has 51 DLEs, 1 ufs and 50 zfs partitions. The partitions/DLE are all part of the same ZFS pool, which I believe (listening to another discussion earlier this week) are checked sequentially. We seem to be exceeding a timeout limit. etimeout is for size estimates - so I don't think it applies. We have switched to server estimate for zfs-dump. Is there a per client amcheck estimate timeout, not based on number of client DLEs ? Amanda Backup Client Hosts Check WARNING: finsen: selfcheck request failed: timeout waiting for REP Client check: 21 hosts checked in 91.125 seconds. 1 problem found. thank you, Brian --- Brian R Cuttler brian.cutt...@wadsworth.org Computer Systems Support(v) 518 486-1697 Wadsworth Center(f) 518 473-6384 NYS Department of HealthHelp Desk 518 473-0773 IMPORTANT NOTICE: This e-mail and any attachments may contain confidential or sensitive information which is, or may be, legally privileged or otherwise protected by law from further disclosure. It is intended only for the addressee. If you received this in error or from someone who was not authorized to send it to you, please do not distribute, copy or use it or any attachments. Please notify the sender immediately by reply e-mail and delete this from your system. Thank you for your cooperation.
Re: AW: Estimate timeout
Thanks for the reply. Actually, I could resolve the problem by changing the disklist file to something like this hostname volumename { root-tar estimate calcsize } It worked for quite a few test backups. But while adding some more DLEs to the disklist, I started getting errors such as those given below, when I run the command amcheck config_name /etc/amanda/fullback/disklist, line 3: dump type parameter expected /etc/amanda/fullback/disklist, line 3: end of line expected My version of amanda is 2.4.4. I thought that the estimate parameter I have used in the disklist is not supported in amanda-2.4.4. But then I wonder why it worked for some time. Seems that I will have to upgrade the version of amanda. Kindly let me know, if there is any other way of resolving this issue. Thanks Yogesh --- Dipl.Ing.Trompler Wilhelm [EMAIL PROTECTED] wrote: try different values for etime. Regards W.Trompler -Ursprüngliche Nachricht- Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Im Auftrag von Yogesh Hasabnis Gesendet: Mittwoch, 07. Februar 2007 06:48 An: amanda-users@amanda.org Betreff: Estimate timeout Hi, Yesterday, I created one configuration for always full backup with the name FULLBACK. The 2 volumes to be backed up had a size of roughly 150GB. But the backup perfromed yesterday seems to have failed. The amanda logs for this backup run, read as given below (I have edited the hostname and the volume names): START driver date 20070206 DISK planner host_name volume_name_1 DISK planner host_name volume_name_2 START planner date 20070206 INFO planner Adding new disk host_name:volume_name_1. INFO planner Adding new disk host_name:volume_name_2. START taper datestamp 20070206 label FULLBACK1 tape 0 FAIL planner hostname volume_name_2 20070206 0 [Estimate timeout from host_name] FAIL planner host_name volume_name_1 20070206 0 [Estimate timeout from host_name] FINISH planner date 20070206 WARNING driver WARNING: got empty schedule from planner STATS driver startup time 1810.114 INFO taper tape FULLBACK1 kb 0 fm 0 [OK] FINISH driver date 20070206 time 1815.145 The backup server and client are the same in my case. Kindly give me suggestions about what may have gone wrong. Thanks Yogesh Get your own web address. Have a HUGE year through Yahoo! Small Business. http://smallbusiness.yahoo.com/domains/?p=BESTDEAL Expecting? Get great news right away with email Auto-Check. Try the Yahoo! Mail Beta. http://advision.webevents.yahoo.com/mailbeta/newmail_tools.html
Estimate timeout
Hi, Yesterday, I created one configuration for always full backup with the name FULLBACK. The 2 volumes to be backed up had a size of roughly 150GB. But the backup perfromed yesterday seems to have failed. The amanda logs for this backup run, read as given below (I have edited the hostname and the volume names): START driver date 20070206 DISK planner host_name volume_name_1 DISK planner host_name volume_name_2 START planner date 20070206 INFO planner Adding new disk host_name:volume_name_1. INFO planner Adding new disk host_name:volume_name_2. START taper datestamp 20070206 label FULLBACK1 tape 0 FAIL planner hostname volume_name_2 20070206 0 [Estimate timeout from host_name] FAIL planner host_name volume_name_1 20070206 0 [Estimate timeout from host_name] FINISH planner date 20070206 WARNING driver WARNING: got empty schedule from planner STATS driver startup time 1810.114 INFO taper tape FULLBACK1 kb 0 fm 0 [OK] FINISH driver date 20070206 time 1815.145 The backup server and client are the same in my case. Kindly give me suggestions about what may have gone wrong. Thanks Yogesh Get your own web address. Have a HUGE year through Yahoo! Small Business. http://smallbusiness.yahoo.com/domains/?p=BESTDEAL
estimate timeout and dump failure
Hi, I am using version 2.5.0p2. on my dump host. One of my clients (same version) has a filesystem that consistently fails the estimate and dump phases. The same host has two other filesystems (smaller) that complete without problem. The amandad and selfcheck debug from this host shows no indication of problems. The sendsize debug does show a warning, but I can't find enough information to correct the problem or be sure that it is the problem. Error/Warning message: From backup report FAILURE AND STRANGE DUMP SUMMARY: fa1 amrd0s1f lev 0 FAILED [disk amrd0s1f, all estimate timed out] planner: ERROR Request to fa1 failed: timeout waiting for REP - Error/Warning message: From sendsize.debug sendsize[67866]: time 5.776: getting size via dump for amrd0s1f level 0 sendsize[67866]: time 5.777: calculating for device '/dev/amrd0s1f' with 'ufs' sendsize[67866]: time 5.777: running /sbin/dump 0Shsf 0 1048576 - /dev/amrd0s1f sendsize[67866]: time 5.778: running /usr/local/libexec/amanda/killpgrp sendsize[67866]: time 5.781: DUMP: WARNING: should use -L when dumping live read-write filesystems! sendsize[67866]: time 5.782: DUMP: Date of this level 0 dump: Thu Oct 5 19:36:58 2006 sendsize[67866]: time 5.783: DUMP: Date of last level 0 dump: the epoch sendsize[67866]: time 5.784: DUMP: Dumping /dev/amrd0s1f (/usr) to standard output sendsize[67866]: time 5.857: DUMP: mapping (Pass I) [regular files] sendsize[67866]: time 17.022: DUMP: mapping (Pass II) [directories] sendsize[67866]: time 17.022: DUMP: estimated 5457824 tape blocks. sendsize[67866]: time 17.027: . sendsize[67866]: estimate time for amrd0s1f level 0: 11.249 sendsize[67866]: estimate size for amrd0s1f level 0: 5457824 KB sendsize[67866]: time 17.027: asking killpgrp to terminate sendsize[67866]: time 18.035: done with amname 'amrd0s1f', dirname '/usr', spindle -1 sendsize[67854]: time 18.036: child 67866 terminated normally - I compiled the client with amanda_snapshot, and I can see a .snap directory in the filesystem noted above. One question I have is, How and where do you specify dump -L? I get this same warning on one of the other filesystems on this host, but the estimate and dump finish with no problems. Backups on the host completed when host and server were using (Amanda-2.4.5). I appreciate any help you can provide in solving this. Thanks -Mike -- Michael Galvez http://www.people.virginia.edu/~mrg8n Information Technology Specialist University of Virginia
Re: estimate timeout and dump failure
On Friday 06 October 2006 10:32, Mike Galvez wrote: Mike, I think this is a known bug, you need to update to one of the 2.5.1p1 versions. It bit lots of us. See http://www.iro.umontreal.ca/~martinea/amanda/ I ran the 20061004 snapshot last night, and it worked just fine. I am using version 2.5.0p2. on my dump host. One of my clients (same version) has a filesystem that consistently fails the estimate and dump phases. The same host has two other filesystems (smaller) that complete without problem. The amandad and selfcheck debug from this host shows no indication of problems. [...] If the client is a slower client, you might have to enlarge the 'etimeout' and 'dtimeout' settings too, but I believe that won't fix the bug I referred to. Really big estimates and dumps will exceed those defaults, and you didn't say how big they might be. My largest dle is about 9GiB, and ISTR I've had those doubled for years. My one client is a little slow, its only a 500mhz k6 with 320megs of ram. I have things divided up into usually not more than 2GiB dle's, some considerably smaller, so amanda can have a ball balancing things. About 55GiB total. -- Cheers, Gene There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order. -Ed Howdershelt (Author) Yahoo.com and AOL/TW attorneys please note, additions to the above message by Gene Heskett are: Copyright 2006 by Maurice Eugene Heskett, all rights reserved.
Re: Estimate timeout from localhost / driver: WARNING: got empty schedule from planner
Joshua Baker-LePain wrote: On Tue, 25 Apr 2006 at 9:22am, Thomas Grieder wrote Since a few days I get this two error messages: Estimate timeout from localhost driver: WARNING: got empty schedule from planner 1) Using localhost in the disklist is generally frowned upon. It's not a unique name, and will likely come back to bite you someday. 2) Look in /tmp/amanda at the *debug files -- most likely sendize*debug and/or amandad*debug will have more info as to what's going wrong. Thanks, backup is working well now. Thomas
Estimate timeout from localhost / driver: WARNING: got empty schedule from planner
Hi Since a few days I get this two error messages: Estimate timeout from localhost driver: WARNING: got empty schedule from planner I could not find any hints while searching google. Any ideas? Below amcheck an amdump mails: amcheck: Amanda Tape Server Host Check - Holding disk /srv/amanda/: 155902968 KB disk space available, using 145417208 KB amcheck-server: slot 3: not an amanda tape amcheck-server: slot 3: not an amanda tape amcheck-server: slot 4: date 20060419 label 000341 (active tape) amcheck-server: slot 5: date 20060420 label 000123 (active tape) amcheck-server: slot 6: date 20060421 label 000214 (active tape) amcheck-server: slot 7: date 20060409 label 000348 (first labelstr match) amcheck-server: slot 8: date 20060410 label 000282 (active tape) amcheck-server: slot 9: date 20060410 label 000279 (active tape) amcheck-server: slot 10: date 20060410 label 000280 (active tape) amcheck-server: slot 11: date 20060410 label 000347 (active tape) amcheck-server: slot 12: date 20060411 label 000349 (active tape) amcheck-server: slot 13: date 20060412 label 000281 (active tape) amcheck-server: slot 14: date 20060411 label 74 (active tape) amcheck-server: slot 15: date 20060414 label 67 (active tape) amcheck-server: slot 16: date 20060417 label 63 (active tape) amcheck-server: slot 17: date 20060418 label 03 (active tape) amcheck-server: slot 18: date 20060413 label 14 (active tape) amcheck-server: slot 19: date Xlabel 25 (labelstr match) amcheck-server: slot 20: date Xlabel 29 (labelstr match) amcheck-server: slot 21: date Xlabel 24 (labelstr match) amcheck-server: slot 22: date 20060330 label 000106 (labelstr match) amcheck-server: slot 1: date 20060330 label 000350 (labelstr match) amcheck-server: slot 2: date 20060405 label 000345 (labelstr match) NOTE: skipping tape-writable test Tape 000348 label ok Server check took 2060.760 seconds Amanda Backup Client Hosts Check Client check: 1 host checked in 0.338 seconds, 0 problems found (brought to you by Amanda 2.4.4p3) amdump: These dumps were to tape 000348. The next tape Amanda expects to use is: 000344. The next new tape already labelled is: 25. FAILURE AND STRANGE DUMP SUMMARY: localhost /srv/backup/moonsmile.ch lev 0 FAILED [Estimate timeout from localhost] localhost /srv/backup/Tapes lev 0 FAILED [Estimate timeout from localhost] localhost /srv/svn lev 0 FAILED [Estimate timeout from localhost] localhost /var lev 0 FAILED [Estimate timeout from localhost] localhost /home lev 0 FAILED [Estimate timeout from localhost] localhost /usr lev 0 FAILED [Estimate timeout from localhost] localhost /etc lev 0 FAILED [Estimate timeout from localhost] STATISTICS: Total Full Daily Estimate Time (hrs:min)1:15 Run Time (hrs:min) 1:15 Dump Time (hrs:min)0:00 0:00 0:00 Output Size (meg) 0.00.00.0 Original Size (meg) 0.00.00.0 Avg Compressed Size (%) -- -- -- Filesystems Dumped0 0 0 Avg Dump Rate (k/s) -- -- -- Tape Time (hrs:min)0:00 0:00 0:00 Tape Size (meg) 0.00.00.0 Tape Used (%) 0.00.00.0 Filesystems Taped 0 0 0 Avg Tp Write Rate (k/s) -- -- -- USAGE BY TAPE: LabelTime Size %Nb 000348 0:00 0.00.0 0 NOTES: driver: WARNING: got empty schedule from planner taper: tape 000348 kb 0 fm 0 [OK] DUMP SUMMARY: DUMPER STATSTAPER STATS HOSTNAME DISKL ORIG-KB OUT-KB COMP% MMM:SS KB/s MMM:SS KB/s -- - localhost/etc0 FAILED --- localhost/home 0 FAILED --- localhost-ckup/Tapes 0 FAILED --- localhost-onsmile.ch 0 FAILED --- localhost/srv/svn0 FAILED --- localhost/usr0 FAILED --- localhost/var0 FAILED --- (brought to you by Amanda version 2.4.4p3) Thomas
Re: Estimate timeout from localhost / driver: WARNING: got empty schedule from planner
On Tue, 25 Apr 2006 at 9:22am, Thomas Grieder wrote Since a few days I get this two error messages: Estimate timeout from localhost driver: WARNING: got empty schedule from planner 1) Using localhost in the disklist is generally frowned upon. It's not a unique name, and will likely come back to bite you someday. 2) Look in /tmp/amanda at the *debug files -- most likely sendize*debug and/or amandad*debug will have more info as to what's going wrong. -- Joshua Baker-LePain Department of Biomedical Engineering Duke University
BUG (was: Re: Handitarded....odd (partial) estimate timeout errors.)
Michael Loftis wrote: Paul asked for the logs, it seems like there's an amanda bug. The units Yes, indeed, there is a bug in Amanda! You have 236 DLE's for that host, and from my reading of the code the REQuest UDP packet is limited to 32K instead of 64K (see planner.c lines 1377-1383) (Need to update the documentation!) It seems that that planner splits up the REQuest packet into separate UDP-packets when exceeding MAX_DGRAM/2, i.e. 32K. Your first request was 32580 bytes. Adding the next string to that request would have excceeded the 32768 limit. The reason for division by 2 seems to reserver space for error replies on each of those. However, the amandad client only expects one and only one REQuest packet. Any other REQuest packet coming from the same connection (5-tuple: protocol, remotehost, remoteport, localhost, localport) and having a type REQ is considered a duplicate. It should actually test for the handle and sequence to be identical too. It does not. It's not fixed quickly either: when receiving the first REQ packet, the amandad client forks and execs the request program (sendsize in this case) and reads from the results from a pipe. By the time the second, non-identical request comes in (with different handle, sequence -- which is currently not checked), sendsize is already started and cannot be given additional DLE's to estimate. As a temporary workaround, you could shorten the exclude-list string for that host by creating a symlink: ln -s /etc/amanda/exclude.gtar /.excl and use that as exclude-list: this shortens each line by 20 byte, which would shrink the package to fit again. (236 DLE's * 20 = 4720 bytes less in a REQuest UDP for that host!) AnywayI'm getting a headache thinking about it :) all my other DLEs seem ok for that host, and the ones that it misses are not always exactly the same, but all seem to be non-calcsize estimated. Just bad luck for those entries that happen to go in the end of the queue. On the other hand, when really unlucky, you could have up to three estimates for each DLE, overflowing even the 4K we saved by shrinking the exclude string... -- Paul Bijnens, XplanationTel +32 16 397.511 Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax +32 16 397.512 http://www.xplanation.com/ email: [EMAIL PROTECTED] *** * I think I've got the hang of it now: exit, ^D, ^C, ^\, ^Z, ^Q, ^^, * * F6, quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, * * stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt, abort, hangup, * * PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e, kill -1 $$, shutdown, * * init 0, kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, Stop-A, ... * * ... Are you sure? ... YES ... Phew ... I'm out * ***
Re: BUG (was: Re: Handitarded....odd (partial) estimate timeout errors.)
--On January 5, 2006 4:49:53 PM +0100 Paul Bijnens [EMAIL PROTECTED] wrote: Michael Loftis wrote: Paul asked for the logs, it seems like there's an amanda bug. The units Yes, indeed, there is a bug in Amanda! You have 236 DLE's for that host, and from my reading of the code the REQuest UDP packet is limited to 32K instead of 64K (see planner.c lines 1377-1383) (Need to update the documentation!) Woot, I'm NOT crazy! :D ...did I just say woot? My apologies. It seems that that planner splits up the REQuest packet into separate UDP-packets when exceeding MAX_DGRAM/2, i.e. 32K. Your first request was 32580 bytes. Adding the next string to that request would have excceeded the 32768 limit. The reason for division by 2 seems to reserver space for error replies on each of those. I knew it was size related but that my packets were significantly less than the MAX_DGRAM. This definitely explains it. However, the amandad client only expects one and only one REQuest packet. Any other REQuest packet coming from the same connection (5-tuple: protocol, remotehost, remoteport, localhost, localport) and having a type REQ is considered a duplicate. It should actually test for the handle and sequence to be identical too. It does not. It's not fixed quickly either: when receiving the first REQ packet, the amandad client forks and execs the request program (sendsize in this case) and reads from the results from a pipe. By the time the second, non-identical request comes in (with different handle, sequence -- which is currently not checked), sendsize is already started and cannot be given additional DLE's to estimate. As a temporary workaround, you could shorten the exclude-list string for that host by creating a symlink: ln -s /etc/amanda/exclude.gtar /.excl Yeah...This will help for a time. Hopefully long enough for a patch to fix amandad. I'll have to create a separate type for this server, since we've got well over a hundred now and they all share that main backup type. I figured shortening the UDP packets somehow would help, I knew it was just odd that it wasn't quite right and I seemed to be running into the problem way too early :) and use that as exclude-list: this shortens each line by 20 byte, which would shrink the package to fit again. (236 DLE's * 20 = 4720 bytes less in a REQuest UDP for that host!) AnywayI'm getting a headache thinking about it :) all my other DLEs seem ok for that host, and the ones that it misses are not always exactly the same, but all seem to be non-calcsize estimated. Just bad luck for those entries that happen to go in the end of the queue. On the other hand, when really unlucky, you could have up to three estimates for each DLE, overflowing even the 4K we saved by shrinking the exclude string... Like I said, hopefully by then either the hackers (or myself) will have put together a patch. ... I see three ways to fix this...one of which I don't know will fix, what about turning wait=yes to wait=no in my xinetd.conf? Not sure what that would break. The other two involve code...multiple sendsize's, *or* a protocol change to wait for a 'final start' packet, or an amandad change to wait a few extra seconds before starting the actual sendsize, coalescing the results. And you're right, the other ways aren't easy...one involves possibly breaking the protocol too.
Handitarded....odd (partial) estimate timeout errors.
I added about half a dozen or so DLEs (splitting an existing one) and since that time I get estimate timeout errors for some other DLEs on this host (daily run snippet attached) ... i suspect I'm hitting a UDP packet limit maybe, but...I'm really drawing a blank. I've turned up etimeout quite a bit, to no effect. Maybe soemone can jog my memory, but are the estimates returned in a single UDP packet and therefore subject to the MTU? If so...how to get around it? OR maybe I'm missing something more obvious. Amanda 2.4.5 server and client, client being debian woody, server being debian sarge client DLEs all with 'calcsize' estimate setting except for the affected DLEs, but not all non-calcsize DLEs are affected... need anything else let me know. planner: ERROR Request to nfs0.msomt timed out. nfs0.msomt /var/spool/cron lev 0 FAILED [missing result for /var/spool/cron in nfs0.msomt response] nfs0.msomt /usr/local lev 0 FAILED [missing result for /usr/local in nfs0.msomt response] nfs0.msomt /root lev 0 FAILED [missing result for /root in nfs0.msomt response] -- Genius might be described as a supreme capacity for getting its possessors into trouble of all kinds. -- Samuel Butler
Re: Handitarded....odd (partial) estimate timeout errors.
On Wed, Jan 04, 2006 at 02:01:36PM -0700, Michael Loftis wrote: I added about half a dozen or so DLEs (splitting an existing one) and since that time I get estimate timeout errors for some other DLEs on this host (daily run snippet attached) ... i suspect I'm hitting a UDP packet limit maybe, but...I'm really drawing a blank. I've turned up etimeout quite a bit, to no effect. Maybe soemone can jog my memory, but are the estimates returned in a single UDP packet and therefore subject to the MTU? If so...how to get around it? OR maybe I'm missing something more obvious. Amanda 2.4.5 server and client, client being debian woody, server being debian sarge client DLEs all with 'calcsize' estimate setting except for the affected DLEs, but not all non-calcsize DLEs are affected... need anything else let me know. planner: ERROR Request to nfs0.msomt timed out. nfs0.msomt /var/spool/cron lev 0 FAILED [missing result for /var/spool/cron in nfs0.msomt response] nfs0.msomt /usr/local lev 0 FAILED [missing result for /usr/local in nfs0.msomt response] nfs0.msomt /root lev 0 FAILED [missing result for /root in nfs0.msomt response] You can find comments on the problem here: http://tinyurl.com/ca7pv -- Jon H. LaBadie [EMAIL PROTECTED] JG Computing 4455 Province Line Road(609) 252-0159 Princeton, NJ 08540-4322 (609) 683-7220 (fax)
Re: Handitarded....odd (partial) estimate timeout errors.
--On January 4, 2006 4:30:53 PM -0500 Jon LaBadie [EMAIL PROTECTED] wrote: You can find comments on the problem here: http://tinyurl.com/ca7pv OK hmm something REALLY odd is happening. For the DLEs that failed there are multiple sendsize requests... one in the main/first REQ which it acks...then another request (a second or two later) that just for the DLEs that never make it, amandad claims this to be a dup P_REQ packet, acks it anyway, but doesn't apparently do any estimates of it I'm wary of sending the entire debug to the list, but if interested I'll send it directly to developer( s ) I'm thinking maybe something funny is going on?
Re: Handitarded....odd (partial) estimate timeout errors.
On Wed, Jan 04, 2006 at 03:38:56PM -0700, Michael Loftis wrote: --On January 4, 2006 4:30:53 PM -0500 Jon LaBadie [EMAIL PROTECTED] wrote: You can find comments on the problem here: http://tinyurl.com/ca7pv OK hmm something REALLY odd is happening. For the DLEs that failed there are multiple sendsize requests... one in the main/first REQ which it acks...then another request (a second or two later) that just for the DLEs that never make it, amandad claims this to be a dup P_REQ packet, acks it anyway, but doesn't apparently do any estimates of it I'm wary of sending the entire debug to the list, but if interested I'll send it directly to developer( s ) I'm thinking maybe something funny is going on? Asking questions about things I know nothing ... :) Are you using iptables? If so, have you installed and configured the ??conntrack?? module? -- Jon H. LaBadie [EMAIL PROTECTED] JG Computing 4455 Province Line Road(609) 252-0159 Princeton, NJ 08540-4322 (609) 683-7220 (fax)
Re: Handitarded....odd (partial) estimate timeout errors.
--On January 4, 2006 7:20:50 PM -0500 Jon LaBadie [EMAIL PROTECTED] wrote: Asking questions about things I know nothing ... :) Are you using iptables? If so, have you installed and configured the ??conntrack?? module? Paul asked for the logs, it seems like there's an amanda bug. The units in question are attached to the same broadcast domain/VLAN and are in the same subnet, so are talking directly to eachother. It's not an obvious network or switch problem going on. I thought maybe an MTU limit of 1500 bytes but apparently amanda is set to fragment UDP packets up to 64k and so that should be fine, and other drives are making it. Anyway thanks anyway Jon :) I think we've hit some sort of bug or something in amandad, or planner (I think it sends the SERVICE sendsize packets) or both. Network wise BTW the backup server is connected to a switch here in the office, which is trunked further to a switch upstairs, then to another switch in the blade chassis, then to the untrunked connection to the (amanda backup client) nfs server which is the one having issues. It seemed maybe some sort of odd packet size limit or some other 'max number of' limit in planner, since planner is sending duplicate requests sorta for the affected DLEs. AnywayI'm getting a headache thinking about it :) all my other DLEs seem ok for that host, and the ones that it misses are not always exactly the same, but all seem to be non-calcsize estimated.
Re: AW: AW: AW: AW: AW: AW: AW: Estimate timeout from server
Sebastian Kösters wrote: Thats all in sendsize: sendsize: debug 1 pid 5132 ruid 33 euid 33: start at Fri Nov 4 02:30:02 2005 sendsize: version 2.4.3 sendsize[5132]: time 0.007: waiting for any estimate child sendsize[5134]: time 0.007: calculating for amname '/pst', dirname '/pst', spindle -1 sendsize[5134]: time 0.007: getting size via gnutar for /pst level 0 sendsize[5134]: time 0.034: spawning /usr/lib/amanda/runtar in pipeline sendsize[5134]: argument list: /bin/tar --create --file /dev/null --directory /pst --one-file-system --listed-incremental /var/lib/amanda/gnutar-lists/pst_ps t_0.new --sparse --ignore-failed-read --totals --exclude-from /tmp/amanda/sendsize._pst.20051104023002.exclude . sendsize[5134]: time 2938.236: Total bytes written: 86986362880 (81GB, 28MB/s) sendsize[5134]: time 2938.237: . sendsize[5134]: estimate time for /pst level 0: 2938.203 sendsize[5134]: estimate size for /pst level 0: 84947620 KB sendsize[5134]: time 2938.237: waiting for /bin/tar /pst child sendsize[5134]: time 2938.237: after /bin/tar /pst wait sendsize[5134]: time 2938.238: done with amname '/pst', dirname '/pst', spindle -1 sendsize[5132]: time 2938.238: child 5134 terminated normally sendsize: time 2938.238: pid 5132 finish time Fri Nov 4 03:19:00 2005 [...] Seems to work but why not directly with amanda?! Remind me, did you change etimeout? Your sendsize debug file seems normal to me, and in fact we see that the estimate size is correctly identified. Could you show the corresponding amandad debug file please? Alex -- Alexander Jolk * BUF Compagnie * [EMAIL PROTECTED] Tel +33-1 42 68 18 28 * Fax +33-1 42 68 18 29
AW: AW: AW: AW: AW: AW: AW: AW: Estimate timeout from server
The last try from which i posted the output was by hand. Now i`am trying again with Amanda (Timeout 4000) -Ursprüngliche Nachricht- Von: Alexander Jolk [mailto:[EMAIL PROTECTED] Gesendet: Freitag, 4. November 2005 09:54 An: Sebastian Kösters Cc: amanda-users@amanda.org Betreff: Re: AW: AW: AW: AW: AW: AW: AW: Estimate timeout from server Sebastian Kösters wrote: Thats all in sendsize: sendsize: debug 1 pid 5132 ruid 33 euid 33: start at Fri Nov 4 02:30:02 2005 sendsize: version 2.4.3 sendsize[5132]: time 0.007: waiting for any estimate child sendsize[5134]: time 0.007: calculating for amname '/pst', dirname '/pst', spindle -1 sendsize[5134]: time 0.007: getting size via gnutar for /pst level 0 sendsize[5134]: time 0.034: spawning /usr/lib/amanda/runtar in pipeline sendsize[5134]: argument list: /bin/tar --create --file /dev/null --directory /pst --one-file-system --listed-incremental /var/lib/amanda/gnutar-lists/pst_ps t_0.new --sparse --ignore-failed-read --totals --exclude-from /tmp/amanda/sendsize._pst.20051104023002.exclude . sendsize[5134]: time 2938.236: Total bytes written: 86986362880 (81GB, 28MB/s) sendsize[5134]: time 2938.237: . sendsize[5134]: estimate time for /pst level 0: 2938.203 sendsize[5134]: estimate size for /pst level 0: 84947620 KB sendsize[5134]: time 2938.237: waiting for /bin/tar /pst child sendsize[5134]: time 2938.237: after /bin/tar /pst wait sendsize[5134]: time 2938.238: done with amname '/pst', dirname '/pst', spindle -1 sendsize[5132]: time 2938.238: child 5134 terminated normally sendsize: time 2938.238: pid 5132 finish time Fri Nov 4 03:19:00 2005 [...] Seems to work but why not directly with amanda?! Remind me, did you change etimeout? Your sendsize debug file seems normal to me, and in fact we see that the estimate size is correctly identified. Could you show the corresponding amandad debug file please? Alex -- Alexander Jolk * BUF Compagnie * [EMAIL PROTECTED] Tel +33-1 42 68 18 28 * Fax +33-1 42 68 18 29
AW: AW: AW: AW: AW: AW: AW: AW: Estimate timeout from server
Like i said i started amdumup with etimeout = 4000but now ist running since 8:36 now it is 10:53. And it is even running I dont understand. On the client thera are the following process from amanda: 8243 ?S 0:00 /usr/lib/amanda/sendbackup 8245 ?S 52:15 /usr/bin/gzip --fast 8246 ?S 1:19 /usr/lib/amanda/sendbackup 8247 ?S 0:00 sh -c /bin/tar -tf - 2/dev/null | sed -e 's/^\.//' 8248 ?S 0:49 /bin/tar -tf - 8249 ?S 0:00 sed -e s/^\.// 8250 ?R 2:30 gtar --create --file - --directory /pst --one-file-system --listed-incremental /var/lib/amanda/gnutar-lists/pst_pst_0.new --sparse -Ursprüngliche Nachricht- Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Im Auftrag von Alexander Jolk Gesendet: Freitag, 4. November 2005 09:54 An: Sebastian Kösters Cc: amanda-users@amanda.org Betreff: Re: AW: AW: AW: AW: AW: AW: AW: Estimate timeout from server Sebastian Kösters wrote: Thats all in sendsize: sendsize: debug 1 pid 5132 ruid 33 euid 33: start at Fri Nov 4 02:30:02 2005 sendsize: version 2.4.3 sendsize[5132]: time 0.007: waiting for any estimate child sendsize[5134]: time 0.007: calculating for amname '/pst', dirname '/pst', spindle -1 sendsize[5134]: time 0.007: getting size via gnutar for /pst level 0 sendsize[5134]: time 0.034: spawning /usr/lib/amanda/runtar in pipeline sendsize[5134]: argument list: /bin/tar --create --file /dev/null --directory /pst --one-file-system --listed-incremental /var/lib/amanda/gnutar-lists/pst_ps t_0.new --sparse --ignore-failed-read --totals --exclude-from /tmp/amanda/sendsize._pst.20051104023002.exclude . sendsize[5134]: time 2938.236: Total bytes written: 86986362880 (81GB, 28MB/s) sendsize[5134]: time 2938.237: . sendsize[5134]: estimate time for /pst level 0: 2938.203 sendsize[5134]: estimate size for /pst level 0: 84947620 KB sendsize[5134]: time 2938.237: waiting for /bin/tar /pst child sendsize[5134]: time 2938.237: after /bin/tar /pst wait sendsize[5134]: time 2938.238: done with amname '/pst', dirname '/pst', spindle -1 sendsize[5132]: time 2938.238: child 5134 terminated normally sendsize: time 2938.238: pid 5132 finish time Fri Nov 4 03:19:00 2005 [...] Seems to work but why not directly with amanda?! Remind me, did you change etimeout? Your sendsize debug file seems normal to me, and in fact we see that the estimate size is correctly identified. Could you show the corresponding amandad debug file please? Alex -- Alexander Jolk * BUF Compagnie * [EMAIL PROTECTED] Tel +33-1 42 68 18 28 * Fax +33-1 42 68 18 29
Re: AW: AW: AW: AW: AW: AW: AW: Estimate timeout from server
Sebastian Kösters wrote: Hi! Thats all in sendsize: sendsize: debug 1 pid 5132 ruid 33 euid 33: start at Fri Nov 4 02:30:02 2005 sendsize: version 2.4.3 sendsize[5132]: time 0.007: waiting for any estimate child sendsize[5134]: time 0.007: calculating for amname '/pst', dirname '/pst', spindle -1 sendsize[5134]: time 0.007: getting size via gnutar for /pst level 0 sendsize[5134]: time 0.034: spawning /usr/lib/amanda/runtar in pipeline sendsize[5134]: argument list: /bin/tar --create --file /dev/null --directory /pst --one-file-system --listed-incremental /var/lib/amanda/gnutar-lists/pst_ps t_0.new --sparse --ignore-failed-read --totals --exclude-from /tmp/amanda/sendsize._pst.20051104023002.exclude . sendsize[5134]: time 2938.236: Total bytes written: 86986362880 (81GB, 28MB/s) sendsize[5134]: time 2938.237: . sendsize[5134]: estimate time for /pst level 0: 2938.203 sendsize[5134]: estimate size for /pst level 0: 84947620 KB sendsize[5134]: time 2938.237: waiting for /bin/tar /pst child sendsize[5134]: time 2938.237: after /bin/tar /pst wait sendsize[5134]: time 2938.238: done with amname '/pst', dirname '/pst', spindle -1 sendsize[5132]: time 2938.238: child 5134 terminated normally sendsize: time 2938.238: pid 5132 finish time Fri Nov 4 03:19:00 2005 This sendsize is completely normal. The previous one you sent, did have the error about not finding the size line. But that size line is there in this run! Notice the line Total bytes written: ... [EMAIL PROTECTED] gnutar-lists]# /bin/tar --create --file /dev/null --directory /pst --one-file-system --listed-incremental /var/lib/amanda/gnutar-lists/pst_pst_0.new --sparse --ignore-failed-read --totals . After +1 hour Gesamtzahl geschriebener Bytes: 86991011840 (81GB, 27MB/s) (English: Total number of written bytes ) Seems you have some mixed English-German environment. Amanda specifically looks for the English string, and does not recognize the german words. Could it be that some amdump runs somehow use the german environment? Seems to work but why not directly with amanda?! I cannot conclude that from the evidence you give here: the sendsize is perfect. -- Paul Bijnens, XplanationTel +32 16 397.511 Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax +32 16 397.512 http://www.xplanation.com/ email: [EMAIL PROTECTED] *** * I think I've got the hang of it now: exit, ^D, ^C, ^\, ^Z, ^Q, F6, * * quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, * * stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt, abort, hangup, * * PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e, kill -1 $$, shutdown, * * kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, Stop-A, ...* * ... Are you sure? ... YES ... Phew ... I'm out * ***
AW: AW: AW: AW: AW: AW: AW: AW: Estimate timeout from server
Hi! We have some more server with (for example) german tar and there are no Problems with the backup. Amanda is installed in English -Ursprüngliche Nachricht- Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Im Auftrag von Paul Bijnens Gesendet: Freitag, 4. November 2005 11:26 An: Sebastian Kösters Cc: amanda-users@amanda.org Betreff: Re: AW: AW: AW: AW: AW: AW: AW: Estimate timeout from server Sebastian Kösters wrote: Hi! Thats all in sendsize: sendsize: debug 1 pid 5132 ruid 33 euid 33: start at Fri Nov 4 02:30:02 2005 sendsize: version 2.4.3 sendsize[5132]: time 0.007: waiting for any estimate child sendsize[5134]: time 0.007: calculating for amname '/pst', dirname '/pst', spindle -1 sendsize[5134]: time 0.007: getting size via gnutar for /pst level 0 sendsize[5134]: time 0.034: spawning /usr/lib/amanda/runtar in pipeline sendsize[5134]: argument list: /bin/tar --create --file /dev/null --directory /pst --one-file-system --listed-incremental /var/lib/amanda/gnutar-lists/pst_ps t_0.new --sparse --ignore-failed-read --totals --exclude-from /tmp/amanda/sendsize._pst.20051104023002.exclude . sendsize[5134]: time 2938.236: Total bytes written: 86986362880 (81GB, 28MB/s) sendsize[5134]: time 2938.237: . sendsize[5134]: estimate time for /pst level 0: 2938.203 sendsize[5134]: estimate size for /pst level 0: 84947620 KB sendsize[5134]: time 2938.237: waiting for /bin/tar /pst child sendsize[5134]: time 2938.237: after /bin/tar /pst wait sendsize[5134]: time 2938.238: done with amname '/pst', dirname '/pst', spindle -1 sendsize[5132]: time 2938.238: child 5134 terminated normally sendsize: time 2938.238: pid 5132 finish time Fri Nov 4 03:19:00 2005 This sendsize is completely normal. The previous one you sent, did have the error about not finding the size line. But that size line is there in this run! Notice the line Total bytes written: ... [EMAIL PROTECTED] gnutar-lists]# /bin/tar --create --file /dev/null --directory /pst --one-file-system --listed-incremental /var/lib/amanda/gnutar-lists/pst_pst_0.new --sparse --ignore-failed-read --totals . After +1 hour Gesamtzahl geschriebener Bytes: 86991011840 (81GB, 27MB/s) (English: Total number of written bytes ) Seems you have some mixed English-German environment. Amanda specifically looks for the English string, and does not recognize the german words. Could it be that some amdump runs somehow use the german environment? Seems to work but why not directly with amanda?! I cannot conclude that from the evidence you give here: the sendsize is perfect. -- Paul Bijnens, XplanationTel +32 16 397.511 Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax +32 16 397.512 http://www.xplanation.com/ email: [EMAIL PROTECTED] *** * I think I've got the hang of it now: exit, ^D, ^C, ^\, ^Z, ^Q, F6, * * quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, * * stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt, abort, hangup, * * PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e, kill -1 $$, shutdown, * * kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, Stop-A, ...* * ... Are you sure? ... YES ... Phew ... I'm out * ***
Re: AW: AW: AW: AW: Estimate timeout from server
Sebastian Kösters wrote: It does not work. Which means? Do you get an error message? Please cite it. What do the relevant log files say? You showed us yesterday that the estimate phase took almost an hour on this machine, have you waited for that time? On the client amanda starts the following command: /bin/tar --create --file /dev/null --directory /pst --one-file-system --listed-incremental /var/lib/amanda/gnutar-lists/pst_pst_0.new What happens if you run that command by hand? (And wait until completion, obviously.) Alex -- Alexander Jolk * BUF Compagnie * [EMAIL PROTECTED] Tel +33-1 42 68 18 28 * Fax +33-1 42 68 18 29
Re: Estimate Timeout Issue - Dump runs fine
OK thanks - I have increased the etimeout to 2400 seconds and also changed the udp timeout within checkpoint to also be 2400 seconds so i'll see how the run goes tonight everything was fine today - no estimate timeout thanks for the pointer
AW: AW: AW: AW: AW: Estimate timeout from server
When run it by hand i get this error (translated from german to english) /bin/tar: No empty Archive created. Everything is like on the other Mashines. -Ursprüngliche Nachricht- Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Im Auftrag von Alexander Jolk Gesendet: Donnerstag, 3. November 2005 09:58 An: Sebastian Kösters Cc: amanda-users@amanda.org Betreff: Re: AW: AW: AW: AW: Estimate timeout from server Sebastian Kösters wrote: It does not work. Which means? Do you get an error message? Please cite it. What do the relevant log files say? You showed us yesterday that the estimate phase took almost an hour on this machine, have you waited for that time? On the client amanda starts the following command: /bin/tar --create --file /dev/null --directory /pst --one-file-system --listed-incremental /var/lib/amanda/gnutar-lists/pst_pst_0.new What happens if you run that command by hand? (And wait until completion, obviously.) Alex -- Alexander Jolk * BUF Compagnie * [EMAIL PROTECTED] Tel +33-1 42 68 18 28 * Fax +33-1 42 68 18 29
Re: AW: AW: AW: AW: AW: Estimate timeout from server
Sebastian Kösters wrote: On the client amanda starts the following command: /bin/tar --create --file /dev/null --directory /pst --one-file-system --listed-incremental /var/lib/amanda/gnutar-lists/pst_pst_0.new When run it by hand i get this error (translated from german to english) /bin/tar: No empty Archive created. You forgot the last dot `.' on the command line as given in the debug file. Alex -- Alexander Jolk * BUF Compagnie * [EMAIL PROTECTED] Tel +33-1 42 68 18 28 * Fax +33-1 42 68 18 29
AW: AW: AW: AW: AW: AW: Estimate timeout from server
You are right. When i try it by hand nothing happens. It runs and runs and runs and the file stays at 0kb. I found no error messages. I also changed the persmissions of the amanda files / directorys to 777. -Ursprüngliche Nachricht- Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Im Auftrag von Alexander Jolk Gesendet: Donnerstag, 3. November 2005 14:21 An: Sebastian Kösters Cc: amanda-users@amanda.org Betreff: Re: AW: AW: AW: AW: AW: Estimate timeout from server Sebastian Kösters wrote: On the client amanda starts the following command: /bin/tar --create --file /dev/null --directory /pst --one-file-system --listed-incremental /var/lib/amanda/gnutar-lists/pst_pst_0.new When run it by hand i get this error (translated from german to english) /bin/tar: No empty Archive created. You forgot the last dot `.' on the command line as given in the debug file. Alex -- Alexander Jolk * BUF Compagnie * [EMAIL PROTECTED] Tel +33-1 42 68 18 28 * Fax +33-1 42 68 18 29
AW: AW: AW: AW: AW: AW: Estimate timeout from server
I found this in /tmp/amanda on the client sendsize: debug 1 pid 25485 ruid 33 euid 33: start at Thu Nov 3 14:00:25 2005 sendsize: version 2.4.3 sendsize[25487]: time 0.002: calculating for amname '/pst', dirname '/pst', spindle -1 sendsize[25487]: time 0.002: getting size via gnutar for /pst level 0 sendsize[25487]: time 0.003: spawning /usr/lib/amanda/runtar in pipeline sendsize[25487]: argument list: /bin/tar --create --file /dev/null --directory /pst --one-file-system --listed-incremental /var/lib/amanda/gnutar-lists/pst_p st_0.new --sparse --ignore-failed-read --totals . sendsize[25485]: time 0.023: waiting for any estimate child sendsize[25487]: time 1519.255: . sendsize[25487]: estimate time for /pst level 0: 1519.252 sendsize[25487]: no size line match in /bin/tar output for /pst sendsize[25487]: . sendsize[25487]: estimate size for /pst level 0: -1 KB sendsize[25487]: time 1519.255: waiting for /bin/tar /pst child sendsize[25487]: time 1519.256: after /bin/tar /pst wait sendsize[25485]: time 1519.256: child 25487 terminated with signal 13 sendsize: time 1519.257: pid 25485 finish time Thu Nov 3 14:25:44 2005 whats that: no size line match in /bin/tar output for /pst and that: size for /pst level 0: -1 KB ?? -Ursprüngliche Nachricht- Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Im Auftrag von Sebastian Kösters Gesendet: Donnerstag, 3. November 2005 14:33 An: 'Alexander Jolk' Cc: amanda-users@amanda.org Betreff: AW: AW: AW: AW: AW: AW: Estimate timeout from server You are right. When i try it by hand nothing happens. It runs and runs and runs and the file stays at 0kb. I found no error messages. I also changed the persmissions of the amanda files / directorys to 777. -Ursprüngliche Nachricht- Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Im Auftrag von Alexander Jolk Gesendet: Donnerstag, 3. November 2005 14:21 An: Sebastian Kösters Cc: amanda-users@amanda.org Betreff: Re: AW: AW: AW: AW: AW: Estimate timeout from server Sebastian Kösters wrote: On the client amanda starts the following command: /bin/tar --create --file /dev/null --directory /pst --one-file-system --listed-incremental /var/lib/amanda/gnutar-lists/pst_pst_0.new When run it by hand i get this error (translated from german to english) /bin/tar: No empty Archive created. You forgot the last dot `.' on the command line as given in the debug file. Alex -- Alexander Jolk * BUF Compagnie * [EMAIL PROTECTED] Tel +33-1 42 68 18 28 * Fax +33-1 42 68 18 29
Re: AW: AW: AW: AW: AW: Estimate timeout from server
On Thu, Nov 03, 2005 at 02:38:04PM +0100, Sebastian Kösters enlightened us: I found this in /tmp/amanda on the client sendsize: debug 1 pid 25485 ruid 33 euid 33: start at Thu Nov 3 14:00:25 2005 sendsize: version 2.4.3 sendsize[25487]: time 0.002: calculating for amname '/pst', dirname '/pst', spindle -1 sendsize[25487]: time 0.002: getting size via gnutar for /pst level 0 sendsize[25487]: time 0.003: spawning /usr/lib/amanda/runtar in pipeline sendsize[25487]: argument list: /bin/tar --create --file /dev/null --directory /pst --one-file-system --listed-incremental /var/lib/amanda/gnutar-lists/pst_p st_0.new --sparse --ignore-failed-read --totals . sendsize[25485]: time 0.023: waiting for any estimate child sendsize[25487]: time 1519.255: . sendsize[25487]: estimate time for /pst level 0: 1519.252 sendsize[25487]: no size line match in /bin/tar output for /pst sendsize[25487]: . sendsize[25487]: estimate size for /pst level 0: -1 KB sendsize[25487]: time 1519.255: waiting for /bin/tar /pst child sendsize[25487]: time 1519.256: after /bin/tar /pst wait sendsize[25485]: time 1519.256: child 25487 terminated with signal 13 sendsize: time 1519.257: pid 25485 finish time Thu Nov 3 14:25:44 2005 whats that: no size line match in /bin/tar output for /pst and that: size for /pst level 0: -1 KB ?? What version of tar is this? -- Matt Hyclak Department of Mathematics Department of Social Work Ohio University (740) 593-1263
AW: AW: AW: AW: AW: AW: AW: Estimate timeout from server
I tried it by hand with /opt and it worked. -rw-r--r--1 root root 256 3. Nov 14:52 pst_opt_0.new tar (GNU tar) 1.13.25 i dont know why it has a Problem with /pst?! Ok its big (80GB) but not to big i think. -Ursprüngliche Nachricht- Von: Paul Bijnens [mailto:[EMAIL PROTECTED] Gesendet: Donnerstag, 3. November 2005 14:46 An: Sebastian Kösters Betreff: Re: AW: AW: AW: AW: AW: AW: Estimate timeout from server Sebastian Kösters wrote: I found this in /tmp/amanda on the client ... sendsize[25487]: argument list: /bin/tar --create --file /dev/null --directory /pst --one-file-system --listed-incremental /var/lib/amanda/gnutar-lists/pst_p st_0.new --sparse --ignore-failed-read --totals . sendsize[25485]: time 0.023: waiting for any estimate child sendsize[25487]: time 1519.255: . sendsize[25487]: estimate time for /pst level 0: 1519.252 sendsize[25487]: no size line match in /bin/tar output for /pst sendsize[25487]: . sendsize[25487]: estimate size for /pst level 0: -1 KB sendsize[25487]: time 1519.255: waiting for /bin/tar /pst child sendsize[25487]: time 1519.256: after /bin/tar /pst wait sendsize[25485]: time 1519.256: child 25487 terminated with signal 13 sendsize: time 1519.257: pid 25485 finish time Thu Nov 3 14:25:44 2005 whats that: no size line match in /bin/tar output for /pst Amanda looks in the tar output for a line like: Total bytes written: 33955840 (32MB, 938MB/s) But it does not find one. and that: size for /pst level 0: -1 KB ?? The -1 is means it failed. Does the tar command works for another directory instead of /pst ? e.g. /var/log -- Paul Bijnens, XplanationTel +32 16 397.511 Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax +32 16 397.512 http://www.xplanation.com/ email: [EMAIL PROTECTED] *** * I think I've got the hang of it now: exit, ^D, ^C, ^\, ^Z, ^Q, ^^, * * F6, quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, * * stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt, abort, hangup, * * PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e, kill -1 $$, shutdown, * * init 0, kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, Stop-A, ... * * ... Are you sure? ... YES ... Phew ... I'm out * ***
Re: AW: AW: AW: AW: AW: AW: AW: Estimate timeout from server
Sebastian Kösters wrote: I tried it by hand with /opt and it worked. -rw-r--r--1 root root 256 3. Nov 14:52 pst_opt_0.new tar (GNU tar) 1.13.25 i dont know why it has a Problem with /pst?! Ok its big (80GB) but not to big i think. You have failed to understand that we don't care about that listed-incremental file something.new; what we are looking for is what tar gives on its stdout at the end of its run. Could you report the exact output of the tar command, both when running on /pst and on /opt? We know from your before debug files that it takes almost half an hour on /pst. Alex -- Alexander Jolk * BUF Compagnie * [EMAIL PROTECTED] Tel +33-1 42 68 18 28 * Fax +33-1 42 68 18 29
Re: AW: AW: AW: AW: AW: AW: Estimate timeout from server
Sebastian Kösters wrote: I found this in /tmp/amanda on the client [...] sendsize[25487]: time 1519.255: . Did you cut something here? That would have been what was interesting. sendsize[25487]: estimate time for /pst level 0: 1519.252 Alex -- Alexander Jolk * BUF Compagnie * [EMAIL PROTECTED] Tel +33-1 42 68 18 28 * Fax +33-1 42 68 18 29
AW: AW: AW: AW: AW: AW: AW: Estimate timeout from server
Hi! Thats all in sendsize: sendsize: debug 1 pid 5132 ruid 33 euid 33: start at Fri Nov 4 02:30:02 2005 sendsize: version 2.4.3 sendsize[5132]: time 0.007: waiting for any estimate child sendsize[5134]: time 0.007: calculating for amname '/pst', dirname '/pst', spindle -1 sendsize[5134]: time 0.007: getting size via gnutar for /pst level 0 sendsize[5134]: time 0.034: spawning /usr/lib/amanda/runtar in pipeline sendsize[5134]: argument list: /bin/tar --create --file /dev/null --directory /pst --one-file-system --listed-incremental /var/lib/amanda/gnutar-lists/pst_ps t_0.new --sparse --ignore-failed-read --totals --exclude-from /tmp/amanda/sendsize._pst.20051104023002.exclude . sendsize[5134]: time 2938.236: Total bytes written: 86986362880 (81GB, 28MB/s) sendsize[5134]: time 2938.237: . sendsize[5134]: estimate time for /pst level 0: 2938.203 sendsize[5134]: estimate size for /pst level 0: 84947620 KB sendsize[5134]: time 2938.237: waiting for /bin/tar /pst child sendsize[5134]: time 2938.237: after /bin/tar /pst wait sendsize[5134]: time 2938.238: done with amname '/pst', dirname '/pst', spindle -1 sendsize[5132]: time 2938.238: child 5134 terminated normally sendsize: time 2938.238: pid 5132 finish time Fri Nov 4 03:19:00 2005 Outputs: /opt [EMAIL PROTECTED] amanda]# /bin/tar --create --file /dev/null --directory /opt --one-file-system --listed-incremental /var/lib/amanda/gnutar-lists/pst_opt_0.new --sparse --ignore-failed-read --totals . Gesamtzahl geschriebener Bytes: 798720 (780kB, ?B/s) (English: Total number of written bytes ) /pst [EMAIL PROTECTED] gnutar-lists]# /bin/tar --create --file /dev/null --directory /pst --one-file-system --listed-incremental /var/lib/amanda/gnutar-lists/pst_pst_0.new --sparse --ignore-failed-read --totals . After +1 hour Gesamtzahl geschriebener Bytes: 86991011840 (81GB, 27MB/s) (English: Total number of written bytes ) Seems to work but why not directly with amanda?! -Ursprüngliche Nachricht- Von: Alexander Jolk [mailto:[EMAIL PROTECTED] Gesendet: Donnerstag, 3. November 2005 15:20 An: Sebastian Kösters Cc: amanda-users@amanda.org Betreff: Re: AW: AW: AW: AW: AW: AW: Estimate timeout from server Sebastian Kösters wrote: I found this in /tmp/amanda on the client [...] sendsize[25487]: time 1519.255: . Did you cut something here? That would have been what was interesting. sendsize[25487]: estimate time for /pst level 0: 1519.252 Alex -- Alexander Jolk * BUF Compagnie * [EMAIL PROTECTED] Tel +33-1 42 68 18 28 * Fax +33-1 42 68 18 29
Estimate Timeout Issue - Dump runs fine
Hi Server is 2.4.5 and client is now 2.4.5p1 both on CentOS I use Amanda and have done for years with no issues setting up etc - I can pretty much set up with my eyes closed now!! Amanda rocks... But i'm getting a slightly strange error with a large partition. The partition in question is around 900gig in size although only a few hundred meg are currently used. When the estimate runs it returns FAILURE AND STRANGE DUMP SUMMARY: planner: ERROR Estimate timeout from servername Thing is though the actual dump of this filesystem runs fine - I have increased my eTimeout to 20mins but this still occurs - Any ideas on this one? thanks
AW: Estimate timeout from server
Can no one help? -Ursprüngliche Nachricht- Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Im Auftrag von Sebastian Kösters Gesendet: Mittwoch, 2. November 2005 07:50 An: amanda-users@amanda.org Betreff: Estimate timeout from server Hi, i get this error (Estimate timeout from server) on an new installed System (RedHat 9). I installed amanda like on every other Machine but it will not work (no reboot or something like this during the Backup). Which Log-Files from which Machine do you need to help me finding the error? Thank you very much! Regards Sebastian
Re: AW: Estimate timeout from server
Sebastian Kösters wrote: Which Log-Files from which Machine do you need to help me finding the error? The output of `amcheck your-conf' on the amanda server. If there's one particular client that fails, the files from that client's /tmp/amanda/ corresponding to above amcheck, or double confirmation that there are no files. The version number of amanda, and the relevant platforms. Alex -- Alexander Jolk * BUF Compagnie * [EMAIL PROTECTED] Tel +33-1 42 68 18 28 * Fax +33-1 42 68 18 29
AW: AW: Estimate timeout from server
[29469]: time 0.005: getting size via gnutar for /pst level 0 sendsize[29469]: time 0.006: spawning /usr/lib/amanda/runtar in pipeline sendsize[29469]: argument list: /bin/tar --create --file /dev/null --directory /pst --one-file-system --listed-incremental /var/lib/amanda/gnutar-lists/pst_p st_0.new --sparse --ignore-failed-read --totals . sendsize[29467]: time 0.006: waiting for any estimate child sendsize[29469]: time 3428.548: Total bytes written: 86429921280 (80GB, 24MB/s) sendsize[29469]: time 3428.562: . sendsize[29469]: estimate time for /pst level 0: 3428.556 sendsize[29469]: estimate size for /pst level 0: 84404220 KB sendsize[29469]: time 3428.562: waiting for /bin/tar /pst child sendsize[29469]: time 3428.562: after /bin/tar /pst wait sendsize[29469]: time 3428.562: done with amname '/pst', dirname '/pst', spindle -1 sendsize[29467]: time 3428.563: child 29469 terminated normally sendsize: time 3428.563: pid 29467 finish time Wed Nov 2 08:25:20 2005 Server is Fedora 3 and Client is RH9 Thanks for the Help! -Ursprüngliche Nachricht- Von: Alexander Jolk [mailto:[EMAIL PROTECTED] Gesendet: Mittwoch, 2. November 2005 13:23 An: Sebastian Kösters Cc: amanda-users@amanda.org Betreff: Re: AW: Estimate timeout from server Sebastian Kösters wrote: Which Log-Files from which Machine do you need to help me finding the error? The output of `amcheck your-conf' on the amanda server. If there's one particular client that fails, the files from that client's /tmp/amanda/ corresponding to above amcheck, or double confirmation that there are no files. The version number of amanda, and the relevant platforms. Alex -- Alexander Jolk * BUF Compagnie * [EMAIL PROTECTED] Tel +33-1 42 68 18 28 * Fax +33-1 42 68 18 29
Re: AW: AW: Estimate timeout from server
Sebastian Kösters wrote: It failes only doing amdump config Would you happen to have a relevant error message to show from your amdump report? Amandad on client [...] amandad: time 0.000: got packet: Amanda 2.4 REQ HANDLE 000-E0426409 SEQ 1130912894 SECURITY USER amanda SERVICE sendsize OPTIONS features=feff9ffe0f;maxdumps=1;hostname=pst; GNUTAR /pst 0 1970:1:1:0:0:0 -1 OPTIONS |;bsd-auth;compress-fast;index; [...] amandad: time 3428.573: sending REP packet: Amanda 2.4 REP HANDLE 000-E0426409 SEQ 1130912894 OPTIONS features=feff9f00; /pst 0 SIZE 84404220 amandad: time 3438.570: dgram_recv: timeout after 10 seconds amandad: time 3438.570: waiting for ack: timeout, retrying Sounds like a simple estimate timeout to me, 3500s for one partition is quite long. Bump up your etimeout in amanda.conf to something like 5000s and see whether that works. Or try to find out why estimate on the pst:/pst takes so long, and do something about that. You might for instance split it up in smaller chunks. Or switch to server side estimates which are instant. You should have got a clear message in amdump's report though, saying `estimate timeout', which is english for `estimate timeout'. Alex -- Alexander Jolk * BUF Compagnie * [EMAIL PROTECTED] Tel +33-1 42 68 18 28 * Fax +33-1 42 68 18 29
AW: AW: AW: Estimate timeout from server
My report looks like this: These dumps were to tape DailySet150. The next tape Amanda expects to use is: DailySet150. FAILURE AND STRANGE DUMP SUMMARY: pst/pst lev 0 FAILED [Estimate timeout from pst] STATISTICS: Total Full Daily Estimate Time (hrs:min)0:15 Run Time (hrs:min) 0:15 Dump Time (hrs:min)0:00 0:00 0:00 Output Size (meg) 0.00.00.0 Original Size (meg) 0.00.00.0 Avg Compressed Size (%) -- -- -- Filesystems Dumped0 0 0 Avg Dump Rate (k/s) -- -- -- Tape Time (hrs:min)0:00 0:00 0:00 Tape Size (meg) 0.00.00.0 Tape Used (%) 0.00.00.0 Filesystems Taped 0 0 0 Avg Tp Write Rate (k/s) -- -- -- USAGE BY TAPE: Label Time Size %Nb DailySet150 0:00 0.00.0 0 NOTES: planner: tapecycle (1) = runspercycle (10) planner: Adding new disk pst:/pst. driver: WARNING: got empty schedule from planner taper: tape DailySet150 kb 0 fm 0 [OK] DUMP SUMMARY: DUMPER STATSTAPER STATS HOSTNAME DISKL ORIG-KB OUT-KB COMP% MMM:SS KB/s MMM:SS KB/s -- - pst /pst0 FAILED --- (brought to you by Amanda version 2.4.4p4) Thats all. I tested with a seperate config. -Ursprüngliche Nachricht- Von: Alexander Jolk [mailto:[EMAIL PROTECTED] Gesendet: Mittwoch, 2. November 2005 13:36 An: Sebastian Kösters Cc: amanda-users@amanda.org Betreff: Re: AW: AW: Estimate timeout from server Sebastian Kösters wrote: It failes only doing amdump config Would you happen to have a relevant error message to show from your amdump report? Amandad on client [...] amandad: time 0.000: got packet: Amanda 2.4 REQ HANDLE 000-E0426409 SEQ 1130912894 SECURITY USER amanda SERVICE sendsize OPTIONS features=feff9ffe0f;maxdumps=1;hostname=pst; GNUTAR /pst 0 1970:1:1:0:0:0 -1 OPTIONS |;bsd-auth;compress-fast;index; [...] amandad: time 3428.573: sending REP packet: Amanda 2.4 REP HANDLE 000-E0426409 SEQ 1130912894 OPTIONS features=feff9f00; /pst 0 SIZE 84404220 amandad: time 3438.570: dgram_recv: timeout after 10 seconds amandad: time 3438.570: waiting for ack: timeout, retrying Sounds like a simple estimate timeout to me, 3500s for one partition is quite long. Bump up your etimeout in amanda.conf to something like 5000s and see whether that works. Or try to find out why estimate on the pst:/pst takes so long, and do something about that. You might for instance split it up in smaller chunks. Or switch to server side estimates which are instant. You should have got a clear message in amdump's report though, saying `estimate timeout', which is english for `estimate timeout'. Alex -- Alexander Jolk * BUF Compagnie * [EMAIL PROTECTED] Tel +33-1 42 68 18 28 * Fax +33-1 42 68 18 29
Re: Estimate Timeout Issue - Dump runs fine
On Wed, 2 Nov 2005 at 11:32am, Tom Brown wrote But i'm getting a slightly strange error with a large partition. The partition in question is around 900gig in size although only a few hundred meg are currently used. When the estimate runs it returns FAILURE AND STRANGE DUMP SUMMARY: planner: ERROR Estimate timeout from servername Thing is though the actual dump of this filesystem runs fine - I have increased my eTimeout to 20mins but this still occurs - Any ideas on this one? Look in /tmp/amanda/sendsize*debug and/or amandad*debug to see how long the estimate is actually taking. Also, what do your iptables rules look like on the server? -- Joshua Baker-LePain Department of Biomedical Engineering Duke University
Re: Estimate Timeout Issue - Dump runs fine
Look in /tmp/amanda/sendsize*debug and/or amandad*debug to see how long the estimate is actually taking. Also, what do your iptables rules look like on the server? thanks - iptables are not being used, local firewall is off sendsize degug is below and looks OK # more /tmp/amanda/sendsize.20051102003001.debug sendsize: debug 1 pid 12320 ruid 11 euid 11: start at Wed Nov 2 00:30:01 2005 sendsize: version 2.4.5p1 sendsize[12322]: time 0.002: calculating for amname '/dev/sda2', dirname '/', spindle -1 sendsize[12322]: time 0.002: getting size via dump for /dev/sda2 level 0 sendsize[12322]: time 0.002: calculating for device '/dev/sda2' with 'ext3' sendsize[12322]: time 0.002: running /sbin/dump 0Ssf 1048576 - /dev/sda2 sendsize[12322]: time 0.003: running /opt/amanda-2.4.5p1/libexec/killpgrp sendsize[12320]: time 0.003: waiting for any estimate child: 1 running sendsize[12322]: time 21.884: 1447269376 sendsize[12322]: time 21.885: . sendsize[12322]: estimate time for /dev/sda2 level 0: 21.882 sendsize[12322]: estimate size for /dev/sda2 level 0: 1413349 KB sendsize[12322]: time 21.885: asking killpgrp to terminate sendsize[12322]: time 22.886: getting size via dump for /dev/sda2 level 1 sendsize[12322]: time 22.887: calculating for device '/dev/sda2' with 'ext3' sendsize[12322]: time 22.887: running /sbin/dump 1Ssf 1048576 - /dev/sda2 sendsize[12322]: time 22.888: running /opt/amanda-2.4.5p1/libexec/killpgrp sendsize[12322]: time 195.606: 4647936 sendsize[12322]: time 195.606: . sendsize[12322]: estimate time for /dev/sda2 level 1: 172.718 sendsize[12322]: estimate size for /dev/sda2 level 1: 4539 KB sendsize[12322]: time 195.606: asking killpgrp to terminate sendsize[12322]: time 196.608: done with amname '/dev/sda2', dirname '/', spindle -1 sendsize[12320]: time 196.608: child 12322 terminated normally sendsize[12334]: time 196.609: calculating for amname '/dev/sda1', dirname '/boot', spindle -1 sendsize[12334]: time 196.609: getting size via dump for /dev/sda1 level 0 sendsize[12334]: time 196.609: calculating for device '/dev/sda1' with 'ext3' sendsize[12334]: time 196.609: running /sbin/dump 0Ssf 1048576 - /dev/sda1 sendsize[12320]: time 196.609: waiting for any estimate child: 1 running sendsize[12334]: time 196.610: running /opt/amanda-2.4.5p1/libexec/killpgrp sendsize[12334]: time 197.239: 5737472 sendsize[12334]: time 197.239: . sendsize[12334]: estimate time for /dev/sda1 level 0: 0.630 sendsize[12334]: estimate size for /dev/sda1 level 0: 5603 KB sendsize[12334]: time 197.239: asking killpgrp to terminate sendsize[12334]: time 198.242: getting size via dump for /dev/sda1 level 1 sendsize[12334]: time 198.243: calculating for device '/dev/sda1' with 'ext3' sendsize[12334]: time 198.243: running /sbin/dump 1Ssf 1048576 - /dev/sda1 sendsize[12334]: time 198.243: running /opt/amanda-2.4.5p1/libexec/killpgrp sendsize[12334]: time 198.684: 27648 sendsize[12334]: time 198.684: . sendsize[12334]: estimate time for /dev/sda1 level 1: 0.441 sendsize[12334]: estimate size for /dev/sda1 level 1: 27 KB sendsize[12334]: time 198.684: asking killpgrp to terminate sendsize[12334]: time 199.687: done with amname '/dev/sda1', dirname '/boot', spindle -1 sendsize[12320]: time 199.687: child 12334 terminated normally sendsize[12339]: time 199.687: calculating for amname '/dev/sda5', dirname '/export/disk1', spindle -1 sendsize[12339]: time 199.688: getting size via dump for /dev/sda5 level 0 sendsize[12320]: time 199.688: waiting for any estimate child: 1 running sendsize[12339]: time 199.688: calculating for device '/dev/sda5' with 'ext3' sendsize[12339]: time 199.688: running /sbin/dump 0Ssf 1048576 - /dev/sda5 sendsize[12339]: time 199.689: running /opt/amanda-2.4.5p1/libexec/killpgrp sendsize[12339]: time 545.606: 88973312 sendsize[12339]: time 545.617: . sendsize[12339]: estimate time for /dev/sda5 level 0: 345.928 sendsize[12339]: estimate size for /dev/sda5 level 0: 86888 KB sendsize[12339]: time 545.617: asking killpgrp to terminate sendsize[12339]: time 546.619: getting size via dump for /dev/sda5 level 1 sendsize[12339]: time 546.646: calculating for device '/dev/sda5' with 'ext3' sendsize[12339]: time 546.646: running /sbin/dump 1Ssf 1048576 - /dev/sda5 sendsize[12339]: time 546.647: running /opt/amanda-2.4.5p1/libexec/killpgrp sendsize[12339]: time 2182.684: 25811968 sendsize[12339]: time 2182.696: . sendsize[12339]: estimate time for /dev/sda5 level 1: 1636.054 sendsize[12339]: estimate size for /dev/sda5 level 1: 25207 KB sendsize[12339]: time 2182.701: asking killpgrp to terminate sendsize[12339]: time 2183.703: done with amname '/dev/sda5', dirname '/export/disk1', spindle -1 sendsize[12320]: time 2183.704: child 12339 terminated normally sendsize: time 2183.704: pid 12320 finish time Wed Nov 2 01:06:24 2005 one of my amanda.debugs does have this at the bottom of it amandad: time 2193.716: dgram_recv: timeout after 10 seconds amandad: time
Re: Estimate Timeout Issue - Dump runs fine
On Wed, 2 Nov 2005 at 2:31pm, Tom Brown wrote Look in /tmp/amanda/sendsize*debug and/or amandad*debug to see how long the estimate is actually taking. Also, what do your iptables rules look like on the server? thanks - iptables are not being used, local firewall is off one of my amanda.debugs does have this at the bottom of it amandad: time 2193.716: dgram_recv: timeout after 10 seconds amandad: time 2193.716: waiting for ack: timeout, retrying amandad: time 2203.716: dgram_recv: timeout after 10 seconds amandad: time 2203.716: waiting for ack: timeout, retrying amandad: time 2213.717: dgram_recv: timeout after 10 seconds amandad: time 2213.717: waiting for ack: timeout, retrying amandad: time 2223.717: dgram_recv: timeout after 10 seconds amandad: time 2223.717: waiting for ack: timeout, retrying amandad: time 2233.718: dgram_recv: timeout after 10 seconds amandad: time 2233.718: waiting for ack: timeout, giving up! amandad: time 2233.718: pid 12319 finish time Wed Nov 2 01:07:14 2005 is that time figure a time in seconds ? Yep. So you can just increase etimeout and/or figure out why /sbin/dump 1Ssf 1048576 - /dev/sda5 is taking so long. -- Joshua Baker-LePain Department of Biomedical Engineering Duke University
Re: Estimate Timeout Issue - Dump runs fine
Yep. So you can just increase etimeout and/or figure out why /sbin/dump 1Ssf 1048576 - /dev/sda5 is taking so long. OK thanks - I have increased the etimeout to 2400 seconds and also changed the udp timeout within checkpoint to also be 2400 seconds so i'll see how the run goes tonight thanks
Re: AW: AW: AW: Estimate timeout from server
Sebastian Kösters wrote: My report looks like this: NOTES: planner: tapecycle (1) = runspercycle (10) Please get that sorted out, I assume that this is not what you want, although it has nothing to do with your timeouts. Thats all. I tested with a seperate config. Have you increased etimeout already, as Alexander suggested? Stefan.
AW: AW: AW: AW: Estimate timeout from server
I am testing right now with a timeout = 5000s. The Partition i want to backup has 80GB. -Ursprüngliche Nachricht- Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Im Auftrag von Stefan G. Weichinger Gesendet: Mittwoch, 2. November 2005 22:01 An: amanda-users@amanda.org Betreff: Re: AW: AW: AW: Estimate timeout from server Sebastian Kösters wrote: My report looks like this: NOTES: planner: tapecycle (1) = runspercycle (10) Please get that sorted out, I assume that this is not what you want, although it has nothing to do with your timeouts. Thats all. I tested with a seperate config. Have you increased etimeout already, as Alexander suggested? Stefan.
AW: AW: AW: AW: Estimate timeout from server
It does not work. On the client amanda starts the following command: /bin/tar --create --file /dev/null --directory /pst --one-file-system --listed-incremental /var/lib/amanda/gnutar-lists/pst_pst_0.new But pst_pst_0.new has always 0kb. And amcheck always told me that everything is ok?! -Ursprüngliche Nachricht- Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Im Auftrag von Stefan G. Weichinger Gesendet: Mittwoch, 2. November 2005 22:01 An: amanda-users@amanda.org Betreff: Re: AW: AW: AW: Estimate timeout from server Sebastian Kösters wrote: My report looks like this: NOTES: planner: tapecycle (1) = runspercycle (10) Please get that sorted out, I assume that this is not what you want, although it has nothing to do with your timeouts. Thats all. I tested with a seperate config. Have you increased etimeout already, as Alexander suggested? Stefan.
Estimate timeout from server
Hi, i get this error (Estimate timeout from server) on an new installed System (RedHat 9). I installed amanda like on every other Machine but it will not work (no reboot or something like this during the Backup). Which Log-Files from which Machine do you need to help me finding the error? Thank you very much! Regards Sebastian
estimate timeout
Hi all, I have searched the archives but none of the emails with similar subjects helped me. I have a FC2 amanda 2.4.4 server with 2 linux clients. The server is using vtapes for daily backups. It all ran very nicely for many months until we ran out of disk space in the server. After a few days of bad backups due to full disk, we installed an additional disk, moved some of the virtual tapes to it using symlinks, flushed the old backups etc... and sat back to enjoy amanda at work. However: While one client is being backed up perfectly well, the other keeps getting estimates timeout. On this client, everything seem ok except for showing 2 amandad processes during estimates, one of them defunct -- I attach the 2 amandad debug reports. On the server I have set an etimeout of 300 which should be enough, but even bumping this to 7200 did not help. I have no firewall on client and server tar version is tar (GNU tar) 1.13.25 o the client This is really frustrating since this setup used to work ! Thanks in advance Shai amandad: debug 1 pid 25071 ruid 33 euid 33: start at Mon Oct 10 08:38:58 2005 amandad: version 2.4.4p2 amandad: build: VERSION=Amanda-2.4.4p2 amandad:BUILT_DATE=Mon Mar 22 12:27:54 EST 2004 amandad:BUILT_MACH=Linux bugs.devel.redhat.com 2.4.21-9.ELsmp #1 SMP Thu Jan 8 17:08:56 EST 2004 i686 i686 i386 GNU/Linux amandad:CC=i386-redhat-linux-gcc amandad:CONFIGURE_COMMAND='./configure' '--host=i386-redhat-linux' '--build=i386-redhat-linux' '--target=i386-redhat-linux-gnu' '--program-prefix=' '--prefix=/usr' '--exec-prefix=/usr' '--bindir=/usr/bin' '--sbindir=/usr/sbin' '--sysconfdir=/etc' '--datadir=/usr/share' '--includedir=/usr/include' '--libdir=/usr/lib' '--libexecdir=/usr/lib/amanda' '--localstatedir=/var/lib' '--sharedstatedir=/usr/com' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--enable-shared' '--with-index-server=localhost' '--with-gnutar-listdir=/var/lib/amanda/gnutar-lists' '--with-smbclient=/usr/bin/smbclient' '--with-amandahosts' '--with-user=amanda' '--with-group=disk' '--with-tmpdir=/var/log/amanda' '--with-gnutar=/bin/tar' amandad: paths: bindir=/usr/bin sbindir=/usr/sbin amandad:libexecdir=/usr/lib/amanda mandir=/usr/share/man amandad:AMANDA_TMPDIR=/var/log/amanda amandad:AMANDA_DBGDIR=/var/log/amanda CONFIG_DIR=/etc/amanda amandad:DEV_PREFIX=/dev/ RDEV_PREFIX=/dev/r amandad:DUMP=/sbin/dump RESTORE=/sbin/restore VDUMP=UNDEF amandad:VRESTORE=UNDEF XFSDUMP=UNDEF XFSRESTORE=UNDEF VXDUMP=UNDEF amandad:VXRESTORE=UNDEF SAMBA_CLIENT=/usr/bin/smbclient amandad:GNUTAR=/bin/tar COMPRESS_PATH=/usr/bin/gzip amandad:UNCOMPRESS_PATH=/usr/bin/gzip LPRCMD=/usr/bin/lpr amandad:MAILER=/usr/bin/Mail amandad:listed_incr_dir=/var/lib/amanda/gnutar-lists amandad: defs: DEFAULT_SERVER=localhost DEFAULT_CONFIG=DailySet1 amandad:DEFAULT_TAPE_SERVER=localhost amandad:DEFAULT_TAPE_DEVICE=/dev/null HAVE_MMAP HAVE_SYSVSHM amandad:LOCKING=POSIX_FCNTL SETPGRP_VOID DEBUG_CODE amandad:AMANDA_DEBUG_DAYS=4 BSD_SECURITY USE_AMANDAHOSTS amandad:CLIENT_LOGIN=amanda FORCE_USERID HAVE_GZIP amandad:COMPRESS_SUFFIX=.gz COMPRESS_FAST_OPT=--fast amandad:COMPRESS_BEST_OPT=--best UNCOMPRESS_OPT=-dc amandad: time 0.000: got packet: Amanda 2.4 REQ HANDLE 000-00443709 SEQ 1128926338 SECURITY USER amanda SERVICE noop OPTIONS features=feff9ffe0f; amandad: time 0.000: sending ack: Amanda 2.4 ACK HANDLE 000-00443709 SEQ 1128926338 amandad: time 0.000: bsd security: remote host betacentauri.bioc user amanda local user amanda amandad: time 0.015: amandahosts security check passed amandad: time 0.015: running service noop amandad: time 0.015: sending REP packet: Amanda 2.4 REP HANDLE 000-00443709 SEQ 1128926338 OPTIONS features=feff9ffe0f; amandad: time 0.015: got packet: Amanda 2.4 ACK HANDLE 000-00443709 SEQ 1128926338 amandad: time 0.016: pid 25071 finish time Mon Oct 10 08:38:58 2005 amandad: debug 1 pid 25072 ruid 33 euid 33: start at Mon Oct 10 08:38:58 2005 amandad: version 2.4.4p2 amandad: build: VERSION=Amanda-2.4.4p2 amandad:BUILT_DATE=Mon Mar 22 12:27:54 EST 2004 amandad:BUILT_MACH=Linux bugs.devel.redhat.com 2.4.21-9.ELsmp #1 SMP Thu Jan 8 17:08:56 EST 2004 i686 i686 i386 GNU/Linux amandad:CC=i386-redhat-linux-gcc amandad:CONFIGURE_COMMAND='./configure' '--host=i386-redhat-linux' '--build=i386-redhat-linux' '--target=i386-redhat-linux-gnu' '--program-prefix=' '--prefix=/usr' '--exec-prefix=/usr' '--bindir=/usr/bin' '--sbindir=/usr/sbin' '--sysconfdir=/etc' '--datadir=/usr/share' '--includedir=/usr/include' '--libdir=/usr/lib' '--libexecdir=/usr/lib/amanda' '--localstatedir=/var/lib' '--sharedstatedir=/usr/com' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--enable-shared'
Re: estimate timeout
On Mon, 10 Oct 2005 at 9:20am, Shai Ayal wrote I have a FC2 amanda 2.4.4 server with 2 linux clients. The server is using vtapes for daily backups. It all ran very nicely for many months until we ran out of disk space in the server. After a few days of bad backups due to full disk, we installed an additional disk, moved some of the virtual tapes to it using symlinks, flushed the old backups etc... and sat back to enjoy amanda at work. However: While one client is being backed up perfectly well, the other keeps getting estimates timeout. On this client, everything seem ok except for showing 2 amandad processes during estimates, one of them defunct -- I attach the 2 amandad debug reports. On the server I have set an etimeout of 300 which should be enough, but even bumping this to 7200 did not help. I have no firewall on client and server Are you sure about that? /etc/sysconfig/iptables is empty and/or 'chkconfig --list iptables' says off for all runlevels? That's a very non-standard setup. I've seen behavior like this: amandad: time 0.025: running service /usr/lib/amanda/sendsize amandad: time 349.398: sending REP packet: *snip* amandad: time 359.415: dgram_recv: timeout after 10 seconds amandad: time 359.415: waiting for ack: timeout, retrying amandad: time 369.413: dgram_recv: timeout after 10 seconds amandad: time 369.413: waiting for ack: timeout, retrying amandad: time 379.412: dgram_recv: timeout after 10 seconds amandad: time 379.412: waiting for ack: timeout, retrying amandad: time 389.410: dgram_recv: timeout after 10 seconds amandad: time 389.410: waiting for ack: timeout, retrying amandad: time 399.409: dgram_recv: timeout after 10 seconds amandad: time 399.409: waiting for ack: timeout, giving up! on my systems where iptables allows established connections, but 300 seconds timed-out what was considered established. -- Joshua Baker-LePain Department of Biomedical Engineering Duke University
Re: estimate timeout
On Monday 10 October 2005 03:20, Shai Ayal wrote: Hi all, I have searched the archives but none of the emails with similar subjects helped me. I have a FC2 amanda 2.4.4 server with 2 linux clients. The server is using vtapes for daily backups. It all ran very nicely for many months until we ran out of disk space in the server. After a few days of bad backups due to full disk, we installed an additional disk, moved some of the virtual tapes to it using symlinks, flushed the old backups etc... and sat back to enjoy amanda at work. However: While one client is being backed up perfectly well, the other keeps getting estimates timeout. On this client, everything seem ok except for showing 2 amandad processes during estimates, one of them defunct -- I attach the 2 amandad debug reports. It is possible that the defunct amandad has open locks on files, thereby blocking the estimate. 2 things might help, first I'd reboot the machine the failure is on to remove them, and then I think I'd install a newer amanda, 2.4.4 is getting a bit long in the tooth these days. I can't recall the exact version I was running when that happened on my firewall box, mainly because I wasn't doing virtual tapes yet and was having so many other tape related issues back then that a stuck amandad just wasn't an event to record at length in my wetram. If you still jave the same scripts you used to build the 2.4.4 on each box, then 2.4.5-20051006 should install and run exactly the same. However, I just checked the /home/amanda directory on my single linux client, and its equally elderly, at 2.4.4-20030529, and its working fine other than a 10 second delay in checking clients when amcheck is run, about 80% of the time. But this is as good a time to bring it uptodate as any, so its building on that box now. Using the same script I built the older version with. Oops, forgot to run ldconfig after the install, done now. Humm, I note that, and this has been random in the past, true about 80% of the time, but there is no longer a 10 second delay in checking the clients now, more like .35 seconds. At least for the several iterations of it I've done. Maybe thats fixed now? On the server I have set an etimeout of 300 which should be enough, but even bumping this to 7200 did not help. I have no firewall on client and server I do, but it not between the client and server, its betwen client and the rest of the planet. That box is the gateway. tar version is tar (GNU tar) 1.13.25 o the client Thats a good one, although I'm running 1.15-1 on the server. But the client box is rh7.3, and the glib version won't let me build, or install 1.15.1. This is really frustrating since this setup used to work ! Thanks in advance Shai -- Cheers, Gene There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order. -Ed Howdershelt (Author) 99.35% setiathome rank, not too shabby for a WV hillbilly Yahoo.com and AOL/TW attorneys please note, additions to the above message by Gene Heskett are: Copyright 2005 by Maurice Eugene Heskett, all rights reserved.
Re: Estimate timeout
In amandad.debug on the client I get: ... amandad: time 2951.811: sending REP packet: Amanda 2.4 REP HANDLE 00F-80930508 SEQ 1126652514 OPTIONS features=feff9ffe7f; ar0s1a 0 SIZE 30567356 ar0s1a 1 SIZE 17959269 amandad: time 2961.819: dgram_recv: timeout after 10 seconds amandad: time 2961.819: waiting for ack: timeout, retrying ... In sendsize.debug: ... sendsize[20121]: estimate time for ar0s1a level 0: 320.045 ... sendsize[20121]: estimate time for ar0s1a level 1: 2629.405 sendsize[20113]: time 2951.660: child 20121 terminated normally It took ~2950 seconds to do the two estimates, based on the various log messages. When amandad tried to send back the response/reply (REP) packet, it never got an acknowledgement (ack) that amdump/planner had received it. The default etimeout is 300 seconds. Amanda multiplies that by the number of estimates it asks the client to do, so, at best, planner on the server side gave up after 600 seconds, which is why there wasn't anyone around to receive the reply and answer it. If you look at the amdump.NN file that matches the above you'll probably see planner getting done well before 2950 seconds. I'm not sure why this disk is now taking so much longer to do estimates, but the simplest solution is to just crank up etimeout in your amanda.conf (or disklist) to compensate. At least backups will start working again, and then you can look into possible hardware or file system performance problems. Tommy Eriksen - Chief Technical Officer JJ
Estimate timeout
Hey, I have a rather strange problem. I had to restore a complete server from backup recently, no problem, everything went smoothly. However, after this, I can't seem to get a fresh backup of it. I've tried everything from reinstalling amanda to changing the machine's hostname (both the machines real hostname and the one in amanda), but still I get this in my daily report: dc104 ar0s1a lev 0 FAILED [Estimate timeout from dc104] I've got 115 entries in my disklist (on some 60 hosts) and this is the only one I can't get to work. There doesn't seem to be any network problems between the amanda server and the client either. This does look like a networking problem, but the machines can communicate freely and without any problems. In amandad.debug on the client I get: -bash-2.05b# cat amandad.20050914010102000.debug amandad: debug 1 pid 20112 ruid 2 euid 2: start at Wed Sep 14 01:01:02 2005 amandad: version 2.4.5 amandad: build: VERSION=Amanda-2.4.5 amandad:BUILT_DATE=Thu Aug 25 17:46:51 CEST 2005 amandad:BUILT_MACH=FreeBSD tlnordic.moduleweb.net 4.9-STABLE FreeBSD 4.9-STABLE #0: Mon Jan 5 23:35:10 CET 2004 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/GENERIC i386 amandad:CC=cc amandad:CONFIGURE_COMMAND='./configure' '--libexecdir=/usr/local/libexec/amanda' '--with-amandahosts' '--with-fqdn' '--with-dump-honor-nodump' '--with-buffered-dump' '--without-server' '--disable-libtool' '--prefix=/usr/local' '--with-user=operator' '--with-group=operator' '--with-gnutar-listdir=/usr/local/var/amanda/gnutar-lists' '--with-index-server=eclipse.rackhosting.com' '--with-tape-server=eclipse.rackhosting.com' '--with-config=ModuleWeb' '--prefix=/usr/local' '--build=i386-portbld-freebsd4.9' amandad: paths: bindir=/usr/local/bin sbindir=/usr/local/sbin amandad:libexecdir=/usr/local/libexec/amanda amandad:mandir=/usr/local/man AMANDA_TMPDIR=/tmp/amanda amandad:AMANDA_DBGDIR=/tmp/amanda amandad:CONFIG_DIR=/usr/local/etc/amanda DEV_PREFIX=/dev/ amandad:RDEV_PREFIX=/dev/r DUMP=/sbin/dump amandad:RESTORE=/sbin/restore VDUMP=UNDEF VRESTORE=UNDEF amandad:XFSDUMP=UNDEF XFSRESTORE=UNDEF VXDUMP=UNDEF VXRESTORE=UNDEF amandad:SAMBA_CLIENT=UNDEF GNUTAR=/usr/bin/tar amandad:COMPRESS_PATH=/usr/bin/gzip amandad:UNCOMPRESS_PATH=/usr/bin/gzip LPRCMD=/usr/bin/lpr amandad:MAILER=/usr/bin/Mail amandad:listed_incr_dir=/usr/local/var/amanda/gnutar-lists amandad: defs: DEFAULT_SERVER=eclipse.rackhosting.com amandad:DEFAULT_CONFIG=ModuleWeb amandad:DEFAULT_TAPE_SERVER=eclipse.rackhosting.com amandad:DEFAULT_TAPE_DEVICE=/dev/null HAVE_MMAP HAVE_SYSVSHM amandad:LOCKING=POSIX_FCNTL DEBUG_CODE AMANDA_DEBUG_DAYS=4 amandad:BSD_SECURITY USE_AMANDAHOSTS CLIENT_LOGIN=operator amandad:FORCE_USERID HAVE_GZIP COMPRESS_SUFFIX=.gz amandad:COMPRESS_FAST_OPT=--fast COMPRESS_BEST_OPT=--best amandad:UNCOMPRESS_OPT=-dc amandad: time 0.000: got packet: Amanda 2.4 REQ HANDLE 00F-80930508 SEQ 1126652514 SECURITY USER operator SERVICE sendsize OPTIONS features=feff9ffe0f;maxdumps=1;hostname=dc104; DUMP ar0s1a 0 1970:1:1:0:0:0 -1 OPTIONS |;auth=bsd;compress-fast;index; DUMP ar0s1a 1 2005:8:10:12:30:11 -1 OPTIONS |;auth=bsd;compress-fast;index; amandad: time 0.000: sending ack: Amanda 2.4 ACK HANDLE 00F-80930508 SEQ 1126652514 amandad: time 0.001: bsd security: remote host eclipse.rackhosting.com user operator local user operator amandad: time 0.001: amandahosts security check passed amandad: time 0.001: running service /usr/local/libexec/amanda/sendsize amandad: time 599.378: got packet: Amanda 2.4 REQ HANDLE 00F-80930508 SEQ 1126652514 SECURITY USER operator SERVICE sendsize OPTIONS features=feff9ffe0f;maxdumps=1;hostname=dc104; DUMP ar0s1a 0 1970:1:1:0:0:0 -1 OPTIONS |;auth=bsd;compress-fast;index; DUMP ar0s1a 1 2005:8:10:12:30:11 -1 OPTIONS |;auth=bsd;compress-fast;index; amandad: time 599.378: received dup P_REQ packet, ACKing it amandad: time 599.378: sending ack: Amanda 2.4 ACK HANDLE 00F-80930508 SEQ 1126652514 amandad: time 1199.936: got packet: Amanda 2.4 REQ HANDLE 00F-80930508 SEQ 1126652514 SECURITY USER operator SERVICE sendsize OPTIONS features=feff9ffe0f;maxdumps=1;hostname=dc104; DUMP ar0s1a 0 1970:1:1:0:0:0 -1 OPTIONS |;auth=bsd;compress-fast;index; DUMP ar0s1a 1 2005:8:10:12:30:11 -1 OPTIONS |;auth=bsd;compress-fast;index; amandad: time 1199.936: received dup P_REQ packet, ACKing it amandad: time 1199.936: sending ack: Amanda 2.4 ACK HANDLE 00F-80930508 SEQ 1126652514 amandad: time 2951.811: sending REP packet: Amanda 2.4 REP HANDLE 00F-80930508 SEQ 1126652514 OPTIONS features=feff9ffe7f; ar0s1a 0 SIZE 30567356 ar0s1a 1 SIZE 17959269 amandad: time 2961.819: dgram_recv: timeout after 10 seconds amandad: time 2961.819: waiting for ack: timeout
RE: Estimate timeout
Well, the tar command by itself is still running, but the backup with the new version of tar is complete, so my estimate timeout problem is fixed with an updated tar executable. Thank you all. -Original Message- From: Joshua Baker-LePain [mailto:[EMAIL PROTECTED] Sent: Tuesday, August 30, 2005 11:21 AM To: LaValley, Brian E Cc: Amanda (E-mail) Subject: Re: Estimate timeout On Tue, 30 Aug 2005 at 11:01am, LaValley, Brian E wrote sendsize: debug 1 pid 12359 ruid 548 euid 548: start at Mon Aug 29 18:00:02 2005 sendsize: version 2.4.4p2 sendsize[12359]: time 0.034: waiting for any estimate child: 1 running sendsize[12361]: time 0.035: calculating for amname '/dev/vx/dsk/homedg/homevol', dirname '/home', spindle -1 sendsize[12361]: time 0.035: getting size via gnutar for /dev/vx/dsk/homedg/homevol level 0 sendsize[12361]: time 0.092: spawning /home/backup/amanda_sun/libexec/runtar in pipeline sendsize[12361]: argument list: /opt/sfw/bin/gtar --create --file /dev/null --directory /home --one-file-system --listed-incremental /home/backup/amanda_sun/var/amanda/gnutar-lists/coneng_dev_vx_dsk_homedg_hom evol_0.new --sparse --ignore-failed-read --totals --exclude-from /tmp/amanda/sendsize._dev_vx_dsk_homedg_homevol.20050829180002.exclude . Run this command yourself on the command line (as root) and see how long it take to complete. Also, what version of tar are you running? -- Joshua Baker-LePain Department of Biomedical Engineering Duke University
Estimate timeout
setup_estimate: napier:/common: command 0, options: last_level -1 next_level0 -13025 level_days 0 getting estimates 0 (0) -1 (-1) -1 (-1) planner: time 0.209: setting up estimates took 0.153 secs GETTING ESTIMATES... driver: started dumper0 pid 7556 driver: started dumper1 pid 7557 driver: started dumper2 pid 7558 driver: started dumper3 pid 7559 dumper: dgram_bind: socket bound to 0.0.0.0.742 dumper: pid 7557 executable dumper1 version 2.4.4p2, using port 742 dumper: dgram_bind: socket bound to 0.0.0.0.741 dumper: pid 7556 executable dumper0 version 2.4.4p2, using port 741 dumper: dgram_bind: socket bound to 0.0.0.0.743 dumper: pid 7558 executable dumper2 version 2.4.4p2, using port 743 dumper: dgram_bind: socket bound to 0.0.0.0.744 dumper: pid 7559 executable dumper3 version 2.4.4p2, using port 744 changer: got exit: 0 str: 6 15 1 1 changer: opening pipe to: /opt/amanda/libexec/chg-scsi -slot current changer: got exit: 0 str: 6 /dev/nst0 taper: slot 6: date Xlabel DailySet1-A06 (new tape) taper: read label `DailySet1-A06' date `X' taper: wrote label `DailySet1-A06' date `20050829' planner: time 10735.451: got result for host napier disk /common: 0 - 294080K, -1 - -1K, -1 - -1K planner: time 10735.481: got result for host napier disk /home: 0 - 81027540K, 1 - 3635110K, -1 - -1K planner: time 10735.481: got result for host napier disk /: 0 - 3193020K, 3 - 281800K, -1 - -1K planner: time 29598.524: error result for host coneng disk /dev/vx/dsk/homedg/homevol: Estimate timeout from coneng planner: time 29598.552: getting estimates took 29598.342 secs FAILED QUEUE: 0: coneng /dev/vx/dsk/homedg/homevol DONE QUEUE: 0: napier /common 1: napier /home 2: napier / GENERATING SCHEDULE: ENDFLUSH DUMP napier feff9ffe0f /common 20050829 13027 0 1970:1:1:0:0:0 147040 4901 DUMP napier feff9ffe0f / 20050829 13 0 1970:1:1:0:0:0 952854 11210 3 2005:8:16:5:23:44 94810 162 DUMP napier feff9ffe0f /home 20050829 2 1 2005:8:26:1:10:29 1817555 60585 driver: adding holding disk 0 dir /backup/amanda/rdumps/dump3 size 83886080 driver: adding holding disk 1 dir /backup/amanda/rdumps/dump2 size 83886080 driver: adding holding disk 2 dir /backup/amanda/ldump size 14698880 reserving 182471040 out of 182471040 for degraded-mode dumps driver: flush size 0 driver: start time 29598.652 inparallel 4 bandwidth 2200 diskspace 182471040 dir OBSOLETE datestamp 20050829 driver: drain-ends tapeq FIRST big-dumpers ttt driver: result time 29598.652 from taper: TAPER-OK driver: send-cmd time 29598.668 to dumper0: FILE-DUMP 00-1 /backup/amanda/rdumps/dump3/20050829/napier._common.0 napier feff9ffe0f /common NODEVICE 0 1970:1:1:0:0:0 256000 GNUTAR 147104 |;bsd-auth;srvcomp-best;index;exclude-list=/opt/amanda/etc/exclude.gtar; driver: state time 29598.669 free kps: 2170 space: 182323936 taper: idle idle-dumpers: 3 qlen tapeq: 0 runq: 2 roomq: 0 wakeup: 15 driver-idle: start-wait driver: interface-state time 29598.669 if : free 570 if ETH0: free 600 if LOCAL: free 1000 driver: hdisk-state time 29598.669 hdisk 0: free 83738976 dumpers 1 hdisk 1: free 83886080 dumpers 0 hdisk 2: free 14698880 dumpers 0 dumper: stream_client: connected to 198.151.154.11.33238 dumper: stream_client: our side is 0.0.0.0.33241 dumper: stream_client: connected to 198.151.154.11.33239 dumper: stream_client: our side is 0.0.0.0.33242 dumper: stream_client: connected to 198.151.154.11.33240 dumper: stream_client: our side is 0.0.0.0.33243 driver: state time 29613.666 free kps: 2170 space: 182323936 taper: idle idle-dumpers: 3 qlen tapeq: 0 runq: 2 roomq: 0 wakeup: 86400 driver-idle: client-constrained driver: interface-state time 29613.666 if : free 570 if ETH0: free 600 if LOCAL: free 1000 driver: hdisk-state time 29613.666 hdisk 0: free 83738976 dumpers 1 hdisk 1: free 83886080 dumpers 0 hdisk 2: free 14698880 dumpers 0 driver: result time 29782.170 from dumper0: DONE 00-1 294080 80960 183 [sec 182.929 kb 80960 kps 442.6 orig-kb 294080] driver: finished-cmd time 30379.942 dumper0 dumped napier:/common driver: send-cmd time 30379.942 to taper: FILE-WRITE 00-2 /backup/amanda/rdumps/dump3/20050829/napier._common.0 napier feff9ffe0f /common 0 20050829 driver: startaflush: FIRST napier /common 80993 52433920 driver: send-cmd time 30379.942 to dumper0: FILE-DUMP 01-3 /backup/amanda/rdumps/dump2/20050829/napier._.0 napier feff9ffe0f / NODEVICE 0 1970:1:1:0:0:0 256000 GNUTAR 953024 |;bsd-auth;srvcomp-best;index;exclude-list=/opt/amanda/etc/exclude.gtar; driver: state time 30379.942 free kps: 2115 space: 181437023 taper: writing idle-dumpers: 3 qlen tapeq: 0 runq: 1 roomq: 0 wakeup: 15 driver-idle: start-wait driver: interface-state time 30379.942 if : free 515 if ETH0: free 600 if LOCAL: free 1000 driver: hdisk-state time 30379.942 hdisk 0: free 83805087 dumpers 0 hdisk 1: free 82933056 dumpers 1 hdisk 2: free 14698880 dumpers 0 dumper: stream_client: connected to 198.151.154.11.33248 dumper
Re: Estimate timeout
On Tue, 30 Aug 2005 at 11:01am, LaValley, Brian E wrote sendsize: debug 1 pid 12359 ruid 548 euid 548: start at Mon Aug 29 18:00:02 2005 sendsize: version 2.4.4p2 sendsize[12359]: time 0.034: waiting for any estimate child: 1 running sendsize[12361]: time 0.035: calculating for amname '/dev/vx/dsk/homedg/homevol', dirname '/home', spindle -1 sendsize[12361]: time 0.035: getting size via gnutar for /dev/vx/dsk/homedg/homevol level 0 sendsize[12361]: time 0.092: spawning /home/backup/amanda_sun/libexec/runtar in pipeline sendsize[12361]: argument list: /opt/sfw/bin/gtar --create --file /dev/null --directory /home --one-file-system --listed-incremental /home/backup/amanda_sun/var/amanda/gnutar-lists/coneng_dev_vx_dsk_homedg_hom evol_0.new --sparse --ignore-failed-read --totals --exclude-from /tmp/amanda/sendsize._dev_vx_dsk_homedg_homevol.20050829180002.exclude . Run this command yourself on the command line (as root) and see how long it take to complete. Also, what version of tar are you running? -- Joshua Baker-LePain Department of Biomedical Engineering Duke University
RE: Estimate timeout
I'll have to get back to you on running the command by itself. My tar version is: tar (GNU tar) 1.13 -Original Message- From: Joshua Baker-LePain [mailto:[EMAIL PROTECTED] Sent: Tuesday, August 30, 2005 11:21 AM To: LaValley, Brian E Cc: Amanda (E-mail) Subject: Re: Estimate timeout On Tue, 30 Aug 2005 at 11:01am, LaValley, Brian E wrote sendsize: debug 1 pid 12359 ruid 548 euid 548: start at Mon Aug 29 18:00:02 2005 sendsize: version 2.4.4p2 sendsize[12359]: time 0.034: waiting for any estimate child: 1 running sendsize[12361]: time 0.035: calculating for amname '/dev/vx/dsk/homedg/homevol', dirname '/home', spindle -1 sendsize[12361]: time 0.035: getting size via gnutar for /dev/vx/dsk/homedg/homevol level 0 sendsize[12361]: time 0.092: spawning /home/backup/amanda_sun/libexec/runtar in pipeline sendsize[12361]: argument list: /opt/sfw/bin/gtar --create --file /dev/null --directory /home --one-file-system --listed-incremental /home/backup/amanda_sun/var/amanda/gnutar-lists/coneng_dev_vx_dsk_homedg_hom evol_0.new --sparse --ignore-failed-read --totals --exclude-from /tmp/amanda/sendsize._dev_vx_dsk_homedg_homevol.20050829180002.exclude . Run this command yourself on the command line (as root) and see how long it take to complete. Also, what version of tar are you running? -- Joshua Baker-LePain Department of Biomedical Engineering Duke University
RE: Estimate timeout
On Tue, 30 Aug 2005 at 11:38am, LaValley, Brian E wrote I'll have to get back to you on running the command by itself. My tar version is: tar (GNU tar) 1.13 Bad, bad, bad. http://www.amanda.org/docs/faq.html#id2554919 -- Joshua Baker-LePain Department of Biomedical Engineering Duke University
RE: Estimate timeout
Ok, I'll try a new version of tar after my test of the tar command on its own. -Original Message- From: Joshua Baker-LePain [mailto:[EMAIL PROTECTED] Sent: Tuesday, August 30, 2005 11:33 AM To: LaValley, Brian E Cc: Amanda (E-mail) Subject: RE: Estimate timeout On Tue, 30 Aug 2005 at 11:38am, LaValley, Brian E wrote I'll have to get back to you on running the command by itself. My tar version is: tar (GNU tar) 1.13 Bad, bad, bad. http://www.amanda.org/docs/faq.html#id2554919 -- Joshua Baker-LePain Department of Biomedical Engineering Duke University
Re: Estimate timeout
On Tue, Aug 30, 2005 at 11:01:52AM -0400, LaValley, Brian E wrote: Can someone please help me get to the bottom of this issue? I have Amanda 2.4.4p2 on a Fedora Core 3 machine which is the tape server. It has no trouble backing up itself and other Linux machines. The trouble comes with a Sun Solaris 8 client which never completes its estimate. I tried to keep increasing the etimeout value, but I am at 29600 and am wondering how far I should go? Is there some other part I should be looking at? Thank you. Does gnutar follow, and backup symbolic links. I wonder if some of these monster estimates might be due to circular references. -- Jon H. LaBadie [EMAIL PROTECTED] JG Computing 4455 Province Line Road(609) 252-0159 Princeton, NJ 08540-4322 (609) 683-7220 (fax)
Re: Estimate timeout
On Tue, 30 Aug 2005 at 1:19pm, Jon LaBadie wrote On Tue, Aug 30, 2005 at 11:01:52AM -0400, LaValley, Brian E wrote: Can someone please help me get to the bottom of this issue? I have Amanda 2.4.4p2 on a Fedora Core 3 machine which is the tape server. It has no trouble backing up itself and other Linux machines. The trouble comes with a Sun Solaris 8 client which never completes its estimate. I tried to keep increasing the etimeout value, but I am at 29600 and am wondering how far I should go? Is there some other part I should be looking at? Thank you. Does gnutar follow, and backup symbolic links. I wonder if some of these monster estimates might be due to circular references. I'm fairly certain it backs them up as links, not as the targest themselves. It'd be easy to test, though... -- Joshua Baker-LePain Department of Biomedical Engineering Duke University
Estimate timeout issue
I have an AMANDA client machine with Solaris 8 and logical volumes on a disk. The AMANDA server's config has etimeout=29600 so it waits 59202 seconds and fails. planner: time 59202.106: error result for host coneng disk /dev/vx/dsk/opt: Estimate timeout from coneng planner: time 59202.108: error result for host coneng disk /dev/vx/dsk/homedg/homevol: Estimate timeout from coneng planner: time 59202.108: getting estimates took 59192.001 secs FAILED QUEUE: 0: coneng /dev/vx/dsk/opt 1: coneng /dev/vx/dsk/homedg/homevol Any ideas how I can fix this? What other information do you need?
Estimate timeout
My dumps aren't completing. One fishy thing I am seeing in the logs is two of the same partition, /home 1 and /home 0 What does this mean? Amanda 2.4 REQ HANDLE 000-B0F0E609 SEQ 1122119536 SECURITY USER backup SERVICE sendsize OPTIONS features=feff9ffe0f;maxdumps=1;hostname=blavalley-l; GNUTAR /home 0 1970:1:1:0:0:0 -1 OPTIONS |;bsd-auth;srvcomp-best;index;exclude-list=/opt/amanda/etc/amanda/daily/excl ude.gtar; GNUTAR /home 1 2005:7:20:18:21:45 -1 OPTIONS |;bsd-auth;srvcomp-best;index;exclude-list=/opt/amanda/etc/amanda/daily/excl ude.gtar; GNUTAR /common 0 1970:1:1:0:0:0 -1 OPTIONS |;bsd-auth;srvcomp-best;index;exclude-list=/opt/amanda/etc/amanda/daily/excl ude.gtar; GNUTAR /native 0 1970:1:1:0:0:0 -1 OPTIONS |;bsd-auth;srvcomp-best;index;exclude-list=/opt/amanda/etc/amanda/daily/excl ude.gtar; GNUTAR /opt 0 1970:1:1:0:0:0 -1 OPTIONS |;bsd-auth;srvcomp-best;index;exclude-list=/opt/amanda/etc/amanda/daily/excl ude.gtar;
Dumps Fail - Estimate Timeout errors
Hi Clients are all RH 7.3 or WhiteBox respin 2 Server is 2.4.4p4 running on whitebox respin 2 My amchecks un fine and without issue however i have come in on 2 morning snow to find that some of the clients failed. The actual fails have occurred on different clients, ie some that failed 2 nights ago worked last night without changes, and i can't figure out why. All clients were working fine at a different site as we move idc over the weekend so we have new network architecture. Failure errors are hostname/dev/rd/c0d0p3 lev 0 FAILED [Estimate timeout from hostname] hostname/dev/rd/c0d0p1 lev 0 FAILED [Estimate timeout from hostname] anotherhostname /dev/rd/c0d0p2 lev 0 FAILED [Estimate timeout from anotherhostname] anotherhostname /dev/rd/c0d0p5 lev 0 FAILED [Estimate timeout from anotherhostname] There is nothing in the firewall log to indicate a drop of packet. I have, as a test actually allowed any ports between these networks, but it has not helped. Does anyone know how to debug these timeout type issues as i have been using amanda for about 3 years now and have not encountered this before. thanks
Re: Dumps Fail - Estimate Timeout errors
Tom Brown wrote: Failure errors are hostname/dev/rd/c0d0p3 lev 0 FAILED [Estimate timeout from hostname] ... Does anyone know how to debug these timeout type issues as i have been using amanda for about 3 years now and have not encountered this before. Have a look on that client in /tmp/amanda, look for the files sendsize.DATETIME.debug and see how long the estimate did take. The first line of the file is the start time and the last line is the finish time. How long did it really take? You many have to change the etimeout parameter in amanda.conf. If there is no finish time line, then the estimate crashed, and probably there is an error message in that file too. -- Paul Bijnens, XplanationTel +32 16 397.511 Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax +32 16 397.512 http://www.xplanation.com/ email: [EMAIL PROTECTED] *** * I think I've got the hang of it now: exit, ^D, ^C, ^\, ^Z, ^Q, F6, * * quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, * * stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt, abort, hangup, * * PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e, kill -1 $$, shutdown, * * kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, Stop-A, ...* * ... Are you sure? ... YES ... Phew ... I'm out * ***
Re: Dumps Fail - Estimate Timeout errors
Have a look on that client in /tmp/amanda, look for the files sendsize.DATETIME.debug and see how long the estimate did take. The first line of the file is the start time and the last line is the finish time. How long did it really take? You many have to change the etimeout parameter in amanda.conf. If there is no finish time line, then the estimate crashed, and probably there is an error message in that file too. Hi Please see pasted below the 3 entries that applt to last night - everything appears OK in there it seems. If the sendsize is not crashing what else could cause this? thanks sendsize: debug 1 pid 16820 ruid 11 euid 11: start at Wed May 18 00:29:27 2005 sendsize: version 2.4.4p1 snip sendsize[16820]: time 118.545: child 16874 terminated normally sendsize: time 118.545: pid 16820 finish time Wed May 18 00:31:25 2005 sendsize: debug 1 pid 17384 ruid 11 euid 11: start at Wed May 18 00:54:26 2005 sendsize: version 2.4.4p1 snip sendsize[17384]: time 101.841: child 17418 terminated normally sendsize: time 101.841: pid 17384 finish time Wed May 18 00:56:08 2005 sendsize: debug 1 pid 17795 ruid 11 euid 11: start at Wed May 18 01:19:26 2005 sendsize: version 2.4.4p1 snip sendsize[17795]: time 106.417: child 17842 terminated normally sendsize: time 106.417: pid 17795 finish time Wed May 18 01:21:12 2005
Re: Dumps Fail - Estimate Timeout errors
Tom Brown wrote: actually digging around in /tmp/amanda i have come accross files with amandad.DATE.debug and a few of them apper to end like this amandad: time 111.846: dgram_recv: timeout after 10 seconds amandad: time 111.846: waiting for ack: timeout, retrying amandad: time 121.846: dgram_recv: timeout after 10 seconds amandad: time 121.847: waiting for ack: timeout, retrying amandad: time 131.847: dgram_recv: timeout after 10 seconds amandad: time 131.847: waiting for ack: timeout, retrying amandad: time 141.847: dgram_recv: timeout after 10 seconds amandad: time 141.848: waiting for ack: timeout, retrying amandad: time 151.848: dgram_recv: timeout after 10 seconds amandad: time 151.848: waiting for ack: timeout, giving up! amandad: time 151.848: pid 17383 finish time Wed May 18 00:56:58 2005 what would cause that and i presume this could be the cause of the failure Maybe this a problem in some firewall settings that forbids reverse traffic, or expires the udp-reply after less than 101 seconds. -- Paul Bijnens, XplanationTel +32 16 397.511 Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax +32 16 397.512 http://www.xplanation.com/ email: [EMAIL PROTECTED] *** * I think I've got the hang of it now: exit, ^D, ^C, ^\, ^Z, ^Q, F6, * * quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, * * stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt, abort, hangup, * * PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e, kill -1 $$, shutdown, * * kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, Stop-A, ...* * ... Are you sure? ... YES ... Phew ... I'm out * ***
Re: estimate timeout
McDonagh, Joe wrote: I have an estimate timeout of three hours, is there anyway to skip the estimate or what? It's getting estimate timeout, the fs is fine, it can be read from and everything, it just has loads of small files. You might want to consider upgrading the server and client involved to the current stable. It has provisions for doing server estimates. That should help. The server estimates seem to be conservative, to be on the safe side. I have some data to look through for level 0, level 1, level 2 estimates and I should be able to post it by the end of this week. Hope this helps. -- Jim Summers School of Computer Science-University of Oklahoma -
estimate timeout
I have an estimate timeout of three hours, is there anyway to skip the estimate or what? It's getting estimate timeout, the fs is fine, it can be read from and everything, it just has loads of small files.
Re: estimate timeout
McDonagh, Joe wrote: I have an estimate timeout of three hours, is there anyway to skip the estimate or what? It's getting estimate timeout, the fs is fine, it can be read from and everything, it just has loads of small files. If you have amanda 2.4.5, you can use alternate methods for the estimate. From the NEWS file: * new 'estimate' dumptype option to select estimate type: CLIENT: estimate by the dumping program. CALCSIZE: estimate by the calcsize program, a lot faster but less acurate. SERVER: estimate based on statistic from previous run, take second but can be wrong on the estimate size. I've not yet tried it myself. -- Paul Bijnens, XplanationTel +32 16 397.511 Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax +32 16 397.512 http://www.xplanation.com/ email: [EMAIL PROTECTED] *** * I think I've got the hang of it now: exit, ^D, ^C, ^\, ^Z, ^Q, F6, * * quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, * * stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt, abort, hangup, * * PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e, kill -1 $$, shutdown, * * kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, Stop-A, ...* * ... Are you sure? ... YES ... Phew ... I'm out * ***
Estimate Timeout through iptables firewall
This is mostly just for the archives. I had problems with some clients timing out on estimates when running through a linux firewall (2.6.9 patched and 2.6.10). The problem was that ip_conntrack_amanda was closing the return path before the clients could finish, so the estimate results were getting dropped on the floor. There are three solutions: 1. Open a hole in the firewall allowing clients to send from port 10080 to your amanda server. 2. Change the UDP stream timeout which defaults to 180 seconds to something larger. WARNING! This will change it for ALL UDP connections: sysctl -w net.ipv4.netfilter.ip_conntrack_udp_timeout_stream=1800 3. Extend the amount of time that ip_conntrack_amanda allows the connection to remain open. According to the source it is currently 300 seconds. You can change this by loading the module with the master_timeout option set to something bigger. This can be done in /etc/modprobe.conf: options ip_conntrack_amanda master_timeout=1800 Obviously I prefer 3. Hope this helps someone down the road... Matt -- Matt Hyclak Department of Mathematics Department of Social Work Ohio University (740) 593-1263 pgpqN7hpFZ99C.pgp Description: PGP signature
Re: Estimate timeout error
Paul Bijnens wrote: But the reply packet never got acknowledged by the server. Somehow it got lost or corrupted. Default route for reverse path not correct? Wrong subnetmask? Try do get a network trace at the client and server, and inbetween (don't know how to accomplish that on a PIX firewall): Solaris: snoop -x42 host x.x.x.x proto udp port 10080 using open source (linux and others): tcpdump -X host x.x.x.x and udp and port 10080 Or other programs that have the same capabilities (ethereal etc). Before guessing how to fix it, we must know where the problem is. Is the packet lost? or is it broken? Quick recap: Server grolsch tries to back up client dominion. It works for the partitions of /, /usr and /var. As soon as I tell grolsh to back up dominions /u00 partition (a 45G partition, but presently only 177M full w/approx 2000 files) it will fail. I have since removed /u00 from backups to at least keep things working in the meantime but I would like that data backed up :-) I have moved the amanda server to public IP space. It is still behind a PIX firewall, I just got rid of the private IP to public IP mappings. This didnt fix it :-) Not that I thought it would, I just got annoyed at some of the routing. I ran tcpdump on client and server, the dumps are on the following page, lined up as best I could to show the flow. It seems when doing the partition that makes it fail, a bunch of packets do not get from the client to the server. Since I am no expert in TCPdump or interpreting its results, I hope this helps figure out the problem. tcpdump results on http://www.hackermonkey.com/amanda-error.html -Nick
Re: lev 0 FAILED [Estimate timeout from ******]
I think is not a problem of firewall because, if i put only one filesystem in disklist it work. ND On Thu, 2004-12-02 at 18:21 +0100, Christoph Scheeder wrote: Hi, Completly shure? many modern linux distros (AFAIK at least suse and redhat) come up with default firewall-installations blocking many things if you do not explicitly disable these firewalls. So there might be a firewall on the linux-box even if you didn't configure it. Christoph Nuno Dias schrieb: No, the two machines are in the same network, no firewall. ND On Thu, 2004-12-02 at 17:55 +0100, Christoph Scheeder wrote: Nuno Dias schrieb: Hi, I have a Digital Unix machine that give me some strange results when i try to use amanda. If i configure disklist with 2 or more disks of the Digital Unix machine, the amanda report tell me this: xxx/ lev 0 FAILED [Estimate timeout from xx] xxx/usr lev 0 FAILED [Estimate timeout from xx] xxx/ lev 0 FAILED [Estimate timeout from xx] The amandad.20041202142753000.debug file in Digital Machine have this error: amandad: time 200.266: dgram_recv: timeout after 10 seconds amandad: time 200.266: waiting for ack: timeout, retrying amandad: time 210.267: dgram_recv: timeout after 10 seconds amandad: time 210.267: waiting for ack: timeout, retrying amandad: time 220.267: dgram_recv: timeout after 10 seconds amandad: time 220.267: waiting for ack: timeout, retrying amandad: time 230.267: dgram_recv: timeout after 10 seconds amandad: time 230.267: waiting for ack: timeout, retrying amandad: time 240.267: dgram_recv: timeout after 10 seconds amandad: time 240.267: waiting for ack: timeout, giving up! amandad: time 240.267: pid 22594 finish time Thu Dec 2 14:31:54 2004 The strange thing is, if i configure only one disk in disklist, the backup run ok, and no problem is report in amanda report. I increased the etimeout/ctimeout to a big number ... and did not work. I have a Linux machine that is the master and the Digital Unix machine is the client, the version of amanda is 2.4.4p4 Thank's for some help. ND Hi, could this be a firewall-timeout on the linux-machine? Christoph -- Nuno Dias [EMAIL PROTECTED] LIP
Re: lev 0 FAILED [Estimate timeout from ******]
Nuno Dias wrote: I think is not a problem of firewall because, if i put only one filesystem in disklist it work. Not sure. Is there a firewall involved? If the connection tracking times out after 5 minutes, then maybe 1 filesystem can reply within that timeframe, but more filesystems exceed the time. Is there a firewall involved? What is the UDP-timer value? Is it any better is client and server disable firewall rules competely? Can you see the reply packet leave the client, and arrive at the server? (capture all network traffic for udp port 10080 at client and server to verify.) Is there any other message in the client-file /tmp/amanda/sendsize.DATETIME.debug? ND On Thu, 2004-12-02 at 18:21 +0100, Christoph Scheeder wrote: Hi, Completly shure? many modern linux distros (AFAIK at least suse and redhat) come up with default firewall-installations blocking many things if you do not explicitly disable these firewalls. So there might be a firewall on the linux-box even if you didn't configure it. Christoph Nuno Dias schrieb: No, the two machines are in the same network, no firewall. ND On Thu, 2004-12-02 at 17:55 +0100, Christoph Scheeder wrote: Nuno Dias schrieb: Hi, I have a Digital Unix machine that give me some strange results when i try to use amanda. If i configure disklist with 2 or more disks of the Digital Unix machine, the amanda report tell me this: xxx/ lev 0 FAILED [Estimate timeout from xx] xxx/usr lev 0 FAILED [Estimate timeout from xx] xxx/ lev 0 FAILED [Estimate timeout from xx] The amandad.20041202142753000.debug file in Digital Machine have this error: amandad: time 200.266: dgram_recv: timeout after 10 seconds amandad: time 200.266: waiting for ack: timeout, retrying amandad: time 210.267: dgram_recv: timeout after 10 seconds amandad: time 210.267: waiting for ack: timeout, retrying amandad: time 220.267: dgram_recv: timeout after 10 seconds amandad: time 220.267: waiting for ack: timeout, retrying amandad: time 230.267: dgram_recv: timeout after 10 seconds amandad: time 230.267: waiting for ack: timeout, retrying amandad: time 240.267: dgram_recv: timeout after 10 seconds amandad: time 240.267: waiting for ack: timeout, giving up! amandad: time 240.267: pid 22594 finish time Thu Dec 2 14:31:54 2004 The strange thing is, if i configure only one disk in disklist, the backup run ok, and no problem is report in amanda report. I increased the etimeout/ctimeout to a big number ... and did not work. I have a Linux machine that is the master and the Digital Unix machine is the client, the version of amanda is 2.4.4p4 Thank's for some help. ND Hi, could this be a firewall-timeout on the linux-machine? Christoph -- Paul Bijnens, XplanationTel +32 16 397.511 Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax +32 16 397.512 http://www.xplanation.com/ email: [EMAIL PROTECTED] *** * I think I've got the hang of it now: exit, ^D, ^C, ^\, ^Z, ^Q, F6, * * quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, * * stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt, abort, hangup, * * PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e, kill -1 $$, shutdown, * * kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, Stop-A, ...* * ... Are you sure? ... YES ... Phew ... I'm out * ***
Re: Estimate timeout error
There is a PIX between the two, but Im backing up a bunch (10?) linux and solaris servers in the same areas of the network, to this same amanda server without any issues so I dont believe it to be a firewall issue. There are no iptables running on either host (both linux in this case) In the amandad.XXX.debug log I have the following lines, which Im assuming are the error report of the problem? Now, the question is, how to fix it :-) -Nick amandad: time 0.010: amandahosts security check passed amandad: time 0.010: running service /usr/lib/amanda/sendsize amandad: time 182.436: sending REP packet: Amanda 2.4 REP HANDLE 005-40813308 SEQ 1102082216 OPTIONS features=feff9ffe0f; / 0 SIZE 301197 / 1 SIZE 100 /u00 0 SIZE 143930 /u00 1 SIZE 41411 /usr 0 SIZE 880958 /usr 1 SIZE 79 /usr/local 0 SIZE 174 /usr/local 1 SIZE 47 /var 0 SIZE 299300 /var 1 SIZE 2857 amandad: time 192.437: dgram_recv: timeout after 10 seconds amandad: time 192.437: waiting for ack: timeout, retrying amandad: time 202.439: dgram_recv: timeout after 10 seconds amandad: time 202.439: waiting for ack: timeout, retrying amandad: time 212.441: dgram_recv: timeout after 10 seconds amandad: time 212.442: waiting for ack: timeout, retrying amandad: time 222.444: dgram_recv: timeout after 10 seconds amandad: time 222.444: waiting for ack: timeout, retrying amandad: time 232.446: dgram_recv: timeout after 10 seconds amandad: time 232.446: waiting for ack: timeout, giving up! amandad: time 232.446: pid 21896 finish time Fri Dec 3 09:01:32 2004 Paul Bijnens wrote: Nick Danger wrote: Nope - still a problem. The error is still as below: FAILURE AND STRANGE DUMP SUMMARY: dominion.h /var lev 0 FAILED [Estimate timeout from dominion.xxx] dominion.h /usr/local lev 0 FAILED [Estimate timeout from dominion.xxx] dominion.h /usr lev 0 FAILED [Estimate timeout from dominion.xxx] dominion.h /u00 lev 0 FAILED [Estimate timeout from dominion.xxx] dominion.h / lev 0 FAILED [Estimate timeout from dominion.xxx] I have the timeout in amanda.conf set to an ungodly high number of etimeout -12000 # total number of seconds for estimates. [...] sendsize: debug 1 pid 26242 ruid 33 euid 33: start at Thu Dec 2 11:25:07 2004 sendsize: version 2.4.4p1 [...] sendsize: time 172.473: pid 26242 finish time Thu Dec 2 11:27:59 2004 The estimate really takes only 173 seconds. That means that etimeout is plenty (better lower it again to normal values). The problem seems to be in the reply packet. I've already seen problems with a UDP-packet overflow, but that's unlikely. That problem happened with older versions where the UDP size was only 8Kbyte or so. Currently it is 64K, but it could be limited by the OS too, of course. The reply packet is usually larger than the request packet, because it contains 1 to 3 lines for each DLE (level 0, current level, current plus 1). In amandad.DATETIME.debug, you can find the request packet, and the reply packet. Any weird limitation on UDP packet size on one of the hosts (or intermediate routers/firewalls)? Another problem could be in the iptables modules for amanda, where there is already twice a bug introduced. I don't know exactly the last status of that bug. If not needed, do not use the amanda iptables modules. Try lsmod | grep amanda. (Or on intermediate firewalls!) Maybe try a network traffic dump (with tcpdump or similar program) on client *and* host?
Re: Estimate timeout error
Nick Danger wrote: There is a PIX between the two, but Im backing up a bunch (10?) linux and solaris servers in the same areas of the network, to this same amanda server without any issues so I dont believe it to be a firewall issue. There are no iptables running on either host (both linux in this case) In the amandad.XXX.debug log I have the following lines, which Im assuming are the error report of the problem? Now, the question is, how to fix it :-) -Nick amandad: time 0.010: amandahosts security check passed amandad: time 0.010: running service /usr/lib/amanda/sendsize amandad: time 182.436: sending REP packet: The above concludes that 3 minutes is needed for the sendsize, and it is indeed without errors, because it has all the info below. Could still be that 179 seconds works and 181 seconds is too late... Amanda 2.4 REP HANDLE 005-40813308 SEQ 1102082216 OPTIONS features=feff9ffe0f; / 0 SIZE 301197 / 1 SIZE 100 /u00 0 SIZE 143930 /u00 1 SIZE 41411 /usr 0 SIZE 880958 /usr 1 SIZE 79 /usr/local 0 SIZE 174 /usr/local 1 SIZE 47 /var 0 SIZE 299300 /var 1 SIZE 2857 The above lines are the reply packet, less than 300 bytes, so I guess it's not a UDP packet overflow. amandad: time 192.437: dgram_recv: timeout after 10 seconds amandad: time 192.437: waiting for ack: timeout, retrying amandad: time 202.439: dgram_recv: timeout after 10 seconds amandad: time 202.439: waiting for ack: timeout, retrying amandad: time 212.441: dgram_recv: timeout after 10 seconds amandad: time 212.442: waiting for ack: timeout, retrying amandad: time 222.444: dgram_recv: timeout after 10 seconds amandad: time 222.444: waiting for ack: timeout, retrying amandad: time 232.446: dgram_recv: timeout after 10 seconds amandad: time 232.446: waiting for ack: timeout, giving up! amandad: time 232.446: pid 21896 finish time Fri Dec 3 09:01:32 2004 But the reply packet never got acknowledged by the server. Somehow it got lost or corrupted. Default route for reverse path not correct? Wrong subnetmask? Try do get a network trace at the client and server, and inbetween (don't know how to accomplish that on a PIX firewall): Solaris: snoop -x42 host x.x.x.x proto udp port 10080 using open source (linux and others): tcpdump -X host x.x.x.x and udp and port 10080 Or other programs that have the same capabilities (ethereal etc). Before guessing how to fix it, we must know where the problem is. Is the packet lost? or is it broken? -- Paul Bijnens, XplanationTel +32 16 397.511 Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax +32 16 397.512 http://www.xplanation.com/ email: [EMAIL PROTECTED] *** * I think I've got the hang of it now: exit, ^D, ^C, ^\, ^Z, ^Q, F6, * * quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, * * stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt, abort, hangup, * * PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e, kill -1 $$, shutdown, * * kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, Stop-A, ...* * ... Are you sure? ... YES ... Phew ... I'm out * ***
lev 0 FAILED [Estimate timeout from ******]
Hi, I have a Digital Unix machine that give me some strange results when i try to use amanda. If i configure disklist with 2 or more disks of the Digital Unix machine, the amanda report tell me this: xxx/ lev 0 FAILED [Estimate timeout from xx] xxx/usr lev 0 FAILED [Estimate timeout from xx] xxx/ lev 0 FAILED [Estimate timeout from xx] The amandad.20041202142753000.debug file in Digital Machine have this error: amandad: time 200.266: dgram_recv: timeout after 10 seconds amandad: time 200.266: waiting for ack: timeout, retrying amandad: time 210.267: dgram_recv: timeout after 10 seconds amandad: time 210.267: waiting for ack: timeout, retrying amandad: time 220.267: dgram_recv: timeout after 10 seconds amandad: time 220.267: waiting for ack: timeout, retrying amandad: time 230.267: dgram_recv: timeout after 10 seconds amandad: time 230.267: waiting for ack: timeout, retrying amandad: time 240.267: dgram_recv: timeout after 10 seconds amandad: time 240.267: waiting for ack: timeout, giving up! amandad: time 240.267: pid 22594 finish time Thu Dec 2 14:31:54 2004 The strange thing is, if i configure only one disk in disklist, the backup run ok, and no problem is report in amanda report. I increased the etimeout/ctimeout to a big number ... and did not work. I have a Linux machine that is the master and the Digital Unix machine is the client, the version of amanda is 2.4.4p4 Thank's for some help. ND -- Nuno Dias [EMAIL PROTECTED] LIP
Re: lev 0 FAILED [Estimate timeout from ******]
Nuno Dias schrieb: Hi, I have a Digital Unix machine that give me some strange results when i try to use amanda. If i configure disklist with 2 or more disks of the Digital Unix machine, the amanda report tell me this: xxx/ lev 0 FAILED [Estimate timeout from xx] xxx/usr lev 0 FAILED [Estimate timeout from xx] xxx/ lev 0 FAILED [Estimate timeout from xx] The amandad.20041202142753000.debug file in Digital Machine have this error: amandad: time 200.266: dgram_recv: timeout after 10 seconds amandad: time 200.266: waiting for ack: timeout, retrying amandad: time 210.267: dgram_recv: timeout after 10 seconds amandad: time 210.267: waiting for ack: timeout, retrying amandad: time 220.267: dgram_recv: timeout after 10 seconds amandad: time 220.267: waiting for ack: timeout, retrying amandad: time 230.267: dgram_recv: timeout after 10 seconds amandad: time 230.267: waiting for ack: timeout, retrying amandad: time 240.267: dgram_recv: timeout after 10 seconds amandad: time 240.267: waiting for ack: timeout, giving up! amandad: time 240.267: pid 22594 finish time Thu Dec 2 14:31:54 2004 The strange thing is, if i configure only one disk in disklist, the backup run ok, and no problem is report in amanda report. I increased the etimeout/ctimeout to a big number ... and did not work. I have a Linux machine that is the master and the Digital Unix machine is the client, the version of amanda is 2.4.4p4 Thank's for some help. ND Hi, could this be a firewall-timeout on the linux-machine? Christoph
Re: lev 0 FAILED [Estimate timeout from ******]
No, the two machines are in the same network, no firewall. ND On Thu, 2004-12-02 at 17:55 +0100, Christoph Scheeder wrote: Nuno Dias schrieb: Hi, I have a Digital Unix machine that give me some strange results when i try to use amanda. If i configure disklist with 2 or more disks of the Digital Unix machine, the amanda report tell me this: xxx/ lev 0 FAILED [Estimate timeout from xx] xxx/usr lev 0 FAILED [Estimate timeout from xx] xxx/ lev 0 FAILED [Estimate timeout from xx] The amandad.20041202142753000.debug file in Digital Machine have this error: amandad: time 200.266: dgram_recv: timeout after 10 seconds amandad: time 200.266: waiting for ack: timeout, retrying amandad: time 210.267: dgram_recv: timeout after 10 seconds amandad: time 210.267: waiting for ack: timeout, retrying amandad: time 220.267: dgram_recv: timeout after 10 seconds amandad: time 220.267: waiting for ack: timeout, retrying amandad: time 230.267: dgram_recv: timeout after 10 seconds amandad: time 230.267: waiting for ack: timeout, retrying amandad: time 240.267: dgram_recv: timeout after 10 seconds amandad: time 240.267: waiting for ack: timeout, giving up! amandad: time 240.267: pid 22594 finish time Thu Dec 2 14:31:54 2004 The strange thing is, if i configure only one disk in disklist, the backup run ok, and no problem is report in amanda report. I increased the etimeout/ctimeout to a big number ... and did not work. I have a Linux machine that is the master and the Digital Unix machine is the client, the version of amanda is 2.4.4p4 Thank's for some help. ND Hi, could this be a firewall-timeout on the linux-machine? Christoph -- Nuno Dias [EMAIL PROTECTED] LIP
Re: lev 0 FAILED [Estimate timeout from ******]
Hi, Completly shure? many modern linux distros (AFAIK at least suse and redhat) come up with default firewall-installations blocking many things if you do not explicitly disable these firewalls. So there might be a firewall on the linux-box even if you didn't configure it. Christoph Nuno Dias schrieb: No, the two machines are in the same network, no firewall. ND On Thu, 2004-12-02 at 17:55 +0100, Christoph Scheeder wrote: Nuno Dias schrieb: Hi, I have a Digital Unix machine that give me some strange results when i try to use amanda. If i configure disklist with 2 or more disks of the Digital Unix machine, the amanda report tell me this: xxx/ lev 0 FAILED [Estimate timeout from xx] xxx/usr lev 0 FAILED [Estimate timeout from xx] xxx/ lev 0 FAILED [Estimate timeout from xx] The amandad.20041202142753000.debug file in Digital Machine have this error: amandad: time 200.266: dgram_recv: timeout after 10 seconds amandad: time 200.266: waiting for ack: timeout, retrying amandad: time 210.267: dgram_recv: timeout after 10 seconds amandad: time 210.267: waiting for ack: timeout, retrying amandad: time 220.267: dgram_recv: timeout after 10 seconds amandad: time 220.267: waiting for ack: timeout, retrying amandad: time 230.267: dgram_recv: timeout after 10 seconds amandad: time 230.267: waiting for ack: timeout, retrying amandad: time 240.267: dgram_recv: timeout after 10 seconds amandad: time 240.267: waiting for ack: timeout, giving up! amandad: time 240.267: pid 22594 finish time Thu Dec 2 14:31:54 2004 The strange thing is, if i configure only one disk in disklist, the backup run ok, and no problem is report in amanda report. I increased the etimeout/ctimeout to a big number ... and did not work. I have a Linux machine that is the master and the Digital Unix machine is the client, the version of amanda is 2.4.4p4 Thank's for some help. ND Hi, could this be a firewall-timeout on the linux-machine? Christoph
Re: Estimate timeout error
Matt Hyclak wrote: The sendsize.DATETIME.debug log file on dominion should tell you how long the estimates are taking. A simple calculation should tell you how big etimeout should be. (NUM_PARTITIONS * ETIMEOUT) = total time amanda waits for estimates. Matt Nope - still a problem. The error is still as below: FAILURE AND STRANGE DUMP SUMMARY: dominion.h /var lev 0 FAILED [Estimate timeout from dominion.xxx] dominion.h /usr/local lev 0 FAILED [Estimate timeout from dominion.xxx] dominion.h /usr lev 0 FAILED [Estimate timeout from dominion.xxx] dominion.h /u00 lev 0 FAILED [Estimate timeout from dominion.xxx] dominion.h / lev 0 FAILED [Estimate timeout from dominion.xxx] I have the timeout in amanda.conf set to an ungodly high number of etimeout -12000 # total number of seconds for estimates. The lines from the sendsize in /var/log are listed below. All the file systems are short, except for /u00 which lists Is the 77 in MINUES? Or seconds? Either way, 12000 in amanda.conf should be plenty, shouldnt it? I could be just doing my math wrong. Which is always a possiblity. There are no units listed other then in the config file, so Im guessing at some parts here. Thanks all -Nick --- sendsize: debug 1 pid 26242 ruid 33 euid 33: start at Thu Dec 2 11:25:07 2004 sendsize: version 2.4.4p1 sendsize[26244]: time 0.007: calculating for amname '/', dirname '/', spindle -1 sendsize[26244]: time 0.007: getting size via dump for / level 0 sendsize[26242]: time 0.008: waiting for any estimate child sendsize[26244]: time 0.008: calculating for device '/dev/sda1' with 'ext3' sendsize[26244]: time 0.008: running /sbin/dump 0Ssf 1048576 - /dev/sda1 sendsize[26244]: time 0.011: running /usr/lib/amanda/killpgrp sendsize[26244]: time 0.071: DUMP: Excluding inode 8 (journal inode) from dump sendsize[26244]: time 0.072: DUMP: Excluding inode 7 (resize inode) from dump sendsize[26244]: time 0.410: 308423680 sendsize[26244]: time 0.411: . sendsize[26244]: estimate time for / level 0: 0.402 sendsize[26244]: estimate size for / level 0: 301195 KB sendsize[26244]: time 0.411: asking killpgrp to terminate sendsize[26244]: time 1.415: getting size via dump for / level 1 sendsize[26244]: time 1.416: calculating for device '/dev/sda1' with 'ext3' sendsize[26244]: time 1.416: running /sbin/dump 1Ssf 1048576 - /dev/sda1 sendsize[26244]: time 1.419: running /usr/lib/amanda/killpgrp sendsize[26244]: time 1.449: DUMP: Excluding inode 8 (journal inode) from dump sendsize[26244]: time 1.451: DUMP: Excluding inode 7 (resize inode) from dump sendsize[26244]: time 1.887: 1104896 sendsize[26244]: time 1.889: . sendsize[26244]: estimate time for / level 1: 0.472 sendsize[26244]: estimate size for / level 1: 1079 KB sendsize[26244]: time 1.889: asking killpgrp to terminate sendsize[26244]: time 2.895: done with amname '/', dirname '/', spindle -1 sendsize[26242]: time 2.895: child 26244 terminated normally sendsize[26249]: time 2.896: calculating for amname '/u00', dirname '/u00', spindle -1 sendsize[26249]: time 2.896: getting size via dump for /u00 level 0 sendsize[26249]: time 2.897: calculating for device '/dev/sda9' with 'ext3' sendsize[26249]: time 2.897: running /sbin/dump 0Ssf 1048576 - /dev/sda9 sendsize[26249]: time 2.900: running /usr/lib/amanda/killpgrp sendsize[26242]: time 2.905: waiting for any estimate child sendsize[26249]: time 2.942: DUMP: Excluding inode 8 (journal inode) from dump sendsize[26249]: time 2.943: DUMP: Excluding inode 7 (resize inode) from dump sendsize[26249]: time 80.109: 147388416 sendsize[26249]: time 80.111: . sendsize[26249]: estimate time for /u00 level 0: 77.213 sendsize[26249]: estimate size for /u00 level 0: 143934 KB sendsize[26249]: time 80.111: asking killpgrp to terminate sendsize[26249]: time 81.112: getting size via dump for /u00 level 1 sendsize[26249]: time 81.113: calculating for device '/dev/sda9' with 'ext3' sendsize[26249]: time 81.113: running /sbin/dump 1Ssf 1048576 - /dev/sda9 sendsize[26249]: time 81.116: running /usr/lib/amanda/killpgrp sendsize[26249]: time 81.401: DUMP: Excluding inode 8 (journal inode) from dump sendsize[26249]: time 81.403: DUMP: Excluding inode 7 (resize inode) from dump sendsize[26249]: time 159.069: 42408960 sendsize[26249]: time 159.070: . sendsize[26249]: estimate time for /u00 level 1: 77.957 sendsize[26249]: estimate size for /u00 level 1: 41415 KB sendsize[26249]: time 159.071: asking killpgrp to terminate sendsize[26249]: time 160.080: done with amname '/u00', dirname '/u00', spindle -1 sendsize[26242]: time 160.080: child 26249 terminated normally sendsize[26484]: time 160.081: calculating for amname '/usr', dirname '/usr', spindle -1 sendsize[26484]: time 160.081: getting size via dump for /usr level 0 sendsize[26484]: time 160.082: calculating for device '/dev/sda2' with 'ext3' sendsize[26484]: time 160.082: running /sbin/dump 0Ssf 1048576 - /dev/sda2 sendsize[26484]: time
Re: Estimate timeout error
Nick Danger wrote: Nope - still a problem. The error is still as below: FAILURE AND STRANGE DUMP SUMMARY: dominion.h /var lev 0 FAILED [Estimate timeout from dominion.xxx] dominion.h /usr/local lev 0 FAILED [Estimate timeout from dominion.xxx] dominion.h /usr lev 0 FAILED [Estimate timeout from dominion.xxx] dominion.h /u00 lev 0 FAILED [Estimate timeout from dominion.xxx] dominion.h / lev 0 FAILED [Estimate timeout from dominion.xxx] I have the timeout in amanda.conf set to an ungodly high number of etimeout -12000 # total number of seconds for estimates. [...] sendsize: debug 1 pid 26242 ruid 33 euid 33: start at Thu Dec 2 11:25:07 2004 sendsize: version 2.4.4p1 [...] sendsize: time 172.473: pid 26242 finish time Thu Dec 2 11:27:59 2004 The estimate really takes only 173 seconds. That means that etimeout is plenty (better lower it again to normal values). The problem seems to be in the reply packet. I've already seen problems with a UDP-packet overflow, but that's unlikely. That problem happened with older versions where the UDP size was only 8Kbyte or so. Currently it is 64K, but it could be limited by the OS too, of course. The reply packet is usually larger than the request packet, because it contains 1 to 3 lines for each DLE (level 0, current level, current plus 1). In amandad.DATETIME.debug, you can find the request packet, and the reply packet. Any weird limitation on UDP packet size on one of the hosts (or intermediate routers/firewalls)? Another problem could be in the iptables modules for amanda, where there is already twice a bug introduced. I don't know exactly the last status of that bug. If not needed, do not use the amanda iptables modules. Try lsmod | grep amanda. (Or on intermediate firewalls!) Maybe try a network traffic dump (with tcpdump or similar program) on client *and* host? -- Paul Bijnens, XplanationTel +32 16 397.511 Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax +32 16 397.512 http://www.xplanation.com/ email: [EMAIL PROTECTED] *** * I think I've got the hang of it now: exit, ^D, ^C, ^\, ^Z, ^Q, F6, * * quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, * * stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt, abort, hangup, * * PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e, kill -1 $$, shutdown, * * kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, Stop-A, ...* * ... Are you sure? ... YES ... Phew ... I'm out * ***
Estimate timeout error
Is there any way to properly calculate what your timeout estimate value should be other then trial and error? I have a partition on a machine that gives this error. dominion.h /u00 lev 0 FAILED [Estimate timeout from dominion] If I remove that partition from disklist, all other partitions on that server backup just fine. Its a 45G partiton on a SCSI raid set. Its hardly 1% full, holding maybe 1000 files. I have upped the timeout to 1200, and still it failed. Suggestions? -Nick
Re: Estimate timeout error
On Mon, Nov 29, 2004 at 09:00:44AM -0500, Nick Danger enlightened us: Is there any way to properly calculate what your timeout estimate value should be other then trial and error? I have a partition on a machine that gives this error. dominion.h /u00 lev 0 FAILED [Estimate timeout from dominion] If I remove that partition from disklist, all other partitions on that server backup just fine. Its a 45G partiton on a SCSI raid set. Its hardly 1% full, holding maybe 1000 files. I have upped the timeout to 1200, and still it failed. The sendsize.DATETIME.debug log file on dominion should tell you how long the estimates are taking. A simple calculation should tell you how big etimeout should be. (NUM_PARTITIONS * ETIMEOUT) = total time amanda waits for estimates. Matt -- Matt Hyclak Department of Mathematics Department of Social Work Ohio University (740) 593-1263 pgpTl5n77Ssee.pgp Description: PGP signature
Re: Estimate timeout error
On Monday 29 November 2004 09:00, Nick Danger wrote: Is there any way to properly calculate what your timeout estimate value should be other then trial and error? I have a partition on a machine that gives this error. dominion.h /u00 lev 0 FAILED [Estimate timeout from dominion] If I remove that partition from disklist, all other partitions on that server backup just fine. Its a 45G partiton on a SCSI raid set. Its hardly 1% full, holding maybe 1000 files. I have upped the timeout to 1200, and still it failed. Suggestions? No, other than your etimeout value s/b more than sufficient. Do you have other dirs on that client that do work ok and amanda backs them up ok? -Nick -- Cheers, Gene There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order. -Ed Howdershelt (Author) 99.29% setiathome rank, not too shabby for a WV hillbilly Yahoo.com attorneys please note, additions to this message by Gene Heskett are: Copyright 2004 by Maurice Eugene Heskett, all rights reserved.
Re: Estimate timeout error
On Monday 29 November 2004 09:00, Nick Danger wrote: Is there any way to properly calculate what your timeout estimate value should be other then trial and error? I have a partition on a machine that gives this error. dominion.h /u00 lev 0 FAILED [Estimate timeout from dominion] Have a look in on the client in the file /tmp/amanda/sendsize.DATETIMESTAMP.debug. The first and the last line contain a date. Even when the server times out, the client still continues. If not, then there is probably an error message. -- Paul Bijnens, XplanationTel +32 16 397.511 Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax +32 16 397.512 http://www.xplanation.com/ email: [EMAIL PROTECTED] *** * I think I've got the hang of it now: exit, ^D, ^C, ^\, ^Z, ^Q, F6, * * quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, * * stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt, abort, hangup, * * PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e, kill -1 $$, shutdown, * * kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, Stop-A, ...* * ... Are you sure? ... YES ... Phew ... I'm out * ***
Re: Estimate timeout
Joshua Baker-LePain wrote: admin prohibited is definately a result of iptables filtering. Have a close look in homer. Execute iptables -L. Maybe the solution is loading the amanda iptables module, if that is available on the machine. I'd be interested to see if that fixes it. The following line was added to /etc/sysconfig/iptables: -A RH-Firewall-1-INPUT -m state --state NEW -m udp -p udp -s XX.XX.XX.0/24 --sport 10080 -j ACCEPT ...where XX.XX.XX is the IP address of our local 'external' network, on which both homer and marge are located. The problem has been solved. -- Steve _ Watch the online reality show Mixed Messages with a friend and enter to win a trip to NY http://www.msnmessenger-download.click-url.com/go/onm00200497ave/direct/01/
Re: Estimate timeout
Steven Schoch wrote: on Wed, 09 Jun 2004 Paul Bijnens wrote: Try to find out where the UDP packet got dropped, using tcpdump or etherreal or other network analyzer on homer and marge. Now we're getting somewhere. The tcpdump shows this: 15:01:56.739818 homer marge: icmp: host homer unreachable - admin prohibited [tos 0xc0] My guess is that ICMP message is something to do with a firewall. admin prohibited is definately a result of iptables filtering. Have a close look in homer. Execute iptables -L. Maybe the solution is loading the amanda iptables module, if that is available on the machine. -- Paul Bijnens, XplanationTel +32 16 397.511 Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax +32 16 397.512 http://www.xplanation.com/ email: [EMAIL PROTECTED] *** * I think I've got the hang of it now: exit, ^D, ^C, ^\, ^Z, ^Q, F6, * * quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, * * stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt, abort, hangup, * * PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e, kill -1 $$, shutdown, * * kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, Stop-A, ...* * ... Are you sure? ... YES ... Phew ... I'm out * ***
Re: Estimate timeout
On Thu, 10 Jun 2004 at 9:31am, Paul Bijnens wrote Steven Schoch wrote: Now we're getting somewhere. The tcpdump shows this: 15:01:56.739818 homer marge: icmp: host homer unreachable - admin prohibited [tos 0xc0] My guess is that ICMP message is something to do with a firewall. admin prohibited is definately a result of iptables filtering. Have a close look in homer. Execute iptables -L. Maybe the solution is loading the amanda iptables module, if that is available on the machine. I'd be interested to see if that fixes it. My amanda server which runs the nightlies of the (small) home partitions has been at RH9 for a while, and has this as the only rule it needed to get amdump working: # If we've an established session, well, okay -A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT I recently moved my other amanda server (which backs up my 4.5TB of RAID space) to RH9. The first few nights, most of the clients were failing with estimate timeouts. But when I tested during the day (with small partitions), everything worked. I finally decided that the estimates on the big partitions were taking long enough that the above rule was timing out. I couldn't afford another night of the backups failing, so I didn't try loading the amanda module -- I just added rules to allow incoming UDP traffic on priviledged ports from the clients. -- Joshua Baker-LePain Department of Biomedical Engineering Duke University
Re: Estimate timeout
Joshua Baker-LePain wrote: On Thu, 10 Jun 2004 at 9:31am, Paul Bijnens wrote Steven Schoch wrote: Now we're getting somewhere. The tcpdump shows this: 15:01:56.739818 homer marge: icmp: host homer unreachable - admin prohibited [tos 0xc0] My guess is that ICMP message is something to do with a firewall. admin prohibited is definately a result of iptables filtering. Have a close look in homer. Execute iptables -L. Maybe the solution is loading the amanda iptables module, if that is available on the machine. I'd be interested to see if that fixes it. My amanda server which runs the nightlies of the (small) home partitions has been at RH9 for a while, and has this as the only rule it needed to get amdump working: # If we've an established session, well, okay -A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT I recently moved my other amanda server (which backs up my 4.5TB of RAID space) to RH9. The first few nights, most of the clients were failing with estimate timeouts. But when I tested during the day (with small partitions), everything worked. I finally decided that the estimates on the big partitions were taking long enough that the above rule was timing out. I couldn't afford another night of the backups failing, so I didn't try loading the amanda module -- I just added rules to allow incoming UDP traffic on priviledged ports from the clients. I have been thinking about this problem, and, without any real testing to backup my hypothesis, I believe the problem lies in the default timeout in iptables for UDP traffic, as you decided too. For TCP traffic, once a packet is replied, the timeout becomes very large (5 days or so I believe). But for UDP, which is a conectionless protocol the timeout is 180 seconds (I believe). After this timeout the connection tracking drops the rule. In my config, the estimates of the clients in the DMZ all take less than 2 minutes. And this works fine. That means that the real solution is to compile amanda with a dedicated udp range, and add that range to the firewall iptables. -- Paul Bijnens, XplanationTel +32 16 397.511 Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax +32 16 397.512 http://www.xplanation.com/ email: [EMAIL PROTECTED] *** * I think I've got the hang of it now: exit, ^D, ^C, ^\, ^Z, ^Q, F6, * * quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, * * stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt, abort, hangup, * * PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e, kill -1 $$, shutdown, * * kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, Stop-A, ...* * ... Are you sure? ... YES ... Phew ... I'm out * ***
Re: Estimate timeout
On Thu, 10 Jun 2004 at 1:40pm, Paul Bijnens wrote I have been thinking about this problem, and, without any real testing to backup my hypothesis, I believe the problem lies in the default timeout in iptables for UDP traffic, as you decided too. For TCP traffic, once a packet is replied, the timeout becomes very large (5 days or so I believe). But for UDP, which is a conectionless protocol the timeout is 180 seconds (I believe). After this timeout the connection tracking drops the rule. Is this true even with ip_conntrack_amanda loaded? -- Joshua Baker-LePain Department of Biomedical Engineering Duke University
Re: Estimate timeout
Joshua Baker-LePain wrote: On Thu, 10 Jun 2004 at 1:40pm, Paul Bijnens wrote I have been thinking about this problem, and, without any real testing to backup my hypothesis, I believe the problem lies in the default timeout in iptables for UDP traffic, as you decided too. For TCP traffic, once a packet is replied, the timeout becomes very large (5 days or so I believe). But for UDP, which is a conectionless protocol the timeout is 180 seconds (I believe). After this timeout the connection tracking drops the rule. Is this true even with ip_conntrack_amanda loaded? I should have a look at the source code, or find a detailed doc that explains it, to find out. Anyway that module should somehow know the etimeout parameter of amanda.conf, which of course it does not know, or otherwise allow a really really large timeout, like a few hours. Or should be tuneable somehow (in the amanda-tradition that could be hardcoded at compile time). -- Paul Bijnens, XplanationTel +32 16 397.511 Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax +32 16 397.512 http://www.xplanation.com/ email: [EMAIL PROTECTED] *** * I think I've got the hang of it now: exit, ^D, ^C, ^\, ^Z, ^Q, F6, * * quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, * * stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt, abort, hangup, * * PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e, kill -1 $$, shutdown, * * kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, Stop-A, ...* * ... Are you sure? ... YES ... Phew ... I'm out * ***
Re: Estimate timeout
On Thu, 10 Jun 2004 at 2:11pm, Paul Bijnens wrote Is this true even with ip_conntrack_amanda loaded? I should have a look at the source code, or find a detailed doc that explains it, to find out. Anyway that module should somehow know the etimeout parameter of amanda.conf, which of course it does not know, or otherwise allow a really really large timeout, like a few hours. Or should be tuneable somehow (in the amanda-tradition that could be hardcoded at compile time). It seems to be tuneable. From the header of the source code: * Module load syntax: * insmod ip_conntrack_amanda.o [master_timeout=n] * * Where master_timeout is the timeout (in seconds) of the master * connection (port 10080). This defaults to 5 minutes but if * your clients take longer than 5 minutes to do their work * before getting back to the Amanda server, you can increase * this value. I should test it one of these nights... -- Joshua Baker-LePain Department of Biomedical Engineering Duke University
Re: Estimate timeout
Joshua Baker-LePain wrote: It seems to be tuneable. From the header of the source code: * Module load syntax: * insmod ip_conntrack_amanda.o [master_timeout=n] * * Where master_timeout is the timeout (in seconds) of the master * connection (port 10080). This defaults to 5 minutes but if * your clients take longer than 5 minutes to do their work * before getting back to the Amanda server, you can increase * this value. I should test it one of these nights... Wow! Learning something new every day! -- Paul Bijnens, XplanationTel +32 16 397.511 Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax +32 16 397.512 http://www.xplanation.com/ email: [EMAIL PROTECTED] *** * I think I've got the hang of it now: exit, ^D, ^C, ^\, ^Z, ^Q, F6, * * quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, * * stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt, abort, hangup, * * PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e, kill -1 $$, shutdown, * * kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, Stop-A, ...* * ... Are you sure? ... YES ... Phew ... I'm out * ***
Re: Estimate timeout
On Thursday 10 June 2004 07:59, Joshua Baker-LePain wrote: On Thu, 10 Jun 2004 at 1:40pm, Paul Bijnens wrote I have been thinking about this problem, and, without any real testing to backup my hypothesis, I believe the problem lies in the default timeout in iptables for UDP traffic, as you decided too. For TCP traffic, once a packet is replied, the timeout becomes very large (5 days or so I believe). But for UDP, which is a conectionless protocol the timeout is 180 seconds (I believe). After this timeout the connection tracking drops the rule. Is this true even with ip_conntrack_amanda loaded? I wasn't even aware of such a module, and got surprised by the output of a locate! Its part of the kernel's netfilter options since back in 2.4.22 or earlier days, so if he doesn't have the executable module, he may have to rebuild his kernel to get it. I hadn't worried about it here since everything I backup with amanda is inside the firewall, or on the firewall itself, but iptables sits between the 2 NICS in the firewall that seperate inside from outside stuffs. -- Cheers, Gene There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order. -Ed Howdershelt (Author) 99.23% setiathome rank, not too shabby for a WV hillbilly Yahoo.com attorneys please note, additions to this message by Gene Heskett are: Copyright 2004 by Maurice Eugene Heskett, all rights reserved.
Estimate timeout
It was working for several days, then all of a sudden it stopped and hasn't worked since. Amcheck works fine, but amdump doesn't. Amdump is run on homer, the system with the tape drive. Homer is a RedHat Enterprise Linux system with amanda version 2.4.4p1. The system that fails to dump is marge, a FreeBSD system with amanda version 2.4.4p2. The important lines from amanda.conf: etimeout 1800# number of seconds per filesystem for estimates. #etimeout -600 # total number of seconds for estimates. # a positive number will be multiplied by the number of filesystems on # each host; a negative number will be taken as an absolute total time-out. # The default is 5 minutes per filesystem. From disklist: marge /var comp-user marge /usr comp-root marge / comp-root From crontab: 45 0 * * 2-6/usr/sbin/amdump OurDump In /tmp/amanda on marge, these lines appear in amandad.20040609004501000.debug: amandad: debug 1 pid 22611 ruid 1001 euid 1001: start at Wed Jun 9 00:45:01 200 4 amandad: version 2.4.4p2 amandad: build: VERSION=Amanda-2.4.4p2 ... amandad: time 0.003: got packet: Amanda 2.4 REQ HANDLE 001-389B0608 SEQ 1086767104 SECURITY USER amanda SERVICE sendsize ... amandad: time 0.004: sending ack: Amanda 2.4 ACK HANDLE 001-389B0608 SEQ 1086767104 ... amandad: time 0.009: amandahosts security check passed amandad: time 0.009: running service /usr/local/libexec/sendsize amandad: time 447.906: sending REP packet: Amanda 2.4 REP HANDLE 001-389B0608 SEQ 1086767104 OPTIONS features=feff9ffe0f; /var 0 SIZE 11520 /var 1 SIZE 1580 /usr 0 SIZE 1166599 /usr 1 SIZE 18710 / 0 SIZE 39571 / 1 SIZE 381 amandad: time 457.910: dgram_recv: timeout after 10 seconds amandad: time 457.910: waiting for ack: timeout, retrying amandad: time 467.920: dgram_recv: timeout after 10 seconds amandad: time 467.920: waiting for ack: timeout, retrying amandad: time 477.930: dgram_recv: timeout after 10 seconds amandad: time 477.930: waiting for ack: timeout, retrying amandad: time 487.940: dgram_recv: timeout after 10 seconds amandad: time 487.941: waiting for ack: timeout, retrying amandad: time 497.950: dgram_recv: timeout after 10 seconds amandad: time 497.951: waiting for ack: timeout, giving up! amandad: time 497.951: pid 22611 finish time Wed Jun 9 00:53:19 2004 On homer, in amdump.1 these lines: amdump: start at Wed Jun 9 00:45:01 PDT 2004 amdump: datestamp 20040609 planner: pid 9813 executable /usr/lib/amanda/planner version 2.4.4p1 planner: build: VERSION=Amanda-2.4.4p1 ... setup_estimate: marge:/var: command 0, options: last_level 0 next_level0 21 level_days 0 getting estimates 0 (11503) 1 (0) -1 (-1) planner: time 0.125: setting up estimates for marge:/usr setup_estimate: marge:/usr: command 0, options: last_level 0 next_level0 21 level_days 0 getting estimates 0 (1163201) 1 (0) -1 (-1) planner: time 0.135: setting up estimates for marge:/ setup_estimate: marge:/: command 0, options: last_level 0 next_level0 21 level_days 0 getting estimates 0 (39486) 1 (0) -1 (-1) ... planner: time 223.483: got result for host homer disk /home: 0 - 4642543K, 4 - 899568K, -1 - -1K planner: time 10801.886: error result for host marge disk /: Estimate timeout fr om marge planner: time 10801.886: error result for host marge disk /usr: Estimate timeout from marge planner: time 10801.886: error result for host marge disk /var: Estimate timeout from marge planner: time 10801.886: getting estimates took 10801.690 secs It looks like homer was waiting a suffcient time for marge to reply, but the reply was dropped. Marge and homer are on the same switch. -- Steve _ Get fast, reliable Internet access with MSN 9 Dial-up now 3 months FREE! http://join.msn.click-url.com/go/onm00200361ave/direct/01/
Re: Estimate timeout
Steven Schoch wrote: It was working for several days, then all of a sudden it stopped and hasn't worked since. First thing to ask is: what did change since then? Installed something? Reconfigured something? Rebooted system? amandad: time 447.906: sending REP packet: It took less than 550 seconds to estimate all of it. planner: time 10801.886: error result for host marge disk /: Estimate and server timed out after 3 DLE's * 2 lvls * 1800 sec = 10800 seconds It looks like homer was waiting a suffcient time for marge to reply, but the reply was dropped. Yes, indeed. Marge and homer are on the same switch. Are there other clients besides marge? Is there a local firewall activated on homer? Try to find out where the UDP packet got dropped, using tcpdump or etherreal or other network analyzer on homer and marge. -- Paul Bijnens, XplanationTel +32 16 397.511 Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax +32 16 397.512 http://www.xplanation.com/ email: [EMAIL PROTECTED] *** * I think I've got the hang of it now: exit, ^D, ^C, ^\, ^Z, ^Q, F6, * * quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, * * stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt, abort, hangup, * * PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e, kill -1 $$, shutdown, * * kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, Stop-A, ...* * ... Are you sure? ... YES ... Phew ... I'm out * ***
Re: Estimate timeout
on Wed, 09 Jun 2004 Paul Bijnens wrote: Try to find out where the UDP packet got dropped, using tcpdump or etherreal or other network analyzer on homer and marge. Now we're getting somewhere. The tcpdump shows this: 14:54:28.697197 homer.858 marge.amanda: udp 117 (DF) 14:54:29.176236 marge.amanda homer.858: udp 50 14:54:29.444159 marge.amanda homer..858: udp 83 14:54:29.444563 homer.858 marge.amanda: udp 50 (DF) 14:54:29.445650 homer.858 marge.amanda: udp 531 (DF) 14:54:29.525614 marge.amanda homer.858: udp 50 15:01:56.739172 marge.amanda homer.858: udp 184 15:01:56.739818 homer marge: icmp: host homer unreachable - admin prohibited [tos 0xc0] 15:02:06.743312 marge.amanda homer.858: udp 184 15:02:06.743992 homer marge: icmp: host homer unreachable - admin prohibited [tos 0xc0] My guess is that ICMP message is something to do with a firewall. -- Steve _ MSN 9 Dial-up Internet Access fights spam and pop-ups now 3 months FREE! http://join.msn.click-url.com/go/onm00200361ave/direct/01/
Re: Estimate timeout on Mac OS X
On Monday 05 April 2004 01:25, David Chin wrote: Hi, I've almost got amanda to run on a PowerBook G4 with Mac OS X.3.3. Right now, I have it set up with virtual tapes on a separate disk. Everything goes well, including the amcheck, but when I rum amdump, the backup doesn't go. The mailed log of the run is below. Can someone point me in the right direction? Thanks in advance, --Dave These dumps were to tape daily11. The next tape Amanda expects to use is: a new tape. The next new tape already labelled is: daily01. FAILURE AND STRANGE DUMP SUMMARY: localhost /Users/drauh lev 0 FAILED [Estimate timeout from localhost] STATISTICS: Total Full Daily Estimate Time (hrs:min)0:15 Run Time (hrs:min) 0:15 Dump Time (hrs:min)0:00 0:00 0:00 Output Size (meg) 0.00.00.0 Original Size (meg) 0.00.00.0 Avg Compressed Size (%) -- -- -- Filesystems Dumped0 0 0 Avg Dump Rate (k/s) -- -- -- Tape Time (hrs:min)0:00 0:00 0:00 Tape Size (meg) 0.00.00.0 Tape Used (%) 0.00.00.0 Filesystems Taped 0 0 0 Avg Tp Write Rate (k/s) -- -- -- USAGE BY TAPE: Label Time Size %Nb daily11 0:00 0.00.0 0 NOTES: planner: Adding new disk localhost:/Users/drauh. driver: WARNING: got empty schedule from planner taper: tape daily11 kb 0 fm 0 [OK] DUMP SUMMARY: DUMPER STATSTAPER STATS HOSTNAME DISKL ORIG-KB OUT-KB COMP% MMM:SS KB/s MMM:SS KB/s -- - localhost-sers/drauh 0 FAILED Here is the first potential problem. Even with all the warnings plastered all over the FAQ and Docs, folks still insist on useing a universal name instead of the FQDN. Second, did you build amanda as the user amanda, then become root to do the make install? I'm thinking, just a hunch because you've not posted enough info, that there is either a permissions problem, or an .amandahosts problem, but in the latter case it will usually tell you about quite plainly. --- (brought to you by Amanda version 2.4.4p2) While it should run ok, 2.4.4p2 is beginning to get a bit long in the tooth. We're up to 2.4.5beta something or other, and I've not found anything beta about it. It just works. -- Cheers, Gene There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order. -Ed Howdershelt (Author) 99.22% setiathome rank, not too shabby for a WV hillbilly Yahoo.com attornies please note, additions to this message by Gene Heskett are: Copyright 2004 by Maurice Eugene Heskett, all rights reserved.
Re: Estimate timeout on Mac OS X
On 5 Apr 2004, at 03:02, Gene Heskett wrote: Here is the first potential problem. Even with all the warnings plastered all over the FAQ and Docs, folks still insist on useing a universal name instead of the FQDN. Yes, I knew about the problems with a universal name. I just wanted to get something up quickly as a test. My machine sits NATted at home and doesn't have a real FQDN. But anyway, I changed it: 1. make my wireless AP give my machine a fixed address 2. add an entry to /etc/hosts -- 192.168.0.111 myhostname Second, did you build amanda as the user amanda, then become root to do the make install? I decided to avoid all permission stuff by running everything as root. Yes, I know the dangers, and I am willing to live with the risk for now. While it should run ok, 2.4.4p2 is beginning to get a bit long in the tooth. We're up to 2.4.5beta something or other, and I've not found anything beta about it. It just works. Are you running it on OS X? I had to edit some of the source to get it to compile. I have 2.4.4p2 running in my lab - RH7.3 server, mix of RH, Fedora, and HP-UX clients - so I figure I'd stick with something I knew was already working. No dice, still. I'll try setting up a separate amanda user first, and then go on and try the beta code. I'll dig around in the code as last resort since Google didn't find me any interesting links for a search on 'estimate timeout amanda' --Dave These dumps were to tape daily12. The next tape Amanda expects to use is: a new tape. The next new tape already labelled is: daily01. FAILURE AND STRANGE DUMP SUMMARY: Ginger /Users/drauh lev 0 FAILED [Estimate timeout from Ginger] STATISTICS: Total Full Daily Estimate Time (hrs:min)0:15 Run Time (hrs:min) 0:15 Dump Time (hrs:min)0:00 0:00 0:00 Output Size (meg) 0.00.00.0 Original Size (meg) 0.00.00.0 Avg Compressed Size (%) -- -- -- Filesystems Dumped0 0 0 Avg Dump Rate (k/s) -- -- -- Tape Time (hrs:min)0:00 0:00 0:00 Tape Size (meg) 0.00.00.0 Tape Used (%) 0.00.00.0 Filesystems Taped 0 0 0 Avg Tp Write Rate (k/s) -- -- -- USAGE BY TAPE: Label Time Size %Nb daily12 0:00 0.00.0 0 NOTES: planner: Adding new disk Ginger:/Users/drauh. driver: WARNING: got empty schedule from planner taper: tape daily12 kb 0 fm 0 [OK] DUMP SUMMARY: DUMPER STATSTAPER STATS HOSTNAME DISKL ORIG-KB OUT-KB COMP% MMM:SS KB/s MMM:SS KB/s -- - Ginger -sers/drauh 0 FAILED --- (brought to you by Amanda version 2.4.4p2)
Re: Estimate timeout on Mac OS X
On Monday 05 April 2004 04:07, David Chin wrote: On 5 Apr 2004, at 03:02, Gene Heskett wrote: Here is the first potential problem. Even with all the warnings plastered all over the FAQ and Docs, folks still insist on useing a universal name instead of the FQDN. Yes, I knew about the problems with a universal name. I just wanted to get something up quickly as a test. My machine sits NATted at home and doesn't have a real FQDN. But anyway, I changed it: 1. make my wireless AP give my machine a fixed address 2. add an entry to /etc/hosts -- 192.168.0.111 myhostname Second, did you build amanda as the user amanda, then become root to do the make install? I decided to avoid all permission stuff by running everything as root. Yes, I know the dangers, and I am willing to live with the risk for now. amanda checks to see who she is, and amdump will not run as root. Tear it all back out and reinstall according to the instructions. This requirement is a security related requirement, and really isn't open for discussion. Where amanda needs root perms, she will do an suid root to gain the perms she needs. Make a normal user amanda' and make this user a member of the group 'disk' or 'backup'. As root, do a chown -R amanda:disk amanda-2.4.5b1-20040326 (if thats the name of the src tree) before starting the build. I maintain these src trees in /home/amanda here. You'll also need to change the perms on the tarball itself because lately the tarballs are not owned by amanda if root does the download. Minor detail. I also use a script to do the configuration and initial make because its consistent and repeatable from snapshot to snapshot without relying on my aged, occasionally fading memory. I copy this script into the new src tree when a new snapshot comes out, and run it from the top level directory of the src. The script: -gh.cf #!/bin/sh # since I'm always forgetting to su amanda... if [ `whoami` != 'amanda' ]; then echo echo Warning echo Amanda needs to be configured and built by the user amanda, echo but must be installed by the user root. echo exit 1 fi make clean rm -f config.status config.cache ./configure --with-user=amanda \ --with-group=disk \ --with-owner=amanda \ --with-tape-device=/dev/nst0 \ --with-changer-device=/dev/sg1 \ --with-gnu-ld --prefix=/usr/local \ --with-debugging=/tmp/amanda-dbg/ \ --with-tape-server=FQDN.of.the.server \ --with-amandahosts \ --with-configdir=/usr/local/etc/amanda make ---end of script- remove the changer device line if you don't have a robotic changer. The Fully Qualified Domain Name (FQDN) of the tape server (or its ip address) must be used. Adjust the device name to be whatever the NON-rewinding on file close device is on your system. Set the x bit (chmod +x script.name) Become amanda and execute it with ./script.name. Then become root and do a make install I doubt you'll need to do it, but the estimate timeout value ('etimeout' in your amanda.conf) which is defaulted to 10 minutes (600 seconds) per disklist entry might have to be increased. I did that early on when it was running on a much slower machine, but now on this box a 44 member disklist typically takes 22 minutes to estimate. The backup will in any event commence when all estimates have been obtained, or have timed out, unlikely on todays hardware such as your G5. [...] -- Cheers, Gene There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order. -Ed Howdershelt (Author) 99.22% setiathome rank, not too shabby for a WV hillbilly Yahoo.com attornies please note, additions to this message by Gene Heskett are: Copyright 2004 by Maurice Eugene Heskett, all rights reserved.
Estimate timeout on Mac OS X
Hi, I've almost got amanda to run on a PowerBook G4 with Mac OS X.3.3. Right now, I have it set up with virtual tapes on a separate disk. Everything goes well, including the amcheck, but when I rum amdump, the backup doesn't go. The mailed log of the run is below. Can someone point me in the right direction? Thanks in advance, --Dave These dumps were to tape daily11. The next tape Amanda expects to use is: a new tape. The next new tape already labelled is: daily01. FAILURE AND STRANGE DUMP SUMMARY: localhost /Users/drauh lev 0 FAILED [Estimate timeout from localhost] STATISTICS: Total Full Daily Estimate Time (hrs:min)0:15 Run Time (hrs:min) 0:15 Dump Time (hrs:min)0:00 0:00 0:00 Output Size (meg) 0.00.00.0 Original Size (meg) 0.00.00.0 Avg Compressed Size (%) -- -- -- Filesystems Dumped0 0 0 Avg Dump Rate (k/s) -- -- -- Tape Time (hrs:min)0:00 0:00 0:00 Tape Size (meg) 0.00.00.0 Tape Used (%) 0.00.00.0 Filesystems Taped 0 0 0 Avg Tp Write Rate (k/s) -- -- -- USAGE BY TAPE: Label Time Size %Nb daily11 0:00 0.00.0 0 NOTES: planner: Adding new disk localhost:/Users/drauh. driver: WARNING: got empty schedule from planner taper: tape daily11 kb 0 fm 0 [OK] DUMP SUMMARY: DUMPER STATSTAPER STATS HOSTNAME DISKL ORIG-KB OUT-KB COMP% MMM:SS KB/s MMM:SS KB/s -- - localhost-sers/drauh 0 FAILED --- (brought to you by Amanda version 2.4.4p2)
estimate timeout
Hi all 'planner' just told me estimate timeout ... on my new archive-conf im testing. The normal-backup works ok even on level 0 dumps from the same host. Any ideas what i can do about it? //Mats