Re: FAILED backups on different hosts each night
Jon LaBadie [EMAIL PROTECTED] 08/29/06 7:50 PM If I understand the configuration, svr2 has 4 separate installations or the amanda client. To amanda it appears as 4 distinct remote hosts. As you indicate different logical hosts fail nightly, it sounds like all have also had successful backups, thus the basic config is ok. Do the 4 logical hosts also have their own separate disks and network controllers? Or is a single network interface serving multiple IP addresses and the hosts have separate partitions on a shared disk(s)? I ask from the view that amanda considers them distinct and may be asking for dumps simultaneously from all 4, possibly overloading the shared resources on the single physical client, svr2. This could trigger some timeout mechanism that daily hits different logical hosts. Even if you are only running a single dumper so multiple, simultaneous dumps do not occur on svr2, perhaps the interval between estimates and dumps is so long that a network timeout is triggered. These are total guesses, just seeing it they might fly. -- Jon H. LaBadie [EMAIL PROTECTED] JG Computing 4455 Province Line Road(609) 252-0159 Princeton, NJ 08540-4322 (609) 683-7220 (fax) Thanks for the reply Jon, Yes you are right is assuming my setup. All 4 servers (3 XEN guests + host) are using the same SATA disks and single NIC interface. All servers are very low load systems, just running different web servers that aren't hit very regularly. I think it could be a timing issue also, but am a bit unsure of where to look. I see that I get all the estimates, and I always get at least 2 dumps in a run (1 from my physical backup server and 1 from one of the XEN host/guest servers). What files should I be looking at to see any timeout errors? All I seem to find is FAILED messages for the dumps but no explanation of why -- maybe I need to turn up debugging from default. I've had a look at both client and server but there are so many and I'm not clear as to which I should concentrate on. Cheers, Stephen Carter Retrac Networking Limited www: http://www.retnet.co.uk Ph: +44 (0)7870 218 693 Fax: +44 (0)870 7060 056 CNA, CNE 6, CNS, CCNA, MCSE 2003
Re: FAILED backups on different hosts each night
As no one has responded, I guess no one else has a clue either. :(( Of course, not having a clue seldom stops me from posting ;) On Sun, Aug 27, 2006 at 04:56:03PM +0100, Stephen Carter wrote: I have 2 physical boxes I'm backing up, one called srv1 and the other called srv2. srv1 is always backed up correctly, which also has the tape device and runs the amanda backups. srv2 is a SLES 10 server running 3 virtual SLES 10 XEN guests within it, but I'm treating them as separate physical boxes for the purposes of amanda. On different nights, different XEN guests fail (including the host, srv2) with a could not connect error in the amanda report. amstatus says 'wait for dumping driver: (aborted:could not connect to data port: Connection timed out) If I understand the configuration, svr2 has 4 separate installations or the amanda client. To amanda it appears as 4 distinct remote hosts. As you indicate different logical hosts fail nightly, it sounds like all have also had successful backups, thus the basic config is ok. Do the 4 logical hosts also have their own separate disks and network controllers? Or is a single network interface serving multiple IP addresses and the hosts have separate partitions on a shared disk(s)? I ask from the view that amanda considers them distinct and may be asking for dumps simultaneously from all 4, possibly overloading the shared resources on the single physical client, svr2. This could trigger some timeout mechanism that daily hits different logical hosts. Even if you are only running a single dumper so multiple, simultaneous dumps do not occur on svr2, perhaps the interval between estimates and dumps is so long that a network timeout is triggered. These are total guesses, just seeing it they might fly. -- Jon H. LaBadie [EMAIL PROTECTED] JG Computing 4455 Province Line Road(609) 252-0159 Princeton, NJ 08540-4322 (609) 683-7220 (fax)
FAILED backups on different hosts each night
I have 2 physical boxes I'm backing up, one called srv1 and the other called srv2. srv1 is always backed up correctly, which also has the tape device and runs the amanda backups. srv2 is a SLES 10 server running 3 virtual SLES 10 XEN guests within it, but I'm treating them as separate physical boxes for the purposes of amanda. On different nights, different XEN guests fail (including the host, srv2) with a could not connect error in the amanda report. amstatus says 'wait for dumping driver: (aborted:could not connect to data port: Connection timed out) amdump.1 reports all estimates worked, with a FAILED QUEUE: empty and the DONE QUEUE: includes all DLE's listed in the disklist. amdump.1 then reports the dumper process, 2 of which work with my other 4 DLE's failing with: dumper: stream_client: connect to 192.168.0.9:12359 failed: Connection timed out I allow all traffic between srv1 (my backup server) and all clients, and thinking it may have been a throughput problem I reduced parallel dumps to 1 which hasn't helped. A copy of the latest amstatus a section from my amdump.1 files are below. Any help would be greatly appreciated. AMSTATUS OUTPUT: srv1:/var/lib/amanda/DailySet1 # amstatus DailySet1 Using /var/lib/amanda/DailySet1/amdump.1 from Fri Aug 25 01:00:02 BST 2006 srv1.retnet.co.uk:md0 3 352152k finished (1:17:18) mailscan.retnet.co.uk:hda2 0 1062300k wait for dumping driver: (aborted:could not connect to data port: Connection timed out) srv2.retnet.co.uk:/srv/install 0 21497250k wait for dumping driver: (aborted:could not connect to data port: Connection timed out) srv2.retnet.co.uk:md0 0 4242910k wait for dumping driver: (aborted:could not connect to data port: Connection timed out) web-1.retnet.co.uk:hda2 0 699770k finished (1:33:02) web-2.retnet.co.uk:hda2 0 906355k wait for dumping driver: (aborted:could not connect to data port: Connection timed out) SUMMARY part real estimated size size partition : 6 estimated : 6 28769687k flush : 0 0k failed : 00k ( 0.00%) wait for dumping: 4 27708815k ( 96.31%) dumping to tape : 00k ( 0.00%) dumping : 0 0k 0k ( 0.00%) ( 0.00%) dumped : 2 1051922k 1060872k ( 99.16%) ( 3.66%) wait for writing: 0 0k 0k ( 0.00%) ( 0.00%) wait to flush : 0 0k 0k (100.00%) ( 0.00%) writing to tape : 0 0k 0k ( 0.00%) ( 0.00%) failed to tape : 0 0k 0k ( 0.00%) ( 0.00%) taped : 2 1051922k 1060872k ( 99.16%) ( 3.66%) tape 1: 2 1051922k 1060872k ( 2.94%) DailySet1-5 1 dumper idle : not-idle taper idle network free kps: 2600 holding space : 33792000k (100.00%) dumper0 busy : 0:40:08 ( 95.25%) taper busy : 0:06:47 ( 16.10%) 0 dumpers busy : 0:00:00 ( 0.00%) 1 dumper busy : 0:42:08 (100.00%)not-idle: 0:28:40 ( 68.07%) no-dumpers: 0:13:27 ( 31.93%) srv1:/var/lib/amanda/DailySet1 # AMDUMP.1 PARTIAL OUTPUT: driver: adding holding disk 0 dir /mnt/dumps size 33792000 reserving 33792000 out of 33792000 for degraded-mode dumps driver: flush size 0 driver: start time 812.693 inparallel 1 bandwidth 2600 diskspace 33792000 dir OBSOLETE datestamp 20060825 driver: drain-ends tapeq FIRST big-dumpers ttt driver: result time 812.693 from taper: TAPER-OK driver: send-cmd time 812.703 to dumper0: FILE-DUMP 00-1 /mnt/dumps/20060825/srv1.retnet.co.uk.md0.3 srv1.retnet.co.uk feff9ffe0f md0 NODEVICE 3 2006:8:22:0:36:52 1073741824 GNUTAR 356544 |;bsd-auth;compress-best;index;exclude-list=/usr/lib/amanda/exclude.gtar; driver: state time 812.703 free kps: -2090 space: 33435456 taper: idle idle-dumpers: 0 qlen tapeq: 0 runq: 5 roomq: 0 wakeup: 86400 driver-idle: not-idle driver: interface-state time 812.703 if : free -3890 if ETH0: free 800 if LOCAL: free 1000 driver: hdisk-state time 812.703 hdisk 0: free 33435456 dumpers 1 dumper: stream_client: connected to 192.168.0.1.51236 dumper: stream_client: our side is 0.0.0.0.51239 dumper: stream_client: connected to 192.168.0.1.51237 dumper: stream_client: our side is 0.0.0.0.51240 dumper: stream_client: connected to 192.168.0.1.51238 dumper: stream_client: our side is 0.0.0.0.51241 driver: result time 901.369 from dumper0: DONE 00-1 441620 352152 89 [sec 88.636 kb 352152 kps 3973.0 orig-kb 441620] driver: finished-cmd time 901.387 dumper0 dumped srv1.retnet.co.uk:md0 driver: send-cmd time 901.387 to taper: FILE-WRITE 00-2 /mnt/dumps/20060825/srv1.retnet.co.uk.md0.3 srv1.retnet.co.uk feff9ffe0f md0 3 20060825 driver: startaflush: FIRST srv1.retnet.co.uk md0 352185 3584 driver: send-cmd time 901.387 to dumper0: FILE-DUMP
Re: Old failed backups
Lucio a écrit : Two problems (maybe related): 1 - I've got two failed backups in the holding disk. I do not want to flush them on tape for a number of reasons, one being because they aren't useful anymore. I'm perhaps wrong, but what I do in that case is : $ rm -fr /somewhere/holdingDisk/Dailyset1/* $ amcleanup Dailyset1 -- Nicolas Ecarnot
Re: Old failed backups
Lucio a écrit : 1 - I've got two failed backups in the holding disk. I do not want to flush them on tape for a number of reasons, one being because they aren't useful anymore. $ rm -fr /somewhere/holdingDisk/Dailyset1/* $ amcleanup Dailyset1 Does amcleanup fix the index as well? Will amflush stop telling me there are two backups on the holding disk? I don't want to rm -fr if the index gets corrupted because rm -fr is not reversible.
Old failed backups
I've got two very old failed backups in the holding disk. I do not want to flush them on tape for a number of reasons, one being because they aren't useful anymore. How do I force amanda to empty the holding disk and to update its index files without writing to a tape? Lucio.
Re: Failed Backups
Steve, On Wed, Jun 04, 2003 at 02:29:20PM -, smw_purdue wrote: Chris, I'm having the same problem using a similar configuration of backups to disk without any holding disks. Every time Amanda drops into degraded mode it's because an error occurred with one of the clients (usually a timeout, indicating that a client system was unavailable). I would suspect that there's a bug in the code that puts Amanda into degraded mode on more errors than just a tape error. Notice in your log that you have an unknown response from gilgamesh. This error was probably what kicked Amanda into degraded mode. That is exactly what appears to be happening. I configured a holding disk in an attempt to eliminate that as a possible cause. In my case, the problem is intermittent with everything working fine for some time and then I a failure. The failure may be some file systems on a given host or most/all of the backup run. Today, I had two file systems fail on the again on gilgamesh and I began checking the various logs for issue. What I found in sendbackup.lotsofnumbers.debug is: ---[ begin ]--- sendbackup: time 0.002: stream_server: waiting for connection: 0.0.0.0.1496 sendbackup: time 0.002: stream_server: waiting for connection: 0.0.0.0.1497 sendbackup: time 0.002: stream_server: waiting for connection: 0.0.0.0.1498 sendbackup: time 0.003: waiting for connect on 1496, then 1497, then 1498 sendbackup: time 29.996: stream_accept: timeout after 30 seconds sendbackup: time 29.996: timeout on data port 1496 sendbackup: time 59.996: stream_accept: timeout after 30 seconds sendbackup: time 59.996: timeout on mesg port 1497 sendbackup: time 89.996: stream_accept: timeout after 30 seconds sendbackup: time 89.996: timeout on index port 1498 sendbackup: time 89.996: pid 5263 finish time Fri Jun 6 00:47:44 2003 ---[ end ]--- Anybody out there have time to debug the source? I may take a look at it but time is at a premium right now... (when isn't it???). Anyone have any ideas? This only happens occasionally and I haven't yet been able to draw a correlation. Thanks, Chris
Re: Failed Backups
Chris, I looked around a little in the Amanda source code and convinced myself that there was a bug there. I sent a note to to the amanda-hackers mailing list and received a prompt reply from Jean-Louis Martineau with a patch that fixed the problem for me. I'll attach his message and patch. Hope that helps! Steve Chris Gordon wrote: Steve, On Wed, Jun 04, 2003 at 02:29:20PM -, smw_purdue wrote: Chris, I'm having the same problem using a similar configuration of backups to disk without any holding disks. Every time Amanda drops into degraded mode it's because an error occurred with one of the clients (usually a timeout, indicating that a client system was unavailable). I would suspect that there's a bug in the code that puts Amanda into degraded mode on more errors than just a tape error. Notice in your log that you have an unknown response from gilgamesh. This error was probably what kicked Amanda into degraded mode. That is exactly what appears to be happening. I configured a holding disk in an attempt to eliminate that as a possible cause. In my case, the problem is intermittent with everything working fine for some time and then I a failure. The failure may be some file systems on a given host or most/all of the backup run. Today, I had two file systems fail on the again on gilgamesh and I began checking the various logs for issue. What I found in sendbackup.lotsofnumbers.debug is: ---[ begin ]--- sendbackup: time 0.002: stream_server: waiting for connection: 0.0.0.0.1496 sendbackup: time 0.002: stream_server: waiting for connection: 0.0.0.0.1497 sendbackup: time 0.002: stream_server: waiting for connection: 0.0.0.0.1498 sendbackup: time 0.003: waiting for connect on 1496, then 1497, then 1498 sendbackup: time 29.996: stream_accept: timeout after 30 seconds sendbackup: time 29.996: timeout on data port 1496 sendbackup: time 59.996: stream_accept: timeout after 30 seconds sendbackup: time 59.996: timeout on mesg port 1497 sendbackup: time 89.996: stream_accept: timeout after 30 seconds sendbackup: time 89.996: timeout on index port 1498 sendbackup: time 89.996: pid 5263 finish time Fri Jun 6 00:47:44 2003 ---[ end ]--- Anybody out there have time to debug the source? I may take a look at it but time is at a premium right now... (when isn't it???). Anyone have any ideas? This only happens occasionally and I haven't yet been able to draw a correlation. Thanks, Chris -- Steven M. Wilson, Systems and Network Manager Markey Center for Structural Biology Purdue University [EMAIL PROTECTED]765.496.1946 --- server-src/driver.c.orig2003-01-01 18:28:54.0 -0500 +++ server-src/driver.c 2003-06-04 15:54:44.0 -0400 @@ -2242,10 +,10 @@ error(error [dump to tape DONE result_argc != 5: %d], result_argc); } - free_serial(result_argv[2]); - if(failed == 1) goto tryagain; /* dump didn't work */ - else if(failed == 2) goto fatal; + else if(failed == 2) goto failed_dumper; + + free_serial(result_argv[2]); /* every thing went fine */ update_info_dumper(dp, origsize, dumpsize, dumptime); @@ -2259,9 +2239,10 @@ case TRYAGAIN: /* TRY-AGAIN handle err mess */ tryagain: + headqueue_disk(runq, dp); +failed_dumper: update_failed_dump_to_tape(dp); free_serial(result_argv[2]); - headqueue_disk(runq, dp); tape_left = tape_length; break; @@ -2269,7 +2250,6 @@ case TAPE_ERROR: /* TAPE-ERROR handle err mess */ case BOGUS: default: -fatal: update_failed_dump_to_tape(dp); free_serial(result_argv[2]); failed = 2; /* fatal problem */ ---BeginMessage--- Hi Steven, Could you try this patch, It should apply to the latest 2.4.4 snapshot for http://www.iro.umontreal.ca/~martinea/amanda Jean-Louis On Wed, Jun 04, 2003 at 02:16:14PM -0500, Steven M. Wilson wrote: I have a question for the Amanda development experts. I'm using version 2.4.4 and backing up to hard disk directly (no tapes, no holding disks). On several occasions, I've had a client error cause Amanda to go into degraded mode. It appears that the dump_to_tape function (server-src/driver.c) takes any FATAL dumper error and forces Amanda into degraded mode. Shouldn't the code be more discerning as to what caused the error? I would think that Amanda should go into degraded mode only if an error were related to the output device. In my case the error was on the client and unrelated to writing the backup to disk. Here's some of the related amdump messages: driver: result time 6754.491 from dumper0: FAILED 01-00368 [data timeout] taper: reader-side: got label slot024 filenum 184 driver: result time 6754.492 from taper: DONE 00-00367 slot024 184 [sec 2174.408 kb 2061376 kps 948.0 {wr: writers 64419 rdwait 2166.220 wrwait 7.959 filemark 0.021}] driver: error time 6754.503 serial gen mismatch dump of
Re: Failed Backups
Jon, Thanks for looking at this for me. On Sat, May 31, 2003 at 03:37:18AM -0400, Jon LaBadie wrote: -- AMANDA MAIL REPORT -- These dumps were to tape standard14. The next 7 tapes Amanda expects to used are: standard16, standard17, standard18, +standard19, standard20, standard21, standard22. Interesting that standard15 is not mentioned. It may have bearing on my guesses. That tape has been used before -- I have been running amanda long enough for it to cycle through all of my tapes and to have used standard15 before. I have rechecked everything to make sure it is setup like all of my other tapes (I used a script to initially create them all to minimize chance of errors.). FAILURE AND STRANGE DUMP SUMMARY: gilgamesh. / lev 1 FAILED [unknown response: 0;] gilgamesh. / lev 1 FAILED [dump to tape failed] goblin.the /var lev 1 FAILED [can't dump no-hold disk in degraded mode] A scan of the source shows that message only coming in one place. At that time the backup has entered degraded mode. Further, the message is only printed if the backup is not using a holding disk. So first I presume you are not using a holding disk. I have added a holding disk to see if that helps. So I'm guessing you have a size limit on your disk tapes and standard14 reached that limit. Yes, I have them set to 5 GB. I can't find any reference in the man pages, but would setting the length to 0 let the tape be infinitely long? When the changer script went to switch to standard15, an error occured. That put you into degraded mode, and without a holding disk, backups of all subsequent DLE's failed. A place to start looking at least. I've read over all of the man pages and the limited data I've found on the net. From your comments, it seems that reading the source is the only really good source of detailed documentation and troubleshooting. Is that true and if so, is there a specific place I should start reading to get details of error messages, etc? Thanks, Chris
Failed Backups
I've been running amanda for several months with backups to disk (amanda version 2.4.3). Recently I've had backups failing and can't figure out what the problem may be. Some details: - Clients and backup server are Linux (RedHat 8) - backup disk has plenty of free space (80 GB drive with only 35% in use) Below is an example report from one of the failed dumps. Nothing has recently changed that should affect backups. I haven't found anything to help point me in the right direction and would appreciate any points. Thanks, Chris -- AMANDA MAIL REPORT -- These dumps were to tape standard14. The next 7 tapes Amanda expects to used are: standard16, standard17, standard18, +standard19, standard20, standard21, standard22. FAILURE AND STRANGE DUMP SUMMARY: gilgamesh. / lev 1 FAILED [unknown response: 0;] gilgamesh. / lev 1 FAILED [dump to tape failed] goblin.the /var lev 1 FAILED [can't dump no-hold disk in degraded mode] gilgamesh. /home lev 1 FAILED [can't dump no-hold disk in degraded mode] gilgamesh. /usr lev 1 FAILED [can't dump no-hold disk in degraded mode] hades.theo /var lev 1 FAILED [can't dump no-hold disk in degraded mode] psyche.the /usr lev 1 FAILED [can't dump no-hold disk in degraded mode] psyche.the /var lev 3 FAILED [can't dump no-hold disk in degraded mode] hades.theo / lev 1 FAILED [can't dump no-hold disk in degraded mode] goblin.the /usr lev 2 FAILED [can't dump no-hold disk in degraded mode] STATISTICS: Total Full Daily Estimate Time (hrs:min)0:02 Run Time (hrs:min) 0:02 Dump Time (hrs:min)0:00 0:00 0:00 Output Size (meg) 3.00.03.0 Original Size (meg) 9.40.09.4 Avg Compressed Size (%)31.5--31.5 (level:#disks ...) Filesystems Dumped9 0 9 (1:9) Avg Dump Rate (k/s) 298.7-- 298.7 Tape Time (hrs:min)0:00 0:00 0:00 Tape Size (meg) 3.50.03.5 Tape Used (%) 0.30.00.3 (level:#disks ...) Filesystems Taped10 0 10 (1:10) Avg Tp Write Rate (k/s) 300.0-- 300.0 ^L FAILED AND STRANGE DUMP DETAILS: /-- gilgamesh. / lev 1 FAILED [unknown response: 0;] \ ^L NOTES: planner: Incremental of psyche.theory14.net:/var bumped to level 3. planner: Full dump of psyche.theory14.net:/var promoted from 2 days ahead. taper: tape standard14 kb 3552 fm 10 [OK] DUMP SUMMARY: DUMPER STATS TAPER STATS HOSTNAME DISKL ORIG-KB OUT-KB COMP% MMM:SS KB/s MMM:SS KB/s -- - gilgamesh.th / 1 FAILED --- gilgamesh.th /boot 1 10 64 640.0 0:00 0.0 0:00 218.8 gilgamesh.th /home 1 FAILED --- gilgamesh.th /usr1 FAILED --- gilgamesh.th /var12370448 18.9 0:02 180.5 0:03 131.1 goblin.theor / 11320160 12.1 0:01 76.9 0:01 121.3 goblin.theor /boot 1 10 64 640.0 0:00 0.0 0:001008.4 goblin.theor /home 14710 2496 53.0 0:03 963.0 0:03 974.2 goblin.theor /usr2 FAILED --- goblin.theor /var1 FAILED --- hades.theory / 1 FAILED --- hades.theory /boot 1 10 64 640.0 0:00 0.0 0:00 171.2 hades.theory /var1 FAILED --- psyche.theor / 11080128 11.9 0:03 26.7 0:03 37.6 psyche.theor /boot 1 10 64 640.0 0:00 0.0 0:00 603.0 psyche.theor /home 1 90 64 71.1 0:00 15.5 0:00 243.9 psyche.theor /usr1 FAILED --- psyche.theor /var3 FAILED --- (brought to you by Amanda version 2.4.3)
Re: Failed Backups
On Fri, May 30, 2003 at 11:32:54PM -0400, Chris Gordon wrote: I've been running amanda for several months with backups to disk (amanda version 2.4.3). Recently I've had backups failing and can't figure out what the problem may be. Some details: - Clients and backup server are Linux (RedHat 8) - backup disk has plenty of free space (80 GB drive with only 35% in use) Below is an example report from one of the failed dumps. Nothing has recently changed that should affect backups. I haven't found anything to help point me in the right direction and would appreciate any points. -- AMANDA MAIL REPORT -- These dumps were to tape standard14. The next 7 tapes Amanda expects to used are: standard16, standard17, standard18, +standard19, standard20, standard21, standard22. Interesting that standard15 is not mentioned. It may have bearing on my guesses. FAILURE AND STRANGE DUMP SUMMARY: gilgamesh. / lev 1 FAILED [unknown response: 0;] gilgamesh. / lev 1 FAILED [dump to tape failed] goblin.the /var lev 1 FAILED [can't dump no-hold disk in degraded mode] A scan of the source shows that message only coming in one place. At that time the backup has entered degraded mode. Further, the message is only printed if the backup is not using a holding disk. So first I presume you are not using a holding disk. Second, looking at the source (quickly, so I might have missed something) degraded mode is only entered after a dump starts if a tape error occurs. So I'm guessing you have a size limit on your disk tapes and standard14 reached that limit. When the changer script went to switch to standard15, an error occured. That put you into degraded mode, and without a holding disk, backups of all subsequent DLE's failed. A place to start looking at least. -- Jon H. LaBadie [EMAIL PROTECTED] JG Computing 4455 Province Line Road(609) 252-0159 Princeton, NJ 08540-4322 (609) 683-7220 (fax)
Problem: Failed backups overwrite good backups
I am using Amanda 2.4.3b3 on a Linux RH 7.2 box to dump several windows clients to disk. I discovered a problem yesterday with my process. I run all of the backup jobs from a script. Each backup is a full backup. When one job completes, the next job runs. This all works correctly if the backup server is able to access the machine. If it is not able to connect to the machine, prehaps the machine is off, the existing backup files are overwritten. Does anyone know of a way to prevent this from happening? If it fails, I want it to leave the existing backup files. The dumpcycles are set to 0 and the number of tapes is 1. I am just getting the system going, and I did not have a good feel for how much drive space was going to be consummed by the backups. If any one cares, this is how the system works. I created a couple of web pages to allow the user to add their machines to the backup list. The web pages are restricted via ip address. The user is informed that this is experimental and that they should also backup their data to zip disk or cd. The user is also instructed to contact the support person to make changes to the computers to allow the backups to happen. We only backup My Documents and Eudora. The users enters some basic information into a form and some information like IP address is collected behind the scenes. The data from the form is added to a mysql database. Initially, my plan was to have the users submit a date and time to run the backup. The users did not like this idea. I guess too much work for them. Good thing I guess, as I read some time there after that I could not run concurrent amdump jobs. I wrote a C program to construct everything else. Before doing this, I had to construct some template files for amanda.conf, changer.conf, and the disklist. I use sed to create the useable files. Anyways, the program checks for any new additions to the list. I have 2 servers to backup 80 machines in 5 different buildings. Each server runs the program and only looks for certain subnets. If a new machine has been added, it creates all of the directories, files, and tapes. Next the program looks through the list for its machines and creates a shell script to perform the backups and uses at to schedule the script. As I said this al works fine. The only problems I have run into have been with ZoneAlarm and the users PC not being set up correctly. Thanks Jonathan Swaby
Re: Problem: Failed backups overwrite good backups
On Tue, Mar 11, 2003 at 09:15:10AM -0500, Jonathan Swaby wrote: I am using Amanda 2.4.3b3 on a Linux RH 7.2 box to dump several windows clients to disk. I discovered a problem yesterday with my process. I run all of the backup jobs from a script. Each backup is a full backup. When one job completes, the next job runs. This all works correctly if the backup server is able to access the machine. If it is not able to connect to the machine, prehaps the machine is off, the existing backup files are overwritten. Does anyone know of a way to prevent this from happening? If it fails, I want it to leave the existing backup files. Which files are overwritten? Is it the files in holding disk? that's normal if you run more than one amdump by day for the same disk. Essentially it is overwritting the tape. In my case the tape is a directory on disk. I assumed it would only do this if it had data to write, but that does not appear to be the case. Jean-Louis -- Jean-Louis Martineau email: [EMAIL PROTECTED] Departement IRO, Universite de Montreal C.P. 6128, Succ. CENTRE-VILLETel: (514) 343-6111 ext. 3529 Montreal, Canada, H3C 3J7Fax: (514) 343-5834
Re: Problem: Failed backups overwrite good backups
On Tue, Mar 11, 2003 at 01:35:48PM -0500, Jonathan Swaby wrote: On Tue, Mar 11, 2003 at 09:15:10AM -0500, Jonathan Swaby wrote: I am using Amanda 2.4.3b3 on a Linux RH 7.2 box to dump several windows clients to disk. I discovered a problem yesterday with my process. I run all of the backup jobs from a script. Each backup is a full backup. When one job completes, the next job runs. This all works correctly if the backup server is able to access the machine. If it is not able to connect to the machine, prehaps the machine is off, the existing backup files are overwritten. Does anyone know of a way to prevent this from happening? If it fails, I want it to leave the existing backup files. Which files are overwritten? Is it the files in holding disk? that's normal if you run more than one amdump by day for the same disk. Essentially it is overwritting the tape. In my case the tape is a directory on disk. I assumed it would only do this if it had data to write, but that does not appear to be the case. It's a tape, it is overwritten at every run, that's the way it works, that's the way it should works (like a tape). Jean-Louis -- Jean-Louis Martineau email: [EMAIL PROTECTED] Departement IRO, Universite de Montreal C.P. 6128, Succ. CENTRE-VILLETel: (514) 343-6111 ext. 3529 Montreal, Canada, H3C 3J7Fax: (514) 343-5834
Re: Problem: Failed backups overwrite good backups
On Tue, Mar 11, 2003 at 01:35:48PM -0500, Jonathan Swaby wrote: On Tue, Mar 11, 2003 at 09:15:10AM -0500, Jonathan Swaby wrote: I am using Amanda 2.4.3b3 on a Linux RH 7.2 box to dump several windows clients to disk. I discovered a problem yesterday with my process. I run all of the backup jobs from a script. Each backup is a full backup. When one job completes, the next job runs. This all works correctly if the backup server is able to access the machine. If it is not able to connect to the machine, prehaps the machine is off, the existing backup files are overwritten. Does anyone know of a way to prevent this from happening? If it fails, I want it to leave the existing backup files. Which files are overwritten? Is it the files in holding disk? that's normal if you run more than one amdump by day for the same disk. Essentially it is overwritting the tape. In my case the tape is a directory on disk. I assumed it would only do this if it had data to write, but that does not appear to be the case. It's a tape, it is overwritten at every run, that's the way it works, that's the way it should works (like a tape). I thought that it would erase the file only if it had something to write. It seems that it erases then checks to see if there is something to write. In any event my problem is solve I think. I wrote a small C program that takes its input from amcheck. If it sees the word ERROR or WARNING, it will return a value of 10. If amcheck works, it will return a value of 0. So, my script looks like this: su -c amcheck amachine1 | backup_test operator su -c amdump machine1 operator If backup_test returns a 0, it will do the dump. Thanks Jonathan Swaby Jean-Louis -- Jean-Louis Martineau email: [EMAIL PROTECTED] Departement IRO, Universite de Montreal C.P. 6128, Succ. CENTRE-VILLETel: (514) 343-6111 ext. 3529 Montreal, Canada, H3C 3J7Fax: (514) 343-5834
Failed backups
Hi, Since setting amanda up, I have constantly run into failed backups with timeouts reported. If I run amdump on the configuration, it works fine, but if I let the cron job call it during the night, it fails. We have a local network with a vpn to our remote servers. The tape server is only tasked with backing up some files on its hard drive and a share from an NT box. Both machines are local. The total size of the backups being requested are less than 5 Gig. and the tape capacity is 40 Gig. The NT box is also the dns server of first resort. Apparently, if/when the vpn goes down, sendsize gets lost and times out in reporting to amandad (if I understand correctly). If the vpn is up, sendsize has no difficulty whatsoever. I found this be seeing amandad and sendsize still running on the tape server at 7:30 AM when the cron job started at 1:00 AM. When I discovered that the vpn was down and restarted it, amandad and sendsize happily finished, reporting the timeout error. Unfortunately, the vpn goes down most nights, although not by design. Any ideas why sendsize would (mis)behave in this manner? Any ideas what I can do to work around this? Thank you. Lee
Failed backups
Hi, Since setting amanda up, I have constantly run into failed backups with timeouts reported. If I run amdump on the configuration, it works fine, but if I let the cron job call it during the night, it fails. We have a local network with a vpn to our remote servers. The tape server is only tasked with backing up some files on its hard drive and a share from an NT box. Both machines are local. The total size of the backups being requested are less than 5 Gig. and the tape capacity is 40 Gig. The NT box is also the dns server of first resort. Apparently, if/when the vpn goes down, sendsize gets lost and times out in reporting to amandad (if I understand correctly). If the vpn is up, sendsize has no difficulty whatsoever. I found this be seeing amandad and sendsize still running on the tape server at 7:30 AM when the cron job started at 1:00 AM. When I discovered that the vpn was down and restarted it, amandad and sendsize happily finished, reporting the timeout error. Unfortunately, the vpn goes down most nights, although not by design. Any ideas why sendsize would (mis)behave in this manner? Any ideas what I can do to work around this? Thank you. Lee
RE: Failed backups
Sounds a lot like what I am going through, but I know what my problem is, I just havn't fixed it yet. Basically the client tries to open a random UDP connection to the server between the 1-1024 port range. For security reasons, it uses a 'trusted' port range. You can set the port range when you compile Amanda, but that isn't the issue. The issue seems to be that the client MUST be able to contact the server's address on that range in order to work. This means that if the server is sitting behind a NAT device, the client must be able to reach the 'reverse NAT' address. Hope this make sense, or help a little. Sorry if it doesn't! -James -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Lee Fellows Sent: Thursday, May 16, 2002 6:04 AM To: [EMAIL PROTECTED] Subject: Failed backups Hi, Since setting amanda up, I have constantly run into failed backups with timeouts reported. If I run amdump on the configuration, it works fine, but if I let the cron job call it during the night, it fails. We have a local network with a vpn to our remote servers. The tape server is only tasked with backing up some files on its hard drive and a share from an NT box. Both machines are local. The total size of the backups being requested are less than 5 Gig. and the tape capacity is 40 Gig. The NT box is also the dns server of first resort. Apparently, if/when the vpn goes down, sendsize gets lost and times out in reporting to amandad (if I understand correctly). If the vpn is up, sendsize has no difficulty whatsoever. I found this be seeing amandad and sendsize still running on the tape server at 7:30 AM when the cron job started at 1:00 AM. When I discovered that the vpn was down and restarted it, amandad and sendsize happily finished, reporting the timeout error. Unfortunately, the vpn goes down most nights, although not by design. Any ideas why sendsize would (mis)behave in this manner? Any ideas what I can do to work around this? Thank you. Lee
RE: Failed backups
James, Yes, it does make sense. Fortunately, both of these machines reside on the same end of the vpn, and neither use a NAT'd address. My suspicion is that sendsize could not resolve its hostname do to network problems caused by the downed vpn. What puzzles me is why the vpn's being up or down would cause such problems. I have put the server's and NT's info in the hosts file on the tape server. Will see tonight if that corrects this problem. Thank you for your response! On Wed, 2002-05-15 at 12:34, James Kelty wrote: Sounds a lot like what I am going through, but I know what my problem is, I just havn't fixed it yet. Basically the client tries to open a random UDP connection to the server between the 1-1024 port range. For security reasons, it uses a 'trusted' port range. You can set the port range when you compile Amanda, but that isn't the issue. The issue seems to be that the client MUST be able to contact the server's address on that range in order to work. This means that if the server is sitting behind a NAT device, the client must be able to reach the 'reverse NAT' address. Hope this make sense, or help a little. Sorry if it doesn't! -James -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Lee Fellows Sent: Thursday, May 16, 2002 6:04 AM To: [EMAIL PROTECTED] Subject: Failed backups Hi, Since setting amanda up, I have constantly run into failed backups with timeouts reported. If I run amdump on the configuration, it works fine, but if I let the cron job call it during the night, it fails. We have a local network with a vpn to our remote servers. The tape server is only tasked with backing up some files on its hard drive and a share from an NT box. Both machines are local. The total size of the backups being requested are less than 5 Gig. and the tape capacity is 40 Gig. The NT box is also the dns server of first resort. Apparently, if/when the vpn goes down, sendsize gets lost and times out in reporting to amandad (if I understand correctly). If the vpn is up, sendsize has no difficulty whatsoever. I found this be seeing amandad and sendsize still running on the tape server at 7:30 AM when the cron job started at 1:00 AM. When I discovered that the vpn was down and restarted it, amandad and sendsize happily finished, reporting the timeout error. Unfortunately, the vpn goes down most nights, although not by design. Any ideas why sendsize would (mis)behave in this manner? Any ideas what I can do to work around this? Thank you. Lee