Re: FAILED backups on different hosts each night
>>> Jon LaBadie <[EMAIL PROTECTED]> 08/29/06 7:50 PM >>> >If I understand the configuration, svr2 has 4 separate installations >or the amanda client. To amanda it appears as 4 distinct remote hosts. >As you indicate different logical hosts fail nightly, it sounds like >all have also had successful backups, thus the basic config is ok. > >Do the 4 logical hosts also have their own separate disks and network >controllers? Or is a single network interface serving multiple IP >addresses and the hosts have separate partitions on a shared disk(s)? > >I ask from the view that amanda considers them distinct and may be >asking for dumps simultaneously from all 4, possibly overloading >the shared resources on the single physical client, svr2. This >could trigger some timeout mechanism that daily hits different >logical hosts. > >Even if you are only running a single dumper so multiple, simultaneous >dumps do not occur on svr2, perhaps the interval between estimates and >dumps is so long that a network timeout is triggered. > >These are total guesses, just seeing it they might fly. > >-- >Jon H. LaBadie [EMAIL PROTECTED] > JG Computing > 4455 Province Line Road(609) 252-0159 > Princeton, NJ 08540-4322 (609) 683-7220 (fax) Thanks for the reply Jon, Yes you are right is assuming my setup. All 4 servers (3 XEN guests + host) are using the same SATA disks and single NIC interface. All servers are very low load systems, just running different web servers that aren't hit very regularly. I think it could be a timing issue also, but am a bit unsure of where to look. I see that I get all the estimates, and I always get at least 2 dumps in a run (1 from my physical backup server and 1 from one of the XEN host/guest servers). What files should I be looking at to see any timeout errors? All I seem to find is FAILED messages for the dumps but no explanation of why -- maybe I need to turn up debugging from default. I've had a look at both client and server but there are so many and I'm not clear as to which I should concentrate on. Cheers, Stephen Carter Retrac Networking Limited www: http://www.retnet.co.uk Ph: +44 (0)7870 218 693 Fax: +44 (0)870 7060 056 CNA, CNE 6, CNS, CCNA, MCSE 2003
Re: FAILED backups on different hosts each night
As no one has responded, I guess no one else has a clue either. :(( Of course, not having a clue seldom stops me from posting ;) On Sun, Aug 27, 2006 at 04:56:03PM +0100, Stephen Carter wrote: > I have 2 physical boxes I'm backing up, one called srv1 and the other called > srv2. > > srv1 is always backed up correctly, which also has the tape device and runs > the amanda backups. > > srv2 is a SLES 10 server running 3 virtual SLES 10 XEN guests within it, but > I'm treating them as separate physical boxes for the purposes of amanda. > > On different nights, different XEN guests fail (including the host, srv2) > with a "could not connect" error in the amanda report. > > amstatus says 'wait for dumping driver: (aborted:could not connect to data > port: Connection timed out) If I understand the configuration, svr2 has 4 separate installations or the amanda client. To amanda it appears as 4 distinct remote hosts. As you indicate different logical hosts fail nightly, it sounds like all have also had successful backups, thus the basic config is ok. Do the 4 logical hosts also have their own separate disks and network controllers? Or is a single network interface serving multiple IP addresses and the hosts have separate partitions on a shared disk(s)? I ask from the view that amanda considers them distinct and may be asking for dumps simultaneously from all 4, possibly overloading the shared resources on the single physical client, svr2. This could trigger some timeout mechanism that daily hits different logical hosts. Even if you are only running a single dumper so multiple, simultaneous dumps do not occur on svr2, perhaps the interval between estimates and dumps is so long that a network timeout is triggered. These are total guesses, just seeing it they might fly. -- Jon H. LaBadie [EMAIL PROTECTED] JG Computing 4455 Province Line Road(609) 252-0159 Princeton, NJ 08540-4322 (609) 683-7220 (fax)
FAILED backups on different hosts each night
I have 2 physical boxes I'm backing up, one called srv1 and the other called srv2. srv1 is always backed up correctly, which also has the tape device and runs the amanda backups. srv2 is a SLES 10 server running 3 virtual SLES 10 XEN guests within it, but I'm treating them as separate physical boxes for the purposes of amanda. On different nights, different XEN guests fail (including the host, srv2) with a "could not connect" error in the amanda report. amstatus says 'wait for dumping driver: (aborted:could not connect to data port: Connection timed out) amdump.1 reports all estimates worked, with a "FAILED QUEUE: empty" and the DONE QUEUE: includes all DLE's listed in the disklist. amdump.1 then reports the dumper process, 2 of which work with my other 4 DLE's failing with: dumper: stream_client: connect to 192.168.0.9:12359 failed: Connection timed out I allow all traffic between srv1 (my backup server) and all clients, and thinking it may have been a throughput problem I reduced parallel dumps to 1 which hasn't helped. A copy of the latest amstatus & a section from my amdump.1 files are below. Any help would be greatly appreciated. AMSTATUS OUTPUT: srv1:/var/lib/amanda/DailySet1 # amstatus DailySet1 Using /var/lib/amanda/DailySet1/amdump.1 from Fri Aug 25 01:00:02 BST 2006 srv1.retnet.co.uk:md0 3 352152k finished (1:17:18) mailscan.retnet.co.uk:hda2 0 1062300k wait for dumping driver: (aborted:could not connect to data port: Connection timed out) srv2.retnet.co.uk:/srv/install 0 21497250k wait for dumping driver: (aborted:could not connect to data port: Connection timed out) srv2.retnet.co.uk:md0 0 4242910k wait for dumping driver: (aborted:could not connect to data port: Connection timed out) web-1.retnet.co.uk:hda2 0 699770k finished (1:33:02) web-2.retnet.co.uk:hda2 0 906355k wait for dumping driver: (aborted:could not connect to data port: Connection timed out) SUMMARY part real estimated size size partition : 6 estimated : 6 28769687k flush : 0 0k failed : 00k ( 0.00%) wait for dumping: 4 27708815k ( 96.31%) dumping to tape : 00k ( 0.00%) dumping : 0 0k 0k ( 0.00%) ( 0.00%) dumped : 2 1051922k 1060872k ( 99.16%) ( 3.66%) wait for writing: 0 0k 0k ( 0.00%) ( 0.00%) wait to flush : 0 0k 0k (100.00%) ( 0.00%) writing to tape : 0 0k 0k ( 0.00%) ( 0.00%) failed to tape : 0 0k 0k ( 0.00%) ( 0.00%) taped : 2 1051922k 1060872k ( 99.16%) ( 3.66%) tape 1: 2 1051922k 1060872k ( 2.94%) DailySet1-5 1 dumper idle : not-idle taper idle network free kps: 2600 holding space : 33792000k (100.00%) dumper0 busy : 0:40:08 ( 95.25%) taper busy : 0:06:47 ( 16.10%) 0 dumpers busy : 0:00:00 ( 0.00%) 1 dumper busy : 0:42:08 (100.00%)not-idle: 0:28:40 ( 68.07%) no-dumpers: 0:13:27 ( 31.93%) srv1:/var/lib/amanda/DailySet1 # AMDUMP.1 PARTIAL OUTPUT: driver: adding holding disk 0 dir /mnt/dumps size 33792000 reserving 33792000 out of 33792000 for degraded-mode dumps driver: flush size 0 driver: start time 812.693 inparallel 1 bandwidth 2600 diskspace 33792000 dir OBSOLETE datestamp 20060825 driver: drain-ends tapeq FIRST big-dumpers ttt driver: result time 812.693 from taper: TAPER-OK driver: send-cmd time 812.703 to dumper0: FILE-DUMP 00-1 /mnt/dumps/20060825/srv1.retnet.co.uk.md0.3 srv1.retnet.co.uk feff9ffe0f md0 NODEVICE 3 2006:8:22:0:36:52 1073741824 GNUTAR 356544 |;bsd-auth;compress-best;index;exclude-list=/usr/lib/amanda/exclude.gtar; driver: state time 812.703 free kps: -2090 space: 33435456 taper: idle idle-dumpers: 0 qlen tapeq: 0 runq: 5 roomq: 0 wakeup: 86400 driver-idle: not-idle driver: interface-state time 812.703 if : free -3890 if ETH0: free 800 if LOCAL: free 1000 driver: hdisk-state time 812.703 hdisk 0: free 33435456 dumpers 1 dumper: stream_client: connected to 192.168.0.1.51236 dumper: stream_client: our side is 0.0.0.0.51239 dumper: stream_client: connected to 192.168.0.1.51237 dumper: stream_client: our side is 0.0.0.0.51240 dumper: stream_client: connected to 192.168.0.1.51238 dumper: stream_client: our side is 0.0.0.0.51241 driver: result time 901.369 from dumper0: DONE 00-1 441620 352152 89 [sec 88.636 kb 352152 kps 3973.0 orig-kb 441620] driver: finished-cmd time 901.387 dumper0 dumped srv1.retnet.co.uk:md0 driver: send-cmd time 901.387 to taper: FILE-WRITE 00-2 /mnt/dumps/20060825/srv1.retnet.co.uk.md0.3 srv1.retnet.co.uk feff9ffe0f md0 3 20060825 driver: startaflush: FIRST srv1.retnet.co.uk md0 352185 3584 driver: send-cmd time 901.387 to dumper0: FILE-DU