RE: Solaris 8 Server hangs during backup
John, Thanks for the reply. Although we backup the firewalls, we do not pass any Amanda traffic through from one segment to another. The systems are all up-to-date with patches (mid-August). We have done a lot more investigating. ufsdump runs fine. We also tried the Arkeia backup software and it has similar problems to Amanda. The systems just seem to run out of resources (i.e. CPU cycles). It happens more quickly on a single processor system, but also happens on some of the dual-processor systems. Everything points to a change in settings (probably network or system) when we ran Titan for servers on the systems. The backups were fine before then and began to intermittently fail afterwards. If you (or anyone else) have any info on this we would appreciate it. Thanks, Eva Freer -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of John R. Jackson Sent: Friday, August 31, 2001 9:56 PM To: Eva Freer Cc: [EMAIL PROTECTED] Subject: Re: Solaris 8 Server hangs during backup >I didn't get much response from amanda-users so I am trying this list. I responded to your first letter but your mail server refused to accept the letter and it eventually bounced. I've appended my original response in case this one gets through. >Further investigation indicates that the problem occurs when sendbackup is >running. We have tried /usr/bin/sed, /usr/xpg4/bin/sed, and GNU sed since >sendbackup appears to be doing ufsdump | sed ... | ufsrestore. ... Just for testing, you might try setting "index no" in amanda.conf for that dumptype. That's what's inserting the sed and ufsrestore stuff in the pipeline. However I'm betting you have a hardware problem and the I/O ufsdump does is causing the system to hang. I'd start by doing some ufsdump's just like Amanda does (see the /tmp/amanda/sendbackup*debug files), **but without the 'u' option**, to /dev/null. >Eva Freer John R. Jackson, Technical Software Specialist, [EMAIL PROTECTED] >We have a highly subnetted configuration of Solaris 8 and 2.6 boxes, mostly >E220R's. The subnets are connected via firewalls. Each subnet has its own >Amanda server with an Exabyte Mammoth tape drive. ... Do the servers reach across the firewalls to back up clients "on the other side"? Or is that the point of having a tape drive in each subnet, so backups stay inside a given firewall? >We use hardware compression only. The Amanda is 2.4.2p1 on most nodes. >... >Originally, we seemed to have a problem with only one subnet, with a Solaris >2.6 server, 2 Solaris clients, and 1 Solaris 8 client. The server would hang >during the backup and required a poweroff reboot. ... Please believe me that I'm not just trying to pass the buck :-), but Amanda cannot be the root of this problem. Put another way, anything you do to Amanda that gets this going is, at best, a workaround and the real problem will still be there, waiting to bite you at the worst possible time. Amanda is pure application level code. Any program that generates the same set of circumstances (e.g. high network load, particular data patterns, etc) would trigger the same problem. If you have systems crashing or hanging, something else (hardware or OS) is wrong. >... Messages in the logs (not from amanda) indicate >that the system is very busy (e.g. sendmail won't run the queue because the >load average is too high.) ... How high is the load average getting? Amanda is I/O bound, especially on the server. It should not be generating significant load (w.r.t. "load average"). Are you certain nothing else was going on at the time? Do you have "top" to see what the heavy hitters are when it starts to go wrong? Or there are other tools (even just a "ps") that do roughly the same thing. What kind of netstat numbers are you seeing during the bad times? Any high error/collision counts or excessive packets? Are all your systems up to reasonably recent Solaris patch levels? Have you tried doing several ftp's of roughly dump image size from the client to the server (they can go to /dev/null on the server as an initial test)? What is maxdumps set to? That would control how many backups were running at one time on the client, which, in turn, would control how many data streams were coming into the server. How about inparallel? That will also throttle how many dumpers are active. Is anything special about the two subnets with the problem? Any particular type of network card, connection, media or topology? >Eva Freer John R. Jackson, Technical Software Specialist, [EMAIL PROTECTED]
Re: Solaris 8 Server hangs during backup
* Eva Freer <[EMAIL PROTECTED]> (Mon, Aug 13, 2001 at 11:52:48AM -0400) > Amanda Users: > > We have a highly subnetted configuration of Solaris 8 and 2.6 boxes, mostly > E220R's. The subnets are connected via firewalls. Each subnet has its own > Amanda server with an Exabyte Mammoth tape drive. We use hardware > compression only. The Amanda is 2.4.2p1 on most nodes. [snip] The usual concerns apply: 1) did you check the scsi chains on all machines to ensure proper termination, proper cables &c &c 2) What is in the syslog on the hanging machine 3) You say the machine gets slower and slower (which means it's progressive). Did you try running top (or similar) on the machine to see what was happening (e.g. was the machine running out of memory ? out of swap space ?) Was there anything special in the /tmp/amanda logfiles ? [you will have to make sure /tmp/amanda is not mounted on tmpfs 4) Are you using software compression ? > but the problem persists. Messages in the logs (not from amanda) indicate > that the system is very busy (e.g. sendmail won't run the queue because the > load average is too high.) Amanda is the only thing really happening other > than the usual OS stuff. Could you run top/ps -ef or whatever to see what exactly is runnig, and what is hogging the CPU ? Are you using ufsdump or tar dump ? Kind regards, -- Gerhard den Hollander Phone +31-10.280.1515 Global Technical SupportFax +31-10.280.1511 Jason Geosystems BV (When calling please note: we are in GMT+1) [EMAIL PROTECTED] POBox 1573 visit us at http://www.jasongeo.com 3000 BN Rotterdam JASON...#1 in Reservoir CharacterizationThe Netherlands This e-mail and any attachment is/are intended solely for the named addressee(s) and may contain information that is confidential and privileged. If you are not the intended recipient, we request that you do not disseminate, forward, distribute or copy this e-mail message. If you have received this e-mail message in error, please notify us immediately by telephone and destroy the original message.
RE: Solaris 8 Server hangs during backup
Bill, Thanks for your reply. We have done some more investigation and have determined that the problem is with sendbackup. It does ufsdump | sed | ufsrestore. When this starts it takes the CPU to 100% and stays there. The performance monitoring soon quits updating. Log messages indicate that sendmail sees the load average too high and quits processing the queue. The only recovery is to turn the machine off and back on. The data on the largest partition was slightly greater that 1 GB. We had 2 holding partitions, each slightly less than 1 GB. We tried combining the 2 partitions with DiskSuite to get a larger volume, but this did not fix the problem. The only patch on the web site for 2.4.2p2 seems to be for IRIS and TRU64, not Solaris. Eva Freer -Original Message- From: Bill Carlson [mailto:[EMAIL PROTECTED]] Sent: Wednesday, August 15, 2001 10:16 AM To: Eva Freer Cc: [EMAIL PROTECTED] Subject: Re: Solaris 8 Server hangs during backup On Tue, 14 Aug 2001, Eva Freer wrote: > We have a highly subnetted configuration of Solaris 8 and 2.6 boxes, mostly > E220R's. The subnets are connected via firewalls. Each subnet has its own > Amanda server with an Exabyte Mammoth tape drive. We use hardware > compression only. The Amanda is 2.4.2p1 on most nodes. > > Originally, we seemed to have a problem with only one subnet, with a Solaris > 2.6 server, 2 Solaris clients, and 1 Solaris 8 client. The server would hang > during the backup and required a poweroff reboot. Part of the backup would !?! I've never seen anything with amanda that actually killed the machine. A heavily overloaded machine will seem dead, but should eventually respond. > The problem now affects at least 2 of the subnets. In both cases, the Amanda > server is Solaris 8 with 1 Solaris 8 client and 2 Solaris 2.6 clients. One > server hangs every night while the other is intermittent. Both are > configured to use 2 ~1 GB holding partitions. Eliminating the holding > partitions did not prevent the hangup. The largest disk backed up contains > slightly more than the capacity of 1 of the holding partitions. The server How full is the largest partition? For holding disk purposes, the important part is how much actual data you have, not the size of the filesystem. > than the usual OS stuff. The 2.6 clients are dual processor Sun E220R > webservers with no activity during the backup period. The 8 client and > server are single processor E220R LDAP servers with no activity during the > backup period. Perfmeter analysis indicates that the CPU usage goes to 100% > shortly after the backup starts and stays there. Do you have debug turned on for all clients and servers? The first thing I'd want to see is the debug output and then the actual logs. When the CPU starts spinning at 100%, what process is the culprit? We need more info here. Are you using ufsdump or tar? Any patches to amanda? Bill Carlson -- Systems Programmer[EMAIL PROTECTED] | Anything is possible, Virtual Hospital http://www.vh.org/ | given time and money. University of Iowa Hospitals and Clinics | Opinions are mine, not my employer's. |
Re: Solaris 8 Server hangs during backup
On Tue, 14 Aug 2001, Eva Freer wrote: > We have a highly subnetted configuration of Solaris 8 and 2.6 boxes, mostly > E220R's. The subnets are connected via firewalls. Each subnet has its own > Amanda server with an Exabyte Mammoth tape drive. We use hardware > compression only. The Amanda is 2.4.2p1 on most nodes. ... We very occasionally (two times in months of running Amanada) see something which _may_ be related to your problem. We're running a mixture of Solaris 7 and 8 (Amanda server is 7) [as well as some RedHat Linux and MacOS X clients]. Twice one of the Solaris 7 Amanda clients (same one both times) has locked up during the estimate phase of the backup run (this is using ufsdump). When this happens access to one or more filesystems blocks and the system clogs up with jammed processes. This is a mail server and sendmail stops accepting new mail once the load gets too high so I've managed to recover both times by killing off the amanda processes. Next time this happens I plan to be less flustered :-> and hopeffully will have better data about what's causing the blockage. Paul -- Paul Haldane Computing Service University of Newcastle
Re: Solaris 8 Server hangs during backup
>We have a highly subnetted configuration of Solaris 8 and 2.6 boxes, mostly >E220R's. The subnets are connected via firewalls. Each subnet has its own >Amanda server with an Exabyte Mammoth tape drive. ... Do the servers reach across the firewalls to back up clients "on the other side"? Or is that the point of having a tape drive in each subnet, so backups stay inside a given firewall? >We use hardware compression only. The Amanda is 2.4.2p1 on most nodes. >... >Originally, we seemed to have a problem with only one subnet, with a Solaris >2.6 server, 2 Solaris clients, and 1 Solaris 8 client. The server would hang >during the backup and required a poweroff reboot. ... Please believe me that I'm not just trying to pass the buck :-), but Amanda cannot be the root of this problem. Put another way, anything you do to Amanda that gets this going is, at best, a workaround and the real problem will still be there, waiting to bite you at the worst possible time. Amanda is pure application level code. Any program that generates the same set of circumstances (e.g. high network load, particular data patterns, etc) would trigger the same problem. If you have systems crashing or hanging, something else (hardware or OS) is wrong. >... Messages in the logs (not from amanda) indicate >that the system is very busy (e.g. sendmail won't run the queue because the >load average is too high.) ... How high is the load average getting? Amanda is I/O bound, especially on the server. It should not be generating significant load (w.r.t. "load average"). Are you certain nothing else was going on at the time? Do you have "top" to see what the heavy hitters are when it starts to go wrong? Or there are other tools (even just a "ps") that do roughly the same thing. What kind of netstat numbers are you seeing during the bad times? Any high error/collision counts or excessive packets? Are all your systems up to reasonably recent Solaris patch levels? Have you tried doing several ftp's of roughly dump image size from the client to the server (they can go to /dev/null on the server as an initial test)? What is maxdumps set to? That would control how many backups were running at one time on the client, which, in turn, would control how many data streams were coming into the server. How about inparallel? That will also throttle how many dumpers are active. Is anything special about the two subnets with the problem? Any particular type of network card, connection, media or topology? >Eva Freer John R. Jackson, Technical Software Specialist, [EMAIL PROTECTED]