John,
Thanks for the reply. Although we backup the firewalls, we do not pass any
Amanda traffic through from one segment to another. The systems are all
up-to-date with patches (mid-August). We have done a lot more investigating.
ufsdump runs fine. We also tried the Arkeia backup software and it has
similar problems to Amanda. The systems just seem to run out of resources
(i.e. CPU cycles). It happens more quickly on a single processor system, but
also happens on some of the dual-processor systems. Everything points to a
change in settings (probably network or system) when we ran Titan for
servers on the systems. The backups were fine before then and began to
intermittently fail afterwards. If you (or anyone else) have any info on
this we would appreciate it.
Thanks,
Eva Freer
-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]]On Behalf Of John R. Jackson
Sent: Friday, August 31, 2001 9:56 PM
To: Eva Freer
Cc: [EMAIL PROTECTED]
Subject: Re: Solaris 8 Server hangs during backup
>I didn't get much response from amanda-users so I am trying this list.
I responded to your first letter but your mail server refused to accept
the letter and it eventually bounced. I've appended my original response
in case this one gets through.
>Further investigation indicates that the problem occurs when sendbackup is
>running. We have tried /usr/bin/sed, /usr/xpg4/bin/sed, and GNU sed since
>sendbackup appears to be doing ufsdump | sed ... | ufsrestore. ...
Just for testing, you might try setting "index no" in amanda.conf for
that dumptype. That's what's inserting the sed and ufsrestore stuff
in the pipeline.
However I'm betting you have a hardware problem and the I/O ufsdump
does is causing the system to hang. I'd start by doing some ufsdump's
just like Amanda does (see the /tmp/amanda/sendbackup*debug files),
**but without the 'u' option**, to /dev/null.
>Eva Freer
John R. Jackson, Technical Software Specialist, [EMAIL PROTECTED]
>We have a highly subnetted configuration of Solaris 8 and 2.6 boxes, mostly
>E220R's. The subnets are connected via firewalls. Each subnet has its own
>Amanda server with an Exabyte Mammoth tape drive. ...
Do the servers reach across the firewalls to back up clients "on the
other side"? Or is that the point of having a tape drive in each subnet,
so backups stay inside a given firewall?
>We use hardware compression only. The Amanda is 2.4.2p1 on most nodes.
>...
>Originally, we seemed to have a problem with only one subnet, with a
Solaris
>2.6 server, 2 Solaris clients, and 1 Solaris 8 client. The server would
hang
>during the backup and required a poweroff reboot. ...
Please believe me that I'm not just trying to pass the buck :-), but
Amanda cannot be the root of this problem. Put another way, anything
you do to Amanda that gets this going is, at best, a workaround and
the real problem will still be there, waiting to bite you at the worst
possible time.
Amanda is pure application level code. Any program that generates the
same set of circumstances (e.g. high network load, particular data
patterns, etc) would trigger the same problem. If you have systems
crashing or hanging, something else (hardware or OS) is wrong.
>... Messages in the logs (not from amanda) indicate
>that the system is very busy (e.g. sendmail won't run the queue because the
>load average is too high.) ...
How high is the load average getting? Amanda is I/O bound, especially
on the server. It should not be generating significant load (w.r.t.
"load average"). Are you certain nothing else was going on at the time?
Do you have "top" to see what the heavy hitters are when it starts to
go wrong? Or there are other tools (even just a "ps") that do roughly
the same thing.
What kind of netstat numbers are you seeing during the bad times? Any
high error/collision counts or excessive packets?
Are all your systems up to reasonably recent Solaris patch levels?
Have you tried doing several ftp's of roughly dump image size from
the client to the server (they can go to /dev/null on the server as an
initial test)?
What is maxdumps set to? That would control how many backups were
running at one time on the client, which, in turn, would control how
many data streams were coming into the server.
How about inparallel? That will also throttle how many dumpers are
active.
Is anything special about the two subnets with the problem?
Any particular type of network card, connection, media or topology?
>Eva Freer
John R. Jackson, Technical Software Specialist, [EMAIL PROTECTED]