>We have a highly subnetted configuration of Solaris 8 and 2.6 boxes, mostly
>E220R's. The subnets are connected via firewalls. Each subnet has its own
>Amanda server with an Exabyte Mammoth tape drive. ...
Do the servers reach across the firewalls to back up clients "on the
other side"? Or is that the point of having a tape drive in each subnet,
so backups stay inside a given firewall?
>We use hardware compression only. The Amanda is 2.4.2p1 on most nodes.
>...
>Originally, we seemed to have a problem with only one subnet, with a Solaris
>2.6 server, 2 Solaris clients, and 1 Solaris 8 client. The server would hang
>during the backup and required a poweroff reboot. ...
Please believe me that I'm not just trying to pass the buck :-), but
Amanda cannot be the root of this problem. Put another way, anything
you do to Amanda that gets this going is, at best, a workaround and
the real problem will still be there, waiting to bite you at the worst
possible time.
Amanda is pure application level code. Any program that generates the
same set of circumstances (e.g. high network load, particular data
patterns, etc) would trigger the same problem. If you have systems
crashing or hanging, something else (hardware or OS) is wrong.
>... Messages in the logs (not from amanda) indicate
>that the system is very busy (e.g. sendmail won't run the queue because the
>load average is too high.) ...
How high is the load average getting? Amanda is I/O bound, especially
on the server. It should not be generating significant load (w.r.t.
"load average"). Are you certain nothing else was going on at the time?
Do you have "top" to see what the heavy hitters are when it starts to
go wrong? Or there are other tools (even just a "ps") that do roughly
the same thing.
What kind of netstat numbers are you seeing during the bad times? Any
high error/collision counts or excessive packets?
Are all your systems up to reasonably recent Solaris patch levels?
Have you tried doing several ftp's of roughly dump image size from
the client to the server (they can go to /dev/null on the server as an
initial test)?
What is maxdumps set to? That would control how many backups were
running at one time on the client, which, in turn, would control how
many data streams were coming into the server.
How about inparallel? That will also throttle how many dumpers are
active.
Is anything special about the two subnets with the problem?
Any particular type of network card, connection, media or topology?
>Eva Freer
John R. Jackson, Technical Software Specialist, [EMAIL PROTECTED]