Amanda Users:

I didn't get any responses yesterday so am trying again.

We have a highly subnetted configuration of Solaris 8 and 2.6 boxes, mostly
E220R's. The subnets are connected via firewalls. Each subnet has its own
Amanda server with an Exabyte Mammoth tape drive. We use hardware
compression only. The Amanda is 2.4.2p1 on most nodes.

Originally, we seemed to have a problem with only one subnet, with a Solaris
2.6 server, 2 Solaris clients, and 1 Solaris 8 client. The server would hang
during the backup and required a poweroff reboot. Part of the backup would
complete, but files would be left in the holding area. Analysis of the
problem indicated that the server would get slower and slower until nothing
was happening. This appeared to happen when backing up the Solaris 8 client.
(This was the only Solaris 2.6 server with a Solaris 8 client.) To solve the
problem (I thought), I moved the tape drive to the Solaris 8 machine and
configured it to be the Amanda server. Shortly after this, we converted the
machines to use BSM and encountered problems. We applied the workaround
given in this group and all seemed well for a few days.

The problem now affects at least 2 of the subnets. In both cases, the Amanda
server is Solaris 8 with 1 Solaris 8 client and 2 Solaris 2.6 clients. One
server hangs every night while the other is intermittent. Both are
configured to use 2 ~1 GB holding partitions. Eliminating the holding
partitions did not prevent the hangup. The largest disk backed up contains
slightly more than the capacity of 1 of the holding partitions. The server
that hangs every night and its clients have been upgraded to Amanda 2.4.2p2,
but the problem persists. Messages in the logs (not from amanda) indicate
that the system is very busy (e.g. sendmail won't run the queue because the
load average is too high.) Amanda is the only thing really happening other
than the usual OS stuff. The 2.6 clients are dual processor Sun E220R
webservers with no activity during the backup period. The 8 client and
server are single processor E220R LDAP servers with no activity during the
backup period. Perfmeter analysis indicates that the CPU usage goes to 100%
shortly after the backup starts and stays there.

This system is supposed to go in production soon, so I need to get this
fixed ASAP or develop an alternate backup plan. Any help would be
appreciated.

TIA,
Eva Freer
Development Engineer
Oak Ridge National Laboratory
[EMAIL PROTECTED]



Reply via email to