>I've had two instances this year (the last one just this week) on one
>of my backup clients of a file system on that client becoming locked
>seemingly due to Amanda's estimate run.
>...
>Mail doesn't go through and processes get stuck and virtual memory fills up.
>
>When I manage to get into the machine I've found both times that
>there are a number of amanda processes running - sorry I forgot to grab
>the exact ps output but I'm pretty sure sendsize and killpgrp were
>there.  When I kill these off the system gets _real_ busy for
>a while as it catches up with things but then settles down.  ...

That's not a lot to go on, but if you saw killpgrp running, that's such
a trivial program my guess is that one of the ufsdump processes is hung
(e.g. in disk wait) and cannot be killed, which sends you down that path.
Then either something loosens up, or just killing the killpgrp process
(or sendsize) lets Amanda continue on.  It could be you still have
ufsdump (or even killpgrp) processes stranded.

In any case, this sounds like an OS hang problem rather than an Amanda
issue.

While I understand perfectly the "make it stop hurting" feeling :-),
next time it happens (if there is one), see if you can grab a "ps -lu
<amanda>" and "ps -fu <amanda>".  Then kill things off one at a time
until it starts going.  Start with the ufsdump processes, and start with
the "bottom" of that chain.  You might also watch and see if kill has
any effect on what you hit.

It might also be useful to grab a copy of /tmp/amanda/killpgrp*debug
**before** you start killing things to see how far it got before it hung.

>Does anyone recognise these symptoms?  Any ideas on whether it's an Amanda
>problem (which might go away if I update my installation to 2.4.2p2 which
>I should probably do anyway) or something to do with ufsdump?

We run Solaris 2.6, 2.7 and 2.8 here and I don't think we've ever seen
this.  You might also want to make sure you're up to the latest Sun
patches both for the kernel/OS and ufsdump/ufsrestore.

Also, there are essentially no changes to killpgrp since 2.4.2 (just
a couple of minor message formatting things), so I don't know that an
upgrade will help with this particular problem.

>Paul

John R. Jackson, Technical Software Specialist, [EMAIL PROTECTED]

Reply via email to