>Thanks John, I think you are absolutely right to question the dump
>program.  ...
>It's 0.4b19 version ...

Well, I'd be a lot happier if it was ancient so I could blame it :-).
You're right that that seems pretty recent, so it may not be the culprit.

>The inconsistancy may be due to the dump command, but it's more
>likely that the processes it depends on are the main cause, and I don't know
>how to find out what these processes are. 

"ps" is your friend.  Find all the "dump" processes, use "ps -fp <PID>"
to display their parent, and keep working your way up the process tree
(there may be some program that does this on your OS, but I don't know
what it is).  Draw a picture as you go.

As I recall, there will be three+ dump processes at the lowest layer
with no children.  Above them will be one or more single dump processes.
The parent of the last one will be sendbackup.  Sendbackup may have
other children (e.g. gzip).  You do not need to go back any further than
sendbackup -- it's the one doing the network I/O.

Here's a sample using a program I have called "pstree":

 \-+- 01191 root /usr/sbin/inetd -s
   \-+- 02071 backup amandad
     \-+- 02072 backup /opt/amanda-2.4.2/libexec/sendbackup
       \-+- 05481 backup ufsdump 0sf 1048576 - /dev/rdsk/c0t0d0s0
         \-+- 05488 backup ufsdump 0sf 1048576 - /dev/rdsk/c0t0d0s0
           |--- 05489 backup ufsdump 0sf 1048576 - /dev/rdsk/c0t0d0s0
           |--- 05490 backup ufsdump 0sf 1048576 - /dev/rdsk/c0t0d0s0
           |--- 05492 backup ufsdump 0sf 1048576 - /dev/rdsk/c0t0d0s0
           \--- 05491 backup ufsdump 0sf 1048576 - /dev/rdsk/c0t0d0s0

On the server side you need to find which dumper process is getting the
data (look at the amdump file or use something like lsof) and then the
two taper processes.

Run truss or the equivalent on each process and see if it is making any
progress.  If none of them are, try gcore-ing the Amanda ones (sendbackup,
dumper and the tapers) then run gdb on the binary and core file to see
where they are stopped.  If you didn't do so, you might want to rebuild
Amanda first with -g and not -O (and maybe without shared libraries)
so you can get good traceback data.

>> What happens if you do something like this:
>> 
>>   /sbin/dump 0f - > /dev/null /
>
>This is what I've got when it's done successfully:

OK, that tells us it isn't something fatally flawed, a.k.a. obvious :-).
It's probably some network congestion issue someplace.  Sigh.

>Hien.

John R. Jackson, Technical Software Specialist, [EMAIL PROTECTED]

Reply via email to