Re: Failing on relatively large partition

John R. Jackson Thu, 08 Feb 2001 11:59:09 -0800
First, quickly grab the amandad*debug file for this run.  With luck,
it will still be for the backup of /home.  It has a command we'll use
shortly.

>So, sendbackup for /home begins at 23:08, amanda quits and sends out the
>MAIL REPORT 30 minutes later at 23:38 with the "[data timeout]" error,

Which makes sense.  That's the 30 minute "the client stopped talking"
timeout (dtimeout in amanda.conf).

>and sendbackup.debug records the "index tee cannot write [Broken pipe]"
>error at 00:16, well after amanda was done for the night. 

That's because the other end of the connection (dumper) was gone.  It is
interesting, though, that it may have actually run a while and finally
tried to write something and that got the error, as compared with just
another timeout but from the client side.

>Is it possible I've got some sort of timeout value wrong someplace? Any
>suggestions on where to look to change it?

I'd sure like to know what the two tar's (the one dumping and the one
generating the index) were doing all that time.

You might try changing dtimeout in amanda.conf (this is 2.4.2, right?)
to 3600 (one hour).  That would cover the time from the start of the
dump to the broken pipe message.

You can test this ahead of time like this (and maybe catch it in the
act of goofing off):

  * In amandad*debug will be some lines like this (yours will say
    GNUTAR, I think):

      SERVICE sendbackup
      OPTIONS hostname=fortress.cc.purdue.edu;
      DUMP /home/fortress/a 1 2001:2:5:4:48:48 OPTIONS |;bsd-auth;index;

  * Take the OPTIONS and DUMP/GNUTAR line and copy them to a file on
    the client.  Add "no-record;" to the end of the DUMP/GNUTAR line
    you got above:

      DUMP /home/fortress/a 1 2001:2:5:4:48:48 OPTIONS |;bsd-auth;index;no-record;

    That will prevent sendbackup from updating the last dumped information.

  * Run sendbackup by hand as the Amanda user like this:

      /path/to/sendbackup -t < /the/file > /tmp/index.out

    The data stream (backup image) will go to /dev/null.  Stdout will
    get the index.  Stderr will get the messages stream (output from
    the backup program).

  * See below for how to run truss on the various processes.  That may
    give you an idea if they are really running or not.  Or you could
    run lsof to see if the file offsets are going up.

If all this works, we need to go back to the server side and try from
there:

  * Find the FILE-DUMP line in amdump.<NN> for this file system (if it
    says PORT-DUMP, I'll have to rethink this).  Copy that line to a
    temp file and make the same "no-record;" change you did above.

  * The second argument on the line is the holding disk file name.
    Change that as needed.

  * Run dumper by hand as the Amanda user with the config name as an arg:

      /path/to/dumper <config> < /file/dump/file

  * Remove the "log" file that driver created or your next real Amanda
    run will be unhappy.

If it looks like nothing is happening, which would be indicated by
the holding disk file not growing, go over to the client and attach a
debugger to sendbackup, the GNU tar doing the dump (the one with the 'c'
flag) and on the GNU tar doing the catalogue (the one with the 't' flag).
Using gdb, it would be:

  gdb /path/to/sendbackup <PID>     # or /path/to/GNUtar

Once inside, do a "where" and save (copy/paste) the stack traceback
for posting back here.  Then "exit" to let it go again.

It would be good if both Amanda and GNU tar were compiled with -g for
this, otherwise the tracebacks may not be very useful.

You might also run truss (truss -o /tmp/xxx.out -p <PID>) on each of the
processes for a short period (enough to get some output) to see if and
what they are doing.

>Chris Hobbs

John R. Jackson, Technical Software Specialist, [EMAIL PROTECTED]
Re: Failing on relatively large partition

Reply via email to