>I am trying to track down a problem with timeouts between my tape server
>and itself when I run amcheck (and amdump, too).  My amandad.debug file
>says:
>...

Thanks for sending it along.  The one line you omitted that is important
is the last one, showing how long the whole run ended up taking.

>error connecting to 128.95.205.92:139 (No route to host)
>Connection to honda failed
>error connecting to 128.95.205.92:139 (No route to host)
>Connection to honda failed
>error connecting to 128.95.205.92:139 (No route to host)
>Connection to honda failed
>error connecting to 128.95.205.38:139 (No route to host)
>Connection to trocar failed
>session request to TORTUGA failed (Called name not present)
>session request to *SMBSERVER failed (Called name not present)
>error connecting to 128.95.205.153:139 (No route to host)
>Connection to maui failed
>session request to KAWJALEIN failed (Called name not present)
>session request to *SMBSERVER failed (Called name not present)
>error connecting to 128.95.205.160:139 (No route to host)
>Connection to bombay failed
>error connecting to 128.95.205.160:139 (No route to host)
>Connection to bombay failed

Note that several PC's failed (which is kind of a redundant statement, but
I digress :-).  And each one of those probably takes some time to detect.

>...  Just because a PC is
>turned off, that shouldn't cause all of the backups on spinoza to fail,
>but that's the way it looks to me.  ...

I'd say having the PC off is the right thing to do no matter what,
but I digress yet again :-).

Now for my latest wild guess/theory.  Samba is somewhat of a tack-on
to Amanda and there are a few oddities.  If those disklist entries for
your Samba server were real disks instead of other machines (PC's), they
would probably either work right away or fail right away.  With Samba,
however, there is this whole connection to yet another host going on,
with all the possible problems that can happen along that way, not to
mention the normal time involved in getting the estimates themselves.

It might be better if Amanda broke the selfcheck and sendsize requests
into batches based on \\pc rather than one big batch for the Samba server.
Then individual PC's mis-behaving would not take the whole server down
with them.

Actually, Alexandre and I have talked about how to redo amandad to fix
several problems, including large packets (which this is sort of related
to, or at least the fix for it would cover this), and that would be the
better, although much more extensive, way to go.

It would probably be interesting to find out how long it takes for Samba
(smbclient) to fail in a case like the above.  Double that, multiply by
the number of disks on PC's (plus whatever other disks are on the Samba
server itself), add some more fudge factor, and see if that isn't more
than the default 5 minutes/disk Amanda is using.  Then crank up etimeout
way beyond that :-).

If you're running 2.4.2, I think someone added a parameter to control
the selfcheck timeout, too (or maybe it's in 2.5.0).

>amandad: received dup packet, ACKing it
>...
>amandad: received other packet, NAKing it

These messages are just amandad fielding the repeated requests from the
server and sending back "I'm busy, leave me a alone".

>Jeff Silverman

John R. Jackson, Technical Software Specialist, [EMAIL PROTECTED]

Reply via email to