On Oct 15, 2014, at 6:29 AM, Jean-Louis Martineau <martin...@zmanda.com> wrote:
> On 10/14/2014 06:14 PM, Debra S Baddorf wrote: >> ( I’ve joined the amanda-hackers list too. Would this be better there? ) >> >> Since amanda 3.3.3 (when I started using bsdtcp and krb5 auth types, in >> addition to bsd), >> I’m getting a 3 minute timeout on connection to a node that is down. I’ve >> deduced that I >> can lower “connect-tries” from the default of 3 down to 2. This reduces my >> freeze time from >> 9 minutes to 6. I’m not sure lowering “connect-tries” to 1 is safe. >> >> Where is the 3 minute timeout coming from? > It come from the connect system call. Is there any place I can shorten that timeout? Deb > >> It doesn’t seem to be etimeout (which I’ve got set at 29 minutes) >> nor dtimeout (set to 30 minutes) >> nor ctimeout (set to 30 seconds, and I *did* try changing this one). >> >> Is this a non-amanda system level delay? My expert & I can’t think of one. >> >> Is there another timeout parameter in amanda that I can play with? >> >> I’ve got 39 nodes to backup, so I don’t always know when one is down for >> some reason. >> And with 39 clients, somehow the 9 minute freeze time (before I lowered >> the TRIES to 2) >> manages to make OTHER clients fail too, even though they are running fine. >> This is actually >> the part that bothers me! >> >> PS this 3 minute per try timeout was timed using amcheck but happens >> during amdump too. >> Both amcheck and amdump have other nodes failing if one node is down. >> >> PPS the connect failure is between these two lines (with debug-auth and >> debug-protocol both set to 5): >> KRB5 node down: >> Tue Oct 14 16:44:06 2014: thd-0x8fc6340: amcheck-clients: connect_port: Try >> port 50000: available - Success >> Tue Oct 14 16:47:15 2014: thd-0x8fc6340: amcheck-clients: connect_portrange: >> Connect from 0.0.0.0:50000 failed: Connection timed out >> BSDTCP node down: >> Tue Oct 14 16:35:20 2014: thd-0x86e8340: amcheck-clients: connect_port: Try >> port 517: available - Success >> Tue Oct 14 16:38:29 2014: thd-0x86e8340: amcheck-clients: connect_portrange: >> Connect from 0.0.0.0:517 failed: Connection timed out >> >> Deb Baddorf >> Fermilab >