>You can tune things by setting the tcp-timeout probably..I don't
>know exactly where to set this..

Aha, found it.  /proc/sys/net/ipv4/tcp_syn_retries

I am a victim of the exponential retry falloff, it would seem.  syn_retries 
of 1 takes a few seconds, 3 takes less than half a minute, and 5 takes 
several minutes.  The default value of 10 is what's giving me the problem 
(something like 20 minutes to time out, according to my earlier tests.)

Then the fact that ssh then re-attempts the connection four times before 
actually failing is where I got my hour and change timeout.  ("ssh -v -v -v" 
comes in handy...)

Fun.

Can we change the default value for this to something more sane, like 5?  
Exponential falloff is not good when your order of magnitude hits double 
digits.

> You probably don't want all tcp to time out at 15 seconds anyway, so

Just connection initiation.  (If their ip stack hasn't replied to me by then, 
I doubt it's going to.)

> I'd suggest either using non-blocking connect (if you have the code
> that does the connect), or just set a timer (or use sigalarm) when you
> start the attempt, and fail the attempt if the timer or alarm signal
> goes off.

Except I'm using off-the-shelf ssh.  (I asked them about this problem a month 
ago, and there was some discussion of a workaround on their mailing list, but 
2.9 came out and still had the same behavior.  Apparently, if it's not a 
problem in OpenBSD, it's not a problem in OpenSSH...)

> > If it's glibc I'm probably better off writing a wrapper to ping the
> > destination before trying to connect, or killing the connection after a
> > timeout with no traffic.  But both of those are really ugly solutions.
>
> Ugly is relative, and don't use ping because there is still a race
> condition (ping worked, but by the time you try tcp, the box is down.)

Yeah, but it would eventually time out and recover, I've got ten threads out 
querying boxes, that's a really rare race condition.  And I already 
acknowledged it was ugly. :)

So the problem is just that tcp_syn_retries' default value of 10 is way too 
high due to the exponentially increasing gap between each retry.

Rob
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to