Re: Possible BUG in IPv4 TCP window handling, all recent 2.4.x/2.6.x kernels

2005-09-02 Thread David S. Miller
From: John Heffner [EMAIL PROTECTED]
Date: Thu, 1 Sep 2005 22:51:48 -0400

 I have an idea why this is going on.  Packets are pre-allocated by the 
 driver to be a max packet size, so when you send small packets, it 
 wastes a lot of memory.  Currently Linux uses the packets at the 
 beginning of a connection to make a guess at how best to advertise its 
 window so as not to overflow the socket's memory bounds.  Since you 
 start out with big segments then go to small ones, this is defeating 
 that mechanism.  It's actually documented in the comments in 
 tcp_input.c. :)
 
   * The scheme does not work when sender sends good segments opening
   * window and then starts to feed us spagetti. But it should work
   * in common situations. Otherwise, we have to rely on queue collapsing.

That's a strong possibility, good catch John.

Although, I'm still not ruling out some box in the middle
even though I consider it less likely than your theory.

So you're suggesting that tcp_prune_queue() should do the:

if (atomic_read(sk-sk_rmem_alloc) = sk-sk_rcvbuf)
tcp_clamp_window(sk, tp);

check after attempting to collapse the queue.

But, that window clamping should fix the problem, as we recalculate
the window to advertise.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Possible BUG in IPv4 TCP window handling, all recent 2.4.x/2.6.x kernels

2005-09-02 Thread Ion Badulescu

On Fri, 2 Sep 2005, Guillaume Autran wrote:

I experienced the very same problem but with window size going all the way 
down to just a few bytes (14 bytes). dump files available upon requests :)
Ion, how were you able to reproduce the issue ? Can the same type of traffice 
always reproduce the issue or is it more intermittent ?


I have no problem whatsoever reproducing it, at least with the kind of 
traffic I described. I had 4 flows like that running yesterday, and all 4 
had TCP window sizes smaller than 500 bytes on the receiver by mid-day.


-Ion
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Possible BUG in IPv4 TCP window handling, all recent 2.4.x/2.6.x kernels

2005-09-02 Thread Alexey Kuznetsov
Hello!

 If you overflow the socket's memory bound, it ends up calling 
 tcp_clamp_window().  (I'm not sure this is really the right thing to do 
 here before trying to collapse the queue.)

Collapsing is too expensive procedure, it is rather an emergency measure.
So, tcp collapses queue, when it is necessary, but it must reduce window
as well.

Alexey
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Possible BUG in IPv4 TCP window handling, all recent 2.4.x/2.6.x kernels

2005-09-02 Thread Alexey Kuznetsov
Hello!

 I experienced the very same problem but with window size going all the 
 way down to just a few bytes (14 bytes). dump files available upon 
 requests :)

I do request.

TCP is not allowed to reduce window to a value less than 2*MSS no matter
how hard network device or peer try to confuse it. :-)

Alexey
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Possible BUG in IPv4 TCP window handling, all recent 2.4.x/2.6.x kernels

2005-09-02 Thread John Heffner

On Sep 2, 2005, at 10:05 AM, [EMAIL PROTECTED] wrote:

This particular Win2k sender sends _only_ real-time data, it's not 
capable of rewinding. So it's always sending small packets, from start 
to finish, yet the problem still occurs.


Note that even real-time data can end up generating a stream of 
full-size packets occassionally. It's just very unlikely they would 
occur at the start of the flow, as market data is very thin in the 
pre-market open hours.


The rcv_ssthresh growth can actually take place anywhere in the flow, 
not just at the beginning.





But, that window clamping should fix the problem, as we recalculate
the window to advertise.


Patches for testing are very much welcome...


Have you tried increasing the size of the receive buffer yet?

  -John

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Possible BUG in IPv4 TCP window handling, all recent 2.4.x/2.6.x kernels

2005-09-02 Thread John Heffner

On Sep 2, 2005, at 9:52 AM, Alexey Kuznetsov wrote:


Hello!


I experienced the very same problem but with window size going all the
way down to just a few bytes (14 bytes). dump files available upon
requests :)


I do request.

TCP is not allowed to reduce window to a value less than 2*MSS no 
matter

how hard network device or peer try to confuse it. :-)


You're right, that doesn't make sense...

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Possible BUG in IPv4 TCP window handling, all recent 2.4.x/2.6.x kernels

2005-09-02 Thread John Heffner

On Sep 2, 2005, at 9:48 AM, Alexey Kuznetsov wrote:


Hello!


If you overflow the socket's memory bound, it ends up calling
tcp_clamp_window().  (I'm not sure this is really the right thing to 
do

here before trying to collapse the queue.)


Collapsing is too expensive procedure, it is rather an emergency 
measure.
So, tcp collapses queue, when it is necessary, but it must reduce 
window

as well.


Right.

I wonder if clamping the window though is too harsh.  Maybe just 
setting the rcv_ssthresh down is better?  Why the distinction between 
in-order and out-of-order data?  Because you expect in-order data to be 
a persistent case?


  -John

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Possible BUG in IPv4 TCP window handling, all recent 2.4.x/2.6.x kernels

2005-09-02 Thread John Heffner

On Sep 2, 2005, at 10:33 AM, [EMAIL PROTECTED] wrote:


On Fri, 2 Sep 2005, John Heffner wrote:


Have you tried increasing the size of the receive buffer yet?


Actually, I just did. I changed rmem_max and rmem_default to 4MB and 
tcp_rmem to 64k 4MB 4MB. It did seem to help, but I'm wondering if 
that's simply because it has a _lot_ of memory now to leak before it 
starts eating up into the window size.


If it is window clamping, then you should be asymptotically approaching 
a ratio between receive buffer and window that corresponds (with a 
fudge factor) to the ratio between TCP segment data size and allocated 
packet size.  If you make the receive buffer large enough, then the 
clamped window should still end up big enough.  Also, since you have 
real time data, a larger receive buffer should probably be adequate 
to eliminate this problem, since it only occurs when the receiving 
application falls behind for a while, and a bigger receive buffer 
allows it to fall behind more without triggering the window clamping.


  -John

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Possible BUG in IPv4 TCP window handling, all recent 2.4.x/2.6.x kernels

2005-09-02 Thread Alexey Kuznetsov
Hello!

 The server socket sockopt are all default, except for the 
 TCP_WINDOW_CLAMP which is set to 1400 (application specific).

It is definitely not all. If you do not fiddle with SO_RCVBUF also,
you will always have receiver advertising window of 1400.


11:15:00.922119 IP 10.10.10.3.1150  10.10.10.2.3200: S 
2246605788:2246605788(0) win 6144 mss 1460,nop,wscale 0,nop,nop,timestamp 
3248699 0
11:15:00.922791 IP 10.10.10.2.3200  10.10.10.3.1150: S 
3863556410:3863556410(0) ack 2246605789 win 1400 mss 1460,nop,nop,timestamp 
268188460 3248699,nop,wscale 0
11:15:00.923118 IP 10.10.10.3.1150  10.10.10.2.3200: . ack 1 win 6144 
nop,nop,timestamp 3248699 268188460
11:15:00.923486 IP 10.10.10.3.1150  10.10.10.2.3200: P 1:7(6) ack 1 win 6144 
nop,nop,timestamp 3248699 268188460
11:15:00.924143 IP 10.10.10.2.3200  10.10.10.3.1150: . ack 7 win 1394 
nop,nop,timestamp 268188460 3248699

cannot happen. SO_RCVBUF is not default.

Alexey
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Possible BUG in IPv4 TCP window handling, all recent 2.4.x/2.6.x kernels

2005-09-02 Thread Ion Badulescu

On Fri, 2 Sep 2005, John Heffner wrote:

If it is window clamping, then you should be asymptotically approaching a 
ratio between receive buffer and window that corresponds (with a fudge 
factor) to the ratio between TCP segment data size and allocated packet size. 
If you make the receive buffer large enough, then the clamped window should 
still end up big enough.


For what it's worth, running with a 512k receive buffer still caused the 
clamping to occur, though it took longer than with the normal buffer size. 
The window went down from a maximum of 12291 (times 2^4 due to window 
scaling) to 3190 currently. That's still enough for our purposes, but I'll 
keep monitoring it to see if it shrinks any further. It could be a viable 
work-around for the time being.


Is this a bug, though, or a feature? :)

Also, since you have real time data, a larger 
receive buffer should probably be adequate to eliminate this problem, since 
it only occurs when the receiving application falls behind for a while, and a 
bigger receive buffer allows it to fall behind more without triggering the 
window clamping.


Correct. I noticed too while experimenting that the clamping never occurs 
if the application is fast enough to keep the socket buffer empty. It's 
when data is allowed to accumulate in the buffer that the window shrinks, 
and then it never grows back, as if a portion of the buffer got lost 
permanently.


-Ion
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Possible BUG in IPv4 TCP window handling, all recent 2.4.x/2.6.x kernels

2005-09-02 Thread Alexey Kuznetsov
Hello!

 This is where things start going bad. The window starts shrinking from 
 15340 all the way down to 2355 over the course of 0.3 seconds. Notice the 
 many duplicate acks that serve no purpose

These are not duplicate, TCP_NODELAY sender just starts flooding
tiny segments, and those are normal ACKs acking those segments, note
ACK field is not the same.

 Five minutes later the TCP window is still at 2355,

 We are kind of stumped at this point, and it's proving to be a 
 show-stopping bug for our purposes, especially over WAN links that have 
 higher latency (for obvious reasons). Any kind of assistance would be 
 greatly appreciated.

I still do not know how the value of 184 is possible in your case,
I would expect 730 as an absolute possible minumum. I see 9420 (2355*4).
Anyway, ignoring this puzzle, the following patch for 2.4 should help.


--- net/ipv4/tcp_input.c.orig   2003-02-20 20:38:39.0 +0300
+++ net/ipv4/tcp_input.c2005-09-02 22:28:00.845952888 +0400
@@ -343,8 +343,6 @@
app_win -= tp-ack.rcv_mss;
app_win = max(app_win, 2U*tp-advmss);
 
-   if (!ofo_win)
-   tp-window_clamp = min(tp-window_clamp, app_win);
tp-rcv_ssthresh = min(tp-window_clamp, 2U*tp-advmss);
}
 }

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Possible BUG in IPv4 TCP window handling, all recent 2.4.x/2.6.x kernels

2005-09-02 Thread Alexey Kuznetsov
Hello!

 Well, take a look at the double acks for 84439343, 84440447 and 84441059, 
 they seem pretty much identical to me.

It is just a little tcpdump glitch.

19:34:54.532271  10.2.20.246.33060  65.171.224.182.8700: . 44:44(0) ack 
84439343 win 24544 nop,nop,timestamp 226080638 99717832 (DF) (ttl 64, id 
60946)
19:34:54.532432  10.2.20.246.33060  65.171.224.182.8700: . 44:44(0) ack 
84439343 win 24544 nop,nop,timestamp 226080638 99717832 (DF) (ttl 64, id 
60946)

It is one ACK (look at IP ID), shown twice. This happens sometimes
with our packet socket.


 I still do not know how the value of 184 is possible in your case,
 I would expect 730 as an absolute possible minumum. I see 9420 (2355*4).
 
 The numbers I mentioned are straight from the tcpdump and are not scaled, 

I understood. I expect when 184*4, when you said 184. But minimum is
still 730 (unscaled 1460*2). If you really saw values lower than 730
(unscaled 1460*2), there is another more severe problem and the suggested
patch will not solve it.

Alexey
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Possible BUG in IPv4 TCP window handling, all recent 2.4.x/2.6.x kernels

2005-09-01 Thread Jesper Juhl
On 9/2/05, Ion Badulescu [EMAIL PROTECTED] wrote:
 Hi David,
 
 On Thu, 1 Sep 2005, David S. Miller wrote:
 
  Thanks for the empty posting.  Please provide the content you
  intended to post, and furthermore please post it to the network
  developer mailing list, netdev@vger.kernel.org
 
 First of all, thanks for the reply (even to an empty posting :).
 
 The posting wasn't actually empty, it was probably too long (94K according

Two solutions commonly applied to that problem :

 - put the big file(s) online somewhere and include an URL in the email
 - compress the file(s) and attach the compressed files to the email

-- 
Jesper Juhl [EMAIL PROTECTED]
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please  http://www.expita.com/nomime.html
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Possible BUG in IPv4 TCP window handling, all recent 2.4.x/2.6.x kernels

2005-09-01 Thread John Heffner

On Sep 1, 2005, at 6:53 PM, Ion Badulescu wrote:


A few minutes later it has finally caught up to present time and it 
starts receiving smaller packets containing real-time data. The TCP 
window is still 16534 at this point.


[tcpdump output removed]

This is where things start going bad. The window starts shrinking from 
15340 all the way down to 2355 over the course of 0.3 seconds. Notice 
the many duplicate acks that serve no purpose (there are no lost 
packets and the tcpdump is taken on the receiver so there is no 
packets/acks crossed in flight).


I have an idea why this is going on.  Packets are pre-allocated by the 
driver to be a max packet size, so when you send small packets, it 
wastes a lot of memory.  Currently Linux uses the packets at the 
beginning of a connection to make a guess at how best to advertise its 
window so as not to overflow the socket's memory bounds.  Since you 
start out with big segments then go to small ones, this is defeating 
that mechanism.  It's actually documented in the comments in 
tcp_input.c. :)


 * The scheme does not work when sender sends good segments opening
 * window and then starts to feed us spagetti. But it should work
 * in common situations. Otherwise, we have to rely on queue collapsing.

If you overflow the socket's memory bound, it ends up calling 
tcp_clamp_window().  (I'm not sure this is really the right thing to do 
here before trying to collapse the queue.)  If the receiving 
application doesn't fall too far behind, it might help you to set a 
much larger receiver buffer.


  -John

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html