Re: TCP's initial cwnd setting correct?...

2007-08-08 Thread Ilpo Järvinen
On Tue, 7 Aug 2007, David Miller wrote:

 From: Ilpo_Järvinen [EMAIL PROTECTED]
 Date: Mon, 6 Aug 2007 15:37:15 +0300 (EEST)
 
  ...Another thing that makes me wonder, is the tp-mss_cache  1460 check 
  as based on rfc3390 (and also it's precursor rfc2414) with up to 2190 
  bytes MSS TCP can use 3 as initial cwnd...
 
 I did the research and my memory was at least partially right.
 
 Below is an old bogus change of mine and the later revert with
 Alexey's explanation.
 
 This seems to be dealing with receive window calculation issues,
 rather than snd_cwnd.  But they might be related and you should
 consider this very seriously.

...Thanks about the info, I'll study it carefully and see what's
relevant in here too. And anyway we're currently on the safer side
as the potential issues only make it more conservative than the RFC 
allows, so there's no hurry to get it in until the solution is
acceptable if there's indeed a need for a change.


-- 
 i.

Re: TCP's initial cwnd setting correct?...

2007-08-08 Thread John Heffner

That sounds right to me.

  -John


Ilpo Järvinen wrote:

On Mon, 6 Aug 2007, Ilpo Järvinen wrote:

...Goto logic could be cleaner (somebody has any suggestion for better 
way to structure it?)


...I could probably move the setting of snd_cwnd earlier to avoid 
this problem if this seems a valid fix at all.




-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TCP's initial cwnd setting correct?...

2007-08-08 Thread John Heffner
I believe the current calculation is correct.  The RFC specifies a 
window of no more than 4380 bytes unless 2*MSS  4380.  If you change 
the code in this way, then MSS=1461 will give you an initial window of 
3*MSS == 4383, violating the spec.  Reading the pseudocode in the RFC 
3390 is a bit misleading because they use a clamp at 4380 bytes rather 
than use a multiplier in the relevant range.


  -John


David Miller wrote:

From: Ilpo_Järvinen [EMAIL PROTECTED]
Date: Mon, 6 Aug 2007 15:37:15 +0300 (EEST)


@@ -805,13 +805,13 @@ void tcp_update_metrics(struct sock *sk)
}
 }
 
-/* Numbers are taken from RFC2414.  */

+/* Numbers are taken from RFC3390.  */
 __u32 tcp_init_cwnd(struct tcp_sock *tp, struct dst_entry *dst)
 {
__u32 cwnd = (dst ? dst_metric(dst, RTAX_INITCWND) : 0);
 
 	if (!cwnd) {

-   if (tp-mss_cache  1460)
+   if (tp-mss_cache = 2190)
cwnd = 2;
else
cwnd = (tp-mss_cache  1095) ? 3 : 4;


I remember suggesting something similar about 5 or 6 years
ago and Alexey Kuznetsov at the time explained the numbers
which are there and why they should not be changed.

I forget the reasons though, and I'll try to do the research.

These numbers have been like this forever, FWIW.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TCP's initial cwnd setting correct?...

2007-08-08 Thread David Miller
From: John Heffner [EMAIL PROTECTED]
Date: Wed, 08 Aug 2007 11:20:05 -0400

 I believe the current calculation is correct.  The RFC specifies a 
 window of no more than 4380 bytes unless 2*MSS  4380.  If you change 
 the code in this way, then MSS=1461 will give you an initial window of 
 3*MSS == 4383, violating the spec.  Reading the pseudocode in the RFC 
 3390 is a bit misleading because they use a clamp at 4380 bytes rather 
 than use a multiplier in the relevant range.

Thanks for this excellent clarification John.

Because this has tripped up enough people, not once but on
multiple occaisions, I'm going to add some comments to
tcp_init_cwnd() to save the next person who is confused
by what seems to be an incorrect implementation of the RFC :-)
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TCP's initial cwnd setting correct?...

2007-08-07 Thread David Miller
From: Ilpo_Järvinen [EMAIL PROTECTED]
Date: Mon, 6 Aug 2007 15:37:15 +0300 (EEST)

 @@ -805,13 +805,13 @@ void tcp_update_metrics(struct sock *sk)
   }
  }
  
 -/* Numbers are taken from RFC2414.  */
 +/* Numbers are taken from RFC3390.  */
  __u32 tcp_init_cwnd(struct tcp_sock *tp, struct dst_entry *dst)
  {
   __u32 cwnd = (dst ? dst_metric(dst, RTAX_INITCWND) : 0);
  
   if (!cwnd) {
 - if (tp-mss_cache  1460)
 + if (tp-mss_cache = 2190)
   cwnd = 2;
   else
   cwnd = (tp-mss_cache  1095) ? 3 : 4;

I remember suggesting something similar about 5 or 6 years
ago and Alexey Kuznetsov at the time explained the numbers
which are there and why they should not be changed.

I forget the reasons though, and I'll try to do the research.

These numbers have been like this forever, FWIW.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TCP's initial cwnd setting correct?...

2007-08-07 Thread David Miller
From: Ilpo_Järvinen [EMAIL PROTECTED]
Date: Mon, 6 Aug 2007 15:37:15 +0300 (EEST)

 ...Another thing that makes me wonder, is the tp-mss_cache  1460 check 
 as based on rfc3390 (and also it's precursor rfc2414) with up to 2190 
 bytes MSS TCP can use 3 as initial cwnd...

I did the research and my memory was at least partially right.

Below is an old bogus change of mine and the later revert with
Alexey's explanation.

This seems to be dealing with receive window calculation issues,
rather than snd_cwnd.  But they might be related and you should
consider this very seriously.

commit 6b251858d377196b8cea20e65cae60f584a42735
Author: David S. Miller [EMAIL PROTECTED]
Date:   Wed Sep 28 16:31:48 2005 -0700

[TCP]: Fix init_cwnd calculations in tcp_select_initial_window()

Match it up to what RFC2414 really specifies.
Noticed by Rick Jones.

Signed-off-by: David S. Miller [EMAIL PROTECTED]

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index d6e3d26..caf2e2c 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -190,15 +190,16 @@ void tcp_select_initial_window(int __space, __u32 mss,
}
 
/* Set initial window to value enough for senders,
-* following RFC1414. Senders, not following this RFC,
+* following RFC2414. Senders, not following this RFC,
 * will be satisfied with 2.
 */
if (mss  (1*rcv_wscale)) {
-   int init_cwnd = 4;
-   if (mss  1460*3)
+   int init_cwnd;
+
+   if (mss  1460)
init_cwnd = 2;
-   else if (mss  1460)
-   init_cwnd = 3;
+   else
+   init_cwnd = (mss  1095) ? 3 : 4;
if (*rcv_wnd  init_cwnd*mss)
*rcv_wnd = init_cwnd*mss;
}

commit 01ff367e62f0474e4d39aa5812cbe2a30d96e1e9
Author: David S. Miller [EMAIL PROTECTED]
Date:   Thu Sep 29 17:07:20 2005 -0700

[TCP]: Revert 6b251858d377196b8cea20e65cae60f584a42735

But retain the comment fix.

Alexey Kuznetsov has explained the situation as follows:



I think the fix is incorrect. Look, the RFC function init_cwnd(mss) is
not continuous: f.e. for mss=1095 it needs initial window 1095*4, but
for mss=1096 it is 1096*3. We do not know exactly what mss sender used
for calculations. If we advertised 1096 (and calculate initial window
3*1096), the sender could limit it to some value  1096 and then it
will need window his_mss*4  3*1096 to send initial burst.

See?

So, the honest function for inital rcv_wnd derived from
tcp_init_cwnd() is:

init_rcv_wnd(mss)=
  min { init_cwnd(mss1)*mss1 for mss1 = mss }

It is something sort of:

if (mss  1096)
return mss*4;
if (mss  1096*2)
return 1096*4;
return mss*2;

(I just scrablled a graph of piece of paper, it is difficult to see or
to explain without this)

I selected it differently giving more window than it is strictly
required.  Initial receive window must be large enough to allow sender
following to the rfc (or just setting initial cwnd to 2) to send
initial burst.  But besides that it is arbitrary, so I decided to give
slack space of one segment.

Actually, the logic was:

If mss is low/normal (=ethernet), set window to receive more than
initial burst allowed by rfc under the worst conditions
i.e. mss*4. This gives slack space of 1 segment for ethernet frames.

For msses slighlty more than ethernet frame, take 3. Try to give slack
space of 1 frame again.

If mss is huge, force 2*mss. No slack space.

Value 1460*3 is really confusing. Minimal one is 1096*2, but besides
that it is an arbitrary value. It was meant to be ~4096. 1460*3 is
just the magic number from RFC, 1460*3 = 1095*4 is the magic :-), so
that I guess hands typed this themselves.



Signed-off-by: David S. Miller [EMAIL PROTECTED]

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index caf2e2c..c5b911f 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -194,12 +194,11 @@ void tcp_select_initial_window(int __space, __u32 mss,
 * will be satisfied with 2.
 */
if (mss  (1*rcv_wscale)) {
-   int init_cwnd;
-
-   if (mss  1460)
+   int init_cwnd = 4;
+   if (mss  1460*3)
init_cwnd = 2;
-   else
-   init_cwnd = (mss  1095) ? 3 : 4;
+   else if (mss  1460)
+   init_cwnd = 3;
if (*rcv_wnd  init_cwnd*mss)
*rcv_wnd = init_cwnd*mss;
}
-
To unsubscribe from this list: send the line unsubscribe netdev in
the 

Re: TCP's initial cwnd setting correct?...

2007-08-06 Thread Ilpo Järvinen
On Mon, 6 Aug 2007, Ilpo Järvinen wrote:

 ...Goto logic could be cleaner (somebody has any suggestion for better 
 way to structure it?)

...I could probably move the setting of snd_cwnd earlier to avoid 
this problem if this seems a valid fix at all.

-- 
 i.