Re: TCP's initial cwnd setting correct?...
On Tue, 7 Aug 2007, David Miller wrote: From: Ilpo_Järvinen [EMAIL PROTECTED] Date: Mon, 6 Aug 2007 15:37:15 +0300 (EEST) ...Another thing that makes me wonder, is the tp-mss_cache 1460 check as based on rfc3390 (and also it's precursor rfc2414) with up to 2190 bytes MSS TCP can use 3 as initial cwnd... I did the research and my memory was at least partially right. Below is an old bogus change of mine and the later revert with Alexey's explanation. This seems to be dealing with receive window calculation issues, rather than snd_cwnd. But they might be related and you should consider this very seriously. ...Thanks about the info, I'll study it carefully and see what's relevant in here too. And anyway we're currently on the safer side as the potential issues only make it more conservative than the RFC allows, so there's no hurry to get it in until the solution is acceptable if there's indeed a need for a change. -- i.
Re: TCP's initial cwnd setting correct?...
That sounds right to me. -John Ilpo Järvinen wrote: On Mon, 6 Aug 2007, Ilpo Järvinen wrote: ...Goto logic could be cleaner (somebody has any suggestion for better way to structure it?) ...I could probably move the setting of snd_cwnd earlier to avoid this problem if this seems a valid fix at all. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP's initial cwnd setting correct?...
I believe the current calculation is correct. The RFC specifies a window of no more than 4380 bytes unless 2*MSS 4380. If you change the code in this way, then MSS=1461 will give you an initial window of 3*MSS == 4383, violating the spec. Reading the pseudocode in the RFC 3390 is a bit misleading because they use a clamp at 4380 bytes rather than use a multiplier in the relevant range. -John David Miller wrote: From: Ilpo_Järvinen [EMAIL PROTECTED] Date: Mon, 6 Aug 2007 15:37:15 +0300 (EEST) @@ -805,13 +805,13 @@ void tcp_update_metrics(struct sock *sk) } } -/* Numbers are taken from RFC2414. */ +/* Numbers are taken from RFC3390. */ __u32 tcp_init_cwnd(struct tcp_sock *tp, struct dst_entry *dst) { __u32 cwnd = (dst ? dst_metric(dst, RTAX_INITCWND) : 0); if (!cwnd) { - if (tp-mss_cache 1460) + if (tp-mss_cache = 2190) cwnd = 2; else cwnd = (tp-mss_cache 1095) ? 3 : 4; I remember suggesting something similar about 5 or 6 years ago and Alexey Kuznetsov at the time explained the numbers which are there and why they should not be changed. I forget the reasons though, and I'll try to do the research. These numbers have been like this forever, FWIW. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP's initial cwnd setting correct?...
From: John Heffner [EMAIL PROTECTED] Date: Wed, 08 Aug 2007 11:20:05 -0400 I believe the current calculation is correct. The RFC specifies a window of no more than 4380 bytes unless 2*MSS 4380. If you change the code in this way, then MSS=1461 will give you an initial window of 3*MSS == 4383, violating the spec. Reading the pseudocode in the RFC 3390 is a bit misleading because they use a clamp at 4380 bytes rather than use a multiplier in the relevant range. Thanks for this excellent clarification John. Because this has tripped up enough people, not once but on multiple occaisions, I'm going to add some comments to tcp_init_cwnd() to save the next person who is confused by what seems to be an incorrect implementation of the RFC :-) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP's initial cwnd setting correct?...
From: Ilpo_Järvinen [EMAIL PROTECTED] Date: Mon, 6 Aug 2007 15:37:15 +0300 (EEST) @@ -805,13 +805,13 @@ void tcp_update_metrics(struct sock *sk) } } -/* Numbers are taken from RFC2414. */ +/* Numbers are taken from RFC3390. */ __u32 tcp_init_cwnd(struct tcp_sock *tp, struct dst_entry *dst) { __u32 cwnd = (dst ? dst_metric(dst, RTAX_INITCWND) : 0); if (!cwnd) { - if (tp-mss_cache 1460) + if (tp-mss_cache = 2190) cwnd = 2; else cwnd = (tp-mss_cache 1095) ? 3 : 4; I remember suggesting something similar about 5 or 6 years ago and Alexey Kuznetsov at the time explained the numbers which are there and why they should not be changed. I forget the reasons though, and I'll try to do the research. These numbers have been like this forever, FWIW. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP's initial cwnd setting correct?...
From: Ilpo_Järvinen [EMAIL PROTECTED] Date: Mon, 6 Aug 2007 15:37:15 +0300 (EEST) ...Another thing that makes me wonder, is the tp-mss_cache 1460 check as based on rfc3390 (and also it's precursor rfc2414) with up to 2190 bytes MSS TCP can use 3 as initial cwnd... I did the research and my memory was at least partially right. Below is an old bogus change of mine and the later revert with Alexey's explanation. This seems to be dealing with receive window calculation issues, rather than snd_cwnd. But they might be related and you should consider this very seriously. commit 6b251858d377196b8cea20e65cae60f584a42735 Author: David S. Miller [EMAIL PROTECTED] Date: Wed Sep 28 16:31:48 2005 -0700 [TCP]: Fix init_cwnd calculations in tcp_select_initial_window() Match it up to what RFC2414 really specifies. Noticed by Rick Jones. Signed-off-by: David S. Miller [EMAIL PROTECTED] diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index d6e3d26..caf2e2c 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -190,15 +190,16 @@ void tcp_select_initial_window(int __space, __u32 mss, } /* Set initial window to value enough for senders, -* following RFC1414. Senders, not following this RFC, +* following RFC2414. Senders, not following this RFC, * will be satisfied with 2. */ if (mss (1*rcv_wscale)) { - int init_cwnd = 4; - if (mss 1460*3) + int init_cwnd; + + if (mss 1460) init_cwnd = 2; - else if (mss 1460) - init_cwnd = 3; + else + init_cwnd = (mss 1095) ? 3 : 4; if (*rcv_wnd init_cwnd*mss) *rcv_wnd = init_cwnd*mss; } commit 01ff367e62f0474e4d39aa5812cbe2a30d96e1e9 Author: David S. Miller [EMAIL PROTECTED] Date: Thu Sep 29 17:07:20 2005 -0700 [TCP]: Revert 6b251858d377196b8cea20e65cae60f584a42735 But retain the comment fix. Alexey Kuznetsov has explained the situation as follows: I think the fix is incorrect. Look, the RFC function init_cwnd(mss) is not continuous: f.e. for mss=1095 it needs initial window 1095*4, but for mss=1096 it is 1096*3. We do not know exactly what mss sender used for calculations. If we advertised 1096 (and calculate initial window 3*1096), the sender could limit it to some value 1096 and then it will need window his_mss*4 3*1096 to send initial burst. See? So, the honest function for inital rcv_wnd derived from tcp_init_cwnd() is: init_rcv_wnd(mss)= min { init_cwnd(mss1)*mss1 for mss1 = mss } It is something sort of: if (mss 1096) return mss*4; if (mss 1096*2) return 1096*4; return mss*2; (I just scrablled a graph of piece of paper, it is difficult to see or to explain without this) I selected it differently giving more window than it is strictly required. Initial receive window must be large enough to allow sender following to the rfc (or just setting initial cwnd to 2) to send initial burst. But besides that it is arbitrary, so I decided to give slack space of one segment. Actually, the logic was: If mss is low/normal (=ethernet), set window to receive more than initial burst allowed by rfc under the worst conditions i.e. mss*4. This gives slack space of 1 segment for ethernet frames. For msses slighlty more than ethernet frame, take 3. Try to give slack space of 1 frame again. If mss is huge, force 2*mss. No slack space. Value 1460*3 is really confusing. Minimal one is 1096*2, but besides that it is an arbitrary value. It was meant to be ~4096. 1460*3 is just the magic number from RFC, 1460*3 = 1095*4 is the magic :-), so that I guess hands typed this themselves. Signed-off-by: David S. Miller [EMAIL PROTECTED] diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index caf2e2c..c5b911f 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -194,12 +194,11 @@ void tcp_select_initial_window(int __space, __u32 mss, * will be satisfied with 2. */ if (mss (1*rcv_wscale)) { - int init_cwnd; - - if (mss 1460) + int init_cwnd = 4; + if (mss 1460*3) init_cwnd = 2; - else - init_cwnd = (mss 1095) ? 3 : 4; + else if (mss 1460) + init_cwnd = 3; if (*rcv_wnd init_cwnd*mss) *rcv_wnd = init_cwnd*mss; } - To unsubscribe from this list: send the line unsubscribe netdev in the
Re: TCP's initial cwnd setting correct?...
On Mon, 6 Aug 2007, Ilpo Järvinen wrote: ...Goto logic could be cleaner (somebody has any suggestion for better way to structure it?) ...I could probably move the setting of snd_cwnd earlier to avoid this problem if this seems a valid fix at all. -- i.