Re: [PATCH] fix limited slow start bug
On 2/25/07, Arnaldo Carvalho de Melo <[EMAIL PROTECTED]> wrote: On 2/25/07, David Miller <[EMAIL PROTECTED]> wrote: > From: Roger While <[EMAIL PROTECTED]> > Date: Sun, 25 Feb 2007 09:55:34 +0100 > > > Was anything done about size/member alignment of struct tcp_sock per > > mail from last year - > > http://marc.theaimsgroup.com/?l=linux-netdev&m=114318857102290&w=2 > > > > (I have no idea what current size is) > > Nothing has been done yet but I've been thinking about it a lot > over the past year and I've had some discussions with other > developers such as Arnaldo. > > It's just a matter of me being backlogged, so I never get to > it as often as I would like :) Attached goes a current (DaveM's net-2.6 git tree build) pahole picture of tcp_sock on UP, 32bits, summary: }; /* size: 1288, cachelines: 21 */ /* last cacheline: 8 bytes */ and for the really curious, take a look at: http://oops.ghostprotocols.net:81/acme/dwarves/tcp_sock.pahole.expand_types.txt All the types are expanded, makes a pretty big picture :-) And looking at it I saw I have Ingo's timer debugging option enabled, which makes struct timer_list a bit bigger: struct timer_list { struct list_head { struct list_head * next; /* 0 4 */ struct list_head * prev; /* 4 4 */ } entry; /* 0 8 */ long unsigned int expires; /* 8 4 */ void (*function)(long unsigned int); /*12 4 */ long unsigned int data;/*16 4 */ struct tvec_t_base_s * base; /*20 4 */ void * start_site; /*24 4 */ char start_comm[16]; /*2816 */ intstart_pid; /*44 4 */ } icsk_delack_timer; /* 66848 */ The three last members are related to debugging, so discount 24 bytes times 3, as there are tree struct timer_list inside struct tcp_sock. - Arnaldo - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fix limited slow start bug
On 2/25/07, David Miller <[EMAIL PROTECTED]> wrote: From: Roger While <[EMAIL PROTECTED]> Date: Sun, 25 Feb 2007 09:55:34 +0100 > Was anything done about size/member alignment of struct tcp_sock per > mail from last year - > http://marc.theaimsgroup.com/?l=linux-netdev&m=114318857102290&w=2 > > (I have no idea what current size is) Nothing has been done yet but I've been thinking about it a lot over the past year and I've had some discussions with other developers such as Arnaldo. It's just a matter of me being backlogged, so I never get to it as often as I would like :) Attached goes a current (DaveM's net-2.6 git tree build) pahole picture of tcp_sock on UP, 32bits, summary: }; /* size: 1288, cachelines: 21 */ /* last cacheline: 8 bytes */ and for the really curious, take a look at: http://oops.ghostprotocols.net:81/acme/dwarves/tcp_sock.pahole.expand_types.txt All the types are expanded, makes a pretty big picture :-) - Arnaldo [EMAIL PROTECTED] net-2.6]$ pahole ../OUTPUT/qemu/net-2.6/vmlinux tcp_sock /* /home/acme/git/net-2.6/include/linux/tcp.h:227 */ struct tcp_sock { struct inet_connection_sock inet_conn; /* 0 844 */ /* --- cacheline 13 boundary (832 bytes) was 12 bytes ago --- */ u16tcp_header_len; /* 844 2 */ u16xmit_size_goal; /* 846 2 */ __be32 pred_flags; /* 848 4 */ u32rcv_nxt; /* 852 4 */ u32snd_nxt; /* 856 4 */ u32snd_una; /* 860 4 */ u32snd_sml; /* 864 4 */ u32rcv_tstamp; /* 868 4 */ u32lsndtime; /* 872 4 */ struct { struct sk_buff_head prequeue;/* 028 */ struct task_struct * task; /*28 4 */ struct iovec * iov; /*32 4 */ intmemory; /*36 4 */ intlen; /*40 4 */ } ucopy; /* 87644 */ /* --- cacheline 14 boundary (896 bytes) was 24 bytes ago --- */ u32snd_wl1; /* 920 4 */ u32snd_wnd; /* 924 4 */ u32max_window; /* 928 4 */ u32mss_cache;/* 932 4 */ u32window_clamp; /* 936 4 */ u32rcv_ssthresh; /* 940 4 */ u32frto_highmark;/* 944 4 */ u8 reordering; /* 948 1 */ u8 frto_counter; /* 949 1 */ u8 nonagle; /* 950 1 */ u8 keepalive_probes; /* 951 1 */ u32srtt; /* 952 4 */ u32mdev; /* 956 4 */ /* --- cacheline 15 boundary (960 bytes) --- */ u32mdev_max; /* 960 4 */ u32rttvar; /* 964 4 */ u32rtt_seq; /* 968 4 */ u32packets_out; /* 972 4 */ u32left_out; /* 976 4 */ u32retrans_out; /* 980 4 */ struct tcp_options_received rx_opt; /* 98424 */ u32snd_ssthresh; /* 1008 4 */ u32snd_cwnd; /* 1012 4 */ u16snd_cwnd_cnt; /* 1016 2 */ u16snd_cwnd_clamp; /* 1018 2 */ u32snd_cwnd_used;/* 1020 4 */ /* --- cacheline 16 boundary (1024 bytes) --- */ u32snd_cwnd_stamp; /* 1024 4 */ struct sk_buff_headout_of_order_queue; /* 102828 */ u32rcv_wnd; /* 1056 4 */ u32rcv_wup; /* 1060 4 */ u32write_seq;/* 1064 4 */ u32pushed_seq; /* 1068 4 */ u32copied_seq; /* 1072 4 */ struct t
Re: [PATCH] fix limited slow start bug
From: Roger While <[EMAIL PROTECTED]> Date: Sun, 25 Feb 2007 09:55:34 +0100 > Was anything done about size/member alignment of struct tcp_sock per > mail from last year - > http://marc.theaimsgroup.com/?l=linux-netdev&m=114318857102290&w=2 > > (I have no idea what current size is) Nothing has been done yet but I've been thinking about it a lot over the past year and I've had some discussions with other developers such as Arnaldo. It's just a matter of me being backlogged, so I never get to it as often as I would like :) - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fix limited slow start bug
Dave M wrote : diff --git a/include/linux/tcp.h b/include/linux/tcp.h index 415193e..18a468d 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -302,7 +302,7 @@ struct tcp_sock { u32 snd_ssthresh; /* Slow start size threshold*/ u32 snd_cwnd; /* Sending congestion window*/ u16 snd_cwnd_cnt; /* Linear increase counter */ - u16 snd_cwnd_clamp; /* Do not allow snd_cwnd to grow above this */ + u32 snd_cwnd_clamp; /* Do not allow snd_cwnd to grow above this */ u32 snd_cwnd_used; u32 snd_cwnd_stamp; Was anything done about size/member alignment of struct tcp_sock per mail from last year - http://marc.theaimsgroup.com/?l=linux-netdev&m=114318857102290&w=2 (I have no idea what current size is) - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fix limited slow start bug
From: John Heffner <[EMAIL PROTECTED]> Date: Thu, 22 Feb 2007 16:52:03 -0500 > I think it's not unreasonable to change clamp to 32 bits now, since with > 1500 byte packets, this corresponds to a max cwnd of ~94MB. This is > pretty big, but we are currently right at this limit with 10 GigE. Agreed, and done in tcp-2.6.git as below. What should we do about that 65535 assignment in hybla? commit cedfa95566512730202bb4abed5d9118e74bab30 Author: David S. Miller <[EMAIL PROTECTED]> Date: Thu Feb 22 22:52:59 2007 -0800 [TCP]: Make snd_cwnd_clamp a u32. Signed-off-by: David S. Miller <[EMAIL PROTECTED]> diff --git a/include/linux/tcp.h b/include/linux/tcp.h index 415193e..18a468d 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -302,7 +302,7 @@ struct tcp_sock { u32 snd_ssthresh; /* Slow start size threshold*/ u32 snd_cwnd; /* Sending congestion window*/ u16 snd_cwnd_cnt; /* Linear increase counter */ - u16 snd_cwnd_clamp; /* Do not allow snd_cwnd to grow above this */ + u32 snd_cwnd_clamp; /* Do not allow snd_cwnd to grow above this */ u32 snd_cwnd_used; u32 snd_cwnd_stamp; - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fix limited slow start bug
John, what tree did you diff this against? I can tell you didn't create this patch against anything actually in any of my trees, because: diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c index 7fd2910..a0c894f 100644 --- a/net/ipv4/tcp_cong.c +++ b/net/ipv4/tcp_cong.c @@ -303,9 +303,9 @@ void tcp_slow_start(struct tcp_sock *tp) tp->snd_cwnd_cnt += cnt; See that line right before the snd_cwnd_cnt increment? There is a tab there in your patch on that empty line. Yoshifuji eliminated all extraneous trailing whitespaces, and spaces that should be tabs, across the entire networking, several weeks ago. Including the tab on that empty line in your patch. This means that you applied the YeaH and your own patch to another source tree, perhaps 2.6.20 or similar, and then generated your fix against that. Please don't do things like that. Instead please produce patches against the tree they will end up being applied to so that they will apply cleanly for me. I can't believe how much time I spend getting people to produce correct patches :-/ Anyways, I applied this by hand, but next time I definitely am not going to. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fix limited slow start bug
Ilpo Järvinen wrote: BTW, while looking this patch, I noticed that snd_cwnd_clamp is only u16 while snd_cwnd is u32, which seems rather strange since snd_cwnd is being limited by the clamp value here and there?!?! And tcp_highspeed.c is clearly assuming even more than this (but the problem is hidden as snd_cwnd_clamp is feed back to the min_t and the used 32-bit constant could be safely cut to 16-bits anyway): tp->snd_cwnd_clamp = min_t(u32, tp->snd_cwnd_clamp, 0x/128); Has the type being changed somewhere in the past or why is this so? It's been that way as long as I can remember. It's always been a mystery to me as well. I suspect the tcp_highspeed code is that way because this patch originally came out of the Web100-patched kernel, which at one point was using a 32 bit snd_cwnd_clamp IIRC. I think it's not unreasonable to change clamp to 32 bits now, since with 1500 byte packets, this corresponds to a max cwnd of ~94MB. This is pretty big, but we are currently right at this limit with 10 GigE. -John - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fix limited slow start bug
On Thu, 22 Feb 2007, John Heffner wrote: > diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c > index 7fd2910..a0c894f 100644 > --- a/net/ipv4/tcp_cong.c > +++ b/net/ipv4/tcp_cong.c > @@ -303,9 +303,9 @@ void tcp_slow_start(struct tcp_sock *tp) > > tp->snd_cwnd_cnt += cnt; > while (tp->snd_cwnd_cnt >= tp->snd_cwnd) { > + tp->snd_cwnd_cnt -= tp->snd_cwnd; > if (tp->snd_cwnd < tp->snd_cwnd_clamp) > tp->snd_cwnd++; > - tp->snd_cwnd_cnt -= tp->snd_cwnd; > } > } > EXPORT_SYMBOL_GPL(tcp_slow_start); ACK. I was going to track this down tomorrow as I noticed strange behavior during slow start while testing tcp-2.6 today but I don't have to anymore :-). The problem I saw is now obvious, whole congestion control was practically disabled due to underflow of the unsigned type that occurs without this patch, causing TCP to be limited only by the receiver advertized window during slow-start. BTW, while looking this patch, I noticed that snd_cwnd_clamp is only u16 while snd_cwnd is u32, which seems rather strange since snd_cwnd is being limited by the clamp value here and there?!?! And tcp_highspeed.c is clearly assuming even more than this (but the problem is hidden as snd_cwnd_clamp is feed back to the min_t and the used 32-bit constant could be safely cut to 16-bits anyway): tp->snd_cwnd_clamp = min_t(u32, tp->snd_cwnd_clamp, 0x/128); Has the type being changed somewhere in the past or why is this so? -- i. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] fix limited slow start bug
Fix arithmetic order bug in limited slow start. The subtraction needs to be done before snd_cwnd is incremented. Signed-off-by: John Heffner <[EMAIL PROTECTED]> --- commit 244e7411d99443df7b7ae849ba6ebbec4c2342bc tree e6d5985a22448f59f8bef393542e1d5497ee5684 parent 97033fa201705e6cfc68ce66f34ede3277c3d645 author John Heffner <[EMAIL PROTECTED]> Thu, 22 Feb 2007 13:54:01 -0500 committer John Heffner <[EMAIL PROTECTED]> Thu, 22 Feb 2007 13:54:01 -0500 net/ipv4/tcp_cong.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c index 7fd2910..a0c894f 100644 --- a/net/ipv4/tcp_cong.c +++ b/net/ipv4/tcp_cong.c @@ -303,9 +303,9 @@ void tcp_slow_start(struct tcp_sock *tp) tp->snd_cwnd_cnt += cnt; while (tp->snd_cwnd_cnt >= tp->snd_cwnd) { + tp->snd_cwnd_cnt -= tp->snd_cwnd; if (tp->snd_cwnd < tp->snd_cwnd_clamp) tp->snd_cwnd++; - tp->snd_cwnd_cnt -= tp->snd_cwnd; } } EXPORT_SYMBOL_GPL(tcp_slow_start);