Re: [PATCH] fix limited slow start bug

2007-02-25 Thread Roger While


Dave M wrote :

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 415193e..18a468d 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -302,7 +302,7 @@ struct tcp_sock {
u32 snd_ssthresh;   /* Slow start size threshold*/
u32 snd_cwnd;   /* Sending congestion window*/
u16 snd_cwnd_cnt;   /* Linear increase counter  */
-   u16 snd_cwnd_clamp; /* Do not allow snd_cwnd to grow above this */
+   u32 snd_cwnd_clamp; /* Do not allow snd_cwnd to grow above this */
u32 snd_cwnd_used;
u32 snd_cwnd_stamp;


Was anything done about size/member alignment of struct tcp_sock per
mail from last year  -
http://marc.theaimsgroup.com/?l=linux-netdevm=114318857102290w=2

(I have no idea what current size is)


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fix limited slow start bug

2007-02-25 Thread David Miller
From: Roger While [EMAIL PROTECTED]
Date: Sun, 25 Feb 2007 09:55:34 +0100

 Was anything done about size/member alignment of struct tcp_sock per
 mail from last year  -
 http://marc.theaimsgroup.com/?l=linux-netdevm=114318857102290w=2
 
 (I have no idea what current size is)

Nothing has been done yet but I've been thinking about it a lot
over the past year and I've had some discussions with other
developers such as Arnaldo.

It's just a matter of me being backlogged, so I never get to
it as often as I would like :)
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fix limited slow start bug

2007-02-25 Thread Arnaldo Carvalho de Melo

On 2/25/07, David Miller [EMAIL PROTECTED] wrote:

From: Roger While [EMAIL PROTECTED]
Date: Sun, 25 Feb 2007 09:55:34 +0100

 Was anything done about size/member alignment of struct tcp_sock per
 mail from last year  -
 http://marc.theaimsgroup.com/?l=linux-netdevm=114318857102290w=2

 (I have no idea what current size is)

Nothing has been done yet but I've been thinking about it a lot
over the past year and I've had some discussions with other
developers such as Arnaldo.

It's just a matter of me being backlogged, so I never get to
it as often as I would like :)


Attached goes a current (DaveM's net-2.6 git tree build) pahole
picture of tcp_sock on UP, 32bits, summary:

}; /* size: 1288, cachelines: 21 */
  /* last cacheline: 8 bytes */

and for the really curious, take a look at:

http://oops.ghostprotocols.net:81/acme/dwarves/tcp_sock.pahole.expand_types.txt

All the types are expanded, makes a pretty big picture :-)

- Arnaldo
[EMAIL PROTECTED] net-2.6]$ pahole ../OUTPUT/qemu/net-2.6/vmlinux tcp_sock
/* e40138 /home/acme/git/net-2.6/include/linux/tcp.h:227 */
struct tcp_sock {
struct inet_connection_sock inet_conn;   /* 0   844 */
/* --- cacheline 13 boundary (832 bytes) was 12 bytes ago --- */
u16tcp_header_len;   /*   844 2 */
u16xmit_size_goal;   /*   846 2 */
__be32 pred_flags;   /*   848 4 */
u32rcv_nxt;  /*   852 4 */
u32snd_nxt;  /*   856 4 */
u32snd_una;  /*   860 4 */
u32snd_sml;  /*   864 4 */
u32rcv_tstamp;   /*   868 4 */
u32lsndtime; /*   872 4 */
struct {
struct sk_buff_head prequeue;/* 028 */
struct task_struct * task;   /*28 4 */
struct iovec * iov;  /*32 4 */
intmemory;   /*36 4 */
intlen;  /*40 4 */
} ucopy; /*   87644 */
/* --- cacheline 14 boundary (896 bytes) was 24 bytes ago --- */
u32snd_wl1;  /*   920 4 */
u32snd_wnd;  /*   924 4 */
u32max_window;   /*   928 4 */
u32mss_cache;/*   932 4 */
u32window_clamp; /*   936 4 */
u32rcv_ssthresh; /*   940 4 */
u32frto_highmark;/*   944 4 */
u8 reordering;   /*   948 1 */
u8 frto_counter; /*   949 1 */
u8 nonagle;  /*   950 1 */
u8 keepalive_probes; /*   951 1 */
u32srtt; /*   952 4 */
u32mdev; /*   956 4 */
/* --- cacheline 15 boundary (960 bytes) --- */
u32mdev_max; /*   960 4 */
u32rttvar;   /*   964 4 */
u32rtt_seq;  /*   968 4 */
u32packets_out;  /*   972 4 */
u32left_out; /*   976 4 */
u32retrans_out;  /*   980 4 */
struct tcp_options_received rx_opt;  /*   98424 */
u32snd_ssthresh; /*  1008 4 */
u32snd_cwnd; /*  1012 4 */
u16snd_cwnd_cnt; /*  1016 2 */
u16snd_cwnd_clamp;   /*  1018 2 */
u32snd_cwnd_used;/*  1020 4 */
/* --- cacheline 16 boundary (1024 bytes) --- */
u32snd_cwnd_stamp;   /*  1024 4 */
struct sk_buff_headout_of_order_queue;   /*  102828 */
u32rcv_wnd;  /*  1056 4 */
u32rcv_wup;  /*  1060 4 */
u32write_seq;/*  1064 4 */
u32pushed_seq;   /*  1068 4 */
u32copied_seq;   /*  1072 4 */
struct 

Re: [PATCH] fix limited slow start bug

2007-02-25 Thread Arnaldo Carvalho de Melo

On 2/25/07, Arnaldo Carvalho de Melo [EMAIL PROTECTED] wrote:

On 2/25/07, David Miller [EMAIL PROTECTED] wrote:
 From: Roger While [EMAIL PROTECTED]
 Date: Sun, 25 Feb 2007 09:55:34 +0100

  Was anything done about size/member alignment of struct tcp_sock per
  mail from last year  -
  http://marc.theaimsgroup.com/?l=linux-netdevm=114318857102290w=2
 
  (I have no idea what current size is)

 Nothing has been done yet but I've been thinking about it a lot
 over the past year and I've had some discussions with other
 developers such as Arnaldo.

 It's just a matter of me being backlogged, so I never get to
 it as often as I would like :)

Attached goes a current (DaveM's net-2.6 git tree build) pahole
picture of tcp_sock on UP, 32bits, summary:

}; /* size: 1288, cachelines: 21 */
   /* last cacheline: 8 bytes */

and for the really curious, take a look at:

http://oops.ghostprotocols.net:81/acme/dwarves/tcp_sock.pahole.expand_types.txt

All the types are expanded, makes a pretty big picture :-)



And looking at it I saw I have Ingo's timer debugging option enabled,
which makes struct timer_list a bit bigger:

struct timer_list {
   struct list_head {
   struct list_head * next;   /* 0 4 */
   struct list_head * prev;   /* 4 4 */
   } entry;   /* 0 8 */
   long unsigned int expires; /* 8 4 */
   void   (*function)(long unsigned int); /*12 4 */
   long unsigned int data;/*16 4 */
   struct tvec_t_base_s * base;   /*20 4 */

   void * start_site; /*24 4 */
   char   start_comm[16]; /*2816 */
   intstart_pid;  /*44 4 */
} icsk_delack_timer; /*   66848 */

The three last members are related to debugging, so discount 24 bytes
times 3, as there are tree struct timer_list inside struct tcp_sock.

- Arnaldo
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] fix limited slow start bug

2007-02-22 Thread John Heffner
 Fix arithmetic order bug in limited slow start.  The subtraction needs to be
done before snd_cwnd is incremented.

Signed-off-by: John Heffner [EMAIL PROTECTED]

---
commit 244e7411d99443df7b7ae849ba6ebbec4c2342bc
tree e6d5985a22448f59f8bef393542e1d5497ee5684
parent 97033fa201705e6cfc68ce66f34ede3277c3d645
author John Heffner [EMAIL PROTECTED] Thu, 22 Feb 2007 13:54:01 -0500
committer John Heffner [EMAIL PROTECTED] Thu, 22 Feb 2007 13:54:01 -0500

 net/ipv4/tcp_cong.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c
index 7fd2910..a0c894f 100644
--- a/net/ipv4/tcp_cong.c
+++ b/net/ipv4/tcp_cong.c
@@ -303,9 +303,9 @@ void tcp_slow_start(struct tcp_sock *tp)

tp-snd_cwnd_cnt += cnt;
while (tp-snd_cwnd_cnt = tp-snd_cwnd) {
+   tp-snd_cwnd_cnt -= tp-snd_cwnd;
if (tp-snd_cwnd  tp-snd_cwnd_clamp)
tp-snd_cwnd++;
-   tp-snd_cwnd_cnt -= tp-snd_cwnd;
}
 }
 EXPORT_SYMBOL_GPL(tcp_slow_start);


Re: [PATCH] fix limited slow start bug

2007-02-22 Thread Ilpo Järvinen
On Thu, 22 Feb 2007, John Heffner wrote:

 diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c
 index 7fd2910..a0c894f 100644
 --- a/net/ipv4/tcp_cong.c
 +++ b/net/ipv4/tcp_cong.c
 @@ -303,9 +303,9 @@ void tcp_slow_start(struct tcp_sock *tp)
   
   tp-snd_cwnd_cnt += cnt;
   while (tp-snd_cwnd_cnt = tp-snd_cwnd) {
 + tp-snd_cwnd_cnt -= tp-snd_cwnd;
   if (tp-snd_cwnd  tp-snd_cwnd_clamp)
   tp-snd_cwnd++;
 - tp-snd_cwnd_cnt -= tp-snd_cwnd;
   }
 }
 EXPORT_SYMBOL_GPL(tcp_slow_start);


ACK. I was going to track this down tomorrow as I noticed strange
behavior during slow start while testing tcp-2.6 today but I don't have to 
anymore :-). The problem I saw is now obvious, whole congestion control 
was practically disabled due to underflow of the unsigned type that occurs 
without this patch, causing TCP to be limited only by the receiver 
advertized window during slow-start.


BTW, while looking this patch, I noticed that snd_cwnd_clamp is only u16 
while snd_cwnd is u32, which seems rather strange since snd_cwnd is being 
limited by the clamp value here and there?!?! And tcp_highspeed.c is 
clearly assuming even more than this (but the problem is hidden as 
snd_cwnd_clamp is feed back to the min_t and the used 32-bit constant 
could be safely cut to 16-bits anyway):

  tp-snd_cwnd_clamp = min_t(u32, tp-snd_cwnd_clamp, 0x/128);

Has the type being changed somewhere in the past or why is this so?


-- 
 i.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fix limited slow start bug

2007-02-22 Thread John Heffner

Ilpo Järvinen wrote:
BTW, while looking this patch, I noticed that snd_cwnd_clamp is only u16 
while snd_cwnd is u32, which seems rather strange since snd_cwnd is being 
limited by the clamp value here and there?!?! And tcp_highspeed.c is 
clearly assuming even more than this (but the problem is hidden as 
snd_cwnd_clamp is feed back to the min_t and the used 32-bit constant 
could be safely cut to 16-bits anyway):


  tp-snd_cwnd_clamp = min_t(u32, tp-snd_cwnd_clamp, 0x/128);

Has the type being changed somewhere in the past or why is this so?


It's been that way as long as I can remember.  It's always been a 
mystery to me as well.  I suspect the tcp_highspeed code is that way 
because this patch originally came out of the Web100-patched kernel, 
which at one point was using a 32 bit snd_cwnd_clamp IIRC.


I think it's not unreasonable to change clamp to 32 bits now, since with 
1500 byte packets, this corresponds to a max cwnd of ~94MB.  This is 
pretty big, but we are currently right at this limit with 10 GigE.


  -John
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fix limited slow start bug

2007-02-22 Thread David Miller

John, what tree did you diff this against?

I can tell you didn't create this patch against anything
actually in any of my trees, because:

diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c
index 7fd2910..a0c894f 100644
--- a/net/ipv4/tcp_cong.c
+++ b/net/ipv4/tcp_cong.c
@@ -303,9 +303,9 @@ void tcp_slow_start(struct tcp_sock *tp)

tp-snd_cwnd_cnt += cnt;

See that line right before the snd_cwnd_cnt increment?

There is a tab there in your patch on that empty line.

Yoshifuji eliminated all extraneous trailing whitespaces,
and spaces that should be tabs, across the entire networking,
several weeks ago.  Including the tab on that empty line in
your patch.

This means that you applied the YeaH and your own patch to
another source tree, perhaps 2.6.20 or similar, and then
generated your fix against that.

Please don't do things like that.  Instead please produce patches
against the tree they will end up being applied to so that they will
apply cleanly for me.

I can't believe how much time I spend getting people to produce
correct patches :-/

Anyways, I applied this by hand, but next time I definitely
am not going to.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fix limited slow start bug

2007-02-22 Thread David Miller
From: John Heffner [EMAIL PROTECTED]
Date: Thu, 22 Feb 2007 16:52:03 -0500

 I think it's not unreasonable to change clamp to 32 bits now, since with 
 1500 byte packets, this corresponds to a max cwnd of ~94MB.  This is 
 pretty big, but we are currently right at this limit with 10 GigE.

Agreed, and done in tcp-2.6.git as below.

What should we do about that 65535 assignment in hybla?

commit cedfa95566512730202bb4abed5d9118e74bab30
Author: David S. Miller [EMAIL PROTECTED]
Date:   Thu Feb 22 22:52:59 2007 -0800

[TCP]: Make snd_cwnd_clamp a u32.

Signed-off-by: David S. Miller [EMAIL PROTECTED]

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 415193e..18a468d 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -302,7 +302,7 @@ struct tcp_sock {
u32 snd_ssthresh;   /* Slow start size threshold*/
u32 snd_cwnd;   /* Sending congestion window*/
u16 snd_cwnd_cnt;   /* Linear increase counter  */
-   u16 snd_cwnd_clamp; /* Do not allow snd_cwnd to grow above this */
+   u32 snd_cwnd_clamp; /* Do not allow snd_cwnd to grow above this */
u32 snd_cwnd_used;
u32 snd_cwnd_stamp;
 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html