Re: [RFC] TCP limited slow start

2006-06-05 Thread Stephen Hemminger
On Sat, 03 Jun 2006 12:46:57 -0400
John Heffner [EMAIL PROTECTED] wrote:

 Stephen Hemminger wrote:
  Rolled my sleeve's up and gave this a try...
  
  This is a implementation of Sally Floyd's Limited Slow Start
  for Large Congestion Windows.
 
 Limited slow start is useful as a work-around for bottleneck queues that 
 are inappropriately short.  I don't think it's good to run it all the 
 time by default (with a max_ssthresh  infinity), because it slows down 
 flows on healthy paths, and introduces another non-scalable parameter to 
 TCP.
 
 I see it as potentially useful as a per-route parameter, where you set 
 it deliberately to work around some known problematic path.  A sysctl 
 with a default value of infinity might be okay as well.
 
 Practically speaking, we've had this in the Web100 patch for a long time 
 (and still do, look for WAD_MaxSsthresh), but I've never found it all 
 that useful.  If the bottleneck queue is too short, you usually end up 
 getting screwed other ways too.
 
-John

I moved it off to tcp_highspeed.c only. That is seems appropriate because
that is where you put the related RFC.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] TCP limited slow start

2006-06-03 Thread John Heffner

Stephen Hemminger wrote:

Rolled my sleeve's up and gave this a try...

This is a implementation of Sally Floyd's Limited Slow Start
for Large Congestion Windows.


Limited slow start is useful as a work-around for bottleneck queues that 
are inappropriately short.  I don't think it's good to run it all the 
time by default (with a max_ssthresh  infinity), because it slows down 
flows on healthy paths, and introduces another non-scalable parameter to 
TCP.


I see it as potentially useful as a per-route parameter, where you set 
it deliberately to work around some known problematic path.  A sysctl 
with a default value of infinity might be okay as well.


Practically speaking, we've had this in the Web100 patch for a long time 
(and still do, look for WAD_MaxSsthresh), but I've never found it all 
that useful.  If the bottleneck queue is too short, you usually end up 
getting screwed other ways too.


  -John
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC] TCP limited slow start

2006-06-02 Thread Stephen Hemminger
Rolled my sleeve's up and gave this a try...

This is a implementation of Sally Floyd's Limited Slow Start
for Large Congestion Windows.

Summary from RFC:
   Limited Slow-Start introduces a parameter, max_ssthresh, and
   modifies the slow-start mechanism for values of the congestion window
   where cwnd is greater than max_ssthresh.  That is, during Slow-
   Start, when

  cwnd = max_ssthresh,

   cwnd is increased by one MSS (MAXIMUM SEGMENT SIZE) for every
   arriving ACK (acknowledgement) during slow-start, as is always the
   case.  During Limited Slow-Start, when

  max_ssthresh  cwnd = ssthresh,

   the invariant is maintained so that the congestion window is
   increased during slow-start by at most max_ssthresh/2 MSS per round-
   trip time.  This is done as follows:

  For each arriving ACK in slow-start:
If (cwnd = max_ssthresh)
   cwnd += MSS;
else
   K = int(cwnd/(0.5 max_ssthresh));
   cwnd += int(MSS/K);

   Thus, during Limited Slow-Start the window is increased by 1/K MSS
   for each arriving ACK, for K = int(cwnd/(0.5 max_ssthresh)), instead
   of by 1 MSS as in standard slow-start [RFC2581].

---

 Documentation/networking/ip-sysctl.txt |8 +-
 include/linux/sysctl.h |1 +
 include/net/tcp.h  |1 +
 net/ipv4/sysctl_net_ipv4.c |8 ++
 net/ipv4/tcp_cong.c|   46 
 net/ipv4/tcp_input.c   |1 +
 6 files changed, 47 insertions(+), 18 deletions(-)

0884f45c9f21c50dd9117b2fc02bf5436be3c3bf
diff --git a/Documentation/networking/ip-sysctl.txt 
b/Documentation/networking/ip-sysctl.txt
index f12007b..9869298 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -103,9 +103,15 @@ TCP variables: 
 
 tcp_abc - INTEGER
Controls Appropriate Byte Count defined in RFC3465. If set to
-   0 then does congestion avoid once per ack. 1 is conservative
+   0 then does congestion avoid once per ack. 1 (default) is conservative
value, and 2 is more agressive.
 
+tcp_limited_ssthresh - INTEGER
+   Controls the increase of the congestion window during slow start as
+   defined in RFC3742. The purpose is to slow the growth of the congestion
+   window on high delay networks where agressive growth can cause losses
+   of 1000's of packets. Default is 100 packets.
+
 tcp_syn_retries - INTEGER
Number of times initial SYNs for an active TCP connection attempt
will be retransmitted. Should not be higher than 255. Default value
diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index 76eaeff..a455165 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -403,6 +403,7 @@ enum
NET_TCP_MTU_PROBING=113,
NET_TCP_BASE_MSS=114,
NET_IPV4_TCP_WORKAROUND_SIGNED_WINDOWS=115,
+   NET_TCP_LIMITED_SSTHRESH=116,
 };
 
 enum {
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 575636f..3a14861 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -225,6 +225,7 @@ extern int sysctl_tcp_abc;
 extern int sysctl_tcp_mtu_probing;
 extern int sysctl_tcp_base_mss;
 extern int sysctl_tcp_workaround_signed_windows;
+extern int sysctl_tcp_limited_ssthresh;
 
 extern atomic_t tcp_memory_allocated;
 extern atomic_t tcp_sockets_allocated;
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 6b6c3ad..d1358d3 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -688,6 +688,14 @@ #endif
.mode   = 0644,
.proc_handler   = proc_dointvec
},
+   {
+   .ctl_name   = NET_TCP_LIMITED_SSTHRESH,
+   .procname   = tcp_max_ssthresh,
+   .data   = sysctl_tcp_limited_ssthresh,
+   .maxlen = sizeof(int),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec,
+   },
{ .ctl_name = 0 }
 };
 
diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c
index 857eefc..a27c792 100644
--- a/net/ipv4/tcp_cong.c
+++ b/net/ipv4/tcp_cong.c
@@ -180,25 +180,37 @@ int tcp_set_congestion_control(struct so
  */
 void tcp_slow_start(struct tcp_sock *tp)
 {
-   if (sysctl_tcp_abc) {
-   /* RFC3465: Slow Start
-* TCP sender SHOULD increase cwnd by the number of
-* previously unacknowledged bytes ACKed by each incoming
-* acknowledgment, provided the increase is not more than L
-*/
-   if (tp-bytes_acked  tp-mss_cache)
-   return;
-
-   /* We MAY increase by 2 if discovered delayed ack */
-   if (sysctl_tcp_abc  1  tp-bytes_acked  2*tp-mss_cache) {
-   if (tp-snd_cwnd  tp-snd_cwnd_clamp)
-   tp-snd_cwnd++;
-   }
+   /* RFC3465: