Re: [PATCH 2/2][TCP] YeAH-TCP: limited slow start exported function

2007-02-22 Thread Angelo P. Castellani

Ok, I've missed this bit in the submitting patches documentation..
however sorry for that.

2007/2/22, David Miller [EMAIL PROTECTED]:

Please never submit patches like this, submit the infrastructure
FIRST, then submit the stuff that uses it.  When a sequence of patches
is applied, in sequence, the tree should build properly (even with all
available new options enabled) at each step along the way.

Otherwise we have the situation we have now, in that YeaH is in my
tree but doesn't build successfully.

What I'm going to do to fix this, is yank YeaH implementation out
of my tree, add this second patch first, then add the YeaH patch
back.

Please never do this again.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2][TCP] YeAH-TCP: limited slow start exported function

2007-02-20 Thread Angelo P. Castellani

John Heffner ha scritto:
Sorry for the confusion.  The patch I attached to my message was 
compile-tested only.
Well I've read your reply by night and I haven't seen that you attached 
a patch. Sorry for that.


Kind regards,
Angelo
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2][TCP] YeAH-TCP: limited slow start exported function

2007-02-19 Thread Angelo P. Castellani

Forgot the patch..

Angelo P. Castellani ha scritto:

From: Angelo P. Castellani [EMAIL PROTECTED]

RFC3742: limited slow start

See http://www.ietf.org/rfc/rfc3742.txt

Signed-off-by: Angelo P. Castellani [EMAIL PROTECTED]
---

To allow code reutilization I've added the limited slow start 
procedure as an exported symbol of linux tcp congestion control.


On large BDP networks canonical slow start should be avoided because 
it requires large packet losses to converge, whereas at lower BDPs 
slow start and limited slow start are identical. Large BDP is defined 
through the max_ssthresh variable.


I think limited slow start could safely replace the canonical slow 
start procedure in Linux.


Regards,
Angelo P. Castellani

p.s.: in the attached patch is added an exported function currently 
used only by YeAH TCP


include/net/tcp.h   |1 +
net/ipv4/tcp_cong.c |   23 +++
2 files changed, 24 insertions(+)




diff -uprN linux-2.6.20-a/include/net/tcp.h linux-2.6.20-c/include/net/tcp.h
--- linux-2.6.20-a/include/net/tcp.h	2007-02-04 19:44:54.0 +0100
+++ linux-2.6.20-c/include/net/tcp.h	2007-02-19 10:54:10.0 +0100
@@ -669,6 +669,7 @@ extern void tcp_get_allowed_congestion_c
 extern int tcp_set_allowed_congestion_control(char *allowed);
 extern int tcp_set_congestion_control(struct sock *sk, const char *name);
 extern void tcp_slow_start(struct tcp_sock *tp);
+extern void tcp_limited_slow_start(struct tcp_sock *tp);
 
 extern struct tcp_congestion_ops tcp_init_congestion_ops;
 extern u32 tcp_reno_ssthresh(struct sock *sk);
diff -uprN linux-2.6.20-a/net/ipv4/tcp_cong.c linux-2.6.20-c/net/ipv4/tcp_cong.c
--- linux-2.6.20-a/net/ipv4/tcp_cong.c	2007-02-04 19:44:54.0 +0100
+++ linux-2.6.20-c/net/ipv4/tcp_cong.c	2007-02-19 10:54:10.0 +0100
@@ -297,6 +297,29 @@ void tcp_slow_start(struct tcp_sock *tp)
 }
 EXPORT_SYMBOL_GPL(tcp_slow_start);
 
+void tcp_limited_slow_start(struct tcp_sock *tp)
+{
+	/* RFC3742: limited slow start
+	 * the window is increased by 1/K MSS for each arriving ACK,
+	 * for K = int(cwnd/(0.5 max_ssthresh))
+	 */
+
+	const int max_ssthresh = 100;
+
+	if (max_ssthresh  0  tp-snd_cwnd  max_ssthresh) {
+		u32 k = max(tp-snd_cwnd / (max_ssthresh  1), 1U);
+		if (++tp-snd_cwnd_cnt = k) {
+			if (tp-snd_cwnd  tp-snd_cwnd_clamp)
+tp-snd_cwnd++;
+			tp-snd_cwnd_cnt = 0;
+		}
+	} else {
+		if (tp-snd_cwnd  tp-snd_cwnd_clamp)
+			tp-snd_cwnd++;
+	}
+}
+EXPORT_SYMBOL_GPL(tcp_limited_slow_start);
+
 /*
  * TCP Reno congestion control
  * This is special case used for fallback as well.


[PATCH 1/2][TCP] YeAH-TCP: algorithm implementation

2007-02-19 Thread Angelo P. Castellani

From: Angelo P. Castellani [EMAIL PROTECTED]

YeAH-TCP is a sender-side high-speed enabled TCP congestion control 
algorithm, which uses a mixed loss/delay approach to compute the 
congestion window. It's design goals target high efficiency, internal, 
RTT and Reno fairness, resilience to link loss while keeping network 
elements load as low as possible.


For further details look here:
   http://wil.cs.caltech.edu/pfldnet2007/paper/YeAH_TCP.pdf

Signed-off-by: Angelo P. Castellani [EMAIL PROTECTED]

---

This is the YeAH-TCP implementation of the algorithm presented to 
PFLDnet2007 (http://wil.cs.caltech.edu/pfldnet2007/).


Regards,
Angelo P. Castellani

Kconfig|   14 ++
Makefile   |1
tcp_yeah.c |  288 
+

tcp_yeah.h |  134 
4 files changed, 437 insertions(+)

diff -uprN linux-2.6.20-a/net/ipv4/Kconfig linux-2.6.20-b/net/ipv4/Kconfig
--- linux-2.6.20-a/net/ipv4/Kconfig	2007-02-04 19:44:54.0 +0100
+++ linux-2.6.20-b/net/ipv4/Kconfig	2007-02-19 10:52:46.0 +0100
@@ -574,6 +574,20 @@ config TCP_CONG_VENO
 	loss packets.
 	See http://www.ntu.edu.sg/home5/ZHOU0022/papers/CPFu03a.pdf
 
+config TCP_CONG_YEAH
+	tristate YeAH TCP
+	depends on EXPERIMENTAL
+	default n
+	---help---
+	YeAH-TCP is a sender-side high-speed enabled TCP congestion control
+	algorithm, which uses a mixed loss/delay approach to compute the
+	congestion window. It's design goals target high efficiency,
+	internal, RTT and Reno fairness, resilience to link loss while
+	keeping network elements load as low as possible.
+	
+	For further details look here:
+	  http://wil.cs.caltech.edu/pfldnet2007/paper/YeAH_TCP.pdf
+
 choice
 	prompt Default TCP congestion control
 	default DEFAULT_CUBIC
diff -uprN linux-2.6.20-a/net/ipv4/Makefile linux-2.6.20-b/net/ipv4/Makefile
--- linux-2.6.20-a/net/ipv4/Makefile	2007-02-04 19:44:54.0 +0100
+++ linux-2.6.20-b/net/ipv4/Makefile	2007-02-19 10:52:46.0 +0100
@@ -49,6 +49,7 @@ obj-$(CONFIG_TCP_CONG_VEGAS) += tcp_vega
 obj-$(CONFIG_TCP_CONG_VENO) += tcp_veno.o
 obj-$(CONFIG_TCP_CONG_SCALABLE) += tcp_scalable.o
 obj-$(CONFIG_TCP_CONG_LP) += tcp_lp.o
+obj-$(CONFIG_TCP_CONG_YEAH) += tcp_yeah.o
 obj-$(CONFIG_NETLABEL) += cipso_ipv4.o
 
 obj-$(CONFIG_XFRM) += xfrm4_policy.o xfrm4_state.o xfrm4_input.o \
diff -uprN linux-2.6.20-a/net/ipv4/tcp_yeah.c linux-2.6.20-b/net/ipv4/tcp_yeah.c
--- linux-2.6.20-a/net/ipv4/tcp_yeah.c	1970-01-01 01:00:00.0 +0100
+++ linux-2.6.20-b/net/ipv4/tcp_yeah.c	2007-02-19 10:52:46.0 +0100
@@ -0,0 +1,288 @@
+/*
+ *
+ *   YeAH TCP
+ *
+ * For further details look at:
+ *http://wil.cs.caltech.edu/pfldnet2007/paper/YeAH_TCP.pdf
+ *
+ */
+
+#include tcp_yeah.h
+
+/* Default values of the Vegas variables, in fixed-point representation
+ * with V_PARAM_SHIFT bits to the right of the binary point.
+ */
+#define V_PARAM_SHIFT 1
+
+#define TCP_YEAH_ALPHA   80 //lin number of packets queued at the bottleneck
+#define TCP_YEAH_GAMMA1 //lin fraction of queue to be removed per rtt
+#define TCP_YEAH_DELTA3 //log minimum fraction of cwnd to be removed on loss
+#define TCP_YEAH_EPSILON  1 //log maximum fraction to be removed on early decongestion
+#define TCP_YEAH_PHY  8 //lin maximum delta from base
+#define TCP_YEAH_RHO 16 //lin minumum number of consecutive rtt to consider competition on loss
+#define TCP_YEAH_ZETA50 //lin minimum number of state switchs to reset reno_count
+
+#define TCP_SCALABLE_AI_CNT	 100U
+
+/* YeAH variables */
+struct yeah {
+	/* Vegas */
+	u32	beg_snd_nxt;	/* right edge during last RTT */
+	u32	beg_snd_una;	/* left edge  during last RTT */
+	u32	beg_snd_cwnd;	/* saves the size of the cwnd */
+	u8	doing_vegas_now;/* if true, do vegas for this RTT */
+	u16	cntRTT;		/* # of RTTs measured within last RTT */
+	u32	minRTT;		/* min of RTTs measured within last RTT (in usec) */
+	u32	baseRTT;	/* the min of all Vegas RTT measurements seen (in usec) */
+	
+	/* YeAH */
+	u32 lastQ;
+	u32 doing_reno_now;
+
+	u32 reno_count;
+	u32 fast_count;
+
+	u32 pkts_acked;
+};
+
+static void tcp_yeah_init(struct sock *sk)
+{
+	struct tcp_sock *tp = tcp_sk(sk);
+	struct yeah *yeah = inet_csk_ca(sk);
+
+	tcp_vegas_init(sk);
+
+	yeah-doing_reno_now = 0;
+	yeah-lastQ = 0;
+
+	yeah-reno_count = 2;
+
+	/* Ensure the MD arithmetic works.  This is somewhat pedantic,
+	 * since I don't think we will see a cwnd this large. :) */
+	tp-snd_cwnd_clamp = min_t(u32, tp-snd_cwnd_clamp, 0x/128);
+
+}
+
+
+static void tcp_yeah_pkts_acked(struct sock *sk, u32 pkts_acked)
+{
+	const struct inet_connection_sock *icsk = inet_csk(sk);
+	struct yeah *yeah = inet_csk_ca(sk);
+
+	if (icsk-icsk_ca_state == TCP_CA_Open)
+		yeah-pkts_acked = pkts_acked;	
+}
+
+/* 64bit divisor, dividend and result. dynamic precision */
+static inline u64 div64_64(u64 dividend, u64 divisor)
+{
+	u32 d = divisor;
+
+	if (divisor  0xULL

[PATCH 1/2][TCP] YeAH-TCP: algorithm implementation

2007-02-19 Thread Angelo P. Castellani

From: Angelo P. Castellani [EMAIL PROTECTED]

YeAH-TCP is a sender-side high-speed enabled TCP congestion control
algorithm, which uses a mixed loss/delay approach to compute the
congestion window. It's design goals target high efficiency, internal,
RTT and Reno fairness, resilience to link loss while keeping network
elements load as low as possible.

For further details look here:
   http://wil.cs.caltech.edu/pfldnet2007/paper/YeAH_TCP.pdf

Signed-off-by: Angelo P. Castellani [EMAIL PROTECTED]

---

This is the YeAH-TCP implementation of the algorithm presented to
PFLDnet2007 (http://wil.cs.caltech.edu/pfldnet2007/).

Regards,
Angelo P. Castellani

Kconfig|   14 ++
Makefile   |1
tcp_yeah.c |  288
+
tcp_yeah.h |  134 
4 files changed, 437 insertions(+)

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2][TCP] YeAH-TCP: algorithm implementation

2007-02-19 Thread Angelo P. Castellani

The patch.

Angelo P. Castellani ha scritto:

From: Angelo P. Castellani [EMAIL PROTECTED]

YeAH-TCP is a sender-side high-speed enabled TCP congestion control
algorithm, which uses a mixed loss/delay approach to compute the
congestion window. It's design goals target high efficiency, internal,
RTT and Reno fairness, resilience to link loss while keeping network
elements load as low as possible.

For further details look here:
   http://wil.cs.caltech.edu/pfldnet2007/paper/YeAH_TCP.pdf

Signed-off-by: Angelo P. Castellani [EMAIL PROTECTED]

---

This is the YeAH-TCP implementation of the algorithm presented to
PFLDnet2007 (http://wil.cs.caltech.edu/pfldnet2007/).

Regards,
Angelo P. Castellani

Kconfig|   14 ++
Makefile   |1
tcp_yeah.c |  288
+
tcp_yeah.h |  134 
4 files changed, 437 insertions(+)




diff -uprN linux-2.6.20-a/net/ipv4/Kconfig linux-2.6.20-b/net/ipv4/Kconfig
--- linux-2.6.20-a/net/ipv4/Kconfig	2007-02-04 19:44:54.0 +0100
+++ linux-2.6.20-b/net/ipv4/Kconfig	2007-02-19 10:52:46.0 +0100
@@ -574,6 +574,20 @@ config TCP_CONG_VENO
 	loss packets.
 	See http://www.ntu.edu.sg/home5/ZHOU0022/papers/CPFu03a.pdf
 
+config TCP_CONG_YEAH
+	tristate YeAH TCP
+	depends on EXPERIMENTAL
+	default n
+	---help---
+	YeAH-TCP is a sender-side high-speed enabled TCP congestion control
+	algorithm, which uses a mixed loss/delay approach to compute the
+	congestion window. It's design goals target high efficiency,
+	internal, RTT and Reno fairness, resilience to link loss while
+	keeping network elements load as low as possible.
+	
+	For further details look here:
+	  http://wil.cs.caltech.edu/pfldnet2007/paper/YeAH_TCP.pdf
+
 choice
 	prompt Default TCP congestion control
 	default DEFAULT_CUBIC
diff -uprN linux-2.6.20-a/net/ipv4/Makefile linux-2.6.20-b/net/ipv4/Makefile
--- linux-2.6.20-a/net/ipv4/Makefile	2007-02-04 19:44:54.0 +0100
+++ linux-2.6.20-b/net/ipv4/Makefile	2007-02-19 10:52:46.0 +0100
@@ -49,6 +49,7 @@ obj-$(CONFIG_TCP_CONG_VEGAS) += tcp_vega
 obj-$(CONFIG_TCP_CONG_VENO) += tcp_veno.o
 obj-$(CONFIG_TCP_CONG_SCALABLE) += tcp_scalable.o
 obj-$(CONFIG_TCP_CONG_LP) += tcp_lp.o
+obj-$(CONFIG_TCP_CONG_YEAH) += tcp_yeah.o
 obj-$(CONFIG_NETLABEL) += cipso_ipv4.o
 
 obj-$(CONFIG_XFRM) += xfrm4_policy.o xfrm4_state.o xfrm4_input.o \
diff -uprN linux-2.6.20-a/net/ipv4/tcp_yeah.c linux-2.6.20-b/net/ipv4/tcp_yeah.c
--- linux-2.6.20-a/net/ipv4/tcp_yeah.c	1970-01-01 01:00:00.0 +0100
+++ linux-2.6.20-b/net/ipv4/tcp_yeah.c	2007-02-19 10:52:46.0 +0100
@@ -0,0 +1,288 @@
+/*
+ *
+ *   YeAH TCP
+ *
+ * For further details look at:
+ *http://wil.cs.caltech.edu/pfldnet2007/paper/YeAH_TCP.pdf
+ *
+ */
+
+#include tcp_yeah.h
+
+/* Default values of the Vegas variables, in fixed-point representation
+ * with V_PARAM_SHIFT bits to the right of the binary point.
+ */
+#define V_PARAM_SHIFT 1
+
+#define TCP_YEAH_ALPHA   80 //lin number of packets queued at the bottleneck
+#define TCP_YEAH_GAMMA1 //lin fraction of queue to be removed per rtt
+#define TCP_YEAH_DELTA3 //log minimum fraction of cwnd to be removed on loss
+#define TCP_YEAH_EPSILON  1 //log maximum fraction to be removed on early decongestion
+#define TCP_YEAH_PHY  8 //lin maximum delta from base
+#define TCP_YEAH_RHO 16 //lin minumum number of consecutive rtt to consider competition on loss
+#define TCP_YEAH_ZETA50 //lin minimum number of state switchs to reset reno_count
+
+#define TCP_SCALABLE_AI_CNT	 100U
+
+/* YeAH variables */
+struct yeah {
+	/* Vegas */
+	u32	beg_snd_nxt;	/* right edge during last RTT */
+	u32	beg_snd_una;	/* left edge  during last RTT */
+	u32	beg_snd_cwnd;	/* saves the size of the cwnd */
+	u8	doing_vegas_now;/* if true, do vegas for this RTT */
+	u16	cntRTT;		/* # of RTTs measured within last RTT */
+	u32	minRTT;		/* min of RTTs measured within last RTT (in usec) */
+	u32	baseRTT;	/* the min of all Vegas RTT measurements seen (in usec) */
+	
+	/* YeAH */
+	u32 lastQ;
+	u32 doing_reno_now;
+
+	u32 reno_count;
+	u32 fast_count;
+
+	u32 pkts_acked;
+};
+
+static void tcp_yeah_init(struct sock *sk)
+{
+	struct tcp_sock *tp = tcp_sk(sk);
+	struct yeah *yeah = inet_csk_ca(sk);
+
+	tcp_vegas_init(sk);
+
+	yeah-doing_reno_now = 0;
+	yeah-lastQ = 0;
+
+	yeah-reno_count = 2;
+
+	/* Ensure the MD arithmetic works.  This is somewhat pedantic,
+	 * since I don't think we will see a cwnd this large. :) */
+	tp-snd_cwnd_clamp = min_t(u32, tp-snd_cwnd_clamp, 0x/128);
+
+}
+
+
+static void tcp_yeah_pkts_acked(struct sock *sk, u32 pkts_acked)
+{
+	const struct inet_connection_sock *icsk = inet_csk(sk);
+	struct yeah *yeah = inet_csk_ca(sk);
+
+	if (icsk-icsk_ca_state == TCP_CA_Open)
+		yeah-pkts_acked = pkts_acked;	
+}
+
+/* 64bit divisor, dividend and result. dynamic precision */
+static inline u64 div64_64(u64 dividend, u64 divisor)
+{
+	u32 d

Re: [PATCH 2/2][TCP] YeAH-TCP: limited slow start exported function

2007-02-19 Thread Angelo P. Castellani

John Heffner ha scritto:
Note the patch is compile-tested only!  I can do some real testing if 
you'd like to apply this Dave.
The date you read on the patch is due to the fact I've splitted this 
patchset into 2 diff files. This isn't compile-tested only, I've used 
this piece of code for about 3 months.


However more testing is good and welcome.

Regards,
Angelo P. Castellani
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[TCP] window update during recovery (continuing on window reduction)

2006-07-05 Thread Angelo P. Castellani

During a recovery, we should always reduce send window if the host is
notifying a window reduction.

This is needed because in the recovery phase the host requires to
buffer the packets between the beginning of the recovery and the data
we're sending forward with ssthresh window and sacked_out count.

So it isn't buffering the in_flight packets, but a number of packets
that could be much higher.

If the host asks a window reduction its buffer is filling up, and if
we ignore the reduction, when the buffer is full all the packets
arriving to the host will be dropped.

When the first packet is dropped the host will begin asking for a zero
window; this request will be eventually granted, however in the
meantime we will have lost a full window of packets.

Anyway we could neither set the FLAG_WIN_UPDATE, otherwise the ack
will not be considered a DUPACK and when using Reno the sacked_out
count will not be updated.

Regards,
Angelo P. Castellani
diff -urd linux-2.6.16-orig/net/ipv4/tcp_input.c linux-2.6.16-winupdate/net/ipv4/tcp_input.c
--- linux-2.6.16-orig/net/ipv4/tcp_input.c	2006-05-16 14:53:02.0 +0200
+++ linux-2.6.16-winupdate/net/ipv4/tcp_input.c	2006-07-05 15:38:08.0 +0200
@@ -2365,12 +2365,44 @@
 {
 	int flag = 0;
 	u32 nwin = ntohs(skb-h.th-window);
+	struct inet_connection_sock *icsk = inet_csk(sk);
+	int silent_update = 0;
 
 	if (likely(!skb-h.th-syn))
 		nwin = tp-rx_opt.snd_wscale;
 
-	if (tcp_may_update_window(tp, ack, ack_seq, nwin)) {
-		flag |= FLAG_WIN_UPDATE;
+	/*
+	 * During a recovery, we should always reduce send window if the host
+	 * is notifying a window reduction.
+	 *
+	 * This is needed because in the recovery phase the host requires
+	 * to buffer the packets between the beginning of the recovery
+	 * and the data we're sending forward with ssthresh window and
+	 * sacked_out count.
+	 *
+	 * So it isn't buffering the in_flight packets, but a number of packets
+	 * that could be much higher.
+	 *
+	 * If the host asks a window reduction its buffer is filling up, and if we
+	 * ignore the reduction, when the buffer is full all the packets arriving
+	 * to the host will be dropped.
+	 *
+	 * When the first packet is dropped the host will begin asking for a zero
+	 * window; this request will be eventually granted, however in the
+	 * meantime we will have lost a full window of packets.
+	 *
+	 * Anyway we could neither set the FLAG_WIN_UPDATE, otherwise the
+	 * ack will not be considered a DUPACK and when using Reno the 
+	 * sacked_out count will not be updated.
+	 *
+	 */
+	if (icsk-icsk_ca_state == TCP_CA_Recovery 
+		nwin  tp-snd_wnd)
+		silent_update = 1;
+
+	if (silent_update || tcp_may_update_window(tp, ack, ack_seq, nwin)) {
+		if (!silent_update)
+			flag |= FLAG_WIN_UPDATE;
 		tcp_update_wl(tp, ack, ack_seq);
 
 		if (tp-snd_wnd != nwin) {


[TCP] rfc strict recovery

2006-07-05 Thread Angelo P. Castellani

In this patch there is a collection of changes useful to have linux
tcp recovery close to rfc standard.

The linux kernel using this patch defaults to linux standard recovery,
when net.ipv4.tcp_rfcstrict_recovery=1 the changes in this patch are
enabled.

I've already discussed something like this here and I don't think the
best thing is to discuss whether linux should be or not close to
rfc's.

I send this patch only because in my tests this changes have helped me
to obtain the expected (and more performing) results.

The rfcstrict recovery reveals extremely more performing during Reno
recovery of large network drops.

Regards,
Angelo P. Castellani
diff -urd linux-2.6.16-orig/include/linux/sysctl.h linux-2.6.16-stdrecovery/include/linux/sysctl.h
--- linux-2.6.16-orig/include/linux/sysctl.h	2006-05-16 14:53:02.0 +0200
+++ linux-2.6.16-stdrecovery/include/linux/sysctl.h	2006-07-05 17:05:24.0 +0200
@@ -397,6 +397,7 @@
 	NET_TCP_CONG_CONTROL=110,
 	NET_TCP_ABC=111,
 	NET_IPV4_IPFRAG_MAX_DIST=112,
+	NET_TCP_RFCSTRICT_RECOVERY,
 };
 
 enum {
diff -urd linux-2.6.16-orig/include/net/tcp.h linux-2.6.16-stdrecovery/include/net/tcp.h
--- linux-2.6.16-orig/include/net/tcp.h	2006-05-16 14:53:02.0 +0200
+++ linux-2.6.16-stdrecovery/include/net/tcp.h	2006-07-05 17:06:41.0 +0200
@@ -219,6 +219,7 @@
 extern int sysctl_tcp_moderate_rcvbuf;
 extern int sysctl_tcp_tso_win_divisor;
 extern int sysctl_tcp_abc;
+extern int sysctl_tcp_rfcstrict_recovery;
 
 extern atomic_t tcp_memory_allocated;
 extern atomic_t tcp_sockets_allocated;
diff -urd linux-2.6.16-orig/net/ipv4/sysctl_net_ipv4.c linux-2.6.16-stdrecovery/net/ipv4/sysctl_net_ipv4.c
--- linux-2.6.16-orig/net/ipv4/sysctl_net_ipv4.c	2006-05-16 14:53:02.0 +0200
+++ linux-2.6.16-stdrecovery/net/ipv4/sysctl_net_ipv4.c	2006-07-05 17:08:31.0 +0200
@@ -664,6 +664,14 @@
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec,
 	},
+	{
+		.ctl_name	= NET_TCP_RFCSTRICT_RECOVERY,
+		.procname	= tcp_rfcstrict_recovery,
+		.data		= sysctl_tcp_rfcstrict_recovery,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec,
+	},
 
 	{ .ctl_name = 0 }
 };
diff -urd linux-2.6.16-orig/net/ipv4/tcp_input.c linux-2.6.16-stdrecovery/net/ipv4/tcp_input.c
--- linux-2.6.16-orig/net/ipv4/tcp_input.c	2006-05-16 14:53:02.0 +0200
+++ linux-2.6.16-stdrecovery/net/ipv4/tcp_input.c	2006-07-05 17:26:41.0 +0200
@@ -91,6 +91,8 @@
 int sysctl_tcp_moderate_rcvbuf = 1;
 int sysctl_tcp_abc = 1;
 
+int sysctl_tcp_rfcstrict_recovery = 0;
+
 #define FLAG_DATA		0x01 /* Incoming frame contained data.		*/
 #define FLAG_WIN_UPDATE		0x02 /* Incoming ACK was a window update.	*/
 #define FLAG_DATA_ACKED		0x04 /* This ACK acknowledged new data.		*/
@@ -854,7 +856,8 @@
   const int ts)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
-	if (metric  tp-reordering) {
+	// rfcstrict: no dynamic reordering metric
+	if (!sysctl_tcp_rfcstrict_recovery  metric  tp-reordering) {
 		tp-reordering = min(TCP_MAX_REORDERING, metric);
 
 		/* This exciting event is worth to be remembered. 8) */
@@ -1784,7 +1787,10 @@
 		/* Hold old state until something *above* high_seq
 		 * is ACKed. For Reno it is MUST to prevent false
 		 * fast retransmits (RFC2582). SACK TCP is safe. */
-		tcp_moderate_cwnd(tp);
+		// rfcstrict: a tcp_moderate_cwnd at the end of the recovery
+		//already solves any kind of burstiness issue
+		if (!sysctl_tcp_rfcstrict_recovery)
+			tcp_moderate_cwnd(tp);
 		return 1;
 	}
 	tcp_set_ca_state(sk, TCP_CA_Open);
@@ -2039,6 +2045,10 @@
 			if (!(flagFLAG_ECE))
 tp-prior_ssthresh = tcp_current_ssthresh(sk);
 			tp-snd_ssthresh = icsk-icsk_ca_ops-ssthresh(sk);
+			// rfcstrict: standard rule cwnd = ssthresh + 3
+			// note: tp-reordering segments have been already added to sacked_out
+			if (sysctl_tcp_rfcstrict_recovery)
+tp-snd_cwnd = tp-snd_ssthresh;
 			TCP_ECN_queue_cwr(tp);
 		}
 
@@ -2049,7 +2059,9 @@
 
 	if (is_dupack || tcp_head_timedout(sk, tp))
 		tcp_update_scoreboard(sk, tp);
-	tcp_cwnd_down(sk);
+	// rfcstrict: no further reduction other than cwnd = sshthresh + 3
+	if (!sysctl_tcp_rfcstrict_recovery || icsk-icsk_ca_state == TCP_CA_CWR)
+		tcp_cwnd_down(sk);
 	tcp_xmit_retransmit_queue(sk);
 }
 


[PATCH] TCP Compound: dwnd=0 on ssthresh

2006-07-05 Thread Angelo P. Castellani

In the TCP Compound article used as a reference for the implementation, we read:
If a retransmission timeout occurs, dwnd should be reset to zero and
the delay-based component is disabled. at page 5 of
ftp://ftp.research.microsoft.com/pub/tr/TR-2005-86.pdf

The attached patch implements this requirement.

Regards,
Angelo P. Castellani
diff -urd a/net/ipv4/tcp_compound.c b/net/ipv4/tcp_compound.c
--- a/net/ipv4/tcp_compound.c	2006-07-05 17:19:28.0 +0200
+++ b/net/ipv4/tcp_compound.c	2006-07-05 17:20:42.0 +0200
@@ -221,12 +221,9 @@
 		tcp_compound_init(sk);
 }
 
-static void tcp_compound_cong_avoid(struct sock *sk, u32 ack,
-u32 seq_rtt, u32 in_flight, int flag)
-{
+static inline void tcp_compound_synch(struct sock *sk) {
 	struct tcp_sock *tp = tcp_sk(sk);
 	struct compound *vegas = inet_csk_ca(sk);
-	u8 inc = 0;
 
 	if (vegas-cwnd + vegas-dwnd  tp-snd_cwnd) {
 		if (vegas-cwnd  tp-snd_cwnd || vegas-dwnd  tp-snd_cwnd) {
@@ -234,9 +231,19 @@
 			vegas-dwnd = 0;
 		} else
 			vegas-cwnd = tp-snd_cwnd - vegas-dwnd;
-
 	}
 
+}
+
+static void tcp_compound_cong_avoid(struct sock *sk, u32 ack,
+u32 seq_rtt, u32 in_flight, int flag)
+{
+	struct tcp_sock *tp = tcp_sk(sk);
+	struct compound *vegas = inet_csk_ca(sk);
+	u8 inc = 0;
+	
+	tcp_compound_synch(sk);
+
 	if (!tcp_is_cwnd_limited(sk, in_flight))
 		return;
 
@@ -415,9 +422,21 @@
 	}
 }
 
+static u32 tcp_compound_ssthresh(struct sock *sk) {
+	struct tcp_sock *tp = tcp_sk(sk);
+	struct compound *vegas = inet_csk_ca(sk);
+
+	tcp_compound_synch(sk);
+	
+	vegas-dwnd = 0;
+	tp-snd_cwnd = vegas-cwnd;
+	
+	return tcp_reno_ssthresh(sk);
+}
+
 static struct tcp_congestion_ops tcp_compound = {
 	.init		= tcp_compound_init,
-	.ssthresh	= tcp_reno_ssthresh,
+	.ssthresh	= tcp_compound_ssthresh,
 	.cong_avoid	= tcp_compound_cong_avoid,
 	.min_cwnd	= tcp_reno_min_cwnd,
 	.rtt_sample	= tcp_compound_rtt_calc,


[PATCH] TCP Compound

2006-05-25 Thread Angelo P. Castellani

From: Angelo P. Castellani [EMAIL PROTECTED]

TCP Compound is a sender-side only change to TCP that uses
a mixed Reno/Vegas approach to calculate the cwnd.

For further details look here:
 ftp://ftp.research.microsoft.com/pub/tr/TR-2005-86.pdf

Signed-off-by: Angelo P. Castellani [EMAIL PROTECTED]

---

This new revision of the TCP Compound implementation fixes some issues
present in the previous patch and has been reverted to a stand-alone
file (thanks to Stephen suggestion).

Regards,
Angelo P. Castellani
diff --git a/net/ipv4/Kconfig b/net/ipv4/Kconfig
index 326676b..e577eb8 100644
--- a/net/ipv4/Kconfig
+++ b/net/ipv4/Kconfig
@@ -542,6 +542,16 @@ config TCP_CONG_LP
 	``fair share`` of bandwidth as targeted by TCP.
 	See http://www-ece.rice.edu/networks/TCP-LP/
 
+config TCP_CONG_COMPOUND
+	tristate TCP Compound
+	depends on EXPERIMENTAL
+	default n
+	---help---
+	TCP Compound is a sender-side only change to TCP that uses 
+	a mixed Reno/Vegas approach to calculate the cwnd.
+	For further details look here:
+	  ftp://ftp.research.microsoft.com/pub/tr/TR-2005-86.pdf
+
 endmenu
 
 config TCP_CONG_BIC
diff --git a/net/ipv4/Makefile b/net/ipv4/Makefile
index 5c65487..f0697c4 100644
--- a/net/ipv4/Makefile
+++ b/net/ipv4/Makefile
@@ -43,6 +43,7 @@ obj-$(CONFIG_TCP_CONG_HTCP) += tcp_htcp.
 obj-$(CONFIG_TCP_CONG_VEGAS) += tcp_vegas.o
 obj-$(CONFIG_TCP_CONG_SCALABLE) += tcp_scalable.o
 obj-$(CONFIG_TCP_CONG_LP) += tcp_lp.o
+obj-$(CONFIG_TCP_CONG_COMPOUND) += tcp_compound.o
 
 obj-$(CONFIG_XFRM) += xfrm4_policy.o xfrm4_state.o xfrm4_input.o \
 		  xfrm4_output.o










/*
 * TCP Vegas congestion control
 *
 * This is based on the congestion detection/avoidance scheme described in
 *Lawrence S. Brakmo and Larry L. Peterson.
 *TCP Vegas: End to end congestion avoidance on a global internet.
 *IEEE Journal on Selected Areas in Communication, 13(8):1465--1480,
 *October 1995. Available from:
 *	ftp://ftp.cs.arizona.edu/xkernel/Papers/jsac.ps
 *
 * See http://www.cs.arizona.edu/xkernel/ for their implementation.
 * The main aspects that distinguish this implementation from the
 * Arizona Vegas implementation are:
 *   o We do not change the loss detection or recovery mechanisms of
 * Linux in any way. Linux already recovers from losses quite well,
 * using fine-grained timers, NewReno, and FACK.
 *   o To avoid the performance penalty imposed by increasing cwnd
 * only every-other RTT during slow start, we increase during
 * every RTT during slow start, just like Reno.
 *   o Largely to allow continuous cwnd growth during slow start,
 * we use the rate at which ACKs come back as the actual
 * rate, rather than the rate at which data is sent.
 *   o To speed convergence to the right rate, we set the cwnd
 * to achieve the right (actual) rate when we exit slow start.
 *   o To filter out the noise caused by delayed ACKs, we use the
 * minimum RTT sample observed during the last RTT to calculate
 * the actual rate.
 *   o When the sender re-starts from idle, it waits until it has
 * received ACKs for an entire flight of new data before making
 * a cwnd adjustment decision. The original Vegas implementation
 * assumed senders never went idle.
 * 
 *
 *   TCP Compound based on TCP Vegas
 *
 *   further details can be found here:
 *  ftp://ftp.research.microsoft.com/pub/tr/TR-2005-86.pdf
 */

#include linux/config.h
#include linux/mm.h
#include linux/module.h
#include linux/skbuff.h
#include linux/inet_diag.h

#include net/tcp.h

/* Default values of the Vegas variables, in fixed-point representation
 * with V_PARAM_SHIFT bits to the right of the binary point.
 */
#define V_PARAM_SHIFT 1

#define TCP_COMPOUND_ALPHA  3U
#define TCP_COMPOUND_BETA   1U
#define TCP_COMPOUND_KAPPA_POW  3
#define TCP_COMPOUND_KAPPA_NSQRT2
#define TCP_COMPOUND_GAMMA 30
#define TCP_COMPOUND_ZETA   1

/* TCP compound variables */
struct compound {
	u32 beg_snd_nxt;	/* right edge during last RTT */
	u32 beg_snd_una;	/* left edge  during last RTT */
	u32 beg_snd_cwnd;	/* saves the size of the cwnd */
	u8 doing_vegas_now;	/* if true, do vegas for this RTT */
	u16 cntRTT;		/* # of RTTs measured within last RTT */
	u32 minRTT;		/* min of RTTs measured within last RTT (in usec) */
	u32 baseRTT;		/* the min of all Vegas RTT measurements seen (in usec) */

	u32 cwnd;
	u32 dwnd;
};

/* There are several situations when we must re-start Vegas:
 *
 *  o when a connection is established
 *  o after an RTO
 *  o after fast recovery
 *  o when we send a packet and there is no outstanding
 *unacknowledged data (restarting an idle connection)
 *
 * In these circumstances we cannot do a Vegas calculation at the
 * end of the first RTT, because any calculation we do is using
 * stale info -- both the saved cwnd and congestion feedback are
 * stale.
 *
 * Instead we must wait until the completion of an RTT during
 * which we actually receive ACKs

[PATCH] reno sacked_out count fix

2006-05-16 Thread Angelo P. Castellani

Using NewReno, if a sk_buff is timed out and is accounted as lost_out,
it should also be removed from the sacked_out.

This is necessary because recovery using NewReno fast retransmit could
take up to a lot RTTs and the sk_buff RTO can expire without actually
being really lost.

left_out = sacked_out + lost_out
in_flight = packets_out - left_out + retrans_out

Using NewReno without this patch, on very large network losses,
left_out becames bigger than packets_out + retrans_out (!!).

For this reason unsigned integer in_flight overflows to 2^32 - something.

Regards,
Angelo P. Castellani
diff -urd ../linux-2.6.16-orig/net/ipv4/tcp_input.c ./net/ipv4/tcp_input.c
--- ../linux-2.6.16-orig/net/ipv4/tcp_input.c	2006-05-15 15:42:39.0 +0200
+++ ./net/ipv4/tcp_input.c	2006-05-16 11:18:21.0 +0200
@@ -1676,6 +1676,8 @@
 			if (!(TCP_SKB_CB(skb)-sackedTCPCB_TAGBITS)) {
 TCP_SKB_CB(skb)-sacked |= TCPCB_LOST;
 tp-lost_out += tcp_skb_pcount(skb);
+if (IsReno(tp)) 
+	tcp_remove_reno_sacks(sk, tp, tcp_skb_pcount(skb) + 1);
 
 /* clear xmit_retrans hint */
 if (tp-retransmit_skb_hint 













[PATCH] Enabling standard compliant behaviour in the Linux TCP implementation

2006-05-16 Thread Angelo P. Castellani

Hi all,
I'm a student doing a thesis about TCP performance over high BDP links
and so about congestion control in TCP.

To do this work I've built a testbed using the latest Linux release (2.6.16).

Anyway I've came across the fact that Linux TCP implementation isn't
fully standard compliant.

Even if the choices made to be different from the standards have been
wisely thought, I think that should be possible to disable these
Linuxisms.

Surely this can help all the people using Linux to evaluate a
standard environment.

Moreover it permits to compare the proscons of the Linux
implementation against the standard one.

So I've disabled the first two Linux-specific mechanisms I've found:
- rate halving
- dynamic reordering metric (dynamic DupThresh)

These're disabled as long as net.ipv4.tcp_standard_compliant=1 (default: 0).

However I don't exclude that there're more non-standard details, so I
hope that somebody can point some more differences between Linux and
the RFCs.

Moreover NewReno is implemented in the Impatient variant (resets the
retransmit timer only on the first partial ack), with
net.ipv4.tcp_slow_but_steady=1 (default: 0) you can enable the
Slow-but-Steady variant (resets the retransmit timer every partial
ack).

Hoping that this can be useful, I attach the patch.

Regards,
Angelo P. Castellani
diff -urd ../linux-2.6.16-orig/include/linux/sysctl.h ./include/linux/sysctl.h
--- ../linux-2.6.16-orig/include/linux/sysctl.h	2006-05-16 14:53:02.0 +0200
+++ ./include/linux/sysctl.h	2006-05-16 14:54:50.0 +0200
@@ -397,6 +397,8 @@
 	NET_TCP_CONG_CONTROL=110,
 	NET_TCP_ABC=111,
 	NET_IPV4_IPFRAG_MAX_DIST=112,
+	NET_TCP_STANDARD_COMPLIANT,
+	NET_TCP_SLOW_BUT_STEADY,
 };
 
 enum {
diff -urd ../linux-2.6.16-orig/include/net/tcp.h ./include/net/tcp.h
--- ../linux-2.6.16-orig/include/net/tcp.h	2006-05-16 14:53:02.0 +0200
+++ ./include/net/tcp.h	2006-05-16 14:55:43.0 +0200
@@ -219,6 +219,8 @@
 extern int sysctl_tcp_moderate_rcvbuf;
 extern int sysctl_tcp_tso_win_divisor;
 extern int sysctl_tcp_abc;
+extern int sysctl_tcp_standard_compliant;
+extern int sysctl_tcp_slow_but_steady;
 
 extern atomic_t tcp_memory_allocated;
 extern atomic_t tcp_sockets_allocated;
diff -urd ../linux-2.6.16-orig/net/ipv4/sysctl_net_ipv4.c ./net/ipv4/sysctl_net_ipv4.c
--- ../linux-2.6.16-orig/net/ipv4/sysctl_net_ipv4.c	2006-05-16 14:53:02.0 +0200
+++ ./net/ipv4/sysctl_net_ipv4.c	2006-05-16 14:57:23.0 +0200
@@ -664,6 +664,22 @@
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec,
 	},
+	{
+		.ctl_name	= NET_TCP_STANDARD_COMPLIANT,
+		.procname	= tcp_standard_compliant,
+		.data		= sysctl_tcp_standard_compliant,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec,
+	},
+	{
+		.ctl_name	= NET_TCP_SLOW_BUT_STEADY,
+		.procname	= tcp_slow_but_steady,
+		.data		= sysctl_tcp_slow_but_steady,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec,
+	},
 
 	{ .ctl_name = 0 }
 };
diff -urd ../linux-2.6.16-orig/net/ipv4/tcp_input.c ./net/ipv4/tcp_input.c
--- ../linux-2.6.16-orig/net/ipv4/tcp_input.c	2006-05-16 14:53:02.0 +0200
+++ ./net/ipv4/tcp_input.c	2006-05-16 14:52:43.0 +0200
@@ -81,6 +81,7 @@
 int sysctl_tcp_dsack = 1;
 int sysctl_tcp_app_win = 31;
 int sysctl_tcp_adv_win_scale = 2;
+int sysctl_tcp_standard_compliant = 0;
 
 int sysctl_tcp_stdurg;
 int sysctl_tcp_rfc1337;
@@ -854,7 +855,7 @@
   const int ts)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
-	if (metric  tp-reordering) {
+	if (!sysctl_tcp_standard_compliant  metric  tp-reordering) {
 		tp-reordering = min(TCP_MAX_REORDERING, metric);
 
 		/* This exciting event is worth to be remembered. 8) */
@@ -2039,6 +2040,8 @@
 			if (!(flagFLAG_ECE))
 tp-prior_ssthresh = tcp_current_ssthresh(sk);
 			tp-snd_ssthresh = icsk-icsk_ca_ops-ssthresh(sk);
+			if (sysctl_tcp_standard_compliant)
+tp-snd_cwnd = tp-snd_ssthresh; // tp-reordering segments should've been already added to sacked_out
 			TCP_ECN_queue_cwr(tp);
 		}
 
@@ -2049,7 +2052,8 @@
 
 	if (is_dupack || tcp_head_timedout(sk, tp))
 		tcp_update_scoreboard(sk, tp);
-	tcp_cwnd_down(sk);
+	if (!sysctl_tcp_standard_compliant || icsk-icsk_ca_state == TCP_CA_CWR)
+		tcp_cwnd_down(sk);
 	tcp_xmit_retransmit_queue(sk);
 }
 
diff -urd ../linux-2.6.16-orig/net/ipv4/tcp_output.c ./net/ipv4/tcp_output.c
--- ../linux-2.6.16-orig/net/ipv4/tcp_output.c	2006-05-16 14:53:02.0 +0200
+++ ./net/ipv4/tcp_output.c	2006-05-16 14:52:43.0 +0200
@@ -51,6 +51,9 @@
  */
 int sysctl_tcp_tso_win_divisor = 3;
 
+/* Enables the Slow-but-Steady variant of NewReno (cfr. RFC2582 Ch.4) */
+int sysctl_tcp_slow_but_steady = 0;
+
 static void update_send_head(struct sock *sk, struct tcp_sock *tp,
 			 struct sk_buff *skb)
 {
@@ -1604,7 +1607,7 @@
 	else
 		NET_INC_STATS_BH(LINUX_MIB_TCPSLOWSTARTRETRANS);
 
-	if (skb ==
+	if (sysctl_tcp_slow_but_steady || skb ==
 	skb_peek(sk-sk_write_queue

Re: tcp compound

2006-05-10 Thread Angelo P. Castellani

On 5/9/06, Stephen Hemminger [EMAIL PROTECTED] wrote:

Moved discussion over to netdev mailing list..

Could you export symbols in tcp_vegas (and change config dependencies) to
allow code reuse rather than having to copy/paste everything from vegas?


I hope I've done that properly.


tcp_compound.patch.gz
Description: GNU Zip compressed data