Re: TCP Pacing

2006-09-19 Thread Daniele Lacamera
On Saturday 16 September 2006 02:41, Xiaoliang (David) Wei wrote:
 Hi Daniel,
  Thank you very much for the patch and the reference summary. For
 the implementation and performance of pacing, I just have a few
 suggestion/clarification/support data:
 
 First, in the implementation in the patch, it seems to me that the
 paced gap is set to RTT/cwnd in CA_Open state. This might leads to
 slower growth of congestion window. See our simulation results at
 http://www.cs.caltech.edu/~weixl/technical/ns2pacing/index.html

Hi David.
Thank you for having pointed this out. It's very interesting.
Actually, we already knew about delta calculation based on expected 
congestion window. Carlo and Rosario had studied this matter in deep, 
considered different options (VTC04), and came to the conclusion that 
thought rtt/cwnd solution slows down cwnd growth, the difference is not 
very relevant, so we have preferred to implement the most conservative 
one, which is sligthly simpler and fits all the congestion control 
algorithms.

 If this pacing algorithm is used in a network with non-paced flows, it
 is very likely to lose its fair share of bandwidth. So, I'd suggest to
 use a pacing gap of RTT/max{cwnd+1, min{ssthresh, cwnd*2}} where
 max{cwnd+1, min{ssthresh, cwnd*2}} is the expected congestion window
 in *next RTT*. As shown in the our simulation results, this
 modification will eliminate the slower growth problem.

The expected window value depends on the congestion control algorithm, 
the formula you suggests fits newreno increments, while other congstion 
control options may have different cwnd_expected. 
I don't exclude we may have an additional 'plug' in each congestion 
control module for pacing delta calculation, if this makes sense.

  * Main reference:
  -
 This main reference (Infocom2000) does not say pacing is always
 improving. In fact, it says pacing might have poorer performance, in
 term of average throughput, than non-paced flows in many cases.

I have proposed to use this as main reference because it gives a general 
description and it is one of the most cited about the argument.

 For TCP Hybla, we do have some simulation results to show that Hybla
 introduces huge loss in start-up phase, if pacing is not deployed.
 (Look for the figures of hybla at
 http://www.cs.caltech.edu/~weixl/technical/ns2linux/index.html)

The initial overshoot in Hybla is a known issue. Cwnd increments are 
calculated on RTT, so the longer the RTT, the bigger the initial 
burstiness. 
The way to counteract overshoot is to use both pacing and an initial 
slow-start threshold estimation, like that one suggested in [1]. 
This is what we have been using for all our tests, in simulation (ns-2),  
emulation (linux+nistnet), and satellites. (See [2] and [3]).
As for pacing, I'd like to have bandwidth estimation feature included in 
future versions of hybla module as soon as we can consider it stable.

HAND.

-- 
Daniele
 
[1] J. Hoe, Improving the Start-up Behavior of a Congestion Control 
Scheme for TCP, ACM Sigcomm, Aug. 1996.

[2] C. Caini, R. Firrincieli and D. Lacamera, TCP Performance 
Evaluation: Methodologies and Applications, SPECTS 2005, Philadelphia, 
July 2005.

[3] C. Caini, R. Firrincieli and D. Lacamera, A Linux Based Multi TCP 
Implementation for Experimental Evaluation of TCP Enhancements, SPECTS 
2005, Philadelphia, July 2005.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TCP Pacing

2006-09-13 Thread Daniele Lacamera
On Tuesday 12 September 2006 23:26, Ian McDonald wrote:
 Where is the published research? If you are going to mention research
 you need URLs to papers and please put this in source code too so
 people can check.

I added the main reference to the code. I am going to give you all the 
pointers on this research, mainly recent congestion control proposals 
that include pacing.

 I agree with Arnaldo's comments and also would add I don't like having
 to select 1000 as HZ unit. Something is wrong if you need this as I
 can run higher resolution timers without having to do this

I removed that select in Kconfig, I agree it doesn't make sense at all, 
for portability. However, pacing works with 1ms resolution, so maybe 
a depends HZ_1000 is still required. (How do you run 1ms timers with 
HZ!=1000?)

Thanks

-- 
Daniele Lacamera
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TCP Pacing

2006-09-13 Thread Daniele Lacamera
On Wednesday 13 September 2006 05:41, Stephen Hemminger wrote:
 Pacing in itself isn't a bad idea, but:
cut
 * Since it is most useful over long delay links, maybe it should be a 
route parameter.

What does this mean? Should I move the sysctl switch elsewhere?

A new (cleaner) patch follows.
Thanks to you all for your attention  advices.

Signed-off by: Daniele Lacamera [EMAIL PROTECTED]
--- 

diff -ruN linux-2.6.18-rc6/Documentation/networking/ip-sysctl.txt linux-pacing/Documentation/networking/ip-sysctl.txt
--- linux-2.6.18-rc6/Documentation/networking/ip-sysctl.txt	2006-09-04 04:19:48.0 +0200
+++ linux-pacing/Documentation/networking/ip-sysctl.txt	2006-09-12 16:38:14.0 +0200
@@ -369,6 +369,12 @@
 	be timed out after an idle period.
 	Default: 1
 
+tcp_pacing - BOOLEAN
+	If set, enable time-based TCP segment sending, instead of normal
+	ack-based sending. A software timer is set every time a new ack 
+	is received, then packets are spreaded across round-trip time.
+	Default: 0
+
 IP Variables:
 
 ip_local_port_range - 2 INTEGERS
diff -ruN linux-2.6.18-rc6/include/linux/sysctl.h linux-pacing/include/linux/sysctl.h
--- linux-2.6.18-rc6/include/linux/sysctl.h	2006-09-04 04:19:48.0 +0200
+++ linux-pacing/include/linux/sysctl.h	2006-09-12 18:13:38.0 +0200
@@ -411,6 +411,7 @@
 	NET_IPV4_TCP_WORKAROUND_SIGNED_WINDOWS=115,
 	NET_TCP_DMA_COPYBREAK=116,
 	NET_TCP_SLOW_START_AFTER_IDLE=117,
+	NET_TCP_PACING=118,
 };
 
 enum {
diff -ruN linux-2.6.18-rc6/include/linux/tcp.h linux-pacing/include/linux/tcp.h
--- linux-2.6.18-rc6/include/linux/tcp.h	2006-09-04 04:19:48.0 +0200
+++ linux-pacing/include/linux/tcp.h	2006-09-12 16:45:32.0 +0200
@@ -356,6 +356,17 @@
 		__u32		  probe_seq_start;
 		__u32		  probe_seq_end;
 	} mtu_probe;
+	
+#ifdef CONFIG_TCP_PACING
+/* TCP Pacing structure */
+	struct {
+		struct timer_list timer;
+		__u16   count;
+		__u16   burst;
+		__u8lock;
+		__u8delta;
+	} pacing;
+#endif
 };
 
 static inline struct tcp_sock *tcp_sk(const struct sock *sk)
diff -ruN linux-2.6.18-rc6/include/net/tcp.h linux-pacing/include/net/tcp.h
--- linux-2.6.18-rc6/include/net/tcp.h	2006-09-04 04:19:48.0 +0200
+++ linux-pacing/include/net/tcp.h	2006-09-13 09:33:02.0 +0200
@@ -449,6 +449,58 @@
 extern unsigned int tcp_sync_mss(struct sock *sk, u32 pmtu);
 extern unsigned int tcp_current_mss(struct sock *sk, int large);
 
+#ifdef CONFIG_TCP_PACING
+extern int sysctl_tcp_pacing;
+extern void __tcp_pacing_recalc_delta(struct sock *sk);
+extern void __tcp_pacing_reset_timer(struct sock *sk);
+static inline void tcp_pacing_recalc_delta(struct sock *sk)
+{
+	if (sysctl_tcp_pacing) 
+		__tcp_pacing_recalc_delta(sk);
+}
+
+static inline void tcp_pacing_reset_timer(struct sock *sk)
+{
+	if (sysctl_tcp_pacing)
+		__tcp_pacing_reset_timer(sk);
+}
+
+static inline void tcp_pacing_lock_tx(struct sock *sk)
+{
+	if (sysctl_tcp_pacing) 
+		tcp_sk(sk)-pacing.lock=1;
+}
+
+static inline int tcp_pacing_locked(struct sock *sk)
+{
+	if (sysctl_tcp_pacing)
+		return tcp_sk(sk)-pacing.lock;
+	else 
+		return 0;
+}
+
+static inline int tcp_pacing_enabled(struct sock *sk)
+{
+	return sysctl_tcp_pacing;
+}
+
+static inline int tcp_pacing_burst(struct sock *sk)
+{
+	if (sysctl_tcp_pacing)
+		return tcp_sk(sk)-pacing.burst;
+	else 
+		return 0;
+}
+	
+#else
+static inline void tcp_pacing_recalc_delta(struct sock *sk) {};
+static inline void tcp_pacing_reset_timer(struct sock *sk) {};
+static inline void tcp_pacing_lock_tx(struct sock *sk) {};
+#define tcp_pacing_locked(sk) 0 
+#define tcp_pacing_enabled(sk) 0
+#define tcp_pacing_burst(sk) 0
+#endif
+
 /* tcp.c */
 extern void tcp_get_info(struct sock *, struct tcp_info *);
 
diff -ruN linux-2.6.18-rc6/net/ipv4/Kconfig linux-pacing/net/ipv4/Kconfig
--- linux-2.6.18-rc6/net/ipv4/Kconfig	2006-09-04 04:19:48.0 +0200
+++ linux-pacing/net/ipv4/Kconfig	2006-09-13 09:31:27.0 +0200
@@ -572,6 +572,19 @@
 	loss packets.
 	See http://www.ntu.edu.sg/home5/ZHOU0022/papers/CPFu03a.pdf
 
+config TCP_PACING
+	bool TCP Pacing
+	depends on EXPERIMENTAL
+	default n
+	---help---
+	Many researchers have observed that TCP's congestion control mechanisms 
+	can lead to bursty traffic flows on modern high-speed networks, with a 
+	negative impact on overall network efficiency. A proposed solution to this 
+	problem is to evenly space, or pace, data sent into the network over an 
+	entire round-trip time, so that data is not sent in a burst.
+	To enable this feature, please refer to Documentation/networking/ip-sysctl.txt.
+	If unsure, say N.
+	
 endmenu
 
 config TCP_CONG_BIC
diff -ruN linux-2.6.18-rc6/net/ipv4/sysctl_net_ipv4.c linux-pacing/net/ipv4/sysctl_net_ipv4.c
--- linux-2.6.18-rc6/net/ipv4/sysctl_net_ipv4.c	2006-09-04 04:19:48.0 +0200
+++ linux-pacing/net/ipv4/sysctl_net_ipv4.c	2006-09-12 18:33:36.0 +0200
@@ -697,6 +697,16 @@
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec
 	},
+#ifdef CONFIG_TCP_PACING
+	

Re: TCP Pacing

2006-09-13 Thread Daniele Lacamera
As Ian requested, some of the papers published about Pacing.

* Main reference:
-

Amit Aggarwal, Stefan Savage, and Thomas Anderson.   
Understanding the Performance of TCP Pacing. 
Proc. of the IEEE INFOCOM 2000 Conference on Computer Communications, 
March 2000, pages 1157 - 1165.

* IETF RFC:
---

H. Balakrishnan, V. N. Padmanabhan, G. Fairhurst, M.Sooriyabandara, 
TCP Performance Implications of Network Path Asymmetry,
IETF RFC 3449, December 2002.

* Other works:
--

C. Caini, R. Firrincieli, 
Packet spreading techniques to avoid bursty traffic in satellite TCP 
connections. 
In. Proceedings of IEEE VTC Spring ’04

Q.Ye, M.H. MacGregor, 
Pacing to Improve SACK TCP Resilience, 
2005 Spring Simulation Multiconference, DASD, pp. 39-45, 2005

Young-Soo Choi; Kong-Won Lee; Tae-Man Han; You-Ze Cho;
High-speed TCP protocols with pacing for fairness and TCP friendliness
TENCON 2004. 2004 IEEE Region 10 Conference
Volume C,  21-24 Nov. 2004 Page(s):13 - 16 Vol. 3 

Razdan, A.; Nandan, A.; Wang, R.; Sanadidi, M.Y.; Gerla, M.;
Enhancing TCP performance in networks with small buffers
Computer Communications and Networks, 2002. Proceedings. Eleventh 
International Conference on
14-16 Oct. 2002 Page(s):39 - 44 

Moonsoo Kang; Jeonghoon Mo;
On the Pacing Technique for High Speed TCP Over Optical Burst Switching 
Networks
Advanced Communication Technology, 2006. ICACT 2006. The 8th 
International Conference
Volume 2, 20-22 Feb. 2006 Page(s):1421 - 1424 

Mark Allman, Ethan Blanton
Notes on burst mitigation for transport protocols, 
April 2005 ACM SIGCOMM Computer Communication Review, Volume 35 Issue 2 
Publisher: ACM Press

J. Kulik, R. Coulter, D. Rockwell, and C. Partridge, 
A Simulation Study of Paced TCP, 
BBN Technical Memorandum No. 1218, 1999. 


* Congestion Control proposals that include Pacing:
---

G. Marfia, C. Palazzi, G. Pau, M. Gerla, M. Sanadidi and M. Roccetti, 
TCP Libra: Balancing Flows over Heterogeneous
Propagation Scenarios, submitted for publication in Proceedings of ACM 
SIGMETRICS/Performance 2006. 

Carlo Caini and Rosario Firrincieli, 
TCP Hybla: a TCP enhancement for heterogeneous networks, 
INTERNATIONAL JOURNAL OF SATELLITE COMMUNICATIONS AND NETWORKING 2004; 
22:547–566 

D. X. Wei, C. Jin, S. H. Low and S. Hegde.
FAST TCP: motivation, architecture, algorithms, 
performance IEEE/ACM Transactions on Networking, to appear in 2007 


-- 
Daniele Lacamera
root{at}danielinux.net
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TCP Pacing

2006-09-13 Thread Ian McDonald

On 9/13/06, Daniele Lacamera [EMAIL PROTECTED] wrote:

On Tuesday 12 September 2006 23:26, Ian McDonald wrote:
 Where is the published research? If you are going to mention research
 you need URLs to papers and please put this in source code too so
 people can check.

I added the main reference to the code. I am going to give you all the
pointers on this research, mainly recent congestion control proposals
that include pacing.


Thanks


 I agree with Arnaldo's comments and also would add I don't like having
 to select 1000 as HZ unit. Something is wrong if you need this as I
 can run higher resolution timers without having to do this

I removed that select in Kconfig, I agree it doesn't make sense at all,
for portability. However, pacing works with 1ms resolution, so maybe
a depends HZ_1000 is still required. (How do you run 1ms timers with
HZ!=1000?)


The HZ refers to time slices per second mostly for user space - e.g.
how often to task switch.
--
Ian McDonald
Web: http://wand.net.nz/~iam4
Blog: http://imcdnzl.blogspot.com
WAND Network Research Group
Department of Computer Science
University of Waikato
New Zealand
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TCP Pacing

2006-09-13 Thread Stephen Hemminger
On Wed, 13 Sep 2006 10:18:31 +0200
Daniele Lacamera [EMAIL PROTECTED] wrote:

 On Wednesday 13 September 2006 05:41, Stephen Hemminger wrote:
  Pacing in itself isn't a bad idea, but:
 cut
  * Since it is most useful over long delay links, maybe it should be a 
 route parameter.


Look into rtnetlink and how we keep track of route metrics, and
add a new per route state variable. Need to update iproute2 (ip command)
as well.

 What does this mean? Should I move the sysctl switch elsewhere?
 
 A new (cleaner) patch follows.
 Thanks to you all for your attention  advices.
 
 Signed-off by: Daniele Lacamera [EMAIL PROTECTED]

You may also want into look into high resolution timer (hrtimer),
the resolution doesn't get finer than HZ without using -rt patches.
But the ktime interface is cleaner than the normal timer math.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TCP Pacing

2006-09-12 Thread Arnaldo Carvalho de Melo

On 9/12/06, Daniele Lacamera [EMAIL PROTECTED] wrote:

Hello,

Please let me insist once again on the importance of adding a TCP Pacing
mechanism in our TCP, as many people are including this algorithm in
their congestion control proposals. Recent researches have found out
that it really can help improving performance in different scenarios,
like satellites and long-delay high-speed channels (100ms RTT, Gbit).
Hybla module itself is cripple without this feature in its natural
scenario.

The following patch is totally non-invasive: it has a config option and
a sysctl switch, both turned off by default. When the config option is
enabled, it adds only 6B to the tcp_sock.

Signed-off by: Daniele Lacamera [EMAIL PROTECTED]
---


diff -ruN linux-2.6.18-rc6/net/ipv4/tcp_input.c
linux-pacing/net/ipv4/tcp_input.c
--- linux-2.6.18-rc6/net/ipv4/tcp_input.c   2006-09-04 04:19:48.0 
+0200
+++ linux-pacing/net/ipv4/tcp_input.c   2006-09-12 17:11:38.0 +0200
@@ -2569,6 +2569,11 @@
tcp_cong_avoid(sk, ack, seq_rtt, prior_in_flight, 1);
}

Without getting into the merits of the pacing technique:

+#ifdef CONFIG_TCP_PACING
+   if(sysctl_tcp_pacing)
+   tcp_pacing_recalc_delta(sk);
+#endif

Please rewrite the patch so as to avoid adding that many #ifdefs to
the common code, replacing above code with:

tcp_pacing_recalc_delta(sk);

That is defined in a header (net/tcp.h) as:

#ifdef CONFIG_TCP_PACING
extern void __tcp_pacing_recalc_delta(struct sock *sk);
extern int sysctl_tcp_pacing;

static inline void tcp_pacing_recalc_delta(struct sock *sk)
{
   if (sysctl_tcp_pacing) /* notice the space after ( */
  __tcp_pacing_recalc_delta(sk);
}
#else
static inline void tcp_pacing_recalc_delta(struct sock *sk) {};
#endif

Thanks,

- Arnaldo
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TCP Pacing

2006-09-12 Thread Ian McDonald

On 9/13/06, Daniele Lacamera [EMAIL PROTECTED] wrote:

Hello,

Please let me insist once again on the importance of adding a TCP Pacing
mechanism in our TCP, as many people are including this algorithm in
their congestion control proposals. Recent researches have found out
that it really can help improving performance in different scenarios,
like satellites and long-delay high-speed channels (100ms RTT, Gbit).
Hybla module itself is cripple without this feature in its natural
scenario.


Where is the published research? If you are going to mention research
you need URLs to papers and please put this in source code too so
people can check.


The following patch is totally non-invasive: it has a config option and
a sysctl switch, both turned off by default. When the config option is
enabled, it adds only 6B to the tcp_sock.


I agree with Arnaldo's comments and also would add I don't like having
to select 1000 as HZ unit. Something is wrong if you need this as I
can run higher resolution timers without having to do this

Haven't reviewed the rest of the code or tested.

Ian
--
Ian McDonald
Web: http://wand.net.nz/~iam4
Blog: http://imcdnzl.blogspot.com
WAND Network Research Group
Department of Computer Science
University of Waikato
New Zealand
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TCP Pacing

2006-09-12 Thread Stephen Hemminger
On Tue, 12 Sep 2006 19:58:21 +0200
Daniele Lacamera [EMAIL PROTECTED] wrote:

 Hello,
 
 Please let me insist once again on the importance of adding a TCP Pacing 
 mechanism in our TCP, as many people are including this algorithm in 
 their congestion control proposals. Recent researches have found out 
 that it really can help improving performance in different scenarios, 
 like satellites and long-delay high-speed channels (100ms RTT, Gbit). 
 Hybla module itself is cripple without this feature in its natural 
 scenario. 
 
 The following patch is totally non-invasive: it has a config option and 
 a sysctl switch, both turned off by default. When the config option is 
 enabled, it adds only 6B to the tcp_sock.

Yes, but tcp_sock is already greater than 1024 on 64 bit, and needs
a diet.

 
 Signed-off by: Daniele Lacamera [EMAIL PROTECTED]

Pacing in itself isn't a bad idea, but:
  * Code needs to follow standard whitespace rules
- blanks around operators   
- blank after keyword
- Avoid (needless) paraenthesis
Bad:
   if( (state==TCP_CA_Recovery) (tp-snd_cwnd 
tp-snd_ssthresh))
window=(tp-snd_ssthresh)3;
Good:
if (state == TCP_CA_Recovery  tp-snd_cwnd  tp-snd_ssthresh)
window = tp-snd_ssthresh  3;

  * Since it is most useful over long delay links, maybe it should
be a route parameter.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html