Re: TCP Pacing
On Saturday 16 September 2006 02:41, Xiaoliang (David) Wei wrote: Hi Daniel, Thank you very much for the patch and the reference summary. For the implementation and performance of pacing, I just have a few suggestion/clarification/support data: First, in the implementation in the patch, it seems to me that the paced gap is set to RTT/cwnd in CA_Open state. This might leads to slower growth of congestion window. See our simulation results at http://www.cs.caltech.edu/~weixl/technical/ns2pacing/index.html Hi David. Thank you for having pointed this out. It's very interesting. Actually, we already knew about delta calculation based on expected congestion window. Carlo and Rosario had studied this matter in deep, considered different options (VTC04), and came to the conclusion that thought rtt/cwnd solution slows down cwnd growth, the difference is not very relevant, so we have preferred to implement the most conservative one, which is sligthly simpler and fits all the congestion control algorithms. If this pacing algorithm is used in a network with non-paced flows, it is very likely to lose its fair share of bandwidth. So, I'd suggest to use a pacing gap of RTT/max{cwnd+1, min{ssthresh, cwnd*2}} where max{cwnd+1, min{ssthresh, cwnd*2}} is the expected congestion window in *next RTT*. As shown in the our simulation results, this modification will eliminate the slower growth problem. The expected window value depends on the congestion control algorithm, the formula you suggests fits newreno increments, while other congstion control options may have different cwnd_expected. I don't exclude we may have an additional 'plug' in each congestion control module for pacing delta calculation, if this makes sense. * Main reference: - This main reference (Infocom2000) does not say pacing is always improving. In fact, it says pacing might have poorer performance, in term of average throughput, than non-paced flows in many cases. I have proposed to use this as main reference because it gives a general description and it is one of the most cited about the argument. For TCP Hybla, we do have some simulation results to show that Hybla introduces huge loss in start-up phase, if pacing is not deployed. (Look for the figures of hybla at http://www.cs.caltech.edu/~weixl/technical/ns2linux/index.html) The initial overshoot in Hybla is a known issue. Cwnd increments are calculated on RTT, so the longer the RTT, the bigger the initial burstiness. The way to counteract overshoot is to use both pacing and an initial slow-start threshold estimation, like that one suggested in [1]. This is what we have been using for all our tests, in simulation (ns-2), emulation (linux+nistnet), and satellites. (See [2] and [3]). As for pacing, I'd like to have bandwidth estimation feature included in future versions of hybla module as soon as we can consider it stable. HAND. -- Daniele [1] J. Hoe, Improving the Start-up Behavior of a Congestion Control Scheme for TCP, ACM Sigcomm, Aug. 1996. [2] C. Caini, R. Firrincieli and D. Lacamera, TCP Performance Evaluation: Methodologies and Applications, SPECTS 2005, Philadelphia, July 2005. [3] C. Caini, R. Firrincieli and D. Lacamera, A Linux Based Multi TCP Implementation for Experimental Evaluation of TCP Enhancements, SPECTS 2005, Philadelphia, July 2005. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP Pacing
On Tuesday 12 September 2006 23:26, Ian McDonald wrote: Where is the published research? If you are going to mention research you need URLs to papers and please put this in source code too so people can check. I added the main reference to the code. I am going to give you all the pointers on this research, mainly recent congestion control proposals that include pacing. I agree with Arnaldo's comments and also would add I don't like having to select 1000 as HZ unit. Something is wrong if you need this as I can run higher resolution timers without having to do this I removed that select in Kconfig, I agree it doesn't make sense at all, for portability. However, pacing works with 1ms resolution, so maybe a depends HZ_1000 is still required. (How do you run 1ms timers with HZ!=1000?) Thanks -- Daniele Lacamera [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP Pacing
On Wednesday 13 September 2006 05:41, Stephen Hemminger wrote: Pacing in itself isn't a bad idea, but: cut * Since it is most useful over long delay links, maybe it should be a route parameter. What does this mean? Should I move the sysctl switch elsewhere? A new (cleaner) patch follows. Thanks to you all for your attention advices. Signed-off by: Daniele Lacamera [EMAIL PROTECTED] --- diff -ruN linux-2.6.18-rc6/Documentation/networking/ip-sysctl.txt linux-pacing/Documentation/networking/ip-sysctl.txt --- linux-2.6.18-rc6/Documentation/networking/ip-sysctl.txt 2006-09-04 04:19:48.0 +0200 +++ linux-pacing/Documentation/networking/ip-sysctl.txt 2006-09-12 16:38:14.0 +0200 @@ -369,6 +369,12 @@ be timed out after an idle period. Default: 1 +tcp_pacing - BOOLEAN + If set, enable time-based TCP segment sending, instead of normal + ack-based sending. A software timer is set every time a new ack + is received, then packets are spreaded across round-trip time. + Default: 0 + IP Variables: ip_local_port_range - 2 INTEGERS diff -ruN linux-2.6.18-rc6/include/linux/sysctl.h linux-pacing/include/linux/sysctl.h --- linux-2.6.18-rc6/include/linux/sysctl.h 2006-09-04 04:19:48.0 +0200 +++ linux-pacing/include/linux/sysctl.h 2006-09-12 18:13:38.0 +0200 @@ -411,6 +411,7 @@ NET_IPV4_TCP_WORKAROUND_SIGNED_WINDOWS=115, NET_TCP_DMA_COPYBREAK=116, NET_TCP_SLOW_START_AFTER_IDLE=117, + NET_TCP_PACING=118, }; enum { diff -ruN linux-2.6.18-rc6/include/linux/tcp.h linux-pacing/include/linux/tcp.h --- linux-2.6.18-rc6/include/linux/tcp.h 2006-09-04 04:19:48.0 +0200 +++ linux-pacing/include/linux/tcp.h 2006-09-12 16:45:32.0 +0200 @@ -356,6 +356,17 @@ __u32 probe_seq_start; __u32 probe_seq_end; } mtu_probe; + +#ifdef CONFIG_TCP_PACING +/* TCP Pacing structure */ + struct { + struct timer_list timer; + __u16 count; + __u16 burst; + __u8lock; + __u8delta; + } pacing; +#endif }; static inline struct tcp_sock *tcp_sk(const struct sock *sk) diff -ruN linux-2.6.18-rc6/include/net/tcp.h linux-pacing/include/net/tcp.h --- linux-2.6.18-rc6/include/net/tcp.h 2006-09-04 04:19:48.0 +0200 +++ linux-pacing/include/net/tcp.h 2006-09-13 09:33:02.0 +0200 @@ -449,6 +449,58 @@ extern unsigned int tcp_sync_mss(struct sock *sk, u32 pmtu); extern unsigned int tcp_current_mss(struct sock *sk, int large); +#ifdef CONFIG_TCP_PACING +extern int sysctl_tcp_pacing; +extern void __tcp_pacing_recalc_delta(struct sock *sk); +extern void __tcp_pacing_reset_timer(struct sock *sk); +static inline void tcp_pacing_recalc_delta(struct sock *sk) +{ + if (sysctl_tcp_pacing) + __tcp_pacing_recalc_delta(sk); +} + +static inline void tcp_pacing_reset_timer(struct sock *sk) +{ + if (sysctl_tcp_pacing) + __tcp_pacing_reset_timer(sk); +} + +static inline void tcp_pacing_lock_tx(struct sock *sk) +{ + if (sysctl_tcp_pacing) + tcp_sk(sk)-pacing.lock=1; +} + +static inline int tcp_pacing_locked(struct sock *sk) +{ + if (sysctl_tcp_pacing) + return tcp_sk(sk)-pacing.lock; + else + return 0; +} + +static inline int tcp_pacing_enabled(struct sock *sk) +{ + return sysctl_tcp_pacing; +} + +static inline int tcp_pacing_burst(struct sock *sk) +{ + if (sysctl_tcp_pacing) + return tcp_sk(sk)-pacing.burst; + else + return 0; +} + +#else +static inline void tcp_pacing_recalc_delta(struct sock *sk) {}; +static inline void tcp_pacing_reset_timer(struct sock *sk) {}; +static inline void tcp_pacing_lock_tx(struct sock *sk) {}; +#define tcp_pacing_locked(sk) 0 +#define tcp_pacing_enabled(sk) 0 +#define tcp_pacing_burst(sk) 0 +#endif + /* tcp.c */ extern void tcp_get_info(struct sock *, struct tcp_info *); diff -ruN linux-2.6.18-rc6/net/ipv4/Kconfig linux-pacing/net/ipv4/Kconfig --- linux-2.6.18-rc6/net/ipv4/Kconfig 2006-09-04 04:19:48.0 +0200 +++ linux-pacing/net/ipv4/Kconfig 2006-09-13 09:31:27.0 +0200 @@ -572,6 +572,19 @@ loss packets. See http://www.ntu.edu.sg/home5/ZHOU0022/papers/CPFu03a.pdf +config TCP_PACING + bool TCP Pacing + depends on EXPERIMENTAL + default n + ---help--- + Many researchers have observed that TCP's congestion control mechanisms + can lead to bursty traffic flows on modern high-speed networks, with a + negative impact on overall network efficiency. A proposed solution to this + problem is to evenly space, or pace, data sent into the network over an + entire round-trip time, so that data is not sent in a burst. + To enable this feature, please refer to Documentation/networking/ip-sysctl.txt. + If unsure, say N. + endmenu config TCP_CONG_BIC diff -ruN linux-2.6.18-rc6/net/ipv4/sysctl_net_ipv4.c linux-pacing/net/ipv4/sysctl_net_ipv4.c --- linux-2.6.18-rc6/net/ipv4/sysctl_net_ipv4.c 2006-09-04 04:19:48.0 +0200 +++ linux-pacing/net/ipv4/sysctl_net_ipv4.c 2006-09-12 18:33:36.0 +0200 @@ -697,6 +697,16 @@ .mode = 0644, .proc_handler = proc_dointvec }, +#ifdef CONFIG_TCP_PACING +
Re: TCP Pacing
As Ian requested, some of the papers published about Pacing. * Main reference: - Amit Aggarwal, Stefan Savage, and Thomas Anderson. Understanding the Performance of TCP Pacing. Proc. of the IEEE INFOCOM 2000 Conference on Computer Communications, March 2000, pages 1157 - 1165. * IETF RFC: --- H. Balakrishnan, V. N. Padmanabhan, G. Fairhurst, M.Sooriyabandara, TCP Performance Implications of Network Path Asymmetry, IETF RFC 3449, December 2002. * Other works: -- C. Caini, R. Firrincieli, Packet spreading techniques to avoid bursty traffic in satellite TCP connections. In. Proceedings of IEEE VTC Spring ’04 Q.Ye, M.H. MacGregor, Pacing to Improve SACK TCP Resilience, 2005 Spring Simulation Multiconference, DASD, pp. 39-45, 2005 Young-Soo Choi; Kong-Won Lee; Tae-Man Han; You-Ze Cho; High-speed TCP protocols with pacing for fairness and TCP friendliness TENCON 2004. 2004 IEEE Region 10 Conference Volume C, 21-24 Nov. 2004 Page(s):13 - 16 Vol. 3 Razdan, A.; Nandan, A.; Wang, R.; Sanadidi, M.Y.; Gerla, M.; Enhancing TCP performance in networks with small buffers Computer Communications and Networks, 2002. Proceedings. Eleventh International Conference on 14-16 Oct. 2002 Page(s):39 - 44 Moonsoo Kang; Jeonghoon Mo; On the Pacing Technique for High Speed TCP Over Optical Burst Switching Networks Advanced Communication Technology, 2006. ICACT 2006. The 8th International Conference Volume 2, 20-22 Feb. 2006 Page(s):1421 - 1424 Mark Allman, Ethan Blanton Notes on burst mitigation for transport protocols, April 2005 ACM SIGCOMM Computer Communication Review, Volume 35 Issue 2 Publisher: ACM Press J. Kulik, R. Coulter, D. Rockwell, and C. Partridge, A Simulation Study of Paced TCP, BBN Technical Memorandum No. 1218, 1999. * Congestion Control proposals that include Pacing: --- G. Marfia, C. Palazzi, G. Pau, M. Gerla, M. Sanadidi and M. Roccetti, TCP Libra: Balancing Flows over Heterogeneous Propagation Scenarios, submitted for publication in Proceedings of ACM SIGMETRICS/Performance 2006. Carlo Caini and Rosario Firrincieli, TCP Hybla: a TCP enhancement for heterogeneous networks, INTERNATIONAL JOURNAL OF SATELLITE COMMUNICATIONS AND NETWORKING 2004; 22:547–566 D. X. Wei, C. Jin, S. H. Low and S. Hegde. FAST TCP: motivation, architecture, algorithms, performance IEEE/ACM Transactions on Networking, to appear in 2007 -- Daniele Lacamera root{at}danielinux.net - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP Pacing
On 9/13/06, Daniele Lacamera [EMAIL PROTECTED] wrote: On Tuesday 12 September 2006 23:26, Ian McDonald wrote: Where is the published research? If you are going to mention research you need URLs to papers and please put this in source code too so people can check. I added the main reference to the code. I am going to give you all the pointers on this research, mainly recent congestion control proposals that include pacing. Thanks I agree with Arnaldo's comments and also would add I don't like having to select 1000 as HZ unit. Something is wrong if you need this as I can run higher resolution timers without having to do this I removed that select in Kconfig, I agree it doesn't make sense at all, for portability. However, pacing works with 1ms resolution, so maybe a depends HZ_1000 is still required. (How do you run 1ms timers with HZ!=1000?) The HZ refers to time slices per second mostly for user space - e.g. how often to task switch. -- Ian McDonald Web: http://wand.net.nz/~iam4 Blog: http://imcdnzl.blogspot.com WAND Network Research Group Department of Computer Science University of Waikato New Zealand - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP Pacing
On Wed, 13 Sep 2006 10:18:31 +0200 Daniele Lacamera [EMAIL PROTECTED] wrote: On Wednesday 13 September 2006 05:41, Stephen Hemminger wrote: Pacing in itself isn't a bad idea, but: cut * Since it is most useful over long delay links, maybe it should be a route parameter. Look into rtnetlink and how we keep track of route metrics, and add a new per route state variable. Need to update iproute2 (ip command) as well. What does this mean? Should I move the sysctl switch elsewhere? A new (cleaner) patch follows. Thanks to you all for your attention advices. Signed-off by: Daniele Lacamera [EMAIL PROTECTED] You may also want into look into high resolution timer (hrtimer), the resolution doesn't get finer than HZ without using -rt patches. But the ktime interface is cleaner than the normal timer math. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP Pacing
On 9/12/06, Daniele Lacamera [EMAIL PROTECTED] wrote: Hello, Please let me insist once again on the importance of adding a TCP Pacing mechanism in our TCP, as many people are including this algorithm in their congestion control proposals. Recent researches have found out that it really can help improving performance in different scenarios, like satellites and long-delay high-speed channels (100ms RTT, Gbit). Hybla module itself is cripple without this feature in its natural scenario. The following patch is totally non-invasive: it has a config option and a sysctl switch, both turned off by default. When the config option is enabled, it adds only 6B to the tcp_sock. Signed-off by: Daniele Lacamera [EMAIL PROTECTED] --- diff -ruN linux-2.6.18-rc6/net/ipv4/tcp_input.c linux-pacing/net/ipv4/tcp_input.c --- linux-2.6.18-rc6/net/ipv4/tcp_input.c 2006-09-04 04:19:48.0 +0200 +++ linux-pacing/net/ipv4/tcp_input.c 2006-09-12 17:11:38.0 +0200 @@ -2569,6 +2569,11 @@ tcp_cong_avoid(sk, ack, seq_rtt, prior_in_flight, 1); } Without getting into the merits of the pacing technique: +#ifdef CONFIG_TCP_PACING + if(sysctl_tcp_pacing) + tcp_pacing_recalc_delta(sk); +#endif Please rewrite the patch so as to avoid adding that many #ifdefs to the common code, replacing above code with: tcp_pacing_recalc_delta(sk); That is defined in a header (net/tcp.h) as: #ifdef CONFIG_TCP_PACING extern void __tcp_pacing_recalc_delta(struct sock *sk); extern int sysctl_tcp_pacing; static inline void tcp_pacing_recalc_delta(struct sock *sk) { if (sysctl_tcp_pacing) /* notice the space after ( */ __tcp_pacing_recalc_delta(sk); } #else static inline void tcp_pacing_recalc_delta(struct sock *sk) {}; #endif Thanks, - Arnaldo - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP Pacing
On 9/13/06, Daniele Lacamera [EMAIL PROTECTED] wrote: Hello, Please let me insist once again on the importance of adding a TCP Pacing mechanism in our TCP, as many people are including this algorithm in their congestion control proposals. Recent researches have found out that it really can help improving performance in different scenarios, like satellites and long-delay high-speed channels (100ms RTT, Gbit). Hybla module itself is cripple without this feature in its natural scenario. Where is the published research? If you are going to mention research you need URLs to papers and please put this in source code too so people can check. The following patch is totally non-invasive: it has a config option and a sysctl switch, both turned off by default. When the config option is enabled, it adds only 6B to the tcp_sock. I agree with Arnaldo's comments and also would add I don't like having to select 1000 as HZ unit. Something is wrong if you need this as I can run higher resolution timers without having to do this Haven't reviewed the rest of the code or tested. Ian -- Ian McDonald Web: http://wand.net.nz/~iam4 Blog: http://imcdnzl.blogspot.com WAND Network Research Group Department of Computer Science University of Waikato New Zealand - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP Pacing
On Tue, 12 Sep 2006 19:58:21 +0200 Daniele Lacamera [EMAIL PROTECTED] wrote: Hello, Please let me insist once again on the importance of adding a TCP Pacing mechanism in our TCP, as many people are including this algorithm in their congestion control proposals. Recent researches have found out that it really can help improving performance in different scenarios, like satellites and long-delay high-speed channels (100ms RTT, Gbit). Hybla module itself is cripple without this feature in its natural scenario. The following patch is totally non-invasive: it has a config option and a sysctl switch, both turned off by default. When the config option is enabled, it adds only 6B to the tcp_sock. Yes, but tcp_sock is already greater than 1024 on 64 bit, and needs a diet. Signed-off by: Daniele Lacamera [EMAIL PROTECTED] Pacing in itself isn't a bad idea, but: * Code needs to follow standard whitespace rules - blanks around operators - blank after keyword - Avoid (needless) paraenthesis Bad: if( (state==TCP_CA_Recovery) (tp-snd_cwnd tp-snd_ssthresh)) window=(tp-snd_ssthresh)3; Good: if (state == TCP_CA_Recovery tp-snd_cwnd tp-snd_ssthresh) window = tp-snd_ssthresh 3; * Since it is most useful over long delay links, maybe it should be a route parameter. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html