Re: [PATCH] NET : No need to update last_rx in loopback driver
From: Eric Dumazet [EMAIL PROTECTED] Date: Fri, 10 Feb 2006 11:00:03 +0100 Linux took 5 years to get out of BKL (Big Kernel Lock), so I think things can evolve step by step. In a few months I will use RCU for the x_tables problem if nobody else did the job before me :) Your work is appreciated, there is no doubt. It's just that I think we can do things like this without removing functionality. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] NET : No need to update last_rx in loopback driver
David S. Miller a écrit : From: Eric Dumazet [EMAIL PROTECTED] Date: Wed, 08 Feb 2006 16:06:31 +0100 loopback driver carefully uses per_cpu storage for statistics but updates loopback_dev.last_rx This has been discussed before, this is an attribute every driver must keep uptodate. Things like bonding use it, for example. Yes, loopback isn't usable with bonding, but it's a bad precedence to set that a useful metric is set for some devices and not for others. Find a cleaner way to fix this SMP cacheline pingpong, thanks. OK David, thanks for your (always excellent :) ) feedback. What do you think of this patch then ? [PATCH] : NET : Introduce netif_last_rx_update() method last_rx is changed each time a packet is received on an net device. This can cause cache line ping pongs in SMP machines, reducing performance. This effect is particularly bad on loopback device since no IRQ affinity can solve the problem (Ethernet devices can be tuned so that all incoming frames are handled by one CPU, avoiding this cache line ping pong. But loopback driver is used by user processes on all cpus) Instead of letting each driver doing plain dev-last_rx = jiffies; we can do some SMP optimizations once in include/linux/netdevice.h, and adapt this method if necessary. (Current uses of last_rx are limited to bonding, but this could change in the future, or for debugging purposes) This patch : - Defines netif_last_rx_update() inline function. Currently, we avoid a change on last_rx if the device is not a slave, or if last_rx value is already 'jiffies'. For slave devices, the number of ping-pongs is maxed to NR_CPUS*HZ per second. Loopback is not a slave candidate, and IRQ affinity should be the answer for ethernet drivers. - Changes drivers/net/loopback.c to : -- use this function. -- (and last_rx should be updated even if LOOPBACK_TSO is defined.) -- kmalloc/kzalloc conversion in loopback_init() Signed-off-by: Eric Dumazet [EMAIL PROTECTED] --- a/include/linux/netdevice.h 2006-02-07 11:55:42.0 +0100 +++ b/include/linux/netdevice.h 2006-02-09 09:23:15.0 +0100 @@ -649,6 +649,19 @@ return test_bit(__LINK_STATE_START, dev-state); } +static inline void netif_last_rx_update(struct net_device *dev) +{ +#ifdef CONFIG_SMP + /* +* In order to avoid cache line ping pongs on last_rx, we check +* if the device is a slave, +* and if last_rx really has to be modified +*/ + if (!(dev-flags IFF_SLAVE) || (dev-last_rx == jiffies)) + return; +#endif + dev-last_rx = jiffies; +} /* Use this variant when it is known for sure that it * is executing from interrupt context. --- a/drivers/net/loopback.c2006-02-07 12:10:55.0 +0100 +++ b/drivers/net/loopback.c2006-02-09 09:40:23.0 +0100 @@ -138,6 +138,7 @@ skb-ip_summed = CHECKSUM_UNNECESSARY; #endif + netif_last_rx_update(dev); #ifdef LOOPBACK_TSO if (skb_shinfo(skb)-tso_size) { BUG_ON(skb-protocol != htons(ETH_P_IP)); @@ -147,7 +148,6 @@ return 0; } #endif - dev-last_rx = jiffies; lb_stats = per_cpu(loopback_stats, get_cpu()); lb_stats-rx_bytes += skb-len; @@ -226,9 +226,8 @@ struct net_device_stats *stats; /* Can survive without statistics */ - stats = kmalloc(sizeof(struct net_device_stats), GFP_KERNEL); + stats = kzalloc(sizeof(struct net_device_stats), GFP_KERNEL); if (stats) { - memset(stats, 0, sizeof(struct net_device_stats)); loopback_dev.priv = stats; loopback_dev.get_stats = get_stats; }
Re: [PATCH] NET : No need to update last_rx in loopback driver
Eric Dumazet wrote: Signed-off-by: Eric Dumazet [EMAIL PROTECTED] --- a/include/linux/netdevice.h 2006-02-07 11:55:42.0 +0100 +++ b/include/linux/netdevice.h 2006-02-09 09:23:15.0 +0100 @@ -649,6 +649,19 @@ return test_bit(__LINK_STATE_START, dev-state); } One minor suggestion I would have is to just load jiffies once (it is volatile). But it gets a little messy if you want to be tricky and still load it zero times in the case that the first test evaluates true. Maybe not worth the fuss. { unsigned long now; #ifndef CONFIG_SMP now = jiffies #else if (!(dev-flags IFF_SLAVE) || ((now = jiffies) == dev-last_rx)) return; #endif dev-last_rx = jiffies; } +static inline void netif_last_rx_update(struct net_device *dev) +{ +#ifdef CONFIG_SMP + /* +* In order to avoid cache line ping pongs on last_rx, we check +* if the device is a slave, +* and if last_rx really has to be modified +*/ + if (!(dev-flags IFF_SLAVE) || (dev-last_rx == jiffies)) + return; +#endif + dev-last_rx = jiffies; +} -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] NET : No need to update last_rx in loopback driver
From: Eric Dumazet [EMAIL PROTECTED] Date: Thu, 09 Feb 2006 09:00:21 +0100 +#ifdef CONFIG_SMP + /* + * In order to avoid cache line ping pongs on last_rx, we check + * if the device is a slave, + * and if last_rx really has to be modified + */ + if (!(dev-flags IFF_SLAVE) || (dev-last_rx == jiffies)) + return; +#endif I said it's supposed to be generally available that the dev-last_rx value is updated and accurate for all network devices. Bypassing the setting by checking for IFF_SLAVE totally defeats that and makes this check a bonding specific optimization. Like I said, please find another way. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] NET : No need to update last_rx in loopback driver
David S. Miller a écrit : From: Eric Dumazet [EMAIL PROTECTED] Date: Thu, 09 Feb 2006 09:00:21 +0100 +#ifdef CONFIG_SMP + /* +* In order to avoid cache line ping pongs on last_rx, we check +* if the device is a slave, +* and if last_rx really has to be modified +*/ + if (!(dev-flags IFF_SLAVE) || (dev-last_rx == jiffies)) + return; +#endif I said it's supposed to be generally available that the dev-last_rx value is updated and accurate for all network devices. Bypassing the setting by checking for IFF_SLAVE totally defeats that and makes this check a bonding specific optimization. I'm sorry but last_rx is not available to User Land. You need a kernel debugger to access it, or mess with /proc/kcore If it's really usefull, then we miss accessors to this valuable data. Once these accessors are in, then we can delete the (dev-flags IFF_SLAVE) test. We can put a lot of metrics in kernel, for example the jiffies of last syscall, but if those metrics are not exported, it seems quite wrong to perform expensive metrics in production kernels. As I said in my patch description, we could in DEBUG mode do all the metrics you want : Jiffies of last_rx, smp_processor_id() of last rx, and all sort of things. Eric - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] NET : No need to update last_rx in loopback driver
From: Eric Dumazet [EMAIL PROTECTED] Date: Thu, 09 Feb 2006 10:51:00 +0100 I'm sorry but last_rx is not available to User Land. You need a kernel debugger to access it, or mess with /proc/kcore There are tons of pieces of state tracked for net devices which we don't export to userland. This one is generally useful. Right now, it may be the case that bonding is the only consumer in the kernel, but that might not always be true. I can very easily envison uses by the packet scheduler and classifiers. This optimization is truly erroneous, and I think people are going waaay over the top wrt. optimizing the loopback interface on SMP. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] NET : No need to update last_rx in loopback driver
On Thursday 09 February 2006 12:05, David S. Miller wrote: From: Eric Dumazet [EMAIL PROTECTED] Date: Thu, 09 Feb 2006 10:51:00 +0100 I'm sorry but last_rx is not available to User Land. You need a kernel debugger to access it, or mess with /proc/kcore There are tons of pieces of state tracked for net devices which we don't export to userland. This one is generally useful. Right now, it may be the case that bonding is the only consumer in the kernel, but that might not always be true. I can very easily envison uses by the packet scheduler and classifiers. Are you sure? I remember at some point talking about using skb-stamp for TCP RTTs, but I think Alexey flamed me for it stating that ignoring the delays added by internal queueing in the stack is wrong. And he was right of course. All these estimates have to be done at the time the packet finally arrives, not when it happens to hit the driver. Same would probably apply to schedulers and classifiers. So I cannot imagine a good use of this field. What does bonding use it for anyways? -Andi - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] NET : No need to update last_rx in loopback driver
On Thursday 09 February 2006 14:53, John W. Linville wrote: On Thu, Feb 09, 2006 at 12:33:11PM +0100, Andi Kleen wrote: Same would probably apply to schedulers and classifiers. So I cannot imagine a good use of this field. What does bonding use it for anyways? ARP-based monitoring of link activity. That's a pretty obscure use to justify a precious full field in sk_buff -Andi - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] NET : No need to update last_rx in loopback driver
Andi Kleen a écrit : On Thursday 09 February 2006 14:53, John W. Linville wrote: On Thu, Feb 09, 2006 at 12:33:11PM +0100, Andi Kleen wrote: Same would probably apply to schedulers and classifiers. So I cannot imagine a good use of this field. What does bonding use it for anyways? ARP-based monitoring of link activity. That's a pretty obscure use to justify a precious full field in sk_buff Your question was about bonding and its last_rx use. This field is not part of sk_buff but 'struct netdevice' :) If it were in sk_buff, I would not desesperatly try to convince David. Eric - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] NET : No need to update last_rx in loopback driver
On Thu, 2006-09-02 at 15:21 +0100, Eric Dumazet wrote: Andi Kleen a écrit : On Thursday 09 February 2006 14:53, John W. Linville wrote: On Thu, Feb 09, 2006 at 12:33:11PM +0100, Andi Kleen wrote: Same would probably apply to schedulers and classifiers. So I cannot imagine a good use of this field. What does bonding use it for anyways? ARP-based monitoring of link activity. That's a pretty obscure use to justify a precious full field in sk_buff Your question was about bonding and its last_rx use. This field is not part of sk_buff but 'struct netdevice' :) If it were in sk_buff, I would not desesperatly try to convince David. I think it is useful -as a stat- to know when the last time something was received or sent on a netdevice. I would actually be suprised if theres no SNMP MIB that requires this information. cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] NET : No need to update last_rx in loopback driver
David S. Miller a écrit : There are tons of pieces of state tracked for net devices which we don't export to userland. This one is generally useful. Right now, it may be the case that bonding is the only consumer in the kernel, but that might not always be true. I can very easily envison uses by the packet scheduler and classifiers. This optimization is truly erroneous, and I think people are going waaay over the top wrt. optimizing the loopback interface on SMP. I had this idea after reading recent Van Jacobson paper about net channels. I'm surprised you see the benefit of net channels (which first assumption is about the 50-100 ns cost of a memory cache miss) but keep this obvious cache line ping pong in loopback driver. I did a benchmark test, using the volano benchmark (http://www.volano.com/report/index.html) Platform : Dell PowerEdge 1600 SC with two Intel Xeon HT 2.0GHz cpu (total of 4 logical cpus), FSB 533MHz, 2GB of ram. java version 1.5.0_06 Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_06-b05) Java HotSpot(TM) Server VM (build 1.5.0_06-b05, mixed mode) Normal 2.6.16-rc2(with dev-last_rx=jiffies in loopback driver) VolanoMark version = 2.5.0.9 Messages sent = 2 Messages received = 38 Total messages = 40 Elapsed time = 27.239 seconds Average throughput = 14685 messages per second VolanoMark version = 2.5.0.9 Messages sent = 2 Messages received = 38 Total messages = 40 Elapsed time = 24.871 seconds Average throughput = 16083 messages per second VolanoMark version = 2.5.0.9 Messages sent = 2 Messages received = 38 Total messages = 40 Elapsed time = 28.698 seconds Average throughput = 13938 messages per second VolanoMark version = 2.5.0.9 Messages sent = 2 Messages received = 38 Total messages = 40 Elapsed time = 24.302 seconds Average throughput = 16460 messages per second total time = 108.956 Patched 2.6.16-rc2 kernel (without dev-last_rx=jiffies; in loopback driver) VolanoMark version = 2.5.0.9 Messages sent = 2 Messages received = 38 Total messages = 40 Elapsed time = 26.3 seconds Average throughput = 15209 messages per second VolanoMark version = 2.5.0.9 Messages sent = 2 Messages received = 38 Total messages = 40 Elapsed time = 25.056 seconds Average throughput = 15964 messages per second VolanoMark version = 2.5.0.9 Messages sent = 2 Messages received = 38 Total messages = 40 Elapsed time = 24.62 seconds Average throughput = 16247 messages per second VolanoMark version = 2.5.0.9 Messages sent = 2 Messages received = 38 Total messages = 40 Elapsed time = 23.609 seconds Average throughput = 16943 messages per second total time = 105.2 If I add the dont refcount loopback netdevice patch : VolanoMark version = 2.5.0.9 Messages sent = 2 Messages received = 38 Total messages = 40 Elapsed time = 29.571 seconds Average throughput = 13527 messages per second VolanoMark version = 2.5.0.9 Messages sent = 2 Messages received = 38 Total messages = 40 Elapsed time = 23.993 seconds Average throughput = 16672 messages per second VolanoMark version = 2.5.0.9 Messages sent = 2 Messages received = 38 Total messages = 40 Elapsed time = 24.111 seconds Average throughput = 16590 messages per second VolanoMark version = 2.5.0.9 Messages sent = 2 Messages received = 38 Total messages = 40 Elapsed time = 24.68 seconds Average throughput = 16207 messages per second total = 102.355 Of course this is a benchmark, but normally used to test the OS capacity to switch threads, not 'loopback performance' - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] NET : No need to update last_rx in loopback driver
On Thu, Feb 09, 2006 at 04:16:54PM +0100, Eric Dumazet ([EMAIL PROTECTED]) wrote: I had this idea after reading recent Van Jacobson paper about net channels. I'm surprised you see the benefit of net channels (which first assumption is about the 50-100 ns cost of a memory cache miss) but keep this obvious cache line ping pong in loopback driver. Net channels are not about optimisations, but about completely new system. And resulted optimisations are just results of design, but not implementation. What about creating per-cpu jiffies variables, which could be read from such hot pathes, and it's values could be updated from some heavyweight functions like schedule()? I did a benchmark test, using the volano benchmark (http://www.volano.com/report/index.html) Platform : Dell PowerEdge 1600 SC with two Intel Xeon HT 2.0GHz cpu (total of 4 logical cpus), FSB 533MHz, 2GB of ram. java version 1.5.0_06 Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_06-b05) Java HotSpot(TM) Server VM (build 1.5.0_06-b05, mixed mode) Normal 2.6.16-rc2(with dev-last_rx=jiffies in loopback driver) VolanoMark version = 2.5.0.9 Messages sent = 2 Messages received = 38 Total messages = 40 Elapsed time = 27.239 seconds Average throughput = 14685 messages per second VolanoMark version = 2.5.0.9 Messages sent = 2 Messages received = 38 Total messages = 40 Elapsed time = 24.871 seconds Average throughput = 16083 messages per second VolanoMark version = 2.5.0.9 Messages sent = 2 Messages received = 38 Total messages = 40 Elapsed time = 28.698 seconds Average throughput = 13938 messages per second VolanoMark version = 2.5.0.9 Messages sent = 2 Messages received = 38 Total messages = 40 Elapsed time = 24.302 seconds Average throughput = 16460 messages per second total time = 108.956 Patched 2.6.16-rc2 kernel (without dev-last_rx=jiffies; in loopback driver) VolanoMark version = 2.5.0.9 Messages sent = 2 Messages received = 38 Total messages = 40 Elapsed time = 26.3 seconds Average throughput = 15209 messages per second VolanoMark version = 2.5.0.9 Messages sent = 2 Messages received = 38 Total messages = 40 Elapsed time = 25.056 seconds Average throughput = 15964 messages per second VolanoMark version = 2.5.0.9 Messages sent = 2 Messages received = 38 Total messages = 40 Elapsed time = 24.62 seconds Average throughput = 16247 messages per second VolanoMark version = 2.5.0.9 Messages sent = 2 Messages received = 38 Total messages = 40 Elapsed time = 23.609 seconds Average throughput = 16943 messages per second total time = 105.2 If I add the dont refcount loopback netdevice patch : VolanoMark version = 2.5.0.9 Messages sent = 2 Messages received = 38 Total messages = 40 Elapsed time = 29.571 seconds Average throughput = 13527 messages per second VolanoMark version = 2.5.0.9 Messages sent = 2 Messages received = 38 Total messages = 40 Elapsed time = 23.993 seconds Average throughput = 16672 messages per second VolanoMark version = 2.5.0.9 Messages sent = 2 Messages received = 38 Total messages = 40 Elapsed time = 24.111 seconds Average throughput = 16590 messages per second VolanoMark version = 2.5.0.9 Messages sent = 2 Messages received = 38 Total messages = 40 Elapsed time = 24.68 seconds Average throughput = 16207 messages per second total = 102.355 Of course this is a benchmark, but normally used to test the OS capacity to switch threads, not 'loopback performance' - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- Evgeniy Polyakov - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] NET : No need to update last_rx in loopback driver
On Thu, 9 Feb 2006 19:42:17 +0300 Evgeniy Polyakov [EMAIL PROTECTED] wrote: On Thu, Feb 09, 2006 at 04:16:54PM +0100, Eric Dumazet ([EMAIL PROTECTED]) wrote: I had this idea after reading recent Van Jacobson paper about net channels. I'm surprised you see the benefit of net channels (which first assumption is about the 50-100 ns cost of a memory cache miss) but keep this obvious cache line ping pong in loopback driver. Net channels are not about optimisations, but about completely new system. And resulted optimisations are just results of design, but not implementation. What about creating per-cpu jiffies variables, which could be read from such hot pathes, and it's values could be updated from some heavyweight functions like schedule()? Quit with the loopback benchmark masturbation. If you what you are talking about could be shown to be important for a real application or over real devices then it would be worth investing some design effort. -- Stephen Hemminger [EMAIL PROTECTED] OSDL http://developer.osdl.org/~shemminger - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] NET : No need to update last_rx in loopback driver
From: Eric Dumazet [EMAIL PROTECTED] Date: Thu, 09 Feb 2006 16:16:54 +0100 I had this idea after reading recent Van Jacobson paper about net channels. I'm surprised you see the benefit of net channels (which first assumption is about the 50-100 ns cost of a memory cache miss) but keep this obvious cache line ping pong in loopback driver. Loopback is the wrong thing to optimize. We have several efficient mechanisms for local system communications. If you don't necessarily need socket semantics, use pipes (named or unnamed), else you can use AF_UNIX sockets. Van Jacobson shows a 6X improvement for real over the wire communications, and you're arguing over a sub-percentile improvement over loopback. There is a major difference. Patches exactly like your's have been proposed before by the SGI folks in the past, and I'm arguing against it using with the same positions I held at that time. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] NET : No need to update last_rx in loopback driver
loopback driver carefully uses per_cpu storage for statistics but updates loopback_dev.last_rx As last_rx is only used by bond driver to detect link activity, and loopback dev wont be used as a bond slave, can we avoid this write access that slowdown SMP platforms, because of cache ping pongs ? If this patch is not correct, I suspect we should update trans_start in loopback driver since the condition in the bond driver is bidirectional : if (((jiffies - slave-dev-trans_start) = delta_in_ticks) ((jiffies - slave-dev-last_rx) = delta_in_ticks)) { Signed-off-by: Eric Dumazet [EMAIL PROTECTED] --- a/drivers/net/loopback.c2006-02-07 12:10:55.0 +0100 +++ b/drivers/net/loopback.c2006-02-08 16:49:08.0 +0100 @@ -147,7 +147,6 @@ return 0; } #endif - dev-last_rx = jiffies; lb_stats = per_cpu(loopback_stats, get_cpu()); lb_stats-rx_bytes += skb-len;
Re: [PATCH] NET : No need to update last_rx in loopback driver
From: Eric Dumazet [EMAIL PROTECTED] Date: Wed, 08 Feb 2006 16:06:31 +0100 loopback driver carefully uses per_cpu storage for statistics but updates loopback_dev.last_rx This has been discussed before, this is an attribute every driver must keep uptodate. Things like bonding use it, for example. Yes, loopback isn't usable with bonding, but it's a bad precedence to set that a useful metric is set for some devices and not for others. Find a cleaner way to fix this SMP cacheline pingpong, thanks. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html