Re: [PATCH v2 net-next] softirq: reduce latencies
[ QUOTE ] From: Eric Dumazet Date: Thu, 03 Jan 2013 23:49:40 -0800 > From: Eric Dumazet > > In various network workloads, __do_softirq() latencies can be up > to 20 ms if HZ=1000, and 200 ms if HZ=100. > > This is because we iterate 10 times in the softirq dispatcher, > and some actions can consume a lot of cycles. > > This patch changes the fallback to ksoftirqd condition to : > > - A time limit of 2 ms. > - need_resched() being set on current task > > When one of this condition is met, we wakeup ksoftirqd for further > softirq processing if we still have pending softirqs. ... > Signed-off-by: Eric Dumazet Acked-by: David S. Miller [ /QUOTE ] Tested-by: Sedat Dilek (against Linux v3.8-rc2) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 net-next] softirq: reduce latencies
From: Eric Dumazet Date: Thu, 03 Jan 2013 23:49:40 -0800 > From: Eric Dumazet > > In various network workloads, __do_softirq() latencies can be up > to 20 ms if HZ=1000, and 200 ms if HZ=100. > > This is because we iterate 10 times in the softirq dispatcher, > and some actions can consume a lot of cycles. > > This patch changes the fallback to ksoftirqd condition to : > > - A time limit of 2 ms. > - need_resched() being set on current task > > When one of this condition is met, we wakeup ksoftirqd for further > softirq processing if we still have pending softirqs. ... > Signed-off-by: Eric Dumazet Acked-by: David S. Miller -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 net-next] softirq: reduce latencies
On Fri, 2013-01-04 at 01:12 -0800, Joe Perches wrote: > On Fri, 2013-01-04 at 00:23 -0800, Eric Dumazet wrote: > > On Fri, 2013-01-04 at 00:15 -0800, Joe Perches wrote: > > > Perhaps MAX_SOFTIRQ_TIME should be > > > #define MAX_SOFTIRQ_TIME msecs_to_jiffies(2) > > > though it would be nicer if it were a compile time constant. > > > > If you send a patch to convert msecs_to_jiffies() to an inline function > > when HZ = 1000, I will gladly use it instead of (2*HZ/1000) > > > > Right now, max(1, msecs_to_jiffies(2)) uses way too many instructions, > > while it should be the constant 2, known at compile time. > > Something like this might work. > > This is incomplete, it just does msecs_to_jiffies, > and it should convert usecs_to_jiffies and the > jiffies_to_ types too. > > Maybe it's worthwhile. > > It does reduce object size by 16 bytes per call site > (x86-32) when the argument is a constant. There are > about 800 of these jiffies conversions in kernel sources. > > What do you think? > I think this is something to discuss in another thread, and definitely worth to do, at least for msecs_to_jiffies() We have many HZ references everywhere that could be cleaned up using this. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 net-next] softirq: reduce latencies
On Fri, 2013-01-04 at 00:23 -0800, Eric Dumazet wrote: > On Fri, 2013-01-04 at 00:15 -0800, Joe Perches wrote: > > Perhaps MAX_SOFTIRQ_TIME should be > > #define MAX_SOFTIRQ_TIME msecs_to_jiffies(2) > > though it would be nicer if it were a compile time constant. > > If you send a patch to convert msecs_to_jiffies() to an inline function > when HZ = 1000, I will gladly use it instead of (2*HZ/1000) > > Right now, max(1, msecs_to_jiffies(2)) uses way too many instructions, > while it should be the constant 2, known at compile time. Something like this might work. This is incomplete, it just does msecs_to_jiffies, and it should convert usecs_to_jiffies and the jiffies_to_ types too. Maybe it's worthwhile. It does reduce object size by 16 bytes per call site (x86-32) when the argument is a constant. There are about 800 of these jiffies conversions in kernel sources. What do you think? include/linux/jiffies.h | 49 - kernel/time.c | 42 +++--- 2 files changed, 51 insertions(+), 40 deletions(-) diff --git a/include/linux/jiffies.h b/include/linux/jiffies.h index 82ed068..c67ddcf 100644 --- a/include/linux/jiffies.h +++ b/include/linux/jiffies.h @@ -291,7 +291,54 @@ extern unsigned long preset_lpj; */ extern unsigned int jiffies_to_msecs(const unsigned long j); extern unsigned int jiffies_to_usecs(const unsigned long j); -extern unsigned long msecs_to_jiffies(const unsigned int m); +extern unsigned long __msecs_to_jiffies(const unsigned int m); + +static inline unsigned long __inline_msecs_to_jiffies(const unsigned int m) +{ + /* +* Negative value, means infinite timeout: +*/ + if ((int)m < 0) + return MAX_JIFFY_OFFSET; + +#if HZ <= MSEC_PER_SEC && !(MSEC_PER_SEC % HZ) + /* +* HZ is equal to or smaller than 1000, and 1000 is a nice +* round multiple of HZ, divide with the factor between them, +* but round upwards: +*/ + return (m + (MSEC_PER_SEC / HZ) - 1) / (MSEC_PER_SEC / HZ); +#elif HZ > MSEC_PER_SEC && !(HZ % MSEC_PER_SEC) + /* +* HZ is larger than 1000, and HZ is a nice round multiple of +* 1000 - simply multiply with the factor between them. +* +* But first make sure the multiplication result cannot +* overflow: +*/ + if (m > jiffies_to_msecs(MAX_JIFFY_OFFSET)) + return MAX_JIFFY_OFFSET; + + return m * (HZ / MSEC_PER_SEC); +#else + /* +* Generic case - multiply, round and divide. But first +* check that if we are doing a net multiplication, that +* we wouldn't overflow: +*/ + if (HZ > MSEC_PER_SEC && m > jiffies_to_msecs(MAX_JIFFY_OFFSET)) + return MAX_JIFFY_OFFSET; + + return (MSEC_TO_HZ_MUL32 * m + MSEC_TO_HZ_ADJ32) + >> MSEC_TO_HZ_SHR32; +#endif +} + +#define msecs_to_jiffies(x)\ + (__builtin_constant_p(x) ? \ +__inline_msecs_to_jiffies(x) : \ +__msecs_to_jiffies(x)) + extern unsigned long usecs_to_jiffies(const unsigned int u); extern unsigned long timespec_to_jiffies(const struct timespec *value); extern void jiffies_to_timespec(const unsigned long jiffies, diff --git a/kernel/time.c b/kernel/time.c index d226c6a..231f2ac 100644 --- a/kernel/time.c +++ b/kernel/time.c @@ -425,47 +425,11 @@ EXPORT_SYMBOL(ns_to_timeval); * * We must also be careful about 32-bit overflows. */ -unsigned long msecs_to_jiffies(const unsigned int m) +unsigned long __msecs_to_jiffies(const unsigned int m) { - /* -* Negative value, means infinite timeout: -*/ - if ((int)m < 0) - return MAX_JIFFY_OFFSET; - -#if HZ <= MSEC_PER_SEC && !(MSEC_PER_SEC % HZ) - /* -* HZ is equal to or smaller than 1000, and 1000 is a nice -* round multiple of HZ, divide with the factor between them, -* but round upwards: -*/ - return (m + (MSEC_PER_SEC / HZ) - 1) / (MSEC_PER_SEC / HZ); -#elif HZ > MSEC_PER_SEC && !(HZ % MSEC_PER_SEC) - /* -* HZ is larger than 1000, and HZ is a nice round multiple of -* 1000 - simply multiply with the factor between them. -* -* But first make sure the multiplication result cannot -* overflow: -*/ - if (m > jiffies_to_msecs(MAX_JIFFY_OFFSET)) - return MAX_JIFFY_OFFSET; - - return m * (HZ / MSEC_PER_SEC); -#else - /* -* Generic case - multiply, round and divide. But first -* check that if we are doing a net multiplication, that -* we wouldn't overflow: -*/ - if (HZ > MSEC_PER_SEC && m > jiffies_to_msecs(MAX_JIFFY_OFFSET)) - return MAX_JIFFY_OFFSET; - - return (MSEC_TO_HZ_MUL32 * m + MSEC_TO_HZ_ADJ32) - >> MSEC_TO_HZ_SHR32; -#endif + return __inline_msecs_to_
Re: [PATCH v2 net-next] softirq: reduce latencies
On Fri, 2013-01-04 at 00:15 -0800, Joe Perches wrote: > On Thu, 2013-01-03 at 23:49 -0800, Eric Dumazet wrote: > > In various network workloads, __do_softirq() latencies can be up > > to 20 ms if HZ=1000, and 200 ms if HZ=100. > > This patch changes the fallback to ksoftirqd condition to : > > - A time limit of 2 ms. > > [] > > diff --git a/kernel/softirq.c b/kernel/softirq.c > [] > > +#define MAX_SOFTIRQ_TIME max(1, (2*HZ/1000)) > > And if HZ is 1? > Then its OK. 2*1/1000 -> 20 ticks -> 2 ms > > asmlinkage void __do_softirq(void) > > { > [] > > + unsigned long end = jiffies + MAX_SOFTIRQ_TIME; > > Perhaps MAX_SOFTIRQ_TIME should be > > #define MAX_SOFTIRQ_TIME msecs_to_jiffies(2) > > though it would be nicer if it were a compile time constant. If you send a patch to convert msecs_to_jiffies() to an inline function when HZ = 1000, I will gladly use it instead of (2*HZ/1000) Right now, max(1, msecs_to_jiffies(2)) uses way too many instructions, while it should be the constant 2, known at compile time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 net-next] softirq: reduce latencies
On Thu, 2013-01-03 at 23:49 -0800, Eric Dumazet wrote: > In various network workloads, __do_softirq() latencies can be up > to 20 ms if HZ=1000, and 200 ms if HZ=100. > This patch changes the fallback to ksoftirqd condition to : > - A time limit of 2 ms. [] > diff --git a/kernel/softirq.c b/kernel/softirq.c [] > +#define MAX_SOFTIRQ_TIME max(1, (2*HZ/1000)) And if HZ is 1? > asmlinkage void __do_softirq(void) > { [] > + unsigned long end = jiffies + MAX_SOFTIRQ_TIME; Perhaps MAX_SOFTIRQ_TIME should be #define MAX_SOFTIRQ_TIME msecs_to_jiffies(2) though it would be nicer if it were a compile time constant. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 net-next] softirq: reduce latencies
From: Eric Dumazet In various network workloads, __do_softirq() latencies can be up to 20 ms if HZ=1000, and 200 ms if HZ=100. This is because we iterate 10 times in the softirq dispatcher, and some actions can consume a lot of cycles. This patch changes the fallback to ksoftirqd condition to : - A time limit of 2 ms. - need_resched() being set on current task When one of this condition is met, we wakeup ksoftirqd for further softirq processing if we still have pending softirqs. Using need_resched() as the only condition can trigger RCU stalls, as we can keep BH disabled for too long. I ran several benchmarks and got no significant difference in throughput, but a very significant reduction of latencies (one order of magnitude) : In following bench, 200 antagonist "netperf -t TCP_RR" are started in background, using all available cpus. Then we start one "netperf -t TCP_RR", bound to the cpu handling the NIC IRQ (hard+soft) Before patch : # netperf -H 7.7.7.84 -t TCP_RR -T2,2 -- -k RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.7.84 () port 0 AF_INET : first burst 0 : cpu bind RT_LATENCY=550110.424 MIN_LATENCY=146858 MAX_LATENCY=997109 P50_LATENCY=305000 P90_LATENCY=55 P99_LATENCY=71 MEAN_LATENCY=376989.12 STDDEV_LATENCY=184046.92 After patch : # netperf -H 7.7.7.84 -t TCP_RR -T2,2 -- -k RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.7.84 () port 0 AF_INET : first burst 0 : cpu bind RT_LATENCY=40545.492 MIN_LATENCY=9834 MAX_LATENCY=78366 P50_LATENCY=33583 P90_LATENCY=59000 P99_LATENCY=69000 MEAN_LATENCY=38364.67 STDDEV_LATENCY=12865.26 Signed-off-by: Eric Dumazet Cc: David Miller Cc: Tom Herbert Cc: Ben Hutchings --- v2: min(1, (2*HZ/1000)) -> max(1, (2*HZ/1000)), as spotted by Ben kernel/softirq.c | 17 + 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/kernel/softirq.c b/kernel/softirq.c index ed567ba..8d5e4be 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -195,21 +195,21 @@ void local_bh_enable_ip(unsigned long ip) EXPORT_SYMBOL(local_bh_enable_ip); /* - * We restart softirq processing MAX_SOFTIRQ_RESTART times, - * and we fall back to softirqd after that. + * We restart softirq processing for at most 2 ms, + * and if need_resched() is not set. * - * This number has been established via experimentation. + * These limits have been established via experimentation. * The two things to balance is latency against fairness - * we want to handle softirqs as soon as possible, but they * should not be able to lock up the box. */ -#define MAX_SOFTIRQ_RESTART 10 +#define MAX_SOFTIRQ_TIME max(1, (2*HZ/1000)) asmlinkage void __do_softirq(void) { struct softirq_action *h; __u32 pending; - int max_restart = MAX_SOFTIRQ_RESTART; + unsigned long end = jiffies + MAX_SOFTIRQ_TIME; int cpu; unsigned long old_flags = current->flags; @@ -264,11 +264,12 @@ restart: local_irq_disable(); pending = local_softirq_pending(); - if (pending && --max_restart) - goto restart; + if (pending) { + if (time_before(jiffies, end) && !need_resched()) + goto restart; - if (pending) wakeup_softirqd(); + } lockdep_softirq_exit(); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/