While trying to find a target for a non-pinned timer, use the following logic:
- Use the closest (from a sched domain POV) busy CPU that is not full dynticks - If none, use the closest idle CPU that is not full dynticks. So this is biased toward isolation over powersaving. This is a quick hack until we provide a way for the user to tune that policy. A CPU mask affinity for non pinned timers could be such a solution. Original-patch-by: Thomas Gleixner <t...@linutronix.de> Signed-off-by: Frederic Weisbecker <fweis...@gmail.com> Cc: Alessio Igor Bogani <abog...@kernel.org> Cc: Andrew Morton <a...@linux-foundation.org> Cc: Chris Metcalf <cmetc...@tilera.com> Cc: Christoph Lameter <c...@linux.com> Cc: Geoff Levand <ge...@infradead.org> Cc: Gilad Ben Yossef <gi...@benyossef.com> Cc: Hakan Akkan <hakanak...@gmail.com> Cc: Ingo Molnar <mi...@kernel.org> Cc: Paul E. McKenney <paul...@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortma...@windriver.com> Cc: Peter Zijlstra <pet...@infradead.org> Cc: Steven Rostedt <rost...@goodmis.org> Cc: Thomas Gleixner <t...@linutronix.de> --- kernel/hrtimer.c | 3 ++- kernel/sched/core.c | 26 +++++++++++++++++++++++--- kernel/timer.c | 3 ++- 3 files changed, 27 insertions(+), 5 deletions(-) diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c index 6db7a5e..f5da6fb 100644 --- a/kernel/hrtimer.c +++ b/kernel/hrtimer.c @@ -159,7 +159,8 @@ struct hrtimer_clock_base *lock_hrtimer_base(const struct hrtimer *timer, static int hrtimer_get_target(int this_cpu, int pinned) { #ifdef CONFIG_NO_HZ - if (!pinned && get_sysctl_timer_migration() && idle_cpu(this_cpu)) + if (!pinned && get_sysctl_timer_migration() && + (idle_cpu(this_cpu) || tick_nohz_full_cpu(this_cpu))) return get_nohz_timer_target(); #endif return this_cpu; diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 7b6156a..e2884c5 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -560,22 +560,42 @@ void resched_cpu(int cpu) */ int get_nohz_timer_target(void) { - int cpu = smp_processor_id(); int i; struct sched_domain *sd; + int cpu = smp_processor_id(); + int target = -1; rcu_read_lock(); for_each_domain(cpu, sd) { for_each_cpu(i, sched_domain_span(sd)) { + /* + * This is biased toward CPU isolation usecase: + * try to migrate the timer to a busy non-full-nohz + * CPU. If there is none, then prefer an idle CPU + * than a full nohz one. + * We shouldn't do policy here (isolation VS powersaving) + * so this is a temporary hack. Being able to affine + * non-pinned timers would be a better thing. + */ + if (tick_nohz_full_cpu(i)) + continue; + if (!idle_cpu(i)) { - cpu = i; + target = i; goto unlock; } + + if (target == -1) + target = i; } } + /* Fallback in case of NULL domain */ + if (target == -1) + target = cpu; unlock: rcu_read_unlock(); - return cpu; + + return target; } /* * When add_timer_on() enqueues a timer into the timer wheel of an diff --git a/kernel/timer.c b/kernel/timer.c index 970b57d..51dd02b 100644 --- a/kernel/timer.c +++ b/kernel/timer.c @@ -738,7 +738,8 @@ __mod_timer(struct timer_list *timer, unsigned long expires, cpu = smp_processor_id(); #if defined(CONFIG_NO_HZ) && defined(CONFIG_SMP) - if (!pinned && get_sysctl_timer_migration() && idle_cpu(cpu)) + if (!pinned && get_sysctl_timer_migration() && + (idle_cpu(cpu) || tick_nohz_full_cpu(cpu))) cpu = get_nohz_timer_target(); #endif new_base = per_cpu(tvec_bases, cpu); -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/