[tip:locking/urgent] locking/rtmutex: Remove unnecessary priority adjustment
Commit-ID: 69f0d429c413fe96db2c187475cebcc6e3a8c7f5 Gitweb: http://git.kernel.org/tip/69f0d429c413fe96db2c187475cebcc6e3a8c7f5 Author: Alex ShiAuthorDate: Thu, 13 Jul 2017 14:18:24 +0800 Committer: Ingo Molnar CommitDate: Thu, 13 Jul 2017 11:44:06 +0200 locking/rtmutex: Remove unnecessary priority adjustment We don't need to adjust priority before adding a new pi_waiter, the priority only needs to be updated after pi_waiter change or task priority change. Steven Rostedt pointed out: "Interesting, I did some git mining and this was added with the original entry of the rtmutex.c (23f78d4a03c5). Looking at even that version, I don't see the purpose of adjusting the task prio here. It is done before anything changes in the task." Signed-off-by: Alex Shi Reviewed-by: Steven Rostedt (VMware) Acked-by: Peter Zijlstra (Intel) Cc: Juri Lelli Cc: Linus Torvalds Cc: Mathieu Poirier Cc: Sebastian Siewior Cc: Steven Rostedt Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/1499926704-28841-1-git-send-email-alex@linaro.org [ Enhance the changelog. ] Signed-off-by: Ingo Molnar --- kernel/locking/rtmutex.c | 1 - 1 file changed, 1 deletion(-) diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c index 7806989..649dc9d 100644 --- a/kernel/locking/rtmutex.c +++ b/kernel/locking/rtmutex.c @@ -963,7 +963,6 @@ static int task_blocks_on_rt_mutex(struct rt_mutex *lock, return -EDEADLK; raw_spin_lock(>pi_lock); - rt_mutex_adjust_prio(task); waiter->task = task; waiter->lock = lock; waiter->prio = task->prio;
[tip:locking/urgent] locking/rtmutex: Remove unnecessary priority adjustment
Commit-ID: 69f0d429c413fe96db2c187475cebcc6e3a8c7f5 Gitweb: http://git.kernel.org/tip/69f0d429c413fe96db2c187475cebcc6e3a8c7f5 Author: Alex Shi AuthorDate: Thu, 13 Jul 2017 14:18:24 +0800 Committer: Ingo Molnar CommitDate: Thu, 13 Jul 2017 11:44:06 +0200 locking/rtmutex: Remove unnecessary priority adjustment We don't need to adjust priority before adding a new pi_waiter, the priority only needs to be updated after pi_waiter change or task priority change. Steven Rostedt pointed out: "Interesting, I did some git mining and this was added with the original entry of the rtmutex.c (23f78d4a03c5). Looking at even that version, I don't see the purpose of adjusting the task prio here. It is done before anything changes in the task." Signed-off-by: Alex Shi Reviewed-by: Steven Rostedt (VMware) Acked-by: Peter Zijlstra (Intel) Cc: Juri Lelli Cc: Linus Torvalds Cc: Mathieu Poirier Cc: Sebastian Siewior Cc: Steven Rostedt Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/1499926704-28841-1-git-send-email-alex@linaro.org [ Enhance the changelog. ] Signed-off-by: Ingo Molnar --- kernel/locking/rtmutex.c | 1 - 1 file changed, 1 deletion(-) diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c index 7806989..649dc9d 100644 --- a/kernel/locking/rtmutex.c +++ b/kernel/locking/rtmutex.c @@ -963,7 +963,6 @@ static int task_blocks_on_rt_mutex(struct rt_mutex *lock, return -EDEADLK; raw_spin_lock(>pi_lock); - rt_mutex_adjust_prio(task); waiter->task = task; waiter->lock = lock; waiter->prio = task->prio;
[tip:sched/core] sched: Clean up the task_hot() function
Commit-ID: 6037dd1a49f95092824fa8ba75c717ff7805e317 Gitweb: http://git.kernel.org/tip/6037dd1a49f95092824fa8ba75c717ff7805e317 Author: Alex Shi AuthorDate: Wed, 12 Mar 2014 14:51:51 +0800 Committer: Ingo Molnar CommitDate: Wed, 12 Mar 2014 10:49:01 +0100 sched: Clean up the task_hot() function task_hot() doesn't need the 'sched_domain' parameter, so remove it. Signed-off-by: Alex Shi Signed-off-by: Peter Zijlstra Link: http://lkml.kernel.org/r/1394607111-1904-1-git-send-email-alex@linaro.org Signed-off-by: Ingo Molnar --- kernel/sched/fair.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index b301918..7e9bd0b 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5037,7 +5037,7 @@ static void move_task(struct task_struct *p, struct lb_env *env) * Is this task likely cache-hot: */ static int -task_hot(struct task_struct *p, u64 now, struct sched_domain *sd) +task_hot(struct task_struct *p, u64 now) { s64 delta; @@ -5198,7 +5198,7 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env) * 2) task is cache cold, or * 3) too many balance attempts have failed. */ - tsk_cache_hot = task_hot(p, rq_clock_task(env->src_rq), env->sd); + tsk_cache_hot = task_hot(p, rq_clock_task(env->src_rq)); if (!tsk_cache_hot) tsk_cache_hot = migrate_degrades_locality(p, env); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:sched/core] sched: Clean up the task_hot() function
Commit-ID: 6037dd1a49f95092824fa8ba75c717ff7805e317 Gitweb: http://git.kernel.org/tip/6037dd1a49f95092824fa8ba75c717ff7805e317 Author: Alex Shi alex@linaro.org AuthorDate: Wed, 12 Mar 2014 14:51:51 +0800 Committer: Ingo Molnar mi...@kernel.org CommitDate: Wed, 12 Mar 2014 10:49:01 +0100 sched: Clean up the task_hot() function task_hot() doesn't need the 'sched_domain' parameter, so remove it. Signed-off-by: Alex Shi alex@linaro.org Signed-off-by: Peter Zijlstra pet...@infradead.org Link: http://lkml.kernel.org/r/1394607111-1904-1-git-send-email-alex@linaro.org Signed-off-by: Ingo Molnar mi...@kernel.org --- kernel/sched/fair.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index b301918..7e9bd0b 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5037,7 +5037,7 @@ static void move_task(struct task_struct *p, struct lb_env *env) * Is this task likely cache-hot: */ static int -task_hot(struct task_struct *p, u64 now, struct sched_domain *sd) +task_hot(struct task_struct *p, u64 now) { s64 delta; @@ -5198,7 +5198,7 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env) * 2) task is cache cold, or * 3) too many balance attempts have failed. */ - tsk_cache_hot = task_hot(p, rq_clock_task(env-src_rq), env-sd); + tsk_cache_hot = task_hot(p, rq_clock_task(env-src_rq)); if (!tsk_cache_hot) tsk_cache_hot = migrate_degrades_locality(p, env); -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:sched/core] sched: Add statistic for newidle load balance cost
Commit-ID: 37e6bae8395a94b4dd934c92b02b9408be992365 Gitweb: http://git.kernel.org/tip/37e6bae8395a94b4dd934c92b02b9408be992365 Author: Alex Shi AuthorDate: Thu, 23 Jan 2014 18:39:54 +0800 Committer: Ingo Molnar CommitDate: Tue, 11 Feb 2014 09:58:18 +0100 sched: Add statistic for newidle load balance cost Tracking rq->max_idle_balance_cost and sd->max_newidle_lb_cost. It's useful to know these values in debug mode. Signed-off-by: Alex Shi Signed-off-by: Peter Zijlstra Link: http://lkml.kernel.org/r/52e0f3bf.5020...@linaro.org Signed-off-by: Ingo Molnar --- kernel/sched/core.c | 9 ++--- kernel/sched/debug.c | 1 + 2 files changed, 7 insertions(+), 3 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 3068f37..fb9764f 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4811,7 +4811,7 @@ set_table_entry(struct ctl_table *entry, static struct ctl_table * sd_alloc_ctl_domain_table(struct sched_domain *sd) { - struct ctl_table *table = sd_alloc_ctl_entry(13); + struct ctl_table *table = sd_alloc_ctl_entry(14); if (table == NULL) return NULL; @@ -4839,9 +4839,12 @@ sd_alloc_ctl_domain_table(struct sched_domain *sd) sizeof(int), 0644, proc_dointvec_minmax, false); set_table_entry([10], "flags", >flags, sizeof(int), 0644, proc_dointvec_minmax, false); - set_table_entry([11], "name", sd->name, + set_table_entry([11], "max_newidle_lb_cost", + >max_newidle_lb_cost, + sizeof(long), 0644, proc_doulongvec_minmax, false); + set_table_entry([12], "name", sd->name, CORENAME_MAX_SIZE, 0444, proc_dostring, false); - /* [12] is terminator */ + /* [13] is terminator */ return table; } diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c index 31b908d..f3344c3 100644 --- a/kernel/sched/debug.c +++ b/kernel/sched/debug.c @@ -321,6 +321,7 @@ do { \ P(sched_goidle); #ifdef CONFIG_SMP P64(avg_idle); + P64(max_idle_balance_cost); #endif P(ttwu_count); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:sched/core] sched: Add statistic for newidle load balance cost
Commit-ID: 37e6bae8395a94b4dd934c92b02b9408be992365 Gitweb: http://git.kernel.org/tip/37e6bae8395a94b4dd934c92b02b9408be992365 Author: Alex Shi alex@linaro.org AuthorDate: Thu, 23 Jan 2014 18:39:54 +0800 Committer: Ingo Molnar mi...@kernel.org CommitDate: Tue, 11 Feb 2014 09:58:18 +0100 sched: Add statistic for newidle load balance cost Tracking rq-max_idle_balance_cost and sd-max_newidle_lb_cost. It's useful to know these values in debug mode. Signed-off-by: Alex Shi alex@linaro.org Signed-off-by: Peter Zijlstra pet...@infradead.org Link: http://lkml.kernel.org/r/52e0f3bf.5020...@linaro.org Signed-off-by: Ingo Molnar mi...@kernel.org --- kernel/sched/core.c | 9 ++--- kernel/sched/debug.c | 1 + 2 files changed, 7 insertions(+), 3 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 3068f37..fb9764f 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4811,7 +4811,7 @@ set_table_entry(struct ctl_table *entry, static struct ctl_table * sd_alloc_ctl_domain_table(struct sched_domain *sd) { - struct ctl_table *table = sd_alloc_ctl_entry(13); + struct ctl_table *table = sd_alloc_ctl_entry(14); if (table == NULL) return NULL; @@ -4839,9 +4839,12 @@ sd_alloc_ctl_domain_table(struct sched_domain *sd) sizeof(int), 0644, proc_dointvec_minmax, false); set_table_entry(table[10], flags, sd-flags, sizeof(int), 0644, proc_dointvec_minmax, false); - set_table_entry(table[11], name, sd-name, + set_table_entry(table[11], max_newidle_lb_cost, + sd-max_newidle_lb_cost, + sizeof(long), 0644, proc_doulongvec_minmax, false); + set_table_entry(table[12], name, sd-name, CORENAME_MAX_SIZE, 0444, proc_dostring, false); - /* table[12] is terminator */ + /* table[13] is terminator */ return table; } diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c index 31b908d..f3344c3 100644 --- a/kernel/sched/debug.c +++ b/kernel/sched/debug.c @@ -321,6 +321,7 @@ do { \ P(sched_goidle); #ifdef CONFIG_SMP P64(avg_idle); + P64(max_idle_balance_cost); #endif P(ttwu_count); -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:timers/urgent] nohz_full: fix code style issue of tick_nohz_full_stop_tick
Commit-ID: e9a2eb403bd953788cd2abfd0d2646d43bd22671 Gitweb: http://git.kernel.org/tip/e9a2eb403bd953788cd2abfd0d2646d43bd22671 Author: Alex Shi AuthorDate: Thu, 28 Nov 2013 14:27:11 +0800 Committer: Frederic Weisbecker CommitDate: Wed, 15 Jan 2014 23:07:11 +0100 nohz_full: fix code style issue of tick_nohz_full_stop_tick Code usually starts with 'tab' instead of 7 'space' in kernel Signed-off-by: Alex Shi Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Alex Shi Cc: Steven Rostedt Cc: Paul E. McKenney Cc: John Stultz Cc: Kevin Hilman Link: http://lkml.kernel.org/r/1386074112-30754-2-git-send-email-alex@linaro.org Signed-off-by: Frederic Weisbecker --- kernel/time/tick-sched.c | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index 68331d1..d603bad 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -679,18 +679,18 @@ out: static void tick_nohz_full_stop_tick(struct tick_sched *ts) { #ifdef CONFIG_NO_HZ_FULL - int cpu = smp_processor_id(); + int cpu = smp_processor_id(); - if (!tick_nohz_full_cpu(cpu) || is_idle_task(current)) - return; + if (!tick_nohz_full_cpu(cpu) || is_idle_task(current)) + return; - if (!ts->tick_stopped && ts->nohz_mode == NOHZ_MODE_INACTIVE) - return; + if (!ts->tick_stopped && ts->nohz_mode == NOHZ_MODE_INACTIVE) + return; - if (!can_stop_full_tick()) - return; + if (!can_stop_full_tick()) + return; - tick_nohz_stop_sched_tick(ts, ktime_get(), cpu); + tick_nohz_stop_sched_tick(ts, ktime_get(), cpu); #endif } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:timers/urgent] nohz_full: fix code style issue of tick_nohz_full_stop_tick
Commit-ID: e9a2eb403bd953788cd2abfd0d2646d43bd22671 Gitweb: http://git.kernel.org/tip/e9a2eb403bd953788cd2abfd0d2646d43bd22671 Author: Alex Shi alex@linaro.org AuthorDate: Thu, 28 Nov 2013 14:27:11 +0800 Committer: Frederic Weisbecker fweis...@gmail.com CommitDate: Wed, 15 Jan 2014 23:07:11 +0100 nohz_full: fix code style issue of tick_nohz_full_stop_tick Code usually starts with 'tab' instead of 7 'space' in kernel Signed-off-by: Alex Shi alex@linaro.org Cc: Thomas Gleixner t...@linutronix.de Cc: Ingo Molnar mi...@kernel.org Cc: Peter Zijlstra pet...@infradead.org Cc: Alex Shi alex@linaro.org Cc: Steven Rostedt rost...@goodmis.org Cc: Paul E. McKenney paul...@linux.vnet.ibm.com Cc: John Stultz john.stu...@linaro.org Cc: Kevin Hilman khil...@linaro.org Link: http://lkml.kernel.org/r/1386074112-30754-2-git-send-email-alex@linaro.org Signed-off-by: Frederic Weisbecker fweis...@gmail.com --- kernel/time/tick-sched.c | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index 68331d1..d603bad 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -679,18 +679,18 @@ out: static void tick_nohz_full_stop_tick(struct tick_sched *ts) { #ifdef CONFIG_NO_HZ_FULL - int cpu = smp_processor_id(); + int cpu = smp_processor_id(); - if (!tick_nohz_full_cpu(cpu) || is_idle_task(current)) - return; + if (!tick_nohz_full_cpu(cpu) || is_idle_task(current)) + return; - if (!ts-tick_stopped ts-nohz_mode == NOHZ_MODE_INACTIVE) - return; + if (!ts-tick_stopped ts-nohz_mode == NOHZ_MODE_INACTIVE) + return; - if (!can_stop_full_tick()) - return; + if (!can_stop_full_tick()) + return; - tick_nohz_stop_sched_tick(ts, ktime_get(), cpu); + tick_nohz_stop_sched_tick(ts, ktime_get(), cpu); #endif } -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:sched/urgent] sched: Remove unused variable in ' struct sched_domain'
Commit-ID: b972fc308c2763096b61b62169f2167ee0ca5a19 Gitweb: http://git.kernel.org/tip/b972fc308c2763096b61b62169f2167ee0ca5a19 Author: Alex Shi AuthorDate: Tue, 19 Nov 2013 17:21:52 +0800 Committer: Ingo Molnar CommitDate: Tue, 19 Nov 2013 17:01:17 +0100 sched: Remove unused variable in 'struct sched_domain' The 'u64 last_update' variable isn't used now, remove it to save a bit of space. Signed-off-by: Alex Shi Signed-off-by: Peter Zijlstra Cc: morten.rasmus...@arm.com Cc: linaro-ker...@lists.linaro.org Link: http://lkml.kernel.org/r/1384852912-24791-1-git-send-email-alex@linaro.org Signed-off-by: Ingo Molnar --- include/linux/sched.h | 2 -- 1 file changed, 2 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index f7efc86..b122395 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -823,8 +823,6 @@ struct sched_domain { unsigned int balance_interval; /* initialise to 1. units in ms. */ unsigned int nr_balance_failed; /* initialise to 0 */ - u64 last_update; - /* idle_balance() stats */ u64 max_newidle_lb_cost; unsigned long next_decay_max_lb_cost; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:sched/urgent] sched: Remove unused variable in ' struct sched_domain'
Commit-ID: b972fc308c2763096b61b62169f2167ee0ca5a19 Gitweb: http://git.kernel.org/tip/b972fc308c2763096b61b62169f2167ee0ca5a19 Author: Alex Shi alex@linaro.org AuthorDate: Tue, 19 Nov 2013 17:21:52 +0800 Committer: Ingo Molnar mi...@kernel.org CommitDate: Tue, 19 Nov 2013 17:01:17 +0100 sched: Remove unused variable in 'struct sched_domain' The 'u64 last_update' variable isn't used now, remove it to save a bit of space. Signed-off-by: Alex Shi alex@linaro.org Signed-off-by: Peter Zijlstra pet...@infradead.org Cc: morten.rasmus...@arm.com Cc: linaro-ker...@lists.linaro.org Link: http://lkml.kernel.org/r/1384852912-24791-1-git-send-email-alex@linaro.org Signed-off-by: Ingo Molnar mi...@kernel.org --- include/linux/sched.h | 2 -- 1 file changed, 2 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index f7efc86..b122395 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -823,8 +823,6 @@ struct sched_domain { unsigned int balance_interval; /* initialise to 1. units in ms. */ unsigned int nr_balance_failed; /* initialise to 0 */ - u64 last_update; - /* idle_balance() stats */ u64 max_newidle_lb_cost; unsigned long next_decay_max_lb_cost; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:sched/core] sched/debug: Remove CONFIG_FAIR_GROUP_SCHED mask
Commit-ID: 333bb864f192015a53b5060b829089decd0220ef Gitweb: http://git.kernel.org/tip/333bb864f192015a53b5060b829089decd0220ef Author: Alex Shi AuthorDate: Fri, 28 Jun 2013 19:10:35 +0800 Committer: Ingo Molnar CommitDate: Fri, 28 Jun 2013 13:17:17 +0200 sched/debug: Remove CONFIG_FAIR_GROUP_SCHED mask Now that we are using runnable load avg in sched balance, we don't need to keep it under CONFIG_FAIR_GROUP_SCHED. Also align the code style to #ifdef instead of #if defined() and reorder the tg output info. Signed-off-by: Alex Shi Cc: p...@google.com Cc: kamal...@linux.vnet.ibm.com Cc: pet...@infradead.org Link: http://lkml.kernel.org/r/1372417835-4698-1-git-send-email-alex@intel.com Signed-off-by: Ingo Molnar --- kernel/sched/debug.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c index 1595614..e076bdd 100644 --- a/kernel/sched/debug.c +++ b/kernel/sched/debug.c @@ -209,22 +209,24 @@ void print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq) cfs_rq->nr_spread_over); SEQ_printf(m, " .%-30s: %d\n", "nr_running", cfs_rq->nr_running); SEQ_printf(m, " .%-30s: %ld\n", "load", cfs_rq->load.weight); -#ifdef CONFIG_FAIR_GROUP_SCHED #ifdef CONFIG_SMP SEQ_printf(m, " .%-30s: %ld\n", "runnable_load_avg", cfs_rq->runnable_load_avg); SEQ_printf(m, " .%-30s: %ld\n", "blocked_load_avg", cfs_rq->blocked_load_avg); - SEQ_printf(m, " .%-30s: %ld\n", "tg_load_avg", - atomic_long_read(_rq->tg->load_avg)); +#ifdef CONFIG_FAIR_GROUP_SCHED SEQ_printf(m, " .%-30s: %ld\n", "tg_load_contrib", cfs_rq->tg_load_contrib); SEQ_printf(m, " .%-30s: %d\n", "tg_runnable_contrib", cfs_rq->tg_runnable_contrib); + SEQ_printf(m, " .%-30s: %ld\n", "tg_load_avg", + atomic_long_read(_rq->tg->load_avg)); SEQ_printf(m, " .%-30s: %d\n", "tg->runnable_avg", atomic_read(_rq->tg->runnable_avg)); #endif +#endif +#ifdef CONFIG_FAIR_GROUP_SCHED print_cfs_group_stats(m, cpu, cfs_rq->tg); #endif } @@ -567,7 +569,7 @@ void proc_sched_show_task(struct task_struct *p, struct seq_file *m) "nr_involuntary_switches", (long long)p->nivcsw); P(se.load.weight); -#if defined(CONFIG_SMP) && defined(CONFIG_FAIR_GROUP_SCHED) +#ifdef CONFIG_SMP P(se.avg.runnable_avg_sum); P(se.avg.runnable_avg_period); P(se.avg.load_avg_contrib); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:sched/core] sched/debug: Remove CONFIG_FAIR_GROUP_SCHED mask
Commit-ID: 333bb864f192015a53b5060b829089decd0220ef Gitweb: http://git.kernel.org/tip/333bb864f192015a53b5060b829089decd0220ef Author: Alex Shi alex@intel.com AuthorDate: Fri, 28 Jun 2013 19:10:35 +0800 Committer: Ingo Molnar mi...@kernel.org CommitDate: Fri, 28 Jun 2013 13:17:17 +0200 sched/debug: Remove CONFIG_FAIR_GROUP_SCHED mask Now that we are using runnable load avg in sched balance, we don't need to keep it under CONFIG_FAIR_GROUP_SCHED. Also align the code style to #ifdef instead of #if defined() and reorder the tg output info. Signed-off-by: Alex Shi alex@intel.com Cc: p...@google.com Cc: kamal...@linux.vnet.ibm.com Cc: pet...@infradead.org Link: http://lkml.kernel.org/r/1372417835-4698-1-git-send-email-alex@intel.com Signed-off-by: Ingo Molnar mi...@kernel.org --- kernel/sched/debug.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c index 1595614..e076bdd 100644 --- a/kernel/sched/debug.c +++ b/kernel/sched/debug.c @@ -209,22 +209,24 @@ void print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq) cfs_rq-nr_spread_over); SEQ_printf(m, .%-30s: %d\n, nr_running, cfs_rq-nr_running); SEQ_printf(m, .%-30s: %ld\n, load, cfs_rq-load.weight); -#ifdef CONFIG_FAIR_GROUP_SCHED #ifdef CONFIG_SMP SEQ_printf(m, .%-30s: %ld\n, runnable_load_avg, cfs_rq-runnable_load_avg); SEQ_printf(m, .%-30s: %ld\n, blocked_load_avg, cfs_rq-blocked_load_avg); - SEQ_printf(m, .%-30s: %ld\n, tg_load_avg, - atomic_long_read(cfs_rq-tg-load_avg)); +#ifdef CONFIG_FAIR_GROUP_SCHED SEQ_printf(m, .%-30s: %ld\n, tg_load_contrib, cfs_rq-tg_load_contrib); SEQ_printf(m, .%-30s: %d\n, tg_runnable_contrib, cfs_rq-tg_runnable_contrib); + SEQ_printf(m, .%-30s: %ld\n, tg_load_avg, + atomic_long_read(cfs_rq-tg-load_avg)); SEQ_printf(m, .%-30s: %d\n, tg-runnable_avg, atomic_read(cfs_rq-tg-runnable_avg)); #endif +#endif +#ifdef CONFIG_FAIR_GROUP_SCHED print_cfs_group_stats(m, cpu, cfs_rq-tg); #endif } @@ -567,7 +569,7 @@ void proc_sched_show_task(struct task_struct *p, struct seq_file *m) nr_involuntary_switches, (long long)p-nivcsw); P(se.load.weight); -#if defined(CONFIG_SMP) defined(CONFIG_FAIR_GROUP_SCHED) +#ifdef CONFIG_SMP P(se.avg.runnable_avg_sum); P(se.avg.runnable_avg_period); P(se.avg.load_avg_contrib); -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:sched/core] Revert "sched: Introduce temporary FAIR_GROUP_SCHED dependency for load-tracking"
Commit-ID: 141965c7494d984b2bf24efd361a3125278869c6 Gitweb: http://git.kernel.org/tip/141965c7494d984b2bf24efd361a3125278869c6 Author: Alex Shi AuthorDate: Wed, 26 Jun 2013 13:05:39 +0800 Committer: Ingo Molnar CommitDate: Thu, 27 Jun 2013 10:07:22 +0200 Revert "sched: Introduce temporary FAIR_GROUP_SCHED dependency for load-tracking" Remove CONFIG_FAIR_GROUP_SCHED that covers the runnable info, then we can use runnable load variables. Also remove 2 CONFIG_FAIR_GROUP_SCHED setting which is not in reverted patch(introduced in 9ee474f), but also need to revert. Signed-off-by: Alex Shi Signed-off-by: Peter Zijlstra Link: http://lkml.kernel.org/r/51ca76a3.3050...@intel.com Signed-off-by: Ingo Molnar --- include/linux/sched.h | 7 +-- kernel/sched/core.c | 7 +-- kernel/sched/fair.c | 17 - kernel/sched/sched.h | 19 ++- 4 files changed, 8 insertions(+), 42 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 178a8d9..0019bef 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -994,12 +994,7 @@ struct sched_entity { struct cfs_rq *my_q; #endif -/* - * Load-tracking only depends on SMP, FAIR_GROUP_SCHED dependency below may be - * removed when useful for applications beyond shares distribution (e.g. - * load-balance). - */ -#if defined(CONFIG_SMP) && defined(CONFIG_FAIR_GROUP_SCHED) +#ifdef CONFIG_SMP /* Per-entity load-tracking */ struct sched_avgavg; #endif diff --git a/kernel/sched/core.c b/kernel/sched/core.c index ceeaf0f..0241b1b 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1611,12 +1611,7 @@ static void __sched_fork(struct task_struct *p) p->se.vruntime = 0; INIT_LIST_HEAD(>se.group_node); -/* - * Load-tracking only depends on SMP, FAIR_GROUP_SCHED dependency below may be - * removed when useful for applications beyond shares distribution (e.g. - * load-balance). - */ -#if defined(CONFIG_SMP) && defined(CONFIG_FAIR_GROUP_SCHED) +#ifdef CONFIG_SMP p->se.avg.runnable_avg_period = 0; p->se.avg.runnable_avg_sum = 0; #endif diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index c0ac2c3..36eadaa 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1128,8 +1128,7 @@ static inline void update_cfs_shares(struct cfs_rq *cfs_rq) } #endif /* CONFIG_FAIR_GROUP_SCHED */ -/* Only depends on SMP, FAIR_GROUP_SCHED may be removed when useful in lb */ -#if defined(CONFIG_SMP) && defined(CONFIG_FAIR_GROUP_SCHED) +#ifdef CONFIG_SMP /* * We choose a half-life close to 1 scheduling period. * Note: The tables below are dependent on this value. @@ -3431,12 +3430,6 @@ unlock: } /* - * Load-tracking only depends on SMP, FAIR_GROUP_SCHED dependency below may be - * removed when useful for applications beyond shares distribution (e.g. - * load-balance). - */ -#ifdef CONFIG_FAIR_GROUP_SCHED -/* * Called immediately before a task is migrated to a new cpu; task_cpu(p) and * cfs_rq_of(p) references at time of call are still valid and identify the * previous cpu. However, the caller only guarantees p->pi_lock is held; no @@ -3459,7 +3452,6 @@ migrate_task_rq_fair(struct task_struct *p, int next_cpu) atomic64_add(se->avg.load_avg_contrib, _rq->removed_load); } } -#endif #endif /* CONFIG_SMP */ static unsigned long @@ -5861,7 +5853,7 @@ static void switched_from_fair(struct rq *rq, struct task_struct *p) se->vruntime -= cfs_rq->min_vruntime; } -#if defined(CONFIG_FAIR_GROUP_SCHED) && defined(CONFIG_SMP) +#ifdef CONFIG_SMP /* * Remove our load from contribution when we leave sched_fair * and ensure we don't carry in an old decay_count if we @@ -5920,7 +5912,7 @@ void init_cfs_rq(struct cfs_rq *cfs_rq) #ifndef CONFIG_64BIT cfs_rq->min_vruntime_copy = cfs_rq->min_vruntime; #endif -#if defined(CONFIG_FAIR_GROUP_SCHED) && defined(CONFIG_SMP) +#ifdef CONFIG_SMP atomic64_set(_rq->decay_counter, 1); atomic64_set(_rq->removed_load, 0); #endif @@ -6162,9 +6154,8 @@ const struct sched_class fair_sched_class = { #ifdef CONFIG_SMP .select_task_rq = select_task_rq_fair, -#ifdef CONFIG_FAIR_GROUP_SCHED .migrate_task_rq= migrate_task_rq_fair, -#endif + .rq_online = rq_online_fair, .rq_offline = rq_offline_fair, diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 029601a..77ce668 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -269,12 +269,6 @@ struct cfs_rq { #endif #ifdef CONFIG_SMP -/* - * Load-tracking only depends on SMP, FAIR_GROUP_SCHED dependency below may be - * removed when useful for applications beyond shares distribution (e.g. - * load-balance). - */ -#ifdef CONFIG_FAIR_GROUP_SCHED /* * CFS Load tracking * Under CFS, load is tracked on a per-entity
[tip:sched/core] sched: Set an initial value of runnable avg for new forked task
Commit-ID: a75cdaa915e42ef0e6f38dc7f2a6a1deca91d648 Gitweb: http://git.kernel.org/tip/a75cdaa915e42ef0e6f38dc7f2a6a1deca91d648 Author: Alex Shi AuthorDate: Thu, 20 Jun 2013 10:18:47 +0800 Committer: Ingo Molnar CommitDate: Thu, 27 Jun 2013 10:07:30 +0200 sched: Set an initial value of runnable avg for new forked task We need to initialize the se.avg.{decay_count, load_avg_contrib} for a new forked task. Otherwise random values of above variables cause a mess when a new task is enqueued: enqueue_task_fair enqueue_entity enqueue_entity_load_avg and make fork balancing imbalance due to incorrect load_avg_contrib. Further more, Morten Rasmussen notice some tasks were not launched at once after created. So Paul and Peter suggest giving a start value for new task runnable avg time same as sched_slice(). PeterZ said: > So the 'problem' is that our running avg is a 'floating' average; ie. it > decays with time. Now we have to guess about the future of our newly > spawned task -- something that is nigh impossible seeing these CPU > vendors keep refusing to implement the crystal ball instruction. > > So there's two asymptotic cases we want to deal well with; 1) the case > where the newly spawned program will be 'nearly' idle for its lifetime; > and 2) the case where its cpu-bound. > > Since we have to guess, we'll go for worst case and assume its > cpu-bound; now we don't want to make the avg so heavy adjusting to the > near-idle case takes forever. We want to be able to quickly adjust and > lower our running avg. > > Now we also don't want to make our avg too light, such that it gets > decremented just for the new task not having had a chance to run yet -- > even if when it would run, it would be more cpu-bound than not. > > So what we do is we make the initial avg of the same duration as that we > guess it takes to run each task on the system at least once -- aka > sched_slice(). > > Of course we can defeat this with wakeup/fork bombs, but in the 'normal' > case it should be good enough. Paul also contributed most of the code comments in this commit. Signed-off-by: Alex Shi Reviewed-by: Gu Zheng Reviewed-by: Paul Turner [peterz; added explanation of sched_slice() usage] Signed-off-by: Peter Zijlstra Link: http://lkml.kernel.org/r/1371694737-29336-4-git-send-email-alex@intel.com Signed-off-by: Ingo Molnar --- kernel/sched/core.c | 6 ++ kernel/sched/fair.c | 24 kernel/sched/sched.h | 2 ++ 3 files changed, 28 insertions(+), 4 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 0241b1b..729e7fc 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1611,10 +1611,6 @@ static void __sched_fork(struct task_struct *p) p->se.vruntime = 0; INIT_LIST_HEAD(>se.group_node); -#ifdef CONFIG_SMP - p->se.avg.runnable_avg_period = 0; - p->se.avg.runnable_avg_sum = 0; -#endif #ifdef CONFIG_SCHEDSTATS memset(>se.statistics, 0, sizeof(p->se.statistics)); #endif @@ -1758,6 +1754,8 @@ void wake_up_new_task(struct task_struct *p) set_task_cpu(p, select_task_rq(p, SD_BALANCE_FORK, 0)); #endif + /* Initialize new task's runnable average */ + init_task_runnable_average(p); rq = __task_rq_lock(p); activate_task(rq, p, 0); p->on_rq = 1; diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 36eadaa..e1602a0 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -680,6 +680,26 @@ static u64 sched_vslice(struct cfs_rq *cfs_rq, struct sched_entity *se) return calc_delta_fair(sched_slice(cfs_rq, se), se); } +#ifdef CONFIG_SMP +static inline void __update_task_entity_contrib(struct sched_entity *se); + +/* Give new task start runnable values to heavy its load in infant time */ +void init_task_runnable_average(struct task_struct *p) +{ + u32 slice; + + p->se.avg.decay_count = 0; + slice = sched_slice(task_cfs_rq(p), >se) >> 10; + p->se.avg.runnable_avg_sum = slice; + p->se.avg.runnable_avg_period = slice; + __update_task_entity_contrib(>se); +} +#else +void init_task_runnable_average(struct task_struct *p) +{ +} +#endif + /* * Update the current task's runtime statistics. Skip current tasks that * are not in our scheduling class. @@ -1527,6 +1547,10 @@ static inline void enqueue_entity_load_avg(struct cfs_rq *cfs_rq, * We track migrations using entity decay_count <= 0, on a wake-up * migration we use a negative decay count to track the remote decays * accumulated while sleeping. +* +* Newly forked tasks are enqueued with se->avg.decay_count == 0, they +* are seen by enqueue_entity_load_avg() as a migration with an already +* constructed load_avg_contrib. */ if (unlikely(se->avg.decay_count <= 0)) { se->avg.last_runnable_update = rq_clock_task(rq_of(cfs_rq)); diff --git
[tip:sched/core] sched: Update cpu load after task_tick
Commit-ID: 83dfd5235ebd66c284b97befe6eabff7132333e6 Gitweb: http://git.kernel.org/tip/83dfd5235ebd66c284b97befe6eabff7132333e6 Author: Alex Shi AuthorDate: Thu, 20 Jun 2013 10:18:49 +0800 Committer: Ingo Molnar CommitDate: Thu, 27 Jun 2013 10:07:33 +0200 sched: Update cpu load after task_tick To get the latest runnable info, we need do this cpuload update after task_tick. Signed-off-by: Alex Shi Reviewed-by: Paul Turner Signed-off-by: Peter Zijlstra Link: http://lkml.kernel.org/r/1371694737-29336-6-git-send-email-alex@intel.com Signed-off-by: Ingo Molnar --- kernel/sched/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 729e7fc..08746cc 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2165,8 +2165,8 @@ void scheduler_tick(void) raw_spin_lock(>lock); update_rq_clock(rq); - update_cpu_load_active(rq); curr->sched_class->task_tick(rq, curr, 0); + update_cpu_load_active(rq); raw_spin_unlock(>lock); perf_event_task_tick(); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:sched/core] sched: Change get_rq_runnable_load() to static and inline
Commit-ID: a9dc5d0e33c677619e4b97a38c23db1a42857905 Gitweb: http://git.kernel.org/tip/a9dc5d0e33c677619e4b97a38c23db1a42857905 Author: Alex Shi AuthorDate: Thu, 20 Jun 2013 10:18:57 +0800 Committer: Ingo Molnar CommitDate: Thu, 27 Jun 2013 10:07:44 +0200 sched: Change get_rq_runnable_load() to static and inline Based-on-patch-by: Fengguang Wu Signed-off-by: Alex Shi Tested-by: Vincent Guittot Signed-off-by: Peter Zijlstra Link: http://lkml.kernel.org/r/1371694737-29336-14-git-send-email-alex@intel.com Signed-off-by: Ingo Molnar --- kernel/sched/proc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/sched/proc.c b/kernel/sched/proc.c index ce5cd48..16f5a30 100644 --- a/kernel/sched/proc.c +++ b/kernel/sched/proc.c @@ -502,12 +502,12 @@ static void __update_cpu_load(struct rq *this_rq, unsigned long this_load, } #ifdef CONFIG_SMP -unsigned long get_rq_runnable_load(struct rq *rq) +static inline unsigned long get_rq_runnable_load(struct rq *rq) { return rq->cfs.runnable_load_avg; } #else -unsigned long get_rq_runnable_load(struct rq *rq) +static inline unsigned long get_rq_runnable_load(struct rq *rq) { return rq->load.weight; } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:sched/core] sched/tg: Use 'unsigned long' for load variable in task group
Commit-ID: bf5b986ed4d20428eeec3df4a03dbfebb9b6538c Gitweb: http://git.kernel.org/tip/bf5b986ed4d20428eeec3df4a03dbfebb9b6538c Author: Alex Shi AuthorDate: Thu, 20 Jun 2013 10:18:54 +0800 Committer: Ingo Molnar CommitDate: Thu, 27 Jun 2013 10:07:40 +0200 sched/tg: Use 'unsigned long' for load variable in task group Since tg->load_avg is smaller than tg->load_weight, we don't need a atomic64_t variable for load_avg in 32 bit machine. The same reason for cfs_rq->tg_load_contrib. The atomic_long_t/unsigned long variable type are more efficient and convenience for them. Signed-off-by: Alex Shi Tested-by: Vincent Guittot Signed-off-by: Peter Zijlstra Link: http://lkml.kernel.org/r/1371694737-29336-11-git-send-email-alex@intel.com Signed-off-by: Ingo Molnar --- kernel/sched/debug.c | 6 +++--- kernel/sched/fair.c | 12 ++-- kernel/sched/sched.h | 4 ++-- 3 files changed, 11 insertions(+), 11 deletions(-) diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c index 160afdc..d803989 100644 --- a/kernel/sched/debug.c +++ b/kernel/sched/debug.c @@ -215,9 +215,9 @@ void print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq) cfs_rq->runnable_load_avg); SEQ_printf(m, " .%-30s: %ld\n", "blocked_load_avg", cfs_rq->blocked_load_avg); - SEQ_printf(m, " .%-30s: %lld\n", "tg_load_avg", - (unsigned long long)atomic64_read(_rq->tg->load_avg)); - SEQ_printf(m, " .%-30s: %lld\n", "tg_load_contrib", + SEQ_printf(m, " .%-30s: %ld\n", "tg_load_avg", + atomic_long_read(_rq->tg->load_avg)); + SEQ_printf(m, " .%-30s: %ld\n", "tg_load_contrib", cfs_rq->tg_load_contrib); SEQ_printf(m, " .%-30s: %d\n", "tg_runnable_contrib", cfs_rq->tg_runnable_contrib); diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index f19772d..30ccc37 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1075,7 +1075,7 @@ static inline long calc_tg_weight(struct task_group *tg, struct cfs_rq *cfs_rq) * to gain a more accurate current total weight. See * update_cfs_rq_load_contribution(). */ - tg_weight = atomic64_read(>load_avg); + tg_weight = atomic_long_read(>load_avg); tg_weight -= cfs_rq->tg_load_contrib; tg_weight += cfs_rq->load.weight; @@ -1356,13 +1356,13 @@ static inline void __update_cfs_rq_tg_load_contrib(struct cfs_rq *cfs_rq, int force_update) { struct task_group *tg = cfs_rq->tg; - s64 tg_contrib; + long tg_contrib; tg_contrib = cfs_rq->runnable_load_avg + cfs_rq->blocked_load_avg; tg_contrib -= cfs_rq->tg_load_contrib; - if (force_update || abs64(tg_contrib) > cfs_rq->tg_load_contrib / 8) { - atomic64_add(tg_contrib, >load_avg); + if (force_update || abs(tg_contrib) > cfs_rq->tg_load_contrib / 8) { + atomic_long_add(tg_contrib, >load_avg); cfs_rq->tg_load_contrib += tg_contrib; } } @@ -1397,8 +1397,8 @@ static inline void __update_group_entity_contrib(struct sched_entity *se) u64 contrib; contrib = cfs_rq->tg_load_contrib * tg->shares; - se->avg.load_avg_contrib = div64_u64(contrib, -atomic64_read(>load_avg) + 1); + se->avg.load_avg_contrib = div_u64(contrib, +atomic_long_read(>load_avg) + 1); /* * For group entities we need to compute a correction term in the case diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 9eb12d9..5585eb2 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -150,7 +150,7 @@ struct task_group { atomic_t load_weight; #ifdef CONFIG_SMP - atomic64_t load_avg; + atomic_long_t load_avg; atomic_t runnable_avg; #endif #endif @@ -284,7 +284,7 @@ struct cfs_rq { #ifdef CONFIG_FAIR_GROUP_SCHED /* Required to track per-cpu representation of a task_group */ u32 tg_runnable_contrib; - u64 tg_load_contrib; + unsigned long tg_load_contrib; #endif /* CONFIG_FAIR_GROUP_SCHED */ /* -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:sched/core] sched: Change cfs_rq load avg to unsigned long
Commit-ID: 72a4cf20cb71a327c636c7042fdacc25abffc87c Gitweb: http://git.kernel.org/tip/72a4cf20cb71a327c636c7042fdacc25abffc87c Author: Alex Shi AuthorDate: Thu, 20 Jun 2013 10:18:53 +0800 Committer: Ingo Molnar CommitDate: Thu, 27 Jun 2013 10:07:38 +0200 sched: Change cfs_rq load avg to unsigned long Since the 'u64 runnable_load_avg, blocked_load_avg' in cfs_rq struct are smaller than 'unsigned long' cfs_rq->load.weight. We don't need u64 vaiables to describe them. unsigned long is more efficient and convenience. Signed-off-by: Alex Shi Reviewed-by: Paul Turner Tested-by: Vincent Guittot Signed-off-by: Peter Zijlstra Link: http://lkml.kernel.org/r/1371694737-29336-10-git-send-email-alex@intel.com Signed-off-by: Ingo Molnar --- kernel/sched/debug.c | 4 ++-- kernel/sched/fair.c | 7 ++- kernel/sched/sched.h | 2 +- 3 files changed, 5 insertions(+), 8 deletions(-) diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c index 75024a6..160afdc 100644 --- a/kernel/sched/debug.c +++ b/kernel/sched/debug.c @@ -211,9 +211,9 @@ void print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq) SEQ_printf(m, " .%-30s: %ld\n", "load", cfs_rq->load.weight); #ifdef CONFIG_FAIR_GROUP_SCHED #ifdef CONFIG_SMP - SEQ_printf(m, " .%-30s: %lld\n", "runnable_load_avg", + SEQ_printf(m, " .%-30s: %ld\n", "runnable_load_avg", cfs_rq->runnable_load_avg); - SEQ_printf(m, " .%-30s: %lld\n", "blocked_load_avg", + SEQ_printf(m, " .%-30s: %ld\n", "blocked_load_avg", cfs_rq->blocked_load_avg); SEQ_printf(m, " .%-30s: %lld\n", "tg_load_avg", (unsigned long long)atomic64_read(_rq->tg->load_avg)); diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 7948bb8..f19772d 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4181,12 +4181,9 @@ static int tg_load_down(struct task_group *tg, void *data) if (!tg->parent) { load = cpu_rq(cpu)->avg.load_avg_contrib; } else { - unsigned long tmp_rla; - tmp_rla = tg->parent->cfs_rq[cpu]->runnable_load_avg + 1; - load = tg->parent->cfs_rq[cpu]->h_load; - load *= tg->se[cpu]->avg.load_avg_contrib; - load /= tmp_rla; + load = div64_ul(load * tg->se[cpu]->avg.load_avg_contrib, + tg->parent->cfs_rq[cpu]->runnable_load_avg + 1); } tg->cfs_rq[cpu]->h_load = load; diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 9c65d46..9eb12d9 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -277,7 +277,7 @@ struct cfs_rq { * This allows for the description of both thread and group usage (in * the FAIR_GROUP_SCHED case). */ - u64 runnable_load_avg, blocked_load_avg; + unsigned long runnable_load_avg, blocked_load_avg; atomic64_t decay_counter, removed_load; u64 last_decay; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:sched/core] sched/tg: Remove tg.load_weight
Commit-ID: a9cef46a10cc1b84bf2cdf4060766d858c0439d8 Gitweb: http://git.kernel.org/tip/a9cef46a10cc1b84bf2cdf4060766d858c0439d8 Author: Alex Shi AuthorDate: Thu, 20 Jun 2013 10:18:56 +0800 Committer: Ingo Molnar CommitDate: Thu, 27 Jun 2013 10:07:43 +0200 sched/tg: Remove tg.load_weight Since no one use it. Signed-off-by: Alex Shi Reviewed-by: Paul Turner Tested-by: Vincent Guittot Signed-off-by: Peter Zijlstra Link: http://lkml.kernel.org/r/1371694737-29336-13-git-send-email-alex@intel.com Signed-off-by: Ingo Molnar --- kernel/sched/sched.h | 1 - 1 file changed, 1 deletion(-) diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 7059919..ef0a7b2 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -148,7 +148,6 @@ struct task_group { struct cfs_rq **cfs_rq; unsigned long shares; - atomic_t load_weight; #ifdef CONFIG_SMP atomic_long_t load_avg; atomic_t runnable_avg; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:sched/core] sched/cfs_rq: Change atomic64_t removed_load to atomic_long_t
Commit-ID: 2509940fd71c2e2915a05052bbdbf2d478364184 Gitweb: http://git.kernel.org/tip/2509940fd71c2e2915a05052bbdbf2d478364184 Author: Alex Shi AuthorDate: Thu, 20 Jun 2013 10:18:55 +0800 Committer: Ingo Molnar CommitDate: Thu, 27 Jun 2013 10:07:41 +0200 sched/cfs_rq: Change atomic64_t removed_load to atomic_long_t Similar to runnable_load_avg, blocked_load_avg variable, long type is enough for removed_load in 64 bit or 32 bit machine. Then we avoid the expensive atomic64 operations on 32 bit machine. Signed-off-by: Alex Shi Reviewed-by: Paul Turner Tested-by: Vincent Guittot Signed-off-by: Peter Zijlstra Link: http://lkml.kernel.org/r/1371694737-29336-12-git-send-email-alex@intel.com Signed-off-by: Ingo Molnar --- kernel/sched/fair.c | 10 ++ kernel/sched/sched.h | 3 ++- 2 files changed, 8 insertions(+), 5 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 30ccc37..b43474a 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1517,8 +1517,9 @@ static void update_cfs_rq_blocked_load(struct cfs_rq *cfs_rq, int force_update) if (!decays && !force_update) return; - if (atomic64_read(_rq->removed_load)) { - u64 removed_load = atomic64_xchg(_rq->removed_load, 0); + if (atomic_long_read(_rq->removed_load)) { + unsigned long removed_load; + removed_load = atomic_long_xchg(_rq->removed_load, 0); subtract_blocked_load_contrib(cfs_rq, removed_load); } @@ -3480,7 +3481,8 @@ migrate_task_rq_fair(struct task_struct *p, int next_cpu) */ if (se->avg.decay_count) { se->avg.decay_count = -__synchronize_entity_decay(se); - atomic64_add(se->avg.load_avg_contrib, _rq->removed_load); + atomic_long_add(se->avg.load_avg_contrib, + _rq->removed_load); } } #endif /* CONFIG_SMP */ @@ -5942,7 +5944,7 @@ void init_cfs_rq(struct cfs_rq *cfs_rq) #endif #ifdef CONFIG_SMP atomic64_set(_rq->decay_counter, 1); - atomic64_set(_rq->removed_load, 0); + atomic_long_set(_rq->removed_load, 0); #endif } diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 5585eb2..7059919 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -278,8 +278,9 @@ struct cfs_rq { * the FAIR_GROUP_SCHED case). */ unsigned long runnable_load_avg, blocked_load_avg; - atomic64_t decay_counter, removed_load; + atomic64_t decay_counter; u64 last_decay; + atomic_long_t removed_load; #ifdef CONFIG_FAIR_GROUP_SCHED /* Required to track per-cpu representation of a task_group */ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:sched/core] sched: Consider runnable load average in move_tasks()
Commit-ID: a003a25b227d59ded9197ced109517f037d01c27 Gitweb: http://git.kernel.org/tip/a003a25b227d59ded9197ced109517f037d01c27 Author: Alex Shi AuthorDate: Thu, 20 Jun 2013 10:18:51 +0800 Committer: Ingo Molnar CommitDate: Thu, 27 Jun 2013 10:07:36 +0200 sched: Consider runnable load average in move_tasks() Aside from using runnable load average in background, move_tasks is also the key function in load balance. We need consider the runnable load average in it in order to make it an apple to apple load comparison. Morten had caught a div u64 bug on ARM, thanks! Thanks-to: Morten Rasmussen Signed-off-by: Alex Shi Signed-off-by: Peter Zijlstra Link: http://lkml.kernel.org/r/1371694737-29336-8-git-send-email-alex@intel.com Signed-off-by: Ingo Molnar --- kernel/sched/fair.c | 18 +- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index e6d82ca..7948bb8 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4179,11 +4179,14 @@ static int tg_load_down(struct task_group *tg, void *data) long cpu = (long)data; if (!tg->parent) { - load = cpu_rq(cpu)->load.weight; + load = cpu_rq(cpu)->avg.load_avg_contrib; } else { + unsigned long tmp_rla; + tmp_rla = tg->parent->cfs_rq[cpu]->runnable_load_avg + 1; + load = tg->parent->cfs_rq[cpu]->h_load; - load *= tg->se[cpu]->load.weight; - load /= tg->parent->cfs_rq[cpu]->load.weight + 1; + load *= tg->se[cpu]->avg.load_avg_contrib; + load /= tmp_rla; } tg->cfs_rq[cpu]->h_load = load; @@ -4209,12 +4212,9 @@ static void update_h_load(long cpu) static unsigned long task_h_load(struct task_struct *p) { struct cfs_rq *cfs_rq = task_cfs_rq(p); - unsigned long load; - - load = p->se.load.weight; - load = div_u64(load * cfs_rq->h_load, cfs_rq->load.weight + 1); - return load; + return div64_ul(p->se.avg.load_avg_contrib * cfs_rq->h_load, + cfs_rq->runnable_load_avg + 1); } #else static inline void update_blocked_averages(int cpu) @@ -4227,7 +4227,7 @@ static inline void update_h_load(long cpu) static unsigned long task_h_load(struct task_struct *p) { - return p->se.load.weight; + return p->se.avg.load_avg_contrib; } #endif -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:sched/core] sched: Fix sleep time double accounting in enqueue entity
Commit-ID: 282cf499f03ec1754b6c8c945c9674b02631fb0f Gitweb: http://git.kernel.org/tip/282cf499f03ec1754b6c8c945c9674b02631fb0f Author: Alex Shi AuthorDate: Thu, 20 Jun 2013 10:18:48 +0800 Committer: Ingo Molnar CommitDate: Thu, 27 Jun 2013 10:07:32 +0200 sched: Fix sleep time double accounting in enqueue entity The woken migrated task will __synchronize_entity_decay(se); in migrate_task_rq_fair, then it needs to set `se->avg.last_runnable_update -= (-se->avg.decay_count) << 20' before update_entity_load_avg, in order to avoid sleep time is updated twice for se.avg.load_avg_contrib in both __syncchronize and update_entity_load_avg. However if the sleeping task is woken up from the same cpu, it miss the last_runnable_update before update_entity_load_avg(se, 0, 1), then the sleep time was used twice in both functions. So we need to remove the double sleep time accounting. Paul also contributed some code comments in this commit. Signed-off-by: Alex Shi Reviewed-by: Paul Turner Signed-off-by: Peter Zijlstra Link: http://lkml.kernel.org/r/1371694737-29336-5-git-send-email-alex@intel.com Signed-off-by: Ingo Molnar --- kernel/sched/fair.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index e1602a0..9bbc303 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1571,7 +1571,13 @@ static inline void enqueue_entity_load_avg(struct cfs_rq *cfs_rq, } wakeup = 0; } else { - __synchronize_entity_decay(se); + /* +* Task re-woke on same cpu (or else migrate_task_rq_fair() +* would have made count negative); we must be careful to avoid +* double-accounting blocked time after synchronizing decays. +*/ + se->avg.last_runnable_update += __synchronize_entity_decay(se) + << 20; } /* migrated tasks did not contribute to our blocked load */ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:sched/core] sched: Compute runnable load avg in cpu_load and cpu_avg_load_per_task
Commit-ID: b92486cbf2aa230d00f160664858495c81d2b37b Gitweb: http://git.kernel.org/tip/b92486cbf2aa230d00f160664858495c81d2b37b Author: Alex Shi AuthorDate: Thu, 20 Jun 2013 10:18:50 +0800 Committer: Ingo Molnar CommitDate: Thu, 27 Jun 2013 10:07:35 +0200 sched: Compute runnable load avg in cpu_load and cpu_avg_load_per_task They are the base values in load balance, update them with rq runnable load average, then the load balance will consider runnable load avg naturally. We also try to include the blocked_load_avg as cpu load in balancing, but that cause kbuild performance drop 6% on every Intel machine, and aim7/oltp drop on some of 4 CPU sockets machines. Or only add blocked_load_avg into get_rq_runable_load, hackbench still drop a little on NHM EX. Signed-off-by: Alex Shi Reviewed-by: Gu Zheng Signed-off-by: Peter Zijlstra Link: http://lkml.kernel.org/r/1371694737-29336-7-git-send-email-alex@intel.com Signed-off-by: Ingo Molnar --- kernel/sched/fair.c | 5 +++-- kernel/sched/proc.c | 17 +++-- 2 files changed, 18 insertions(+), 4 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 9bbc303..e6d82ca 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -2963,7 +2963,7 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags) /* Used instead of source_load when we know the type == 0 */ static unsigned long weighted_cpuload(const int cpu) { - return cpu_rq(cpu)->load.weight; + return cpu_rq(cpu)->cfs.runnable_load_avg; } /* @@ -3008,9 +3008,10 @@ static unsigned long cpu_avg_load_per_task(int cpu) { struct rq *rq = cpu_rq(cpu); unsigned long nr_running = ACCESS_ONCE(rq->nr_running); + unsigned long load_avg = rq->cfs.runnable_load_avg; if (nr_running) - return rq->load.weight / nr_running; + return load_avg / nr_running; return 0; } diff --git a/kernel/sched/proc.c b/kernel/sched/proc.c index bb3a6a0..ce5cd48 100644 --- a/kernel/sched/proc.c +++ b/kernel/sched/proc.c @@ -501,6 +501,18 @@ static void __update_cpu_load(struct rq *this_rq, unsigned long this_load, sched_avg_update(this_rq); } +#ifdef CONFIG_SMP +unsigned long get_rq_runnable_load(struct rq *rq) +{ + return rq->cfs.runnable_load_avg; +} +#else +unsigned long get_rq_runnable_load(struct rq *rq) +{ + return rq->load.weight; +} +#endif + #ifdef CONFIG_NO_HZ_COMMON /* * There is no sane way to deal with nohz on smp when using jiffies because the @@ -522,7 +534,7 @@ static void __update_cpu_load(struct rq *this_rq, unsigned long this_load, void update_idle_cpu_load(struct rq *this_rq) { unsigned long curr_jiffies = ACCESS_ONCE(jiffies); - unsigned long load = this_rq->load.weight; + unsigned long load = get_rq_runnable_load(this_rq); unsigned long pending_updates; /* @@ -568,11 +580,12 @@ void update_cpu_load_nohz(void) */ void update_cpu_load_active(struct rq *this_rq) { + unsigned long load = get_rq_runnable_load(this_rq); /* * See the mess around update_idle_cpu_load() / update_cpu_load_nohz(). */ this_rq->last_load_update_tick = jiffies; - __update_cpu_load(this_rq, this_rq->load.weight, 1); + __update_cpu_load(this_rq, load, 1); calc_load_account_active(this_rq); } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:sched/core] sched: Move a few runnable tg variables into CONFIG_SMP
Commit-ID: fa6bddeb14d59d701f846b174b59c9982e926e66 Gitweb: http://git.kernel.org/tip/fa6bddeb14d59d701f846b174b59c9982e926e66 Author: Alex Shi AuthorDate: Thu, 20 Jun 2013 10:18:46 +0800 Committer: Ingo Molnar CommitDate: Thu, 27 Jun 2013 10:07:29 +0200 sched: Move a few runnable tg variables into CONFIG_SMP The following 2 variables are only used under CONFIG_SMP, so its better to move their definiation into CONFIG_SMP too. atomic64_t load_avg; atomic_t runnable_avg; Signed-off-by: Alex Shi Signed-off-by: Peter Zijlstra Link: http://lkml.kernel.org/r/1371694737-29336-3-git-send-email-alex@intel.com Signed-off-by: Ingo Molnar --- kernel/sched/sched.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 77ce668..31d25f8 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -149,9 +149,11 @@ struct task_group { unsigned long shares; atomic_t load_weight; +#ifdef CONFIG_SMP atomic64_t load_avg; atomic_t runnable_avg; #endif +#endif #ifdef CONFIG_RT_GROUP_SCHED struct sched_rt_entity **rt_se; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:sched/core] sched: Compute runnable load avg in cpu_load and cpu_avg_load_per_task
Commit-ID: b92486cbf2aa230d00f160664858495c81d2b37b Gitweb: http://git.kernel.org/tip/b92486cbf2aa230d00f160664858495c81d2b37b Author: Alex Shi alex@intel.com AuthorDate: Thu, 20 Jun 2013 10:18:50 +0800 Committer: Ingo Molnar mi...@kernel.org CommitDate: Thu, 27 Jun 2013 10:07:35 +0200 sched: Compute runnable load avg in cpu_load and cpu_avg_load_per_task They are the base values in load balance, update them with rq runnable load average, then the load balance will consider runnable load avg naturally. We also try to include the blocked_load_avg as cpu load in balancing, but that cause kbuild performance drop 6% on every Intel machine, and aim7/oltp drop on some of 4 CPU sockets machines. Or only add blocked_load_avg into get_rq_runable_load, hackbench still drop a little on NHM EX. Signed-off-by: Alex Shi alex@intel.com Reviewed-by: Gu Zheng guz.f...@cn.fujitsu.com Signed-off-by: Peter Zijlstra pet...@infradead.org Link: http://lkml.kernel.org/r/1371694737-29336-7-git-send-email-alex@intel.com Signed-off-by: Ingo Molnar mi...@kernel.org --- kernel/sched/fair.c | 5 +++-- kernel/sched/proc.c | 17 +++-- 2 files changed, 18 insertions(+), 4 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 9bbc303..e6d82ca 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -2963,7 +2963,7 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags) /* Used instead of source_load when we know the type == 0 */ static unsigned long weighted_cpuload(const int cpu) { - return cpu_rq(cpu)-load.weight; + return cpu_rq(cpu)-cfs.runnable_load_avg; } /* @@ -3008,9 +3008,10 @@ static unsigned long cpu_avg_load_per_task(int cpu) { struct rq *rq = cpu_rq(cpu); unsigned long nr_running = ACCESS_ONCE(rq-nr_running); + unsigned long load_avg = rq-cfs.runnable_load_avg; if (nr_running) - return rq-load.weight / nr_running; + return load_avg / nr_running; return 0; } diff --git a/kernel/sched/proc.c b/kernel/sched/proc.c index bb3a6a0..ce5cd48 100644 --- a/kernel/sched/proc.c +++ b/kernel/sched/proc.c @@ -501,6 +501,18 @@ static void __update_cpu_load(struct rq *this_rq, unsigned long this_load, sched_avg_update(this_rq); } +#ifdef CONFIG_SMP +unsigned long get_rq_runnable_load(struct rq *rq) +{ + return rq-cfs.runnable_load_avg; +} +#else +unsigned long get_rq_runnable_load(struct rq *rq) +{ + return rq-load.weight; +} +#endif + #ifdef CONFIG_NO_HZ_COMMON /* * There is no sane way to deal with nohz on smp when using jiffies because the @@ -522,7 +534,7 @@ static void __update_cpu_load(struct rq *this_rq, unsigned long this_load, void update_idle_cpu_load(struct rq *this_rq) { unsigned long curr_jiffies = ACCESS_ONCE(jiffies); - unsigned long load = this_rq-load.weight; + unsigned long load = get_rq_runnable_load(this_rq); unsigned long pending_updates; /* @@ -568,11 +580,12 @@ void update_cpu_load_nohz(void) */ void update_cpu_load_active(struct rq *this_rq) { + unsigned long load = get_rq_runnable_load(this_rq); /* * See the mess around update_idle_cpu_load() / update_cpu_load_nohz(). */ this_rq-last_load_update_tick = jiffies; - __update_cpu_load(this_rq, this_rq-load.weight, 1); + __update_cpu_load(this_rq, load, 1); calc_load_account_active(this_rq); } -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:sched/core] sched: Fix sleep time double accounting in enqueue entity
Commit-ID: 282cf499f03ec1754b6c8c945c9674b02631fb0f Gitweb: http://git.kernel.org/tip/282cf499f03ec1754b6c8c945c9674b02631fb0f Author: Alex Shi alex@intel.com AuthorDate: Thu, 20 Jun 2013 10:18:48 +0800 Committer: Ingo Molnar mi...@kernel.org CommitDate: Thu, 27 Jun 2013 10:07:32 +0200 sched: Fix sleep time double accounting in enqueue entity The woken migrated task will __synchronize_entity_decay(se); in migrate_task_rq_fair, then it needs to set `se-avg.last_runnable_update -= (-se-avg.decay_count) 20' before update_entity_load_avg, in order to avoid sleep time is updated twice for se.avg.load_avg_contrib in both __syncchronize and update_entity_load_avg. However if the sleeping task is woken up from the same cpu, it miss the last_runnable_update before update_entity_load_avg(se, 0, 1), then the sleep time was used twice in both functions. So we need to remove the double sleep time accounting. Paul also contributed some code comments in this commit. Signed-off-by: Alex Shi alex@intel.com Reviewed-by: Paul Turner p...@google.com Signed-off-by: Peter Zijlstra pet...@infradead.org Link: http://lkml.kernel.org/r/1371694737-29336-5-git-send-email-alex@intel.com Signed-off-by: Ingo Molnar mi...@kernel.org --- kernel/sched/fair.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index e1602a0..9bbc303 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1571,7 +1571,13 @@ static inline void enqueue_entity_load_avg(struct cfs_rq *cfs_rq, } wakeup = 0; } else { - __synchronize_entity_decay(se); + /* +* Task re-woke on same cpu (or else migrate_task_rq_fair() +* would have made count negative); we must be careful to avoid +* double-accounting blocked time after synchronizing decays. +*/ + se-avg.last_runnable_update += __synchronize_entity_decay(se) +20; } /* migrated tasks did not contribute to our blocked load */ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:sched/core] sched: Move a few runnable tg variables into CONFIG_SMP
Commit-ID: fa6bddeb14d59d701f846b174b59c9982e926e66 Gitweb: http://git.kernel.org/tip/fa6bddeb14d59d701f846b174b59c9982e926e66 Author: Alex Shi alex@intel.com AuthorDate: Thu, 20 Jun 2013 10:18:46 +0800 Committer: Ingo Molnar mi...@kernel.org CommitDate: Thu, 27 Jun 2013 10:07:29 +0200 sched: Move a few runnable tg variables into CONFIG_SMP The following 2 variables are only used under CONFIG_SMP, so its better to move their definiation into CONFIG_SMP too. atomic64_t load_avg; atomic_t runnable_avg; Signed-off-by: Alex Shi alex@intel.com Signed-off-by: Peter Zijlstra pet...@infradead.org Link: http://lkml.kernel.org/r/1371694737-29336-3-git-send-email-alex@intel.com Signed-off-by: Ingo Molnar mi...@kernel.org --- kernel/sched/sched.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 77ce668..31d25f8 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -149,9 +149,11 @@ struct task_group { unsigned long shares; atomic_t load_weight; +#ifdef CONFIG_SMP atomic64_t load_avg; atomic_t runnable_avg; #endif +#endif #ifdef CONFIG_RT_GROUP_SCHED struct sched_rt_entity **rt_se; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:sched/core] sched: Consider runnable load average in move_tasks()
Commit-ID: a003a25b227d59ded9197ced109517f037d01c27 Gitweb: http://git.kernel.org/tip/a003a25b227d59ded9197ced109517f037d01c27 Author: Alex Shi alex@intel.com AuthorDate: Thu, 20 Jun 2013 10:18:51 +0800 Committer: Ingo Molnar mi...@kernel.org CommitDate: Thu, 27 Jun 2013 10:07:36 +0200 sched: Consider runnable load average in move_tasks() Aside from using runnable load average in background, move_tasks is also the key function in load balance. We need consider the runnable load average in it in order to make it an apple to apple load comparison. Morten had caught a div u64 bug on ARM, thanks! Thanks-to: Morten Rasmussen morten.rasmus...@arm.com Signed-off-by: Alex Shi alex@intel.com Signed-off-by: Peter Zijlstra pet...@infradead.org Link: http://lkml.kernel.org/r/1371694737-29336-8-git-send-email-alex@intel.com Signed-off-by: Ingo Molnar mi...@kernel.org --- kernel/sched/fair.c | 18 +- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index e6d82ca..7948bb8 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4179,11 +4179,14 @@ static int tg_load_down(struct task_group *tg, void *data) long cpu = (long)data; if (!tg-parent) { - load = cpu_rq(cpu)-load.weight; + load = cpu_rq(cpu)-avg.load_avg_contrib; } else { + unsigned long tmp_rla; + tmp_rla = tg-parent-cfs_rq[cpu]-runnable_load_avg + 1; + load = tg-parent-cfs_rq[cpu]-h_load; - load *= tg-se[cpu]-load.weight; - load /= tg-parent-cfs_rq[cpu]-load.weight + 1; + load *= tg-se[cpu]-avg.load_avg_contrib; + load /= tmp_rla; } tg-cfs_rq[cpu]-h_load = load; @@ -4209,12 +4212,9 @@ static void update_h_load(long cpu) static unsigned long task_h_load(struct task_struct *p) { struct cfs_rq *cfs_rq = task_cfs_rq(p); - unsigned long load; - - load = p-se.load.weight; - load = div_u64(load * cfs_rq-h_load, cfs_rq-load.weight + 1); - return load; + return div64_ul(p-se.avg.load_avg_contrib * cfs_rq-h_load, + cfs_rq-runnable_load_avg + 1); } #else static inline void update_blocked_averages(int cpu) @@ -4227,7 +4227,7 @@ static inline void update_h_load(long cpu) static unsigned long task_h_load(struct task_struct *p) { - return p-se.load.weight; + return p-se.avg.load_avg_contrib; } #endif -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:sched/core] sched: Change cfs_rq load avg to unsigned long
Commit-ID: 72a4cf20cb71a327c636c7042fdacc25abffc87c Gitweb: http://git.kernel.org/tip/72a4cf20cb71a327c636c7042fdacc25abffc87c Author: Alex Shi alex@intel.com AuthorDate: Thu, 20 Jun 2013 10:18:53 +0800 Committer: Ingo Molnar mi...@kernel.org CommitDate: Thu, 27 Jun 2013 10:07:38 +0200 sched: Change cfs_rq load avg to unsigned long Since the 'u64 runnable_load_avg, blocked_load_avg' in cfs_rq struct are smaller than 'unsigned long' cfs_rq-load.weight. We don't need u64 vaiables to describe them. unsigned long is more efficient and convenience. Signed-off-by: Alex Shi alex@intel.com Reviewed-by: Paul Turner p...@google.com Tested-by: Vincent Guittot vincent.guit...@linaro.org Signed-off-by: Peter Zijlstra pet...@infradead.org Link: http://lkml.kernel.org/r/1371694737-29336-10-git-send-email-alex@intel.com Signed-off-by: Ingo Molnar mi...@kernel.org --- kernel/sched/debug.c | 4 ++-- kernel/sched/fair.c | 7 ++- kernel/sched/sched.h | 2 +- 3 files changed, 5 insertions(+), 8 deletions(-) diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c index 75024a6..160afdc 100644 --- a/kernel/sched/debug.c +++ b/kernel/sched/debug.c @@ -211,9 +211,9 @@ void print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq) SEQ_printf(m, .%-30s: %ld\n, load, cfs_rq-load.weight); #ifdef CONFIG_FAIR_GROUP_SCHED #ifdef CONFIG_SMP - SEQ_printf(m, .%-30s: %lld\n, runnable_load_avg, + SEQ_printf(m, .%-30s: %ld\n, runnable_load_avg, cfs_rq-runnable_load_avg); - SEQ_printf(m, .%-30s: %lld\n, blocked_load_avg, + SEQ_printf(m, .%-30s: %ld\n, blocked_load_avg, cfs_rq-blocked_load_avg); SEQ_printf(m, .%-30s: %lld\n, tg_load_avg, (unsigned long long)atomic64_read(cfs_rq-tg-load_avg)); diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 7948bb8..f19772d 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4181,12 +4181,9 @@ static int tg_load_down(struct task_group *tg, void *data) if (!tg-parent) { load = cpu_rq(cpu)-avg.load_avg_contrib; } else { - unsigned long tmp_rla; - tmp_rla = tg-parent-cfs_rq[cpu]-runnable_load_avg + 1; - load = tg-parent-cfs_rq[cpu]-h_load; - load *= tg-se[cpu]-avg.load_avg_contrib; - load /= tmp_rla; + load = div64_ul(load * tg-se[cpu]-avg.load_avg_contrib, + tg-parent-cfs_rq[cpu]-runnable_load_avg + 1); } tg-cfs_rq[cpu]-h_load = load; diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 9c65d46..9eb12d9 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -277,7 +277,7 @@ struct cfs_rq { * This allows for the description of both thread and group usage (in * the FAIR_GROUP_SCHED case). */ - u64 runnable_load_avg, blocked_load_avg; + unsigned long runnable_load_avg, blocked_load_avg; atomic64_t decay_counter, removed_load; u64 last_decay; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:sched/core] sched/tg: Remove tg.load_weight
Commit-ID: a9cef46a10cc1b84bf2cdf4060766d858c0439d8 Gitweb: http://git.kernel.org/tip/a9cef46a10cc1b84bf2cdf4060766d858c0439d8 Author: Alex Shi alex@intel.com AuthorDate: Thu, 20 Jun 2013 10:18:56 +0800 Committer: Ingo Molnar mi...@kernel.org CommitDate: Thu, 27 Jun 2013 10:07:43 +0200 sched/tg: Remove tg.load_weight Since no one use it. Signed-off-by: Alex Shi alex@intel.com Reviewed-by: Paul Turner p...@google.com Tested-by: Vincent Guittot vincent.guit...@linaro.org Signed-off-by: Peter Zijlstra pet...@infradead.org Link: http://lkml.kernel.org/r/1371694737-29336-13-git-send-email-alex@intel.com Signed-off-by: Ingo Molnar mi...@kernel.org --- kernel/sched/sched.h | 1 - 1 file changed, 1 deletion(-) diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 7059919..ef0a7b2 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -148,7 +148,6 @@ struct task_group { struct cfs_rq **cfs_rq; unsigned long shares; - atomic_t load_weight; #ifdef CONFIG_SMP atomic_long_t load_avg; atomic_t runnable_avg; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:sched/core] sched/cfs_rq: Change atomic64_t removed_load to atomic_long_t
Commit-ID: 2509940fd71c2e2915a05052bbdbf2d478364184 Gitweb: http://git.kernel.org/tip/2509940fd71c2e2915a05052bbdbf2d478364184 Author: Alex Shi alex@intel.com AuthorDate: Thu, 20 Jun 2013 10:18:55 +0800 Committer: Ingo Molnar mi...@kernel.org CommitDate: Thu, 27 Jun 2013 10:07:41 +0200 sched/cfs_rq: Change atomic64_t removed_load to atomic_long_t Similar to runnable_load_avg, blocked_load_avg variable, long type is enough for removed_load in 64 bit or 32 bit machine. Then we avoid the expensive atomic64 operations on 32 bit machine. Signed-off-by: Alex Shi alex@intel.com Reviewed-by: Paul Turner p...@google.com Tested-by: Vincent Guittot vincent.guit...@linaro.org Signed-off-by: Peter Zijlstra pet...@infradead.org Link: http://lkml.kernel.org/r/1371694737-29336-12-git-send-email-alex@intel.com Signed-off-by: Ingo Molnar mi...@kernel.org --- kernel/sched/fair.c | 10 ++ kernel/sched/sched.h | 3 ++- 2 files changed, 8 insertions(+), 5 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 30ccc37..b43474a 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1517,8 +1517,9 @@ static void update_cfs_rq_blocked_load(struct cfs_rq *cfs_rq, int force_update) if (!decays !force_update) return; - if (atomic64_read(cfs_rq-removed_load)) { - u64 removed_load = atomic64_xchg(cfs_rq-removed_load, 0); + if (atomic_long_read(cfs_rq-removed_load)) { + unsigned long removed_load; + removed_load = atomic_long_xchg(cfs_rq-removed_load, 0); subtract_blocked_load_contrib(cfs_rq, removed_load); } @@ -3480,7 +3481,8 @@ migrate_task_rq_fair(struct task_struct *p, int next_cpu) */ if (se-avg.decay_count) { se-avg.decay_count = -__synchronize_entity_decay(se); - atomic64_add(se-avg.load_avg_contrib, cfs_rq-removed_load); + atomic_long_add(se-avg.load_avg_contrib, + cfs_rq-removed_load); } } #endif /* CONFIG_SMP */ @@ -5942,7 +5944,7 @@ void init_cfs_rq(struct cfs_rq *cfs_rq) #endif #ifdef CONFIG_SMP atomic64_set(cfs_rq-decay_counter, 1); - atomic64_set(cfs_rq-removed_load, 0); + atomic_long_set(cfs_rq-removed_load, 0); #endif } diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 5585eb2..7059919 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -278,8 +278,9 @@ struct cfs_rq { * the FAIR_GROUP_SCHED case). */ unsigned long runnable_load_avg, blocked_load_avg; - atomic64_t decay_counter, removed_load; + atomic64_t decay_counter; u64 last_decay; + atomic_long_t removed_load; #ifdef CONFIG_FAIR_GROUP_SCHED /* Required to track per-cpu representation of a task_group */ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:sched/core] sched/tg: Use 'unsigned long' for load variable in task group
Commit-ID: bf5b986ed4d20428eeec3df4a03dbfebb9b6538c Gitweb: http://git.kernel.org/tip/bf5b986ed4d20428eeec3df4a03dbfebb9b6538c Author: Alex Shi alex@intel.com AuthorDate: Thu, 20 Jun 2013 10:18:54 +0800 Committer: Ingo Molnar mi...@kernel.org CommitDate: Thu, 27 Jun 2013 10:07:40 +0200 sched/tg: Use 'unsigned long' for load variable in task group Since tg-load_avg is smaller than tg-load_weight, we don't need a atomic64_t variable for load_avg in 32 bit machine. The same reason for cfs_rq-tg_load_contrib. The atomic_long_t/unsigned long variable type are more efficient and convenience for them. Signed-off-by: Alex Shi alex@intel.com Tested-by: Vincent Guittot vincent.guit...@linaro.org Signed-off-by: Peter Zijlstra pet...@infradead.org Link: http://lkml.kernel.org/r/1371694737-29336-11-git-send-email-alex@intel.com Signed-off-by: Ingo Molnar mi...@kernel.org --- kernel/sched/debug.c | 6 +++--- kernel/sched/fair.c | 12 ++-- kernel/sched/sched.h | 4 ++-- 3 files changed, 11 insertions(+), 11 deletions(-) diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c index 160afdc..d803989 100644 --- a/kernel/sched/debug.c +++ b/kernel/sched/debug.c @@ -215,9 +215,9 @@ void print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq) cfs_rq-runnable_load_avg); SEQ_printf(m, .%-30s: %ld\n, blocked_load_avg, cfs_rq-blocked_load_avg); - SEQ_printf(m, .%-30s: %lld\n, tg_load_avg, - (unsigned long long)atomic64_read(cfs_rq-tg-load_avg)); - SEQ_printf(m, .%-30s: %lld\n, tg_load_contrib, + SEQ_printf(m, .%-30s: %ld\n, tg_load_avg, + atomic_long_read(cfs_rq-tg-load_avg)); + SEQ_printf(m, .%-30s: %ld\n, tg_load_contrib, cfs_rq-tg_load_contrib); SEQ_printf(m, .%-30s: %d\n, tg_runnable_contrib, cfs_rq-tg_runnable_contrib); diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index f19772d..30ccc37 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1075,7 +1075,7 @@ static inline long calc_tg_weight(struct task_group *tg, struct cfs_rq *cfs_rq) * to gain a more accurate current total weight. See * update_cfs_rq_load_contribution(). */ - tg_weight = atomic64_read(tg-load_avg); + tg_weight = atomic_long_read(tg-load_avg); tg_weight -= cfs_rq-tg_load_contrib; tg_weight += cfs_rq-load.weight; @@ -1356,13 +1356,13 @@ static inline void __update_cfs_rq_tg_load_contrib(struct cfs_rq *cfs_rq, int force_update) { struct task_group *tg = cfs_rq-tg; - s64 tg_contrib; + long tg_contrib; tg_contrib = cfs_rq-runnable_load_avg + cfs_rq-blocked_load_avg; tg_contrib -= cfs_rq-tg_load_contrib; - if (force_update || abs64(tg_contrib) cfs_rq-tg_load_contrib / 8) { - atomic64_add(tg_contrib, tg-load_avg); + if (force_update || abs(tg_contrib) cfs_rq-tg_load_contrib / 8) { + atomic_long_add(tg_contrib, tg-load_avg); cfs_rq-tg_load_contrib += tg_contrib; } } @@ -1397,8 +1397,8 @@ static inline void __update_group_entity_contrib(struct sched_entity *se) u64 contrib; contrib = cfs_rq-tg_load_contrib * tg-shares; - se-avg.load_avg_contrib = div64_u64(contrib, -atomic64_read(tg-load_avg) + 1); + se-avg.load_avg_contrib = div_u64(contrib, +atomic_long_read(tg-load_avg) + 1); /* * For group entities we need to compute a correction term in the case diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 9eb12d9..5585eb2 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -150,7 +150,7 @@ struct task_group { atomic_t load_weight; #ifdef CONFIG_SMP - atomic64_t load_avg; + atomic_long_t load_avg; atomic_t runnable_avg; #endif #endif @@ -284,7 +284,7 @@ struct cfs_rq { #ifdef CONFIG_FAIR_GROUP_SCHED /* Required to track per-cpu representation of a task_group */ u32 tg_runnable_contrib; - u64 tg_load_contrib; + unsigned long tg_load_contrib; #endif /* CONFIG_FAIR_GROUP_SCHED */ /* -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:sched/core] sched: Change get_rq_runnable_load() to static and inline
Commit-ID: a9dc5d0e33c677619e4b97a38c23db1a42857905 Gitweb: http://git.kernel.org/tip/a9dc5d0e33c677619e4b97a38c23db1a42857905 Author: Alex Shi alex@intel.com AuthorDate: Thu, 20 Jun 2013 10:18:57 +0800 Committer: Ingo Molnar mi...@kernel.org CommitDate: Thu, 27 Jun 2013 10:07:44 +0200 sched: Change get_rq_runnable_load() to static and inline Based-on-patch-by: Fengguang Wu fengguang...@intel.com Signed-off-by: Alex Shi alex@intel.com Tested-by: Vincent Guittot vincent.guit...@linaro.org Signed-off-by: Peter Zijlstra pet...@infradead.org Link: http://lkml.kernel.org/r/1371694737-29336-14-git-send-email-alex@intel.com Signed-off-by: Ingo Molnar mi...@kernel.org --- kernel/sched/proc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/sched/proc.c b/kernel/sched/proc.c index ce5cd48..16f5a30 100644 --- a/kernel/sched/proc.c +++ b/kernel/sched/proc.c @@ -502,12 +502,12 @@ static void __update_cpu_load(struct rq *this_rq, unsigned long this_load, } #ifdef CONFIG_SMP -unsigned long get_rq_runnable_load(struct rq *rq) +static inline unsigned long get_rq_runnable_load(struct rq *rq) { return rq-cfs.runnable_load_avg; } #else -unsigned long get_rq_runnable_load(struct rq *rq) +static inline unsigned long get_rq_runnable_load(struct rq *rq) { return rq-load.weight; } -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:sched/core] sched: Update cpu load after task_tick
Commit-ID: 83dfd5235ebd66c284b97befe6eabff7132333e6 Gitweb: http://git.kernel.org/tip/83dfd5235ebd66c284b97befe6eabff7132333e6 Author: Alex Shi alex@intel.com AuthorDate: Thu, 20 Jun 2013 10:18:49 +0800 Committer: Ingo Molnar mi...@kernel.org CommitDate: Thu, 27 Jun 2013 10:07:33 +0200 sched: Update cpu load after task_tick To get the latest runnable info, we need do this cpuload update after task_tick. Signed-off-by: Alex Shi alex@intel.com Reviewed-by: Paul Turner p...@google.com Signed-off-by: Peter Zijlstra pet...@infradead.org Link: http://lkml.kernel.org/r/1371694737-29336-6-git-send-email-alex@intel.com Signed-off-by: Ingo Molnar mi...@kernel.org --- kernel/sched/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 729e7fc..08746cc 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2165,8 +2165,8 @@ void scheduler_tick(void) raw_spin_lock(rq-lock); update_rq_clock(rq); - update_cpu_load_active(rq); curr-sched_class-task_tick(rq, curr, 0); + update_cpu_load_active(rq); raw_spin_unlock(rq-lock); perf_event_task_tick(); -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:sched/core] sched: Set an initial value of runnable avg for new forked task
Commit-ID: a75cdaa915e42ef0e6f38dc7f2a6a1deca91d648 Gitweb: http://git.kernel.org/tip/a75cdaa915e42ef0e6f38dc7f2a6a1deca91d648 Author: Alex Shi alex@intel.com AuthorDate: Thu, 20 Jun 2013 10:18:47 +0800 Committer: Ingo Molnar mi...@kernel.org CommitDate: Thu, 27 Jun 2013 10:07:30 +0200 sched: Set an initial value of runnable avg for new forked task We need to initialize the se.avg.{decay_count, load_avg_contrib} for a new forked task. Otherwise random values of above variables cause a mess when a new task is enqueued: enqueue_task_fair enqueue_entity enqueue_entity_load_avg and make fork balancing imbalance due to incorrect load_avg_contrib. Further more, Morten Rasmussen notice some tasks were not launched at once after created. So Paul and Peter suggest giving a start value for new task runnable avg time same as sched_slice(). PeterZ said: So the 'problem' is that our running avg is a 'floating' average; ie. it decays with time. Now we have to guess about the future of our newly spawned task -- something that is nigh impossible seeing these CPU vendors keep refusing to implement the crystal ball instruction. So there's two asymptotic cases we want to deal well with; 1) the case where the newly spawned program will be 'nearly' idle for its lifetime; and 2) the case where its cpu-bound. Since we have to guess, we'll go for worst case and assume its cpu-bound; now we don't want to make the avg so heavy adjusting to the near-idle case takes forever. We want to be able to quickly adjust and lower our running avg. Now we also don't want to make our avg too light, such that it gets decremented just for the new task not having had a chance to run yet -- even if when it would run, it would be more cpu-bound than not. So what we do is we make the initial avg of the same duration as that we guess it takes to run each task on the system at least once -- aka sched_slice(). Of course we can defeat this with wakeup/fork bombs, but in the 'normal' case it should be good enough. Paul also contributed most of the code comments in this commit. Signed-off-by: Alex Shi alex@intel.com Reviewed-by: Gu Zheng guz.f...@cn.fujitsu.com Reviewed-by: Paul Turner p...@google.com [peterz; added explanation of sched_slice() usage] Signed-off-by: Peter Zijlstra pet...@infradead.org Link: http://lkml.kernel.org/r/1371694737-29336-4-git-send-email-alex@intel.com Signed-off-by: Ingo Molnar mi...@kernel.org --- kernel/sched/core.c | 6 ++ kernel/sched/fair.c | 24 kernel/sched/sched.h | 2 ++ 3 files changed, 28 insertions(+), 4 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 0241b1b..729e7fc 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1611,10 +1611,6 @@ static void __sched_fork(struct task_struct *p) p-se.vruntime = 0; INIT_LIST_HEAD(p-se.group_node); -#ifdef CONFIG_SMP - p-se.avg.runnable_avg_period = 0; - p-se.avg.runnable_avg_sum = 0; -#endif #ifdef CONFIG_SCHEDSTATS memset(p-se.statistics, 0, sizeof(p-se.statistics)); #endif @@ -1758,6 +1754,8 @@ void wake_up_new_task(struct task_struct *p) set_task_cpu(p, select_task_rq(p, SD_BALANCE_FORK, 0)); #endif + /* Initialize new task's runnable average */ + init_task_runnable_average(p); rq = __task_rq_lock(p); activate_task(rq, p, 0); p-on_rq = 1; diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 36eadaa..e1602a0 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -680,6 +680,26 @@ static u64 sched_vslice(struct cfs_rq *cfs_rq, struct sched_entity *se) return calc_delta_fair(sched_slice(cfs_rq, se), se); } +#ifdef CONFIG_SMP +static inline void __update_task_entity_contrib(struct sched_entity *se); + +/* Give new task start runnable values to heavy its load in infant time */ +void init_task_runnable_average(struct task_struct *p) +{ + u32 slice; + + p-se.avg.decay_count = 0; + slice = sched_slice(task_cfs_rq(p), p-se) 10; + p-se.avg.runnable_avg_sum = slice; + p-se.avg.runnable_avg_period = slice; + __update_task_entity_contrib(p-se); +} +#else +void init_task_runnable_average(struct task_struct *p) +{ +} +#endif + /* * Update the current task's runtime statistics. Skip current tasks that * are not in our scheduling class. @@ -1527,6 +1547,10 @@ static inline void enqueue_entity_load_avg(struct cfs_rq *cfs_rq, * We track migrations using entity decay_count = 0, on a wake-up * migration we use a negative decay count to track the remote decays * accumulated while sleeping. +* +* Newly forked tasks are enqueued with se-avg.decay_count == 0, they +* are seen by enqueue_entity_load_avg() as a migration with an already +* constructed load_avg_contrib. */ if (unlikely(se-avg.decay_count =
[tip:sched/core] Revert sched: Introduce temporary FAIR_GROUP_SCHED dependency for load-tracking
Commit-ID: 141965c7494d984b2bf24efd361a3125278869c6 Gitweb: http://git.kernel.org/tip/141965c7494d984b2bf24efd361a3125278869c6 Author: Alex Shi alex@intel.com AuthorDate: Wed, 26 Jun 2013 13:05:39 +0800 Committer: Ingo Molnar mi...@kernel.org CommitDate: Thu, 27 Jun 2013 10:07:22 +0200 Revert sched: Introduce temporary FAIR_GROUP_SCHED dependency for load-tracking Remove CONFIG_FAIR_GROUP_SCHED that covers the runnable info, then we can use runnable load variables. Also remove 2 CONFIG_FAIR_GROUP_SCHED setting which is not in reverted patch(introduced in 9ee474f), but also need to revert. Signed-off-by: Alex Shi alex@intel.com Signed-off-by: Peter Zijlstra pet...@infradead.org Link: http://lkml.kernel.org/r/51ca76a3.3050...@intel.com Signed-off-by: Ingo Molnar mi...@kernel.org --- include/linux/sched.h | 7 +-- kernel/sched/core.c | 7 +-- kernel/sched/fair.c | 17 - kernel/sched/sched.h | 19 ++- 4 files changed, 8 insertions(+), 42 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 178a8d9..0019bef 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -994,12 +994,7 @@ struct sched_entity { struct cfs_rq *my_q; #endif -/* - * Load-tracking only depends on SMP, FAIR_GROUP_SCHED dependency below may be - * removed when useful for applications beyond shares distribution (e.g. - * load-balance). - */ -#if defined(CONFIG_SMP) defined(CONFIG_FAIR_GROUP_SCHED) +#ifdef CONFIG_SMP /* Per-entity load-tracking */ struct sched_avgavg; #endif diff --git a/kernel/sched/core.c b/kernel/sched/core.c index ceeaf0f..0241b1b 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1611,12 +1611,7 @@ static void __sched_fork(struct task_struct *p) p-se.vruntime = 0; INIT_LIST_HEAD(p-se.group_node); -/* - * Load-tracking only depends on SMP, FAIR_GROUP_SCHED dependency below may be - * removed when useful for applications beyond shares distribution (e.g. - * load-balance). - */ -#if defined(CONFIG_SMP) defined(CONFIG_FAIR_GROUP_SCHED) +#ifdef CONFIG_SMP p-se.avg.runnable_avg_period = 0; p-se.avg.runnable_avg_sum = 0; #endif diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index c0ac2c3..36eadaa 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1128,8 +1128,7 @@ static inline void update_cfs_shares(struct cfs_rq *cfs_rq) } #endif /* CONFIG_FAIR_GROUP_SCHED */ -/* Only depends on SMP, FAIR_GROUP_SCHED may be removed when useful in lb */ -#if defined(CONFIG_SMP) defined(CONFIG_FAIR_GROUP_SCHED) +#ifdef CONFIG_SMP /* * We choose a half-life close to 1 scheduling period. * Note: The tables below are dependent on this value. @@ -3431,12 +3430,6 @@ unlock: } /* - * Load-tracking only depends on SMP, FAIR_GROUP_SCHED dependency below may be - * removed when useful for applications beyond shares distribution (e.g. - * load-balance). - */ -#ifdef CONFIG_FAIR_GROUP_SCHED -/* * Called immediately before a task is migrated to a new cpu; task_cpu(p) and * cfs_rq_of(p) references at time of call are still valid and identify the * previous cpu. However, the caller only guarantees p-pi_lock is held; no @@ -3459,7 +3452,6 @@ migrate_task_rq_fair(struct task_struct *p, int next_cpu) atomic64_add(se-avg.load_avg_contrib, cfs_rq-removed_load); } } -#endif #endif /* CONFIG_SMP */ static unsigned long @@ -5861,7 +5853,7 @@ static void switched_from_fair(struct rq *rq, struct task_struct *p) se-vruntime -= cfs_rq-min_vruntime; } -#if defined(CONFIG_FAIR_GROUP_SCHED) defined(CONFIG_SMP) +#ifdef CONFIG_SMP /* * Remove our load from contribution when we leave sched_fair * and ensure we don't carry in an old decay_count if we @@ -5920,7 +5912,7 @@ void init_cfs_rq(struct cfs_rq *cfs_rq) #ifndef CONFIG_64BIT cfs_rq-min_vruntime_copy = cfs_rq-min_vruntime; #endif -#if defined(CONFIG_FAIR_GROUP_SCHED) defined(CONFIG_SMP) +#ifdef CONFIG_SMP atomic64_set(cfs_rq-decay_counter, 1); atomic64_set(cfs_rq-removed_load, 0); #endif @@ -6162,9 +6154,8 @@ const struct sched_class fair_sched_class = { #ifdef CONFIG_SMP .select_task_rq = select_task_rq_fair, -#ifdef CONFIG_FAIR_GROUP_SCHED .migrate_task_rq= migrate_task_rq_fair, -#endif + .rq_online = rq_online_fair, .rq_offline = rq_offline_fair, diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 029601a..77ce668 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -269,12 +269,6 @@ struct cfs_rq { #endif #ifdef CONFIG_SMP -/* - * Load-tracking only depends on SMP, FAIR_GROUP_SCHED dependency below may be - * removed when useful for applications beyond shares distribution (e.g. - * load-balance). - */ -#ifdef CONFIG_FAIR_GROUP_SCHED /* *
[tip:core/locking] rwsem: Implement writer lock-stealing for better scalability
Commit-ID: ce6711f3d196f09ca0ed29a24dfad42d83912b20 Gitweb: http://git.kernel.org/tip/ce6711f3d196f09ca0ed29a24dfad42d83912b20 Author: Alex Shi AuthorDate: Tue, 5 Feb 2013 21:11:55 +0800 Committer: Ingo Molnar CommitDate: Tue, 19 Feb 2013 08:42:43 +0100 rwsem: Implement writer lock-stealing for better scalability Commit 5a505085f043 ("mm/rmap: Convert the struct anon_vma::mutex to an rwsem") changed struct anon_vma::mutex to an rwsem, which caused aim7 fork_test performance to drop by 50%. Yuanhan Liu did the following excellent analysis: https://lkml.org/lkml/2013/1/29/84 and found that the regression is caused by strict, serialized, FIFO sequential write-ownership of rwsems. Ingo suggested implementing opportunistic lock-stealing for the front writer task in the waitqueue. Yuanhan Liu implemented lock-stealing for spinlock-rwsems, which indeed recovered much of the regression - confirming the analysis that the main factor in the regression was the FIFO writer-fairness of rwsems. In this patch we allow lock-stealing to happen when the first waiter is also writer. With that change in place the aim7 fork_test performance is fully recovered on my Intel NHM EP, NHM EX, SNB EP 2S and 4S test-machines. Reported-by: l...@linux.intel.com Reported-by: Yuanhan Liu Signed-off-by: Alex Shi Cc: David Howells Cc: Michel Lespinasse Cc: Linus Torvalds Cc: Andrew Morton Cc: Peter Zijlstra Cc: Anton Blanchard Cc: Arjan van de Ven Cc: paul.gortma...@windriver.com Link: https://lkml.org/lkml/2013/1/29/84 Link: http://lkml.kernel.org/r/1360069915-31619-1-git-send-email-alex@intel.com [ Small stylistic fixes, updated changelog. ] Signed-off-by: Ingo Molnar --- lib/rwsem.c | 75 + 1 file changed, 46 insertions(+), 29 deletions(-) diff --git a/lib/rwsem.c b/lib/rwsem.c index 8337e1b..ad5e0df 100644 --- a/lib/rwsem.c +++ b/lib/rwsem.c @@ -2,6 +2,8 @@ * * Written by David Howells (dhowe...@redhat.com). * Derived from arch/i386/kernel/semaphore.c + * + * Writer lock-stealing by Alex Shi */ #include #include @@ -60,7 +62,7 @@ __rwsem_do_wake(struct rw_semaphore *sem, int wake_type) struct rwsem_waiter *waiter; struct task_struct *tsk; struct list_head *next; - signed long oldcount, woken, loop, adjustment; + signed long woken, loop, adjustment; waiter = list_entry(sem->wait_list.next, struct rwsem_waiter, list); if (!(waiter->flags & RWSEM_WAITING_FOR_WRITE)) @@ -72,30 +74,8 @@ __rwsem_do_wake(struct rw_semaphore *sem, int wake_type) */ goto out; - /* There's a writer at the front of the queue - try to grant it the -* write lock. However, we only wake this writer if we can transition -* the active part of the count from 0 -> 1 -*/ - adjustment = RWSEM_ACTIVE_WRITE_BIAS; - if (waiter->list.next == >wait_list) - adjustment -= RWSEM_WAITING_BIAS; - - try_again_write: - oldcount = rwsem_atomic_update(adjustment, sem) - adjustment; - if (oldcount & RWSEM_ACTIVE_MASK) - /* Someone grabbed the sem already */ - goto undo_write; - - /* We must be careful not to touch 'waiter' after we set ->task = NULL. -* It is an allocated on the waiter's stack and may become invalid at -* any time after that point (due to a wakeup from another source). -*/ - list_del(>list); - tsk = waiter->task; - smp_mb(); - waiter->task = NULL; - wake_up_process(tsk); - put_task_struct(tsk); + /* Wake up the writing waiter and let the task grab the sem: */ + wake_up_process(waiter->task); goto out; readers_only: @@ -157,12 +137,40 @@ __rwsem_do_wake(struct rw_semaphore *sem, int wake_type) out: return sem; +} + +/* Try to get write sem, caller holds sem->wait_lock: */ +static int try_get_writer_sem(struct rw_semaphore *sem, + struct rwsem_waiter *waiter) +{ + struct rwsem_waiter *fwaiter; + long oldcount, adjustment; - /* undo the change to the active count, but check for a transition -* 1->0 */ - undo_write: + /* only steal when first waiter is writing */ + fwaiter = list_entry(sem->wait_list.next, struct rwsem_waiter, list); + if (!(fwaiter->flags & RWSEM_WAITING_FOR_WRITE)) + return 0; + + adjustment = RWSEM_ACTIVE_WRITE_BIAS; + /* Only one waiter in the queue: */ + if (fwaiter == waiter && waiter->list.next == >wait_list) + adjustment -= RWSEM_WAITING_BIAS; + +try_again_write: + oldcount = rwsem_atomic_update(adjustment, sem) - adjustment; + if (!(oldcount & RWSEM_ACTIVE_MASK)) { + /* No active lock: */ + struct task_struct *tsk = waiter->task; + + list_del(>list); +
[tip:core/locking] rwsem: Implement writer lock-stealing for better scalability
Commit-ID: ce6711f3d196f09ca0ed29a24dfad42d83912b20 Gitweb: http://git.kernel.org/tip/ce6711f3d196f09ca0ed29a24dfad42d83912b20 Author: Alex Shi alex@intel.com AuthorDate: Tue, 5 Feb 2013 21:11:55 +0800 Committer: Ingo Molnar mi...@kernel.org CommitDate: Tue, 19 Feb 2013 08:42:43 +0100 rwsem: Implement writer lock-stealing for better scalability Commit 5a505085f043 (mm/rmap: Convert the struct anon_vma::mutex to an rwsem) changed struct anon_vma::mutex to an rwsem, which caused aim7 fork_test performance to drop by 50%. Yuanhan Liu did the following excellent analysis: https://lkml.org/lkml/2013/1/29/84 and found that the regression is caused by strict, serialized, FIFO sequential write-ownership of rwsems. Ingo suggested implementing opportunistic lock-stealing for the front writer task in the waitqueue. Yuanhan Liu implemented lock-stealing for spinlock-rwsems, which indeed recovered much of the regression - confirming the analysis that the main factor in the regression was the FIFO writer-fairness of rwsems. In this patch we allow lock-stealing to happen when the first waiter is also writer. With that change in place the aim7 fork_test performance is fully recovered on my Intel NHM EP, NHM EX, SNB EP 2S and 4S test-machines. Reported-by: l...@linux.intel.com Reported-by: Yuanhan Liu yuanhan@linux.intel.com Signed-off-by: Alex Shi alex@intel.com Cc: David Howells dhowe...@redhat.com Cc: Michel Lespinasse wal...@google.com Cc: Linus Torvalds torva...@linux-foundation.org Cc: Andrew Morton a...@linux-foundation.org Cc: Peter Zijlstra a.p.zijls...@chello.nl Cc: Anton Blanchard an...@samba.org Cc: Arjan van de Ven ar...@linux.intel.com Cc: paul.gortma...@windriver.com Link: https://lkml.org/lkml/2013/1/29/84 Link: http://lkml.kernel.org/r/1360069915-31619-1-git-send-email-alex@intel.com [ Small stylistic fixes, updated changelog. ] Signed-off-by: Ingo Molnar mi...@kernel.org --- lib/rwsem.c | 75 + 1 file changed, 46 insertions(+), 29 deletions(-) diff --git a/lib/rwsem.c b/lib/rwsem.c index 8337e1b..ad5e0df 100644 --- a/lib/rwsem.c +++ b/lib/rwsem.c @@ -2,6 +2,8 @@ * * Written by David Howells (dhowe...@redhat.com). * Derived from arch/i386/kernel/semaphore.c + * + * Writer lock-stealing by Alex Shi alex@intel.com */ #include linux/rwsem.h #include linux/sched.h @@ -60,7 +62,7 @@ __rwsem_do_wake(struct rw_semaphore *sem, int wake_type) struct rwsem_waiter *waiter; struct task_struct *tsk; struct list_head *next; - signed long oldcount, woken, loop, adjustment; + signed long woken, loop, adjustment; waiter = list_entry(sem-wait_list.next, struct rwsem_waiter, list); if (!(waiter-flags RWSEM_WAITING_FOR_WRITE)) @@ -72,30 +74,8 @@ __rwsem_do_wake(struct rw_semaphore *sem, int wake_type) */ goto out; - /* There's a writer at the front of the queue - try to grant it the -* write lock. However, we only wake this writer if we can transition -* the active part of the count from 0 - 1 -*/ - adjustment = RWSEM_ACTIVE_WRITE_BIAS; - if (waiter-list.next == sem-wait_list) - adjustment -= RWSEM_WAITING_BIAS; - - try_again_write: - oldcount = rwsem_atomic_update(adjustment, sem) - adjustment; - if (oldcount RWSEM_ACTIVE_MASK) - /* Someone grabbed the sem already */ - goto undo_write; - - /* We must be careful not to touch 'waiter' after we set -task = NULL. -* It is an allocated on the waiter's stack and may become invalid at -* any time after that point (due to a wakeup from another source). -*/ - list_del(waiter-list); - tsk = waiter-task; - smp_mb(); - waiter-task = NULL; - wake_up_process(tsk); - put_task_struct(tsk); + /* Wake up the writing waiter and let the task grab the sem: */ + wake_up_process(waiter-task); goto out; readers_only: @@ -157,12 +137,40 @@ __rwsem_do_wake(struct rw_semaphore *sem, int wake_type) out: return sem; +} + +/* Try to get write sem, caller holds sem-wait_lock: */ +static int try_get_writer_sem(struct rw_semaphore *sem, + struct rwsem_waiter *waiter) +{ + struct rwsem_waiter *fwaiter; + long oldcount, adjustment; - /* undo the change to the active count, but check for a transition -* 1-0 */ - undo_write: + /* only steal when first waiter is writing */ + fwaiter = list_entry(sem-wait_list.next, struct rwsem_waiter, list); + if (!(fwaiter-flags RWSEM_WAITING_FOR_WRITE)) + return 0; + + adjustment = RWSEM_ACTIVE_WRITE_BIAS; + /* Only one waiter in the queue: */ + if (fwaiter == waiter waiter-list.next == sem-wait_list) + adjustment -= RWSEM_WAITING_BIAS; +
[tip:core/urgent] rwsem: Implement writer lock-stealing for better scalability
Commit-ID: 3a15e0e0cdda5b401d0a36dd7e83406cd1ce0724 Gitweb: http://git.kernel.org/tip/3a15e0e0cdda5b401d0a36dd7e83406cd1ce0724 Author: Alex Shi AuthorDate: Tue, 5 Feb 2013 21:11:55 +0800 Committer: Ingo Molnar CommitDate: Wed, 6 Feb 2013 12:41:43 +0100 rwsem: Implement writer lock-stealing for better scalability Commit 5a505085f043 ("mm/rmap: Convert the struct anon_vma::mutex to an rwsem") changed struct anon_vma::mutex to an rwsem, which caused aim7 fork_test performance to drop by 50%. Yuanhan Liu did the following excellent analysis: https://lkml.org/lkml/2013/1/29/84 and found that the regression is caused by strict, serialized, FIFO sequential write-ownership of rwsems. Ingo suggested implementing opportunistic lock-stealing for the front writer task in the waitqueue. Yuanhan Liu implemented lock-stealing for spinlock-rwsems, which indeed recovered much of the regression - confirming the analysis that the main factor in the regression was the FIFO writer-fairness of rwsems. In this patch we allow lock-stealing to happen when the first waiter is also writer. With that change in place the aim7 fork_test performance is fully recovered on my Intel NHM EP, NHM EX, SNB EP 2S and 4S test-machines. Reported-by: l...@linux.intel.com Reported-by: Yuanhan Liu Signed-off-by: Alex Shi Cc: David Howells Cc: Michel Lespinasse Cc: Linus Torvalds Cc: Andrew Morton Cc: Peter Zijlstra Cc: Anton Blanchard Cc: Arjan van de Ven Cc: paul.gortma...@windriver.com Link: https://lkml.org/lkml/2013/1/29/84 Link: http://lkml.kernel.org/r/1360069915-31619-1-git-send-email-alex@intel.com [ Small stylistic fixes, updated changelog. ] Signed-off-by: Ingo Molnar --- lib/rwsem.c | 75 + 1 file changed, 46 insertions(+), 29 deletions(-) diff --git a/lib/rwsem.c b/lib/rwsem.c index 8337e1b..ad5e0df 100644 --- a/lib/rwsem.c +++ b/lib/rwsem.c @@ -2,6 +2,8 @@ * * Written by David Howells (dhowe...@redhat.com). * Derived from arch/i386/kernel/semaphore.c + * + * Writer lock-stealing by Alex Shi */ #include #include @@ -60,7 +62,7 @@ __rwsem_do_wake(struct rw_semaphore *sem, int wake_type) struct rwsem_waiter *waiter; struct task_struct *tsk; struct list_head *next; - signed long oldcount, woken, loop, adjustment; + signed long woken, loop, adjustment; waiter = list_entry(sem->wait_list.next, struct rwsem_waiter, list); if (!(waiter->flags & RWSEM_WAITING_FOR_WRITE)) @@ -72,30 +74,8 @@ __rwsem_do_wake(struct rw_semaphore *sem, int wake_type) */ goto out; - /* There's a writer at the front of the queue - try to grant it the -* write lock. However, we only wake this writer if we can transition -* the active part of the count from 0 -> 1 -*/ - adjustment = RWSEM_ACTIVE_WRITE_BIAS; - if (waiter->list.next == >wait_list) - adjustment -= RWSEM_WAITING_BIAS; - - try_again_write: - oldcount = rwsem_atomic_update(adjustment, sem) - adjustment; - if (oldcount & RWSEM_ACTIVE_MASK) - /* Someone grabbed the sem already */ - goto undo_write; - - /* We must be careful not to touch 'waiter' after we set ->task = NULL. -* It is an allocated on the waiter's stack and may become invalid at -* any time after that point (due to a wakeup from another source). -*/ - list_del(>list); - tsk = waiter->task; - smp_mb(); - waiter->task = NULL; - wake_up_process(tsk); - put_task_struct(tsk); + /* Wake up the writing waiter and let the task grab the sem: */ + wake_up_process(waiter->task); goto out; readers_only: @@ -157,12 +137,40 @@ __rwsem_do_wake(struct rw_semaphore *sem, int wake_type) out: return sem; +} + +/* Try to get write sem, caller holds sem->wait_lock: */ +static int try_get_writer_sem(struct rw_semaphore *sem, + struct rwsem_waiter *waiter) +{ + struct rwsem_waiter *fwaiter; + long oldcount, adjustment; - /* undo the change to the active count, but check for a transition -* 1->0 */ - undo_write: + /* only steal when first waiter is writing */ + fwaiter = list_entry(sem->wait_list.next, struct rwsem_waiter, list); + if (!(fwaiter->flags & RWSEM_WAITING_FOR_WRITE)) + return 0; + + adjustment = RWSEM_ACTIVE_WRITE_BIAS; + /* Only one waiter in the queue: */ + if (fwaiter == waiter && waiter->list.next == >wait_list) + adjustment -= RWSEM_WAITING_BIAS; + +try_again_write: + oldcount = rwsem_atomic_update(adjustment, sem) - adjustment; + if (!(oldcount & RWSEM_ACTIVE_MASK)) { + /* No active lock: */ + struct task_struct *tsk = waiter->task; + + list_del(>list); + smp_mb();
[tip:core/urgent] rwsem: Implement writer lock-stealing for better scalability
Commit-ID: 3a15e0e0cdda5b401d0a36dd7e83406cd1ce0724 Gitweb: http://git.kernel.org/tip/3a15e0e0cdda5b401d0a36dd7e83406cd1ce0724 Author: Alex Shi alex@intel.com AuthorDate: Tue, 5 Feb 2013 21:11:55 +0800 Committer: Ingo Molnar mi...@kernel.org CommitDate: Wed, 6 Feb 2013 12:41:43 +0100 rwsem: Implement writer lock-stealing for better scalability Commit 5a505085f043 (mm/rmap: Convert the struct anon_vma::mutex to an rwsem) changed struct anon_vma::mutex to an rwsem, which caused aim7 fork_test performance to drop by 50%. Yuanhan Liu did the following excellent analysis: https://lkml.org/lkml/2013/1/29/84 and found that the regression is caused by strict, serialized, FIFO sequential write-ownership of rwsems. Ingo suggested implementing opportunistic lock-stealing for the front writer task in the waitqueue. Yuanhan Liu implemented lock-stealing for spinlock-rwsems, which indeed recovered much of the regression - confirming the analysis that the main factor in the regression was the FIFO writer-fairness of rwsems. In this patch we allow lock-stealing to happen when the first waiter is also writer. With that change in place the aim7 fork_test performance is fully recovered on my Intel NHM EP, NHM EX, SNB EP 2S and 4S test-machines. Reported-by: l...@linux.intel.com Reported-by: Yuanhan Liu yuanhan@linux.intel.com Signed-off-by: Alex Shi alex@intel.com Cc: David Howells dhowe...@redhat.com Cc: Michel Lespinasse wal...@google.com Cc: Linus Torvalds torva...@linux-foundation.org Cc: Andrew Morton a...@linux-foundation.org Cc: Peter Zijlstra a.p.zijls...@chello.nl Cc: Anton Blanchard an...@samba.org Cc: Arjan van de Ven ar...@linux.intel.com Cc: paul.gortma...@windriver.com Link: https://lkml.org/lkml/2013/1/29/84 Link: http://lkml.kernel.org/r/1360069915-31619-1-git-send-email-alex@intel.com [ Small stylistic fixes, updated changelog. ] Signed-off-by: Ingo Molnar mi...@kernel.org --- lib/rwsem.c | 75 + 1 file changed, 46 insertions(+), 29 deletions(-) diff --git a/lib/rwsem.c b/lib/rwsem.c index 8337e1b..ad5e0df 100644 --- a/lib/rwsem.c +++ b/lib/rwsem.c @@ -2,6 +2,8 @@ * * Written by David Howells (dhowe...@redhat.com). * Derived from arch/i386/kernel/semaphore.c + * + * Writer lock-stealing by Alex Shi alex@intel.com */ #include linux/rwsem.h #include linux/sched.h @@ -60,7 +62,7 @@ __rwsem_do_wake(struct rw_semaphore *sem, int wake_type) struct rwsem_waiter *waiter; struct task_struct *tsk; struct list_head *next; - signed long oldcount, woken, loop, adjustment; + signed long woken, loop, adjustment; waiter = list_entry(sem-wait_list.next, struct rwsem_waiter, list); if (!(waiter-flags RWSEM_WAITING_FOR_WRITE)) @@ -72,30 +74,8 @@ __rwsem_do_wake(struct rw_semaphore *sem, int wake_type) */ goto out; - /* There's a writer at the front of the queue - try to grant it the -* write lock. However, we only wake this writer if we can transition -* the active part of the count from 0 - 1 -*/ - adjustment = RWSEM_ACTIVE_WRITE_BIAS; - if (waiter-list.next == sem-wait_list) - adjustment -= RWSEM_WAITING_BIAS; - - try_again_write: - oldcount = rwsem_atomic_update(adjustment, sem) - adjustment; - if (oldcount RWSEM_ACTIVE_MASK) - /* Someone grabbed the sem already */ - goto undo_write; - - /* We must be careful not to touch 'waiter' after we set -task = NULL. -* It is an allocated on the waiter's stack and may become invalid at -* any time after that point (due to a wakeup from another source). -*/ - list_del(waiter-list); - tsk = waiter-task; - smp_mb(); - waiter-task = NULL; - wake_up_process(tsk); - put_task_struct(tsk); + /* Wake up the writing waiter and let the task grab the sem: */ + wake_up_process(waiter-task); goto out; readers_only: @@ -157,12 +137,40 @@ __rwsem_do_wake(struct rw_semaphore *sem, int wake_type) out: return sem; +} + +/* Try to get write sem, caller holds sem-wait_lock: */ +static int try_get_writer_sem(struct rw_semaphore *sem, + struct rwsem_waiter *waiter) +{ + struct rwsem_waiter *fwaiter; + long oldcount, adjustment; - /* undo the change to the active count, but check for a transition -* 1-0 */ - undo_write: + /* only steal when first waiter is writing */ + fwaiter = list_entry(sem-wait_list.next, struct rwsem_waiter, list); + if (!(fwaiter-flags RWSEM_WAITING_FOR_WRITE)) + return 0; + + adjustment = RWSEM_ACTIVE_WRITE_BIAS; + /* Only one waiter in the queue: */ + if (fwaiter == waiter waiter-list.next == sem-wait_list) + adjustment -= RWSEM_WAITING_BIAS; +
[tip:sched/core] sched/nohz: Clean up select_nohz_load_balancer()
Commit-ID: c1cc017c59c44d9ede7003631c43adc0cfdce2f9 Gitweb: http://git.kernel.org/tip/c1cc017c59c44d9ede7003631c43adc0cfdce2f9 Author: Alex Shi AuthorDate: Mon, 10 Sep 2012 15:10:58 +0800 Committer: Ingo Molnar CommitDate: Thu, 13 Sep 2012 16:52:05 +0200 sched/nohz: Clean up select_nohz_load_balancer() There is no load_balancer to be selected now. It just sets the state of the nohz tick to stop. So rename the function, pass the 'cpu' as a parameter and then remove the useless call from tick_nohz_restart_sched_tick(). [ s/set_nohz_tick_stopped/nohz_balance_enter_idle/g s/clear_nohz_tick_stopped/nohz_balance_exit_idle/g ] Signed-off-by: Alex Shi Acked-by: Suresh Siddha Cc: Venkatesh Pallipadi Signed-off-by: Peter Zijlstra Link: http://lkml.kernel.org/r/1347261059-24747-1-git-send-email-alex@intel.com Signed-off-by: Ingo Molnar --- include/linux/sched.h|4 ++-- kernel/sched/fair.c | 25 ++--- kernel/time/tick-sched.c |3 +-- 3 files changed, 13 insertions(+), 19 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 60e5e38..8c38df0 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -273,11 +273,11 @@ extern void init_idle_bootup_task(struct task_struct *idle); extern int runqueue_is_locked(int cpu); #if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ) -extern void select_nohz_load_balancer(int stop_tick); +extern void nohz_balance_enter_idle(int cpu); extern void set_cpu_sd_state_idle(void); extern int get_nohz_timer_target(void); #else -static inline void select_nohz_load_balancer(int stop_tick) { } +static inline void nohz_balance_enter_idle(int cpu) { } static inline void set_cpu_sd_state_idle(void) { } #endif diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 9ae3a5b..de596a2 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4603,7 +4603,7 @@ static void nohz_balancer_kick(int cpu) return; } -static inline void clear_nohz_tick_stopped(int cpu) +static inline void nohz_balance_exit_idle(int cpu) { if (unlikely(test_bit(NOHZ_TICK_STOPPED, nohz_flags(cpu { cpumask_clear_cpu(cpu, nohz.idle_cpus_mask); @@ -4643,28 +4643,23 @@ void set_cpu_sd_state_idle(void) } /* - * This routine will record that this cpu is going idle with tick stopped. + * This routine will record that the cpu is going idle with tick stopped. * This info will be used in performing idle load balancing in the future. */ -void select_nohz_load_balancer(int stop_tick) +void nohz_balance_enter_idle(int cpu) { - int cpu = smp_processor_id(); - /* * If this cpu is going down, then nothing needs to be done. */ if (!cpu_active(cpu)) return; - if (stop_tick) { - if (test_bit(NOHZ_TICK_STOPPED, nohz_flags(cpu))) - return; + if (test_bit(NOHZ_TICK_STOPPED, nohz_flags(cpu))) + return; - cpumask_set_cpu(cpu, nohz.idle_cpus_mask); - atomic_inc(_cpus); - set_bit(NOHZ_TICK_STOPPED, nohz_flags(cpu)); - } - return; + cpumask_set_cpu(cpu, nohz.idle_cpus_mask); + atomic_inc(_cpus); + set_bit(NOHZ_TICK_STOPPED, nohz_flags(cpu)); } static int __cpuinit sched_ilb_notifier(struct notifier_block *nfb, @@ -4672,7 +4667,7 @@ static int __cpuinit sched_ilb_notifier(struct notifier_block *nfb, { switch (action & ~CPU_TASKS_FROZEN) { case CPU_DYING: - clear_nohz_tick_stopped(smp_processor_id()); + nohz_balance_exit_idle(smp_processor_id()); return NOTIFY_OK; default: return NOTIFY_DONE; @@ -4833,7 +4828,7 @@ static inline int nohz_kick_needed(struct rq *rq, int cpu) * busy tick after returning from idle, we will update the busy stats. */ set_cpu_sd_state_busy(); - clear_nohz_tick_stopped(cpu); + nohz_balance_exit_idle(cpu); /* * None are in tickless mode and hence no need for NOHZ idle load diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index 3a9e5d5..1a5ee90 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -372,7 +372,7 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts, * the scheduler tick in nohz_restart_sched_tick. */ if (!ts->tick_stopped) { - select_nohz_load_balancer(1); + nohz_balance_enter_idle(cpu); calc_load_enter_idle(); ts->last_tick = hrtimer_get_expires(>sched_timer); @@ -569,7 +569,6 @@ static void tick_nohz_restart(struct tick_sched *ts, ktime_t now) static void tick_nohz_restart_sched_tick(struct tick_sched *ts, ktime_t now) { /* Update jiffies first */ - select_nohz_load_balancer(0);
[tip:sched/core] sched/nohz: Clean up select_nohz_load_balancer()
Commit-ID: c1cc017c59c44d9ede7003631c43adc0cfdce2f9 Gitweb: http://git.kernel.org/tip/c1cc017c59c44d9ede7003631c43adc0cfdce2f9 Author: Alex Shi alex@intel.com AuthorDate: Mon, 10 Sep 2012 15:10:58 +0800 Committer: Ingo Molnar mi...@kernel.org CommitDate: Thu, 13 Sep 2012 16:52:05 +0200 sched/nohz: Clean up select_nohz_load_balancer() There is no load_balancer to be selected now. It just sets the state of the nohz tick to stop. So rename the function, pass the 'cpu' as a parameter and then remove the useless call from tick_nohz_restart_sched_tick(). [ s/set_nohz_tick_stopped/nohz_balance_enter_idle/g s/clear_nohz_tick_stopped/nohz_balance_exit_idle/g ] Signed-off-by: Alex Shi alex@intel.com Acked-by: Suresh Siddha suresh.b.sid...@intel.com Cc: Venkatesh Pallipadi ve...@google.com Signed-off-by: Peter Zijlstra a.p.zijls...@chello.nl Link: http://lkml.kernel.org/r/1347261059-24747-1-git-send-email-alex@intel.com Signed-off-by: Ingo Molnar mi...@kernel.org --- include/linux/sched.h|4 ++-- kernel/sched/fair.c | 25 ++--- kernel/time/tick-sched.c |3 +-- 3 files changed, 13 insertions(+), 19 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 60e5e38..8c38df0 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -273,11 +273,11 @@ extern void init_idle_bootup_task(struct task_struct *idle); extern int runqueue_is_locked(int cpu); #if defined(CONFIG_SMP) defined(CONFIG_NO_HZ) -extern void select_nohz_load_balancer(int stop_tick); +extern void nohz_balance_enter_idle(int cpu); extern void set_cpu_sd_state_idle(void); extern int get_nohz_timer_target(void); #else -static inline void select_nohz_load_balancer(int stop_tick) { } +static inline void nohz_balance_enter_idle(int cpu) { } static inline void set_cpu_sd_state_idle(void) { } #endif diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 9ae3a5b..de596a2 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4603,7 +4603,7 @@ static void nohz_balancer_kick(int cpu) return; } -static inline void clear_nohz_tick_stopped(int cpu) +static inline void nohz_balance_exit_idle(int cpu) { if (unlikely(test_bit(NOHZ_TICK_STOPPED, nohz_flags(cpu { cpumask_clear_cpu(cpu, nohz.idle_cpus_mask); @@ -4643,28 +4643,23 @@ void set_cpu_sd_state_idle(void) } /* - * This routine will record that this cpu is going idle with tick stopped. + * This routine will record that the cpu is going idle with tick stopped. * This info will be used in performing idle load balancing in the future. */ -void select_nohz_load_balancer(int stop_tick) +void nohz_balance_enter_idle(int cpu) { - int cpu = smp_processor_id(); - /* * If this cpu is going down, then nothing needs to be done. */ if (!cpu_active(cpu)) return; - if (stop_tick) { - if (test_bit(NOHZ_TICK_STOPPED, nohz_flags(cpu))) - return; + if (test_bit(NOHZ_TICK_STOPPED, nohz_flags(cpu))) + return; - cpumask_set_cpu(cpu, nohz.idle_cpus_mask); - atomic_inc(nohz.nr_cpus); - set_bit(NOHZ_TICK_STOPPED, nohz_flags(cpu)); - } - return; + cpumask_set_cpu(cpu, nohz.idle_cpus_mask); + atomic_inc(nohz.nr_cpus); + set_bit(NOHZ_TICK_STOPPED, nohz_flags(cpu)); } static int __cpuinit sched_ilb_notifier(struct notifier_block *nfb, @@ -4672,7 +4667,7 @@ static int __cpuinit sched_ilb_notifier(struct notifier_block *nfb, { switch (action ~CPU_TASKS_FROZEN) { case CPU_DYING: - clear_nohz_tick_stopped(smp_processor_id()); + nohz_balance_exit_idle(smp_processor_id()); return NOTIFY_OK; default: return NOTIFY_DONE; @@ -4833,7 +4828,7 @@ static inline int nohz_kick_needed(struct rq *rq, int cpu) * busy tick after returning from idle, we will update the busy stats. */ set_cpu_sd_state_busy(); - clear_nohz_tick_stopped(cpu); + nohz_balance_exit_idle(cpu); /* * None are in tickless mode and hence no need for NOHZ idle load diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index 3a9e5d5..1a5ee90 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -372,7 +372,7 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts, * the scheduler tick in nohz_restart_sched_tick. */ if (!ts-tick_stopped) { - select_nohz_load_balancer(1); + nohz_balance_enter_idle(cpu); calc_load_enter_idle(); ts-last_tick = hrtimer_get_expires(ts-sched_timer); @@ -569,7 +569,6 @@ static void tick_nohz_restart(struct tick_sched *ts, ktime_t now) static void tick_nohz_restart_sched_tick(struct
[tip:sched/core] tile: Remove SD_PREFER_LOCAL leftover
Commit-ID: c7660994ed6b44d17dad0aac0d156da1e0a2f003 Gitweb: http://git.kernel.org/tip/c7660994ed6b44d17dad0aac0d156da1e0a2f003 Author: Alex Shi AuthorDate: Wed, 15 Aug 2012 08:14:36 +0800 Committer: Thomas Gleixner CommitDate: Wed, 15 Aug 2012 13:22:55 +0200 tile: Remove SD_PREFER_LOCAL leftover commit (sched: recover SD_WAKE_AFFINE in select_task_rq_fair and code clean up) removed SD_PREFER_LOCAL, but left a SD_PREFER_LOCAL usage in arch/tile code. That breaks the arch/tile build. Reported-by: Fengguang Wu Signed-off-by: Alex Shi Acked-by: Peter Zijlstra Link: http://lkml.kernel.org/r/502af3e6.3050...@intel.com Signed-off-by: Thomas Gleixner --- arch/tile/include/asm/topology.h |1 - 1 files changed, 0 insertions(+), 1 deletions(-) diff --git a/arch/tile/include/asm/topology.h b/arch/tile/include/asm/topology.h index 7a7ce39..d5e86c9 100644 --- a/arch/tile/include/asm/topology.h +++ b/arch/tile/include/asm/topology.h @@ -69,7 +69,6 @@ static inline const struct cpumask *cpumask_of_node(int node) | 1*SD_BALANCE_FORK \ | 0*SD_BALANCE_WAKE \ | 0*SD_WAKE_AFFINE \ - | 0*SD_PREFER_LOCAL \ | 0*SD_SHARE_CPUPOWER \ | 0*SD_SHARE_PKG_RESOURCES \ | 0*SD_SERIALIZE\ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:sched/core] tile: Remove SD_PREFER_LOCAL leftover
Commit-ID: c7660994ed6b44d17dad0aac0d156da1e0a2f003 Gitweb: http://git.kernel.org/tip/c7660994ed6b44d17dad0aac0d156da1e0a2f003 Author: Alex Shi alex@intel.com AuthorDate: Wed, 15 Aug 2012 08:14:36 +0800 Committer: Thomas Gleixner t...@linutronix.de CommitDate: Wed, 15 Aug 2012 13:22:55 +0200 tile: Remove SD_PREFER_LOCAL leftover commit (sched: recover SD_WAKE_AFFINE in select_task_rq_fair and code clean up) removed SD_PREFER_LOCAL, but left a SD_PREFER_LOCAL usage in arch/tile code. That breaks the arch/tile build. Reported-by: Fengguang Wu fengguang...@intel.com Signed-off-by: Alex Shi alex@intel.com Acked-by: Peter Zijlstra a.p.zijls...@chello.nl Link: http://lkml.kernel.org/r/502af3e6.3050...@intel.com Signed-off-by: Thomas Gleixner t...@linutronix.de --- arch/tile/include/asm/topology.h |1 - 1 files changed, 0 insertions(+), 1 deletions(-) diff --git a/arch/tile/include/asm/topology.h b/arch/tile/include/asm/topology.h index 7a7ce39..d5e86c9 100644 --- a/arch/tile/include/asm/topology.h +++ b/arch/tile/include/asm/topology.h @@ -69,7 +69,6 @@ static inline const struct cpumask *cpumask_of_node(int node) | 1*SD_BALANCE_FORK \ | 0*SD_BALANCE_WAKE \ | 0*SD_WAKE_AFFINE \ - | 0*SD_PREFER_LOCAL \ | 0*SD_SHARE_CPUPOWER \ | 0*SD_SHARE_PKG_RESOURCES \ | 0*SD_SERIALIZE\ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:sched/core] sched: recover SD_WAKE_AFFINE in select_task_rq_fair and code clean up
Commit-ID: f03542a7019c600163ac4441d8a826c92c1bd510 Gitweb: http://git.kernel.org/tip/f03542a7019c600163ac4441d8a826c92c1bd510 Author: Alex Shi AuthorDate: Thu, 26 Jul 2012 08:55:34 +0800 Committer: Thomas Gleixner CommitDate: Mon, 13 Aug 2012 19:02:05 +0200 sched: recover SD_WAKE_AFFINE in select_task_rq_fair and code clean up Since power saving code was removed from sched now, the implement code is out of service in this function, and even pollute other logical. like, 'want_sd' never has chance to be set '0', that remove the effect of SD_WAKE_AFFINE here. So, clean up the obsolete code, includes SD_PREFER_LOCAL. Signed-off-by: Alex Shi Signed-off-by: Peter Zijlstra Link: http://lkml.kernel.org/r/5028f431.6000...@intel.com Signed-off-by: Thomas Gleixner --- include/linux/sched.h|1 - include/linux/topology.h |2 -- kernel/sched/core.c |1 - kernel/sched/fair.c | 34 +++--- 4 files changed, 3 insertions(+), 35 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index b8c8664..f3eebc1 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -860,7 +860,6 @@ enum cpu_idle_type { #define SD_BALANCE_FORK0x0008 /* Balance on fork, clone */ #define SD_BALANCE_WAKE0x0010 /* Balance on wakeup */ #define SD_WAKE_AFFINE 0x0020 /* Wake task to waking CPU */ -#define SD_PREFER_LOCAL0x0040 /* Prefer to keep tasks local to this domain */ #define SD_SHARE_CPUPOWER 0x0080 /* Domain members share cpu power */ #define SD_SHARE_PKG_RESOURCES 0x0200 /* Domain members share cpu pkg resources */ #define SD_SERIALIZE 0x0400 /* Only a single load balancing instance */ diff --git a/include/linux/topology.h b/include/linux/topology.h index fec12d6..d3cf0d6 100644 --- a/include/linux/topology.h +++ b/include/linux/topology.h @@ -129,7 +129,6 @@ int arch_update_cpu_topology(void); | 1*SD_BALANCE_FORK \ | 0*SD_BALANCE_WAKE \ | 1*SD_WAKE_AFFINE \ - | 0*SD_PREFER_LOCAL \ | 0*SD_SHARE_CPUPOWER \ | 1*SD_SHARE_PKG_RESOURCES \ | 0*SD_SERIALIZE\ @@ -160,7 +159,6 @@ int arch_update_cpu_topology(void); | 1*SD_BALANCE_FORK \ | 0*SD_BALANCE_WAKE \ | 1*SD_WAKE_AFFINE \ - | 0*SD_PREFER_LOCAL \ | 0*SD_SHARE_CPUPOWER \ | 0*SD_SHARE_PKG_RESOURCES \ | 0*SD_SERIALIZE\ diff --git a/kernel/sched/core.c b/kernel/sched/core.c index c9a3655..4376c9f 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -6622,7 +6622,6 @@ sd_numa_init(struct sched_domain_topology_level *tl, int cpu) | 0*SD_BALANCE_FORK | 0*SD_BALANCE_WAKE | 0*SD_WAKE_AFFINE - | 0*SD_PREFER_LOCAL | 0*SD_SHARE_CPUPOWER | 0*SD_SHARE_PKG_RESOURCES | 1*SD_SERIALIZE diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 287bfac..01d3eda 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -2686,7 +2686,6 @@ select_task_rq_fair(struct task_struct *p, int sd_flag, int wake_flags) int prev_cpu = task_cpu(p); int new_cpu = cpu; int want_affine = 0; - int want_sd = 1; int sync = wake_flags & WF_SYNC; if (p->nr_cpus_allowed == 1) @@ -2704,48 +2703,21 @@ select_task_rq_fair(struct task_struct *p, int sd_flag, int wake_flags) continue; /* -* If power savings logic is enabled for a domain, see if we -* are not overloaded, if so, don't balance wider. -*/ - if (tmp->flags & (SD_PREFER_LOCAL)) { - unsigned long power = 0; - unsigned long nr_running = 0; - unsigned long capacity; - int i; - - for_each_cpu(i, sched_domain_span(tmp)) { - power += power_of(i); - nr_running += cpu_rq(i)->cfs.nr_running; - } - - capacity = DIV_ROUND_CLOSEST(power,
[tip:sched/core] sched: recover SD_WAKE_AFFINE in select_task_rq_fair and code clean up
Commit-ID: f03542a7019c600163ac4441d8a826c92c1bd510 Gitweb: http://git.kernel.org/tip/f03542a7019c600163ac4441d8a826c92c1bd510 Author: Alex Shi alex@intel.com AuthorDate: Thu, 26 Jul 2012 08:55:34 +0800 Committer: Thomas Gleixner t...@linutronix.de CommitDate: Mon, 13 Aug 2012 19:02:05 +0200 sched: recover SD_WAKE_AFFINE in select_task_rq_fair and code clean up Since power saving code was removed from sched now, the implement code is out of service in this function, and even pollute other logical. like, 'want_sd' never has chance to be set '0', that remove the effect of SD_WAKE_AFFINE here. So, clean up the obsolete code, includes SD_PREFER_LOCAL. Signed-off-by: Alex Shi alex@intel.com Signed-off-by: Peter Zijlstra a.p.zijls...@chello.nl Link: http://lkml.kernel.org/r/5028f431.6000...@intel.com Signed-off-by: Thomas Gleixner t...@linutronix.de --- include/linux/sched.h|1 - include/linux/topology.h |2 -- kernel/sched/core.c |1 - kernel/sched/fair.c | 34 +++--- 4 files changed, 3 insertions(+), 35 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index b8c8664..f3eebc1 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -860,7 +860,6 @@ enum cpu_idle_type { #define SD_BALANCE_FORK0x0008 /* Balance on fork, clone */ #define SD_BALANCE_WAKE0x0010 /* Balance on wakeup */ #define SD_WAKE_AFFINE 0x0020 /* Wake task to waking CPU */ -#define SD_PREFER_LOCAL0x0040 /* Prefer to keep tasks local to this domain */ #define SD_SHARE_CPUPOWER 0x0080 /* Domain members share cpu power */ #define SD_SHARE_PKG_RESOURCES 0x0200 /* Domain members share cpu pkg resources */ #define SD_SERIALIZE 0x0400 /* Only a single load balancing instance */ diff --git a/include/linux/topology.h b/include/linux/topology.h index fec12d6..d3cf0d6 100644 --- a/include/linux/topology.h +++ b/include/linux/topology.h @@ -129,7 +129,6 @@ int arch_update_cpu_topology(void); | 1*SD_BALANCE_FORK \ | 0*SD_BALANCE_WAKE \ | 1*SD_WAKE_AFFINE \ - | 0*SD_PREFER_LOCAL \ | 0*SD_SHARE_CPUPOWER \ | 1*SD_SHARE_PKG_RESOURCES \ | 0*SD_SERIALIZE\ @@ -160,7 +159,6 @@ int arch_update_cpu_topology(void); | 1*SD_BALANCE_FORK \ | 0*SD_BALANCE_WAKE \ | 1*SD_WAKE_AFFINE \ - | 0*SD_PREFER_LOCAL \ | 0*SD_SHARE_CPUPOWER \ | 0*SD_SHARE_PKG_RESOURCES \ | 0*SD_SERIALIZE\ diff --git a/kernel/sched/core.c b/kernel/sched/core.c index c9a3655..4376c9f 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -6622,7 +6622,6 @@ sd_numa_init(struct sched_domain_topology_level *tl, int cpu) | 0*SD_BALANCE_FORK | 0*SD_BALANCE_WAKE | 0*SD_WAKE_AFFINE - | 0*SD_PREFER_LOCAL | 0*SD_SHARE_CPUPOWER | 0*SD_SHARE_PKG_RESOURCES | 1*SD_SERIALIZE diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 287bfac..01d3eda 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -2686,7 +2686,6 @@ select_task_rq_fair(struct task_struct *p, int sd_flag, int wake_flags) int prev_cpu = task_cpu(p); int new_cpu = cpu; int want_affine = 0; - int want_sd = 1; int sync = wake_flags WF_SYNC; if (p-nr_cpus_allowed == 1) @@ -2704,48 +2703,21 @@ select_task_rq_fair(struct task_struct *p, int sd_flag, int wake_flags) continue; /* -* If power savings logic is enabled for a domain, see if we -* are not overloaded, if so, don't balance wider. -*/ - if (tmp-flags (SD_PREFER_LOCAL)) { - unsigned long power = 0; - unsigned long nr_running = 0; - unsigned long capacity; - int i; - - for_each_cpu(i, sched_domain_span(tmp)) { - power += power_of(i); - nr_running += cpu_rq(i)-cfs.nr_running; -
[tip:sched/urgent] sched/numa: Add SD_PERFER_SIBLING to CPU domain
Commit-ID: 6956dc568f34107f1d02b24f87efe7250803fc87 Gitweb: http://git.kernel.org/tip/6956dc568f34107f1d02b24f87efe7250803fc87 Author: Alex Shi AuthorDate: Fri, 20 Jul 2012 14:19:50 +0800 Committer: Ingo Molnar CommitDate: Thu, 26 Jul 2012 11:46:58 +0200 sched/numa: Add SD_PERFER_SIBLING to CPU domain Commit 8e7fbcbc22c ("sched: Remove stale power aware scheduling remnants and dysfunctional knobs") removed SD_PERFER_SIBLING from the CPU domain. On NUMA machines this causes that load_balance() doesn't perfer LCPU in same physical CPU package. It causes some actual performance regressions on our NUMA machines from Core2 to NHM and SNB. Adding this domain flag again recovers the performance drop. This change doesn't have any bad impact on any of my benchmarks: specjbb, kbuild, fio, hackbench .. etc, on all my machines. Signed-off-by: Alex Shi Signed-off-by: Peter Zijlstra Link: http://lkml.kernel.org/r/1342765190-21540-1-git-send-email-alex@intel.com Signed-off-by: Ingo Molnar --- include/linux/topology.h |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/include/linux/topology.h b/include/linux/topology.h index e91cd43..fec12d6 100644 --- a/include/linux/topology.h +++ b/include/linux/topology.h @@ -164,6 +164,7 @@ int arch_update_cpu_topology(void); | 0*SD_SHARE_CPUPOWER \ | 0*SD_SHARE_PKG_RESOURCES \ | 0*SD_SERIALIZE\ + | 1*SD_PREFER_SIBLING \ , \ .last_balance = jiffies, \ .balance_interval = 1,\ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:sched/urgent] sched/numa: Add SD_PERFER_SIBLING to CPU domain
Commit-ID: 6956dc568f34107f1d02b24f87efe7250803fc87 Gitweb: http://git.kernel.org/tip/6956dc568f34107f1d02b24f87efe7250803fc87 Author: Alex Shi alex@intel.com AuthorDate: Fri, 20 Jul 2012 14:19:50 +0800 Committer: Ingo Molnar mi...@kernel.org CommitDate: Thu, 26 Jul 2012 11:46:58 +0200 sched/numa: Add SD_PERFER_SIBLING to CPU domain Commit 8e7fbcbc22c (sched: Remove stale power aware scheduling remnants and dysfunctional knobs) removed SD_PERFER_SIBLING from the CPU domain. On NUMA machines this causes that load_balance() doesn't perfer LCPU in same physical CPU package. It causes some actual performance regressions on our NUMA machines from Core2 to NHM and SNB. Adding this domain flag again recovers the performance drop. This change doesn't have any bad impact on any of my benchmarks: specjbb, kbuild, fio, hackbench .. etc, on all my machines. Signed-off-by: Alex Shi alex@intel.com Signed-off-by: Peter Zijlstra a.p.zijls...@chello.nl Link: http://lkml.kernel.org/r/1342765190-21540-1-git-send-email-alex@intel.com Signed-off-by: Ingo Molnar mi...@kernel.org --- include/linux/topology.h |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/include/linux/topology.h b/include/linux/topology.h index e91cd43..fec12d6 100644 --- a/include/linux/topology.h +++ b/include/linux/topology.h @@ -164,6 +164,7 @@ int arch_update_cpu_topology(void); | 0*SD_SHARE_CPUPOWER \ | 0*SD_SHARE_PKG_RESOURCES \ | 0*SD_SERIALIZE\ + | 1*SD_PREFER_SIBLING \ , \ .last_balance = jiffies, \ .balance_interval = 1,\ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/mm] x86/tlb: Fix build warning and crash when building for !SMP
Commit-ID: 7efa1c87963d23cc57ba40c07316d3e28cc75a3a Gitweb: http://git.kernel.org/tip/7efa1c87963d23cc57ba40c07316d3e28cc75a3a Author: Alex Shi AuthorDate: Fri, 20 Jul 2012 09:18:23 +0800 Committer: H. Peter Anvin CommitDate: Fri, 20 Jul 2012 15:01:48 -0700 x86/tlb: Fix build warning and crash when building for !SMP The incompatible parameter of flush_tlb_mm_range cause build warning. Fix it by correct parameter. Ingo Molnar found that this could also cause a user space crash. Reported-by: Tetsuo Handa Reported-by: Ingo Molnar Signed-off-by: Alex Shi Link: http://lkml.kernel.org/r/1342747103-19765-1-git-send-email-alex@intel.com Signed-off-by: H. Peter Anvin --- arch/x86/include/asm/tlbflush.h |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index b5a27bd..74a4433 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -105,10 +105,10 @@ static inline void flush_tlb_range(struct vm_area_struct *vma, __flush_tlb(); } -static inline void flush_tlb_mm_range(struct vm_area_struct *vma, +static inline void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start, unsigned long end, unsigned long vmflag) { - if (vma->vm_mm == current->active_mm) + if (mm == current->active_mm) __flush_tlb(); } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/mm] x86/tlb: Fix build warning and crash when building for !SMP
Commit-ID: 7efa1c87963d23cc57ba40c07316d3e28cc75a3a Gitweb: http://git.kernel.org/tip/7efa1c87963d23cc57ba40c07316d3e28cc75a3a Author: Alex Shi alex@intel.com AuthorDate: Fri, 20 Jul 2012 09:18:23 +0800 Committer: H. Peter Anvin h...@zytor.com CommitDate: Fri, 20 Jul 2012 15:01:48 -0700 x86/tlb: Fix build warning and crash when building for !SMP The incompatible parameter of flush_tlb_mm_range cause build warning. Fix it by correct parameter. Ingo Molnar found that this could also cause a user space crash. Reported-by: Tetsuo Handa penguin-ker...@i-love.sakura.ne.jp Reported-by: Ingo Molnar mi...@kernel.org Signed-off-by: Alex Shi alex@intel.com Link: http://lkml.kernel.org/r/1342747103-19765-1-git-send-email-alex@intel.com Signed-off-by: H. Peter Anvin h...@zytor.com --- arch/x86/include/asm/tlbflush.h |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index b5a27bd..74a4433 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -105,10 +105,10 @@ static inline void flush_tlb_range(struct vm_area_struct *vma, __flush_tlb(); } -static inline void flush_tlb_mm_range(struct vm_area_struct *vma, +static inline void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start, unsigned long end, unsigned long vmflag) { - if (vma-vm_mm == current-active_mm) + if (mm == current-active_mm) __flush_tlb(); } -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/