[tip:locking/urgent] locking/rtmutex: Remove unnecessary priority adjustment

2017-07-13 Thread tip-bot for Alex Shi
Commit-ID:  69f0d429c413fe96db2c187475cebcc6e3a8c7f5
Gitweb: http://git.kernel.org/tip/69f0d429c413fe96db2c187475cebcc6e3a8c7f5
Author: Alex Shi 
AuthorDate: Thu, 13 Jul 2017 14:18:24 +0800
Committer:  Ingo Molnar 
CommitDate: Thu, 13 Jul 2017 11:44:06 +0200

locking/rtmutex: Remove unnecessary priority adjustment

We don't need to adjust priority before adding a new pi_waiter, the
priority only needs to be updated after pi_waiter change or task
priority change.

Steven Rostedt pointed out:

  "Interesting, I did some git mining and this was added with the original
   entry of the rtmutex.c (23f78d4a03c5). Looking at even that version, I
   don't see the purpose of adjusting the task prio here. It is done
   before anything changes in the task."

Signed-off-by: Alex Shi 
Reviewed-by: Steven Rostedt (VMware) 
Acked-by: Peter Zijlstra (Intel) 
Cc: Juri Lelli 
Cc: Linus Torvalds 
Cc: Mathieu Poirier 
Cc: Sebastian Siewior 
Cc: Steven Rostedt 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/1499926704-28841-1-git-send-email-alex@linaro.org
[ Enhance the changelog. ]
Signed-off-by: Ingo Molnar 
---
 kernel/locking/rtmutex.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index 7806989..649dc9d 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -963,7 +963,6 @@ static int task_blocks_on_rt_mutex(struct rt_mutex *lock,
return -EDEADLK;
 
raw_spin_lock(>pi_lock);
-   rt_mutex_adjust_prio(task);
waiter->task = task;
waiter->lock = lock;
waiter->prio = task->prio;


[tip:locking/urgent] locking/rtmutex: Remove unnecessary priority adjustment

2017-07-13 Thread tip-bot for Alex Shi
Commit-ID:  69f0d429c413fe96db2c187475cebcc6e3a8c7f5
Gitweb: http://git.kernel.org/tip/69f0d429c413fe96db2c187475cebcc6e3a8c7f5
Author: Alex Shi 
AuthorDate: Thu, 13 Jul 2017 14:18:24 +0800
Committer:  Ingo Molnar 
CommitDate: Thu, 13 Jul 2017 11:44:06 +0200

locking/rtmutex: Remove unnecessary priority adjustment

We don't need to adjust priority before adding a new pi_waiter, the
priority only needs to be updated after pi_waiter change or task
priority change.

Steven Rostedt pointed out:

  "Interesting, I did some git mining and this was added with the original
   entry of the rtmutex.c (23f78d4a03c5). Looking at even that version, I
   don't see the purpose of adjusting the task prio here. It is done
   before anything changes in the task."

Signed-off-by: Alex Shi 
Reviewed-by: Steven Rostedt (VMware) 
Acked-by: Peter Zijlstra (Intel) 
Cc: Juri Lelli 
Cc: Linus Torvalds 
Cc: Mathieu Poirier 
Cc: Sebastian Siewior 
Cc: Steven Rostedt 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/1499926704-28841-1-git-send-email-alex@linaro.org
[ Enhance the changelog. ]
Signed-off-by: Ingo Molnar 
---
 kernel/locking/rtmutex.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index 7806989..649dc9d 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -963,7 +963,6 @@ static int task_blocks_on_rt_mutex(struct rt_mutex *lock,
return -EDEADLK;
 
raw_spin_lock(>pi_lock);
-   rt_mutex_adjust_prio(task);
waiter->task = task;
waiter->lock = lock;
waiter->prio = task->prio;


[tip:sched/core] sched: Clean up the task_hot() function

2014-03-12 Thread tip-bot for Alex Shi
Commit-ID:  6037dd1a49f95092824fa8ba75c717ff7805e317
Gitweb: http://git.kernel.org/tip/6037dd1a49f95092824fa8ba75c717ff7805e317
Author: Alex Shi 
AuthorDate: Wed, 12 Mar 2014 14:51:51 +0800
Committer:  Ingo Molnar 
CommitDate: Wed, 12 Mar 2014 10:49:01 +0100

sched: Clean up the task_hot() function

task_hot() doesn't need the 'sched_domain' parameter, so remove it.

Signed-off-by: Alex Shi 
Signed-off-by: Peter Zijlstra 
Link: 
http://lkml.kernel.org/r/1394607111-1904-1-git-send-email-alex@linaro.org
Signed-off-by: Ingo Molnar 
---
 kernel/sched/fair.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index b301918..7e9bd0b 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5037,7 +5037,7 @@ static void move_task(struct task_struct *p, struct 
lb_env *env)
  * Is this task likely cache-hot:
  */
 static int
-task_hot(struct task_struct *p, u64 now, struct sched_domain *sd)
+task_hot(struct task_struct *p, u64 now)
 {
s64 delta;
 
@@ -5198,7 +5198,7 @@ int can_migrate_task(struct task_struct *p, struct lb_env 
*env)
 * 2) task is cache cold, or
 * 3) too many balance attempts have failed.
 */
-   tsk_cache_hot = task_hot(p, rq_clock_task(env->src_rq), env->sd);
+   tsk_cache_hot = task_hot(p, rq_clock_task(env->src_rq));
if (!tsk_cache_hot)
tsk_cache_hot = migrate_degrades_locality(p, env);
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:sched/core] sched: Clean up the task_hot() function

2014-03-12 Thread tip-bot for Alex Shi
Commit-ID:  6037dd1a49f95092824fa8ba75c717ff7805e317
Gitweb: http://git.kernel.org/tip/6037dd1a49f95092824fa8ba75c717ff7805e317
Author: Alex Shi alex@linaro.org
AuthorDate: Wed, 12 Mar 2014 14:51:51 +0800
Committer:  Ingo Molnar mi...@kernel.org
CommitDate: Wed, 12 Mar 2014 10:49:01 +0100

sched: Clean up the task_hot() function

task_hot() doesn't need the 'sched_domain' parameter, so remove it.

Signed-off-by: Alex Shi alex@linaro.org
Signed-off-by: Peter Zijlstra pet...@infradead.org
Link: 
http://lkml.kernel.org/r/1394607111-1904-1-git-send-email-alex@linaro.org
Signed-off-by: Ingo Molnar mi...@kernel.org
---
 kernel/sched/fair.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index b301918..7e9bd0b 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5037,7 +5037,7 @@ static void move_task(struct task_struct *p, struct 
lb_env *env)
  * Is this task likely cache-hot:
  */
 static int
-task_hot(struct task_struct *p, u64 now, struct sched_domain *sd)
+task_hot(struct task_struct *p, u64 now)
 {
s64 delta;
 
@@ -5198,7 +5198,7 @@ int can_migrate_task(struct task_struct *p, struct lb_env 
*env)
 * 2) task is cache cold, or
 * 3) too many balance attempts have failed.
 */
-   tsk_cache_hot = task_hot(p, rq_clock_task(env-src_rq), env-sd);
+   tsk_cache_hot = task_hot(p, rq_clock_task(env-src_rq));
if (!tsk_cache_hot)
tsk_cache_hot = migrate_degrades_locality(p, env);
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:sched/core] sched: Add statistic for newidle load balance cost

2014-02-11 Thread tip-bot for Alex Shi
Commit-ID:  37e6bae8395a94b4dd934c92b02b9408be992365
Gitweb: http://git.kernel.org/tip/37e6bae8395a94b4dd934c92b02b9408be992365
Author: Alex Shi 
AuthorDate: Thu, 23 Jan 2014 18:39:54 +0800
Committer:  Ingo Molnar 
CommitDate: Tue, 11 Feb 2014 09:58:18 +0100

sched: Add statistic for newidle load balance cost

Tracking rq->max_idle_balance_cost and sd->max_newidle_lb_cost.
It's useful to know these values in debug mode.

Signed-off-by: Alex Shi 
Signed-off-by: Peter Zijlstra 
Link: http://lkml.kernel.org/r/52e0f3bf.5020...@linaro.org
Signed-off-by: Ingo Molnar 
---
 kernel/sched/core.c  | 9 ++---
 kernel/sched/debug.c | 1 +
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 3068f37..fb9764f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4811,7 +4811,7 @@ set_table_entry(struct ctl_table *entry,
 static struct ctl_table *
 sd_alloc_ctl_domain_table(struct sched_domain *sd)
 {
-   struct ctl_table *table = sd_alloc_ctl_entry(13);
+   struct ctl_table *table = sd_alloc_ctl_entry(14);
 
if (table == NULL)
return NULL;
@@ -4839,9 +4839,12 @@ sd_alloc_ctl_domain_table(struct sched_domain *sd)
sizeof(int), 0644, proc_dointvec_minmax, false);
set_table_entry([10], "flags", >flags,
sizeof(int), 0644, proc_dointvec_minmax, false);
-   set_table_entry([11], "name", sd->name,
+   set_table_entry([11], "max_newidle_lb_cost",
+   >max_newidle_lb_cost,
+   sizeof(long), 0644, proc_doulongvec_minmax, false);
+   set_table_entry([12], "name", sd->name,
CORENAME_MAX_SIZE, 0444, proc_dostring, false);
-   /* [12] is terminator */
+   /* [13] is terminator */
 
return table;
 }
diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index 31b908d..f3344c3 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -321,6 +321,7 @@ do {
\
P(sched_goidle);
 #ifdef CONFIG_SMP
P64(avg_idle);
+   P64(max_idle_balance_cost);
 #endif
 
P(ttwu_count);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:sched/core] sched: Add statistic for newidle load balance cost

2014-02-11 Thread tip-bot for Alex Shi
Commit-ID:  37e6bae8395a94b4dd934c92b02b9408be992365
Gitweb: http://git.kernel.org/tip/37e6bae8395a94b4dd934c92b02b9408be992365
Author: Alex Shi alex@linaro.org
AuthorDate: Thu, 23 Jan 2014 18:39:54 +0800
Committer:  Ingo Molnar mi...@kernel.org
CommitDate: Tue, 11 Feb 2014 09:58:18 +0100

sched: Add statistic for newidle load balance cost

Tracking rq-max_idle_balance_cost and sd-max_newidle_lb_cost.
It's useful to know these values in debug mode.

Signed-off-by: Alex Shi alex@linaro.org
Signed-off-by: Peter Zijlstra pet...@infradead.org
Link: http://lkml.kernel.org/r/52e0f3bf.5020...@linaro.org
Signed-off-by: Ingo Molnar mi...@kernel.org
---
 kernel/sched/core.c  | 9 ++---
 kernel/sched/debug.c | 1 +
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 3068f37..fb9764f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4811,7 +4811,7 @@ set_table_entry(struct ctl_table *entry,
 static struct ctl_table *
 sd_alloc_ctl_domain_table(struct sched_domain *sd)
 {
-   struct ctl_table *table = sd_alloc_ctl_entry(13);
+   struct ctl_table *table = sd_alloc_ctl_entry(14);
 
if (table == NULL)
return NULL;
@@ -4839,9 +4839,12 @@ sd_alloc_ctl_domain_table(struct sched_domain *sd)
sizeof(int), 0644, proc_dointvec_minmax, false);
set_table_entry(table[10], flags, sd-flags,
sizeof(int), 0644, proc_dointvec_minmax, false);
-   set_table_entry(table[11], name, sd-name,
+   set_table_entry(table[11], max_newidle_lb_cost,
+   sd-max_newidle_lb_cost,
+   sizeof(long), 0644, proc_doulongvec_minmax, false);
+   set_table_entry(table[12], name, sd-name,
CORENAME_MAX_SIZE, 0444, proc_dostring, false);
-   /* table[12] is terminator */
+   /* table[13] is terminator */
 
return table;
 }
diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index 31b908d..f3344c3 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -321,6 +321,7 @@ do {
\
P(sched_goidle);
 #ifdef CONFIG_SMP
P64(avg_idle);
+   P64(max_idle_balance_cost);
 #endif
 
P(ttwu_count);
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:timers/urgent] nohz_full: fix code style issue of tick_nohz_full_stop_tick

2014-01-25 Thread tip-bot for Alex Shi
Commit-ID:  e9a2eb403bd953788cd2abfd0d2646d43bd22671
Gitweb: http://git.kernel.org/tip/e9a2eb403bd953788cd2abfd0d2646d43bd22671
Author: Alex Shi 
AuthorDate: Thu, 28 Nov 2013 14:27:11 +0800
Committer:  Frederic Weisbecker 
CommitDate: Wed, 15 Jan 2014 23:07:11 +0100

nohz_full: fix code style issue of tick_nohz_full_stop_tick

Code usually starts with 'tab' instead of 7 'space' in kernel

Signed-off-by: Alex Shi 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Cc: Alex Shi 
Cc: Steven Rostedt 
Cc: Paul E. McKenney 
Cc: John Stultz 
Cc: Kevin Hilman 
Link: 
http://lkml.kernel.org/r/1386074112-30754-2-git-send-email-alex@linaro.org
Signed-off-by: Frederic Weisbecker 
---
 kernel/time/tick-sched.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 68331d1..d603bad 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -679,18 +679,18 @@ out:
 static void tick_nohz_full_stop_tick(struct tick_sched *ts)
 {
 #ifdef CONFIG_NO_HZ_FULL
-   int cpu = smp_processor_id();
+   int cpu = smp_processor_id();
 
-   if (!tick_nohz_full_cpu(cpu) || is_idle_task(current))
-   return;
+   if (!tick_nohz_full_cpu(cpu) || is_idle_task(current))
+   return;
 
-   if (!ts->tick_stopped && ts->nohz_mode == NOHZ_MODE_INACTIVE)
-  return;
+   if (!ts->tick_stopped && ts->nohz_mode == NOHZ_MODE_INACTIVE)
+   return;
 
-   if (!can_stop_full_tick())
-   return;
+   if (!can_stop_full_tick())
+   return;
 
-   tick_nohz_stop_sched_tick(ts, ktime_get(), cpu);
+   tick_nohz_stop_sched_tick(ts, ktime_get(), cpu);
 #endif
 }
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:timers/urgent] nohz_full: fix code style issue of tick_nohz_full_stop_tick

2014-01-25 Thread tip-bot for Alex Shi
Commit-ID:  e9a2eb403bd953788cd2abfd0d2646d43bd22671
Gitweb: http://git.kernel.org/tip/e9a2eb403bd953788cd2abfd0d2646d43bd22671
Author: Alex Shi alex@linaro.org
AuthorDate: Thu, 28 Nov 2013 14:27:11 +0800
Committer:  Frederic Weisbecker fweis...@gmail.com
CommitDate: Wed, 15 Jan 2014 23:07:11 +0100

nohz_full: fix code style issue of tick_nohz_full_stop_tick

Code usually starts with 'tab' instead of 7 'space' in kernel

Signed-off-by: Alex Shi alex@linaro.org
Cc: Thomas Gleixner t...@linutronix.de
Cc: Ingo Molnar mi...@kernel.org
Cc: Peter Zijlstra pet...@infradead.org
Cc: Alex Shi alex@linaro.org
Cc: Steven Rostedt rost...@goodmis.org
Cc: Paul E. McKenney paul...@linux.vnet.ibm.com
Cc: John Stultz john.stu...@linaro.org
Cc: Kevin Hilman khil...@linaro.org
Link: 
http://lkml.kernel.org/r/1386074112-30754-2-git-send-email-alex@linaro.org
Signed-off-by: Frederic Weisbecker fweis...@gmail.com
---
 kernel/time/tick-sched.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 68331d1..d603bad 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -679,18 +679,18 @@ out:
 static void tick_nohz_full_stop_tick(struct tick_sched *ts)
 {
 #ifdef CONFIG_NO_HZ_FULL
-   int cpu = smp_processor_id();
+   int cpu = smp_processor_id();
 
-   if (!tick_nohz_full_cpu(cpu) || is_idle_task(current))
-   return;
+   if (!tick_nohz_full_cpu(cpu) || is_idle_task(current))
+   return;
 
-   if (!ts-tick_stopped  ts-nohz_mode == NOHZ_MODE_INACTIVE)
-  return;
+   if (!ts-tick_stopped  ts-nohz_mode == NOHZ_MODE_INACTIVE)
+   return;
 
-   if (!can_stop_full_tick())
-   return;
+   if (!can_stop_full_tick())
+   return;
 
-   tick_nohz_stop_sched_tick(ts, ktime_get(), cpu);
+   tick_nohz_stop_sched_tick(ts, ktime_get(), cpu);
 #endif
 }
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:sched/urgent] sched: Remove unused variable in ' struct sched_domain'

2013-11-19 Thread tip-bot for Alex Shi
Commit-ID:  b972fc308c2763096b61b62169f2167ee0ca5a19
Gitweb: http://git.kernel.org/tip/b972fc308c2763096b61b62169f2167ee0ca5a19
Author: Alex Shi 
AuthorDate: Tue, 19 Nov 2013 17:21:52 +0800
Committer:  Ingo Molnar 
CommitDate: Tue, 19 Nov 2013 17:01:17 +0100

sched: Remove unused variable in 'struct sched_domain'

The 'u64 last_update' variable isn't used now, remove it to save a bit of space.

Signed-off-by: Alex Shi 
Signed-off-by: Peter Zijlstra 
Cc: morten.rasmus...@arm.com
Cc: linaro-ker...@lists.linaro.org
Link: 
http://lkml.kernel.org/r/1384852912-24791-1-git-send-email-alex@linaro.org
Signed-off-by: Ingo Molnar 
---
 include/linux/sched.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index f7efc86..b122395 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -823,8 +823,6 @@ struct sched_domain {
unsigned int balance_interval;  /* initialise to 1. units in ms. */
unsigned int nr_balance_failed; /* initialise to 0 */
 
-   u64 last_update;
-
/* idle_balance() stats */
u64 max_newidle_lb_cost;
unsigned long next_decay_max_lb_cost;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:sched/urgent] sched: Remove unused variable in ' struct sched_domain'

2013-11-19 Thread tip-bot for Alex Shi
Commit-ID:  b972fc308c2763096b61b62169f2167ee0ca5a19
Gitweb: http://git.kernel.org/tip/b972fc308c2763096b61b62169f2167ee0ca5a19
Author: Alex Shi alex@linaro.org
AuthorDate: Tue, 19 Nov 2013 17:21:52 +0800
Committer:  Ingo Molnar mi...@kernel.org
CommitDate: Tue, 19 Nov 2013 17:01:17 +0100

sched: Remove unused variable in 'struct sched_domain'

The 'u64 last_update' variable isn't used now, remove it to save a bit of space.

Signed-off-by: Alex Shi alex@linaro.org
Signed-off-by: Peter Zijlstra pet...@infradead.org
Cc: morten.rasmus...@arm.com
Cc: linaro-ker...@lists.linaro.org
Link: 
http://lkml.kernel.org/r/1384852912-24791-1-git-send-email-alex@linaro.org
Signed-off-by: Ingo Molnar mi...@kernel.org
---
 include/linux/sched.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index f7efc86..b122395 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -823,8 +823,6 @@ struct sched_domain {
unsigned int balance_interval;  /* initialise to 1. units in ms. */
unsigned int nr_balance_failed; /* initialise to 0 */
 
-   u64 last_update;
-
/* idle_balance() stats */
u64 max_newidle_lb_cost;
unsigned long next_decay_max_lb_cost;
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:sched/core] sched/debug: Remove CONFIG_FAIR_GROUP_SCHED mask

2013-06-28 Thread tip-bot for Alex Shi
Commit-ID:  333bb864f192015a53b5060b829089decd0220ef
Gitweb: http://git.kernel.org/tip/333bb864f192015a53b5060b829089decd0220ef
Author: Alex Shi 
AuthorDate: Fri, 28 Jun 2013 19:10:35 +0800
Committer:  Ingo Molnar 
CommitDate: Fri, 28 Jun 2013 13:17:17 +0200

sched/debug: Remove CONFIG_FAIR_GROUP_SCHED mask

Now that we are using runnable load avg in sched balance, we don't
need to keep it under CONFIG_FAIR_GROUP_SCHED.

Also align the code style to #ifdef instead of #if defined() and
reorder the tg output info.

Signed-off-by: Alex Shi 
Cc: p...@google.com
Cc: kamal...@linux.vnet.ibm.com
Cc: pet...@infradead.org
Link: 
http://lkml.kernel.org/r/1372417835-4698-1-git-send-email-alex@intel.com
Signed-off-by: Ingo Molnar 
---
 kernel/sched/debug.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index 1595614..e076bdd 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -209,22 +209,24 @@ void print_cfs_rq(struct seq_file *m, int cpu, struct 
cfs_rq *cfs_rq)
cfs_rq->nr_spread_over);
SEQ_printf(m, "  .%-30s: %d\n", "nr_running", cfs_rq->nr_running);
SEQ_printf(m, "  .%-30s: %ld\n", "load", cfs_rq->load.weight);
-#ifdef CONFIG_FAIR_GROUP_SCHED
 #ifdef CONFIG_SMP
SEQ_printf(m, "  .%-30s: %ld\n", "runnable_load_avg",
cfs_rq->runnable_load_avg);
SEQ_printf(m, "  .%-30s: %ld\n", "blocked_load_avg",
cfs_rq->blocked_load_avg);
-   SEQ_printf(m, "  .%-30s: %ld\n", "tg_load_avg",
-   atomic_long_read(_rq->tg->load_avg));
+#ifdef CONFIG_FAIR_GROUP_SCHED
SEQ_printf(m, "  .%-30s: %ld\n", "tg_load_contrib",
cfs_rq->tg_load_contrib);
SEQ_printf(m, "  .%-30s: %d\n", "tg_runnable_contrib",
cfs_rq->tg_runnable_contrib);
+   SEQ_printf(m, "  .%-30s: %ld\n", "tg_load_avg",
+   atomic_long_read(_rq->tg->load_avg));
SEQ_printf(m, "  .%-30s: %d\n", "tg->runnable_avg",
atomic_read(_rq->tg->runnable_avg));
 #endif
+#endif
 
+#ifdef CONFIG_FAIR_GROUP_SCHED
print_cfs_group_stats(m, cpu, cfs_rq->tg);
 #endif
 }
@@ -567,7 +569,7 @@ void proc_sched_show_task(struct task_struct *p, struct 
seq_file *m)
   "nr_involuntary_switches", (long long)p->nivcsw);
 
P(se.load.weight);
-#if defined(CONFIG_SMP) && defined(CONFIG_FAIR_GROUP_SCHED)
+#ifdef CONFIG_SMP
P(se.avg.runnable_avg_sum);
P(se.avg.runnable_avg_period);
P(se.avg.load_avg_contrib);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:sched/core] sched/debug: Remove CONFIG_FAIR_GROUP_SCHED mask

2013-06-28 Thread tip-bot for Alex Shi
Commit-ID:  333bb864f192015a53b5060b829089decd0220ef
Gitweb: http://git.kernel.org/tip/333bb864f192015a53b5060b829089decd0220ef
Author: Alex Shi alex@intel.com
AuthorDate: Fri, 28 Jun 2013 19:10:35 +0800
Committer:  Ingo Molnar mi...@kernel.org
CommitDate: Fri, 28 Jun 2013 13:17:17 +0200

sched/debug: Remove CONFIG_FAIR_GROUP_SCHED mask

Now that we are using runnable load avg in sched balance, we don't
need to keep it under CONFIG_FAIR_GROUP_SCHED.

Also align the code style to #ifdef instead of #if defined() and
reorder the tg output info.

Signed-off-by: Alex Shi alex@intel.com
Cc: p...@google.com
Cc: kamal...@linux.vnet.ibm.com
Cc: pet...@infradead.org
Link: 
http://lkml.kernel.org/r/1372417835-4698-1-git-send-email-alex@intel.com
Signed-off-by: Ingo Molnar mi...@kernel.org
---
 kernel/sched/debug.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index 1595614..e076bdd 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -209,22 +209,24 @@ void print_cfs_rq(struct seq_file *m, int cpu, struct 
cfs_rq *cfs_rq)
cfs_rq-nr_spread_over);
SEQ_printf(m,   .%-30s: %d\n, nr_running, cfs_rq-nr_running);
SEQ_printf(m,   .%-30s: %ld\n, load, cfs_rq-load.weight);
-#ifdef CONFIG_FAIR_GROUP_SCHED
 #ifdef CONFIG_SMP
SEQ_printf(m,   .%-30s: %ld\n, runnable_load_avg,
cfs_rq-runnable_load_avg);
SEQ_printf(m,   .%-30s: %ld\n, blocked_load_avg,
cfs_rq-blocked_load_avg);
-   SEQ_printf(m,   .%-30s: %ld\n, tg_load_avg,
-   atomic_long_read(cfs_rq-tg-load_avg));
+#ifdef CONFIG_FAIR_GROUP_SCHED
SEQ_printf(m,   .%-30s: %ld\n, tg_load_contrib,
cfs_rq-tg_load_contrib);
SEQ_printf(m,   .%-30s: %d\n, tg_runnable_contrib,
cfs_rq-tg_runnable_contrib);
+   SEQ_printf(m,   .%-30s: %ld\n, tg_load_avg,
+   atomic_long_read(cfs_rq-tg-load_avg));
SEQ_printf(m,   .%-30s: %d\n, tg-runnable_avg,
atomic_read(cfs_rq-tg-runnable_avg));
 #endif
+#endif
 
+#ifdef CONFIG_FAIR_GROUP_SCHED
print_cfs_group_stats(m, cpu, cfs_rq-tg);
 #endif
 }
@@ -567,7 +569,7 @@ void proc_sched_show_task(struct task_struct *p, struct 
seq_file *m)
   nr_involuntary_switches, (long long)p-nivcsw);
 
P(se.load.weight);
-#if defined(CONFIG_SMP)  defined(CONFIG_FAIR_GROUP_SCHED)
+#ifdef CONFIG_SMP
P(se.avg.runnable_avg_sum);
P(se.avg.runnable_avg_period);
P(se.avg.load_avg_contrib);
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:sched/core] Revert "sched: Introduce temporary FAIR_GROUP_SCHED dependency for load-tracking"

2013-06-27 Thread tip-bot for Alex Shi
Commit-ID:  141965c7494d984b2bf24efd361a3125278869c6
Gitweb: http://git.kernel.org/tip/141965c7494d984b2bf24efd361a3125278869c6
Author: Alex Shi 
AuthorDate: Wed, 26 Jun 2013 13:05:39 +0800
Committer:  Ingo Molnar 
CommitDate: Thu, 27 Jun 2013 10:07:22 +0200

Revert "sched: Introduce temporary FAIR_GROUP_SCHED dependency for 
load-tracking"

Remove CONFIG_FAIR_GROUP_SCHED that covers the runnable info, then
we can use runnable load variables.

Also remove 2 CONFIG_FAIR_GROUP_SCHED setting which is not in reverted
patch(introduced in 9ee474f), but also need to revert.

Signed-off-by: Alex Shi 
Signed-off-by: Peter Zijlstra 
Link: http://lkml.kernel.org/r/51ca76a3.3050...@intel.com
Signed-off-by: Ingo Molnar 
---
 include/linux/sched.h |  7 +--
 kernel/sched/core.c   |  7 +--
 kernel/sched/fair.c   | 17 -
 kernel/sched/sched.h  | 19 ++-
 4 files changed, 8 insertions(+), 42 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 178a8d9..0019bef 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -994,12 +994,7 @@ struct sched_entity {
struct cfs_rq   *my_q;
 #endif
 
-/*
- * Load-tracking only depends on SMP, FAIR_GROUP_SCHED dependency below may be
- * removed when useful for applications beyond shares distribution (e.g.
- * load-balance).
- */
-#if defined(CONFIG_SMP) && defined(CONFIG_FAIR_GROUP_SCHED)
+#ifdef CONFIG_SMP
/* Per-entity load-tracking */
struct sched_avgavg;
 #endif
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ceeaf0f..0241b1b 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1611,12 +1611,7 @@ static void __sched_fork(struct task_struct *p)
p->se.vruntime  = 0;
INIT_LIST_HEAD(>se.group_node);
 
-/*
- * Load-tracking only depends on SMP, FAIR_GROUP_SCHED dependency below may be
- * removed when useful for applications beyond shares distribution (e.g.
- * load-balance).
- */
-#if defined(CONFIG_SMP) && defined(CONFIG_FAIR_GROUP_SCHED)
+#ifdef CONFIG_SMP
p->se.avg.runnable_avg_period = 0;
p->se.avg.runnable_avg_sum = 0;
 #endif
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c0ac2c3..36eadaa 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1128,8 +1128,7 @@ static inline void update_cfs_shares(struct cfs_rq 
*cfs_rq)
 }
 #endif /* CONFIG_FAIR_GROUP_SCHED */
 
-/* Only depends on SMP, FAIR_GROUP_SCHED may be removed when useful in lb */
-#if defined(CONFIG_SMP) && defined(CONFIG_FAIR_GROUP_SCHED)
+#ifdef CONFIG_SMP
 /*
  * We choose a half-life close to 1 scheduling period.
  * Note: The tables below are dependent on this value.
@@ -3431,12 +3430,6 @@ unlock:
 }
 
 /*
- * Load-tracking only depends on SMP, FAIR_GROUP_SCHED dependency below may be
- * removed when useful for applications beyond shares distribution (e.g.
- * load-balance).
- */
-#ifdef CONFIG_FAIR_GROUP_SCHED
-/*
  * Called immediately before a task is migrated to a new cpu; task_cpu(p) and
  * cfs_rq_of(p) references at time of call are still valid and identify the
  * previous cpu.  However, the caller only guarantees p->pi_lock is held; no
@@ -3459,7 +3452,6 @@ migrate_task_rq_fair(struct task_struct *p, int next_cpu)
atomic64_add(se->avg.load_avg_contrib, _rq->removed_load);
}
 }
-#endif
 #endif /* CONFIG_SMP */
 
 static unsigned long
@@ -5861,7 +5853,7 @@ static void switched_from_fair(struct rq *rq, struct 
task_struct *p)
se->vruntime -= cfs_rq->min_vruntime;
}
 
-#if defined(CONFIG_FAIR_GROUP_SCHED) && defined(CONFIG_SMP)
+#ifdef CONFIG_SMP
/*
* Remove our load from contribution when we leave sched_fair
* and ensure we don't carry in an old decay_count if we
@@ -5920,7 +5912,7 @@ void init_cfs_rq(struct cfs_rq *cfs_rq)
 #ifndef CONFIG_64BIT
cfs_rq->min_vruntime_copy = cfs_rq->min_vruntime;
 #endif
-#if defined(CONFIG_FAIR_GROUP_SCHED) && defined(CONFIG_SMP)
+#ifdef CONFIG_SMP
atomic64_set(_rq->decay_counter, 1);
atomic64_set(_rq->removed_load, 0);
 #endif
@@ -6162,9 +6154,8 @@ const struct sched_class fair_sched_class = {
 
 #ifdef CONFIG_SMP
.select_task_rq = select_task_rq_fair,
-#ifdef CONFIG_FAIR_GROUP_SCHED
.migrate_task_rq= migrate_task_rq_fair,
-#endif
+
.rq_online  = rq_online_fair,
.rq_offline = rq_offline_fair,
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 029601a..77ce668 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -269,12 +269,6 @@ struct cfs_rq {
 #endif
 
 #ifdef CONFIG_SMP
-/*
- * Load-tracking only depends on SMP, FAIR_GROUP_SCHED dependency below may be
- * removed when useful for applications beyond shares distribution (e.g.
- * load-balance).
- */
-#ifdef CONFIG_FAIR_GROUP_SCHED
/*
 * CFS Load tracking
 * Under CFS, load is tracked on a per-entity 

[tip:sched/core] sched: Set an initial value of runnable avg for new forked task

2013-06-27 Thread tip-bot for Alex Shi
Commit-ID:  a75cdaa915e42ef0e6f38dc7f2a6a1deca91d648
Gitweb: http://git.kernel.org/tip/a75cdaa915e42ef0e6f38dc7f2a6a1deca91d648
Author: Alex Shi 
AuthorDate: Thu, 20 Jun 2013 10:18:47 +0800
Committer:  Ingo Molnar 
CommitDate: Thu, 27 Jun 2013 10:07:30 +0200

sched: Set an initial value of runnable avg for new forked task

We need to initialize the se.avg.{decay_count, load_avg_contrib} for a
new forked task. Otherwise random values of above variables cause a
mess when a new task is enqueued:

enqueue_task_fair
enqueue_entity
enqueue_entity_load_avg

and make fork balancing imbalance due to incorrect load_avg_contrib.

Further more, Morten Rasmussen notice some tasks were not launched at
once after created. So Paul and Peter suggest giving a start value for
new task runnable avg time same as sched_slice().

PeterZ said:

> So the 'problem' is that our running avg is a 'floating' average; ie. it
> decays with time. Now we have to guess about the future of our newly
> spawned task -- something that is nigh impossible seeing these CPU
> vendors keep refusing to implement the crystal ball instruction.
>
> So there's two asymptotic cases we want to deal well with; 1) the case
> where the newly spawned program will be 'nearly' idle for its lifetime;
> and 2) the case where its cpu-bound.
>
> Since we have to guess, we'll go for worst case and assume its
> cpu-bound; now we don't want to make the avg so heavy adjusting to the
> near-idle case takes forever. We want to be able to quickly adjust and
> lower our running avg.
>
> Now we also don't want to make our avg too light, such that it gets
> decremented just for the new task not having had a chance to run yet --
> even if when it would run, it would be more cpu-bound than not.
>
> So what we do is we make the initial avg of the same duration as that we
> guess it takes to run each task on the system at least once -- aka
> sched_slice().
>
> Of course we can defeat this with wakeup/fork bombs, but in the 'normal'
> case it should be good enough.

Paul also contributed most of the code comments in this commit.

Signed-off-by: Alex Shi 
Reviewed-by: Gu Zheng 
Reviewed-by: Paul Turner 
[peterz; added explanation of sched_slice() usage]
Signed-off-by: Peter Zijlstra 
Link: 
http://lkml.kernel.org/r/1371694737-29336-4-git-send-email-alex@intel.com
Signed-off-by: Ingo Molnar 
---
 kernel/sched/core.c  |  6 ++
 kernel/sched/fair.c  | 24 
 kernel/sched/sched.h |  2 ++
 3 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 0241b1b..729e7fc 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1611,10 +1611,6 @@ static void __sched_fork(struct task_struct *p)
p->se.vruntime  = 0;
INIT_LIST_HEAD(>se.group_node);
 
-#ifdef CONFIG_SMP
-   p->se.avg.runnable_avg_period = 0;
-   p->se.avg.runnable_avg_sum = 0;
-#endif
 #ifdef CONFIG_SCHEDSTATS
memset(>se.statistics, 0, sizeof(p->se.statistics));
 #endif
@@ -1758,6 +1754,8 @@ void wake_up_new_task(struct task_struct *p)
set_task_cpu(p, select_task_rq(p, SD_BALANCE_FORK, 0));
 #endif
 
+   /* Initialize new task's runnable average */
+   init_task_runnable_average(p);
rq = __task_rq_lock(p);
activate_task(rq, p, 0);
p->on_rq = 1;
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 36eadaa..e1602a0 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -680,6 +680,26 @@ static u64 sched_vslice(struct cfs_rq *cfs_rq, struct 
sched_entity *se)
return calc_delta_fair(sched_slice(cfs_rq, se), se);
 }
 
+#ifdef CONFIG_SMP
+static inline void __update_task_entity_contrib(struct sched_entity *se);
+
+/* Give new task start runnable values to heavy its load in infant time */
+void init_task_runnable_average(struct task_struct *p)
+{
+   u32 slice;
+
+   p->se.avg.decay_count = 0;
+   slice = sched_slice(task_cfs_rq(p), >se) >> 10;
+   p->se.avg.runnable_avg_sum = slice;
+   p->se.avg.runnable_avg_period = slice;
+   __update_task_entity_contrib(>se);
+}
+#else
+void init_task_runnable_average(struct task_struct *p)
+{
+}
+#endif
+
 /*
  * Update the current task's runtime statistics. Skip current tasks that
  * are not in our scheduling class.
@@ -1527,6 +1547,10 @@ static inline void enqueue_entity_load_avg(struct cfs_rq 
*cfs_rq,
 * We track migrations using entity decay_count <= 0, on a wake-up
 * migration we use a negative decay count to track the remote decays
 * accumulated while sleeping.
+*
+* Newly forked tasks are enqueued with se->avg.decay_count == 0, they
+* are seen by enqueue_entity_load_avg() as a migration with an already
+* constructed load_avg_contrib.
 */
if (unlikely(se->avg.decay_count <= 0)) {
se->avg.last_runnable_update = rq_clock_task(rq_of(cfs_rq));
diff --git 

[tip:sched/core] sched: Update cpu load after task_tick

2013-06-27 Thread tip-bot for Alex Shi
Commit-ID:  83dfd5235ebd66c284b97befe6eabff7132333e6
Gitweb: http://git.kernel.org/tip/83dfd5235ebd66c284b97befe6eabff7132333e6
Author: Alex Shi 
AuthorDate: Thu, 20 Jun 2013 10:18:49 +0800
Committer:  Ingo Molnar 
CommitDate: Thu, 27 Jun 2013 10:07:33 +0200

sched: Update cpu load after task_tick

To get the latest runnable info, we need do this cpuload update after
task_tick.

Signed-off-by: Alex Shi 
Reviewed-by: Paul Turner 
Signed-off-by: Peter Zijlstra 
Link: 
http://lkml.kernel.org/r/1371694737-29336-6-git-send-email-alex@intel.com
Signed-off-by: Ingo Molnar 
---
 kernel/sched/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 729e7fc..08746cc 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2165,8 +2165,8 @@ void scheduler_tick(void)
 
raw_spin_lock(>lock);
update_rq_clock(rq);
-   update_cpu_load_active(rq);
curr->sched_class->task_tick(rq, curr, 0);
+   update_cpu_load_active(rq);
raw_spin_unlock(>lock);
 
perf_event_task_tick();
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:sched/core] sched: Change get_rq_runnable_load() to static and inline

2013-06-27 Thread tip-bot for Alex Shi
Commit-ID:  a9dc5d0e33c677619e4b97a38c23db1a42857905
Gitweb: http://git.kernel.org/tip/a9dc5d0e33c677619e4b97a38c23db1a42857905
Author: Alex Shi 
AuthorDate: Thu, 20 Jun 2013 10:18:57 +0800
Committer:  Ingo Molnar 
CommitDate: Thu, 27 Jun 2013 10:07:44 +0200

sched: Change get_rq_runnable_load() to static and inline

Based-on-patch-by: Fengguang Wu 
Signed-off-by: Alex Shi 
Tested-by: Vincent Guittot 
Signed-off-by: Peter Zijlstra 
Link: 
http://lkml.kernel.org/r/1371694737-29336-14-git-send-email-alex@intel.com
Signed-off-by: Ingo Molnar 
---
 kernel/sched/proc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/proc.c b/kernel/sched/proc.c
index ce5cd48..16f5a30 100644
--- a/kernel/sched/proc.c
+++ b/kernel/sched/proc.c
@@ -502,12 +502,12 @@ static void __update_cpu_load(struct rq *this_rq, 
unsigned long this_load,
 }
 
 #ifdef CONFIG_SMP
-unsigned long get_rq_runnable_load(struct rq *rq)
+static inline unsigned long get_rq_runnable_load(struct rq *rq)
 {
return rq->cfs.runnable_load_avg;
 }
 #else
-unsigned long get_rq_runnable_load(struct rq *rq)
+static inline unsigned long get_rq_runnable_load(struct rq *rq)
 {
return rq->load.weight;
 }
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:sched/core] sched/tg: Use 'unsigned long' for load variable in task group

2013-06-27 Thread tip-bot for Alex Shi
Commit-ID:  bf5b986ed4d20428eeec3df4a03dbfebb9b6538c
Gitweb: http://git.kernel.org/tip/bf5b986ed4d20428eeec3df4a03dbfebb9b6538c
Author: Alex Shi 
AuthorDate: Thu, 20 Jun 2013 10:18:54 +0800
Committer:  Ingo Molnar 
CommitDate: Thu, 27 Jun 2013 10:07:40 +0200

sched/tg: Use 'unsigned long' for load variable in task group

Since tg->load_avg is smaller than tg->load_weight, we don't need a
atomic64_t variable for load_avg in 32 bit machine.
The same reason for cfs_rq->tg_load_contrib.

The atomic_long_t/unsigned long variable type are more efficient and
convenience for them.

Signed-off-by: Alex Shi 
Tested-by: Vincent Guittot 
Signed-off-by: Peter Zijlstra 
Link: 
http://lkml.kernel.org/r/1371694737-29336-11-git-send-email-alex@intel.com
Signed-off-by: Ingo Molnar 
---
 kernel/sched/debug.c |  6 +++---
 kernel/sched/fair.c  | 12 ++--
 kernel/sched/sched.h |  4 ++--
 3 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index 160afdc..d803989 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -215,9 +215,9 @@ void print_cfs_rq(struct seq_file *m, int cpu, struct 
cfs_rq *cfs_rq)
cfs_rq->runnable_load_avg);
SEQ_printf(m, "  .%-30s: %ld\n", "blocked_load_avg",
cfs_rq->blocked_load_avg);
-   SEQ_printf(m, "  .%-30s: %lld\n", "tg_load_avg",
-   (unsigned long 
long)atomic64_read(_rq->tg->load_avg));
-   SEQ_printf(m, "  .%-30s: %lld\n", "tg_load_contrib",
+   SEQ_printf(m, "  .%-30s: %ld\n", "tg_load_avg",
+   atomic_long_read(_rq->tg->load_avg));
+   SEQ_printf(m, "  .%-30s: %ld\n", "tg_load_contrib",
cfs_rq->tg_load_contrib);
SEQ_printf(m, "  .%-30s: %d\n", "tg_runnable_contrib",
cfs_rq->tg_runnable_contrib);
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index f19772d..30ccc37 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1075,7 +1075,7 @@ static inline long calc_tg_weight(struct task_group *tg, 
struct cfs_rq *cfs_rq)
 * to gain a more accurate current total weight. See
 * update_cfs_rq_load_contribution().
 */
-   tg_weight = atomic64_read(>load_avg);
+   tg_weight = atomic_long_read(>load_avg);
tg_weight -= cfs_rq->tg_load_contrib;
tg_weight += cfs_rq->load.weight;
 
@@ -1356,13 +1356,13 @@ static inline void 
__update_cfs_rq_tg_load_contrib(struct cfs_rq *cfs_rq,
 int force_update)
 {
struct task_group *tg = cfs_rq->tg;
-   s64 tg_contrib;
+   long tg_contrib;
 
tg_contrib = cfs_rq->runnable_load_avg + cfs_rq->blocked_load_avg;
tg_contrib -= cfs_rq->tg_load_contrib;
 
-   if (force_update || abs64(tg_contrib) > cfs_rq->tg_load_contrib / 8) {
-   atomic64_add(tg_contrib, >load_avg);
+   if (force_update || abs(tg_contrib) > cfs_rq->tg_load_contrib / 8) {
+   atomic_long_add(tg_contrib, >load_avg);
cfs_rq->tg_load_contrib += tg_contrib;
}
 }
@@ -1397,8 +1397,8 @@ static inline void __update_group_entity_contrib(struct 
sched_entity *se)
u64 contrib;
 
contrib = cfs_rq->tg_load_contrib * tg->shares;
-   se->avg.load_avg_contrib = div64_u64(contrib,
-atomic64_read(>load_avg) + 1);
+   se->avg.load_avg_contrib = div_u64(contrib,
+atomic_long_read(>load_avg) + 1);
 
/*
 * For group entities we need to compute a correction term in the case
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 9eb12d9..5585eb2 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -150,7 +150,7 @@ struct task_group {
 
atomic_t load_weight;
 #ifdef CONFIG_SMP
-   atomic64_t load_avg;
+   atomic_long_t load_avg;
atomic_t runnable_avg;
 #endif
 #endif
@@ -284,7 +284,7 @@ struct cfs_rq {
 #ifdef CONFIG_FAIR_GROUP_SCHED
/* Required to track per-cpu representation of a task_group */
u32 tg_runnable_contrib;
-   u64 tg_load_contrib;
+   unsigned long tg_load_contrib;
 #endif /* CONFIG_FAIR_GROUP_SCHED */
 
/*
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:sched/core] sched: Change cfs_rq load avg to unsigned long

2013-06-27 Thread tip-bot for Alex Shi
Commit-ID:  72a4cf20cb71a327c636c7042fdacc25abffc87c
Gitweb: http://git.kernel.org/tip/72a4cf20cb71a327c636c7042fdacc25abffc87c
Author: Alex Shi 
AuthorDate: Thu, 20 Jun 2013 10:18:53 +0800
Committer:  Ingo Molnar 
CommitDate: Thu, 27 Jun 2013 10:07:38 +0200

sched: Change cfs_rq load avg to unsigned long

Since the 'u64 runnable_load_avg, blocked_load_avg' in cfs_rq struct are
smaller than 'unsigned long' cfs_rq->load.weight. We don't need u64
vaiables to describe them. unsigned long is more efficient and convenience.

Signed-off-by: Alex Shi 
Reviewed-by: Paul Turner 
Tested-by: Vincent Guittot 
Signed-off-by: Peter Zijlstra 
Link: 
http://lkml.kernel.org/r/1371694737-29336-10-git-send-email-alex@intel.com
Signed-off-by: Ingo Molnar 
---
 kernel/sched/debug.c | 4 ++--
 kernel/sched/fair.c  | 7 ++-
 kernel/sched/sched.h | 2 +-
 3 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index 75024a6..160afdc 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -211,9 +211,9 @@ void print_cfs_rq(struct seq_file *m, int cpu, struct 
cfs_rq *cfs_rq)
SEQ_printf(m, "  .%-30s: %ld\n", "load", cfs_rq->load.weight);
 #ifdef CONFIG_FAIR_GROUP_SCHED
 #ifdef CONFIG_SMP
-   SEQ_printf(m, "  .%-30s: %lld\n", "runnable_load_avg",
+   SEQ_printf(m, "  .%-30s: %ld\n", "runnable_load_avg",
cfs_rq->runnable_load_avg);
-   SEQ_printf(m, "  .%-30s: %lld\n", "blocked_load_avg",
+   SEQ_printf(m, "  .%-30s: %ld\n", "blocked_load_avg",
cfs_rq->blocked_load_avg);
SEQ_printf(m, "  .%-30s: %lld\n", "tg_load_avg",
(unsigned long 
long)atomic64_read(_rq->tg->load_avg));
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7948bb8..f19772d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4181,12 +4181,9 @@ static int tg_load_down(struct task_group *tg, void 
*data)
if (!tg->parent) {
load = cpu_rq(cpu)->avg.load_avg_contrib;
} else {
-   unsigned long tmp_rla;
-   tmp_rla = tg->parent->cfs_rq[cpu]->runnable_load_avg + 1;
-
load = tg->parent->cfs_rq[cpu]->h_load;
-   load *= tg->se[cpu]->avg.load_avg_contrib;
-   load /= tmp_rla;
+   load = div64_ul(load * tg->se[cpu]->avg.load_avg_contrib,
+   tg->parent->cfs_rq[cpu]->runnable_load_avg + 1);
}
 
tg->cfs_rq[cpu]->h_load = load;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 9c65d46..9eb12d9 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -277,7 +277,7 @@ struct cfs_rq {
 * This allows for the description of both thread and group usage (in
 * the FAIR_GROUP_SCHED case).
 */
-   u64 runnable_load_avg, blocked_load_avg;
+   unsigned long runnable_load_avg, blocked_load_avg;
atomic64_t decay_counter, removed_load;
u64 last_decay;
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:sched/core] sched/tg: Remove tg.load_weight

2013-06-27 Thread tip-bot for Alex Shi
Commit-ID:  a9cef46a10cc1b84bf2cdf4060766d858c0439d8
Gitweb: http://git.kernel.org/tip/a9cef46a10cc1b84bf2cdf4060766d858c0439d8
Author: Alex Shi 
AuthorDate: Thu, 20 Jun 2013 10:18:56 +0800
Committer:  Ingo Molnar 
CommitDate: Thu, 27 Jun 2013 10:07:43 +0200

sched/tg: Remove tg.load_weight

Since no one use it.

Signed-off-by: Alex Shi 
Reviewed-by: Paul Turner 
Tested-by: Vincent Guittot 
Signed-off-by: Peter Zijlstra 
Link: 
http://lkml.kernel.org/r/1371694737-29336-13-git-send-email-alex@intel.com
Signed-off-by: Ingo Molnar 
---
 kernel/sched/sched.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 7059919..ef0a7b2 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -148,7 +148,6 @@ struct task_group {
struct cfs_rq **cfs_rq;
unsigned long shares;
 
-   atomic_t load_weight;
 #ifdef CONFIG_SMP
atomic_long_t load_avg;
atomic_t runnable_avg;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:sched/core] sched/cfs_rq: Change atomic64_t removed_load to atomic_long_t

2013-06-27 Thread tip-bot for Alex Shi
Commit-ID:  2509940fd71c2e2915a05052bbdbf2d478364184
Gitweb: http://git.kernel.org/tip/2509940fd71c2e2915a05052bbdbf2d478364184
Author: Alex Shi 
AuthorDate: Thu, 20 Jun 2013 10:18:55 +0800
Committer:  Ingo Molnar 
CommitDate: Thu, 27 Jun 2013 10:07:41 +0200

sched/cfs_rq: Change atomic64_t removed_load to atomic_long_t

Similar to runnable_load_avg, blocked_load_avg variable, long type is
enough for removed_load in 64 bit or 32 bit machine.

Then we avoid the expensive atomic64 operations on 32 bit machine.

Signed-off-by: Alex Shi 
Reviewed-by: Paul Turner 
Tested-by: Vincent Guittot 
Signed-off-by: Peter Zijlstra 
Link: 
http://lkml.kernel.org/r/1371694737-29336-12-git-send-email-alex@intel.com
Signed-off-by: Ingo Molnar 
---
 kernel/sched/fair.c  | 10 ++
 kernel/sched/sched.h |  3 ++-
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 30ccc37..b43474a 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1517,8 +1517,9 @@ static void update_cfs_rq_blocked_load(struct cfs_rq 
*cfs_rq, int force_update)
if (!decays && !force_update)
return;
 
-   if (atomic64_read(_rq->removed_load)) {
-   u64 removed_load = atomic64_xchg(_rq->removed_load, 0);
+   if (atomic_long_read(_rq->removed_load)) {
+   unsigned long removed_load;
+   removed_load = atomic_long_xchg(_rq->removed_load, 0);
subtract_blocked_load_contrib(cfs_rq, removed_load);
}
 
@@ -3480,7 +3481,8 @@ migrate_task_rq_fair(struct task_struct *p, int next_cpu)
 */
if (se->avg.decay_count) {
se->avg.decay_count = -__synchronize_entity_decay(se);
-   atomic64_add(se->avg.load_avg_contrib, _rq->removed_load);
+   atomic_long_add(se->avg.load_avg_contrib,
+   _rq->removed_load);
}
 }
 #endif /* CONFIG_SMP */
@@ -5942,7 +5944,7 @@ void init_cfs_rq(struct cfs_rq *cfs_rq)
 #endif
 #ifdef CONFIG_SMP
atomic64_set(_rq->decay_counter, 1);
-   atomic64_set(_rq->removed_load, 0);
+   atomic_long_set(_rq->removed_load, 0);
 #endif
 }
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 5585eb2..7059919 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -278,8 +278,9 @@ struct cfs_rq {
 * the FAIR_GROUP_SCHED case).
 */
unsigned long runnable_load_avg, blocked_load_avg;
-   atomic64_t decay_counter, removed_load;
+   atomic64_t decay_counter;
u64 last_decay;
+   atomic_long_t removed_load;
 
 #ifdef CONFIG_FAIR_GROUP_SCHED
/* Required to track per-cpu representation of a task_group */
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:sched/core] sched: Consider runnable load average in move_tasks()

2013-06-27 Thread tip-bot for Alex Shi
Commit-ID:  a003a25b227d59ded9197ced109517f037d01c27
Gitweb: http://git.kernel.org/tip/a003a25b227d59ded9197ced109517f037d01c27
Author: Alex Shi 
AuthorDate: Thu, 20 Jun 2013 10:18:51 +0800
Committer:  Ingo Molnar 
CommitDate: Thu, 27 Jun 2013 10:07:36 +0200

sched: Consider runnable load average in move_tasks()

Aside from using runnable load average in background, move_tasks is
also the key function in load balance. We need consider the runnable
load average in it in order to make it an apple to apple load
comparison.

Morten had caught a div u64 bug on ARM, thanks!

Thanks-to: Morten Rasmussen 
Signed-off-by: Alex Shi 
Signed-off-by: Peter Zijlstra 
Link: 
http://lkml.kernel.org/r/1371694737-29336-8-git-send-email-alex@intel.com
Signed-off-by: Ingo Molnar 
---
 kernel/sched/fair.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e6d82ca..7948bb8 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4179,11 +4179,14 @@ static int tg_load_down(struct task_group *tg, void 
*data)
long cpu = (long)data;
 
if (!tg->parent) {
-   load = cpu_rq(cpu)->load.weight;
+   load = cpu_rq(cpu)->avg.load_avg_contrib;
} else {
+   unsigned long tmp_rla;
+   tmp_rla = tg->parent->cfs_rq[cpu]->runnable_load_avg + 1;
+
load = tg->parent->cfs_rq[cpu]->h_load;
-   load *= tg->se[cpu]->load.weight;
-   load /= tg->parent->cfs_rq[cpu]->load.weight + 1;
+   load *= tg->se[cpu]->avg.load_avg_contrib;
+   load /= tmp_rla;
}
 
tg->cfs_rq[cpu]->h_load = load;
@@ -4209,12 +4212,9 @@ static void update_h_load(long cpu)
 static unsigned long task_h_load(struct task_struct *p)
 {
struct cfs_rq *cfs_rq = task_cfs_rq(p);
-   unsigned long load;
-
-   load = p->se.load.weight;
-   load = div_u64(load * cfs_rq->h_load, cfs_rq->load.weight + 1);
 
-   return load;
+   return div64_ul(p->se.avg.load_avg_contrib * cfs_rq->h_load,
+   cfs_rq->runnable_load_avg + 1);
 }
 #else
 static inline void update_blocked_averages(int cpu)
@@ -4227,7 +4227,7 @@ static inline void update_h_load(long cpu)
 
 static unsigned long task_h_load(struct task_struct *p)
 {
-   return p->se.load.weight;
+   return p->se.avg.load_avg_contrib;
 }
 #endif
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:sched/core] sched: Fix sleep time double accounting in enqueue entity

2013-06-27 Thread tip-bot for Alex Shi
Commit-ID:  282cf499f03ec1754b6c8c945c9674b02631fb0f
Gitweb: http://git.kernel.org/tip/282cf499f03ec1754b6c8c945c9674b02631fb0f
Author: Alex Shi 
AuthorDate: Thu, 20 Jun 2013 10:18:48 +0800
Committer:  Ingo Molnar 
CommitDate: Thu, 27 Jun 2013 10:07:32 +0200

sched: Fix sleep time double accounting in enqueue entity

The woken migrated task will __synchronize_entity_decay(se); in
migrate_task_rq_fair, then it needs to set
`se->avg.last_runnable_update -= (-se->avg.decay_count) << 20' before
update_entity_load_avg, in order to avoid sleep time is updated twice
for se.avg.load_avg_contrib in both __syncchronize and
update_entity_load_avg.

However if the sleeping task is woken up from the same cpu, it miss
the last_runnable_update before update_entity_load_avg(se, 0, 1), then
the sleep time was used twice in both functions.  So we need to remove
the double sleep time accounting.

Paul also contributed some code comments in this commit.

Signed-off-by: Alex Shi 
Reviewed-by: Paul Turner 
Signed-off-by: Peter Zijlstra 
Link: 
http://lkml.kernel.org/r/1371694737-29336-5-git-send-email-alex@intel.com
Signed-off-by: Ingo Molnar 
---
 kernel/sched/fair.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e1602a0..9bbc303 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1571,7 +1571,13 @@ static inline void enqueue_entity_load_avg(struct cfs_rq 
*cfs_rq,
}
wakeup = 0;
} else {
-   __synchronize_entity_decay(se);
+   /*
+* Task re-woke on same cpu (or else migrate_task_rq_fair()
+* would have made count negative); we must be careful to avoid
+* double-accounting blocked time after synchronizing decays.
+*/
+   se->avg.last_runnable_update += __synchronize_entity_decay(se)
+   << 20;
}
 
/* migrated tasks did not contribute to our blocked load */
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:sched/core] sched: Compute runnable load avg in cpu_load and cpu_avg_load_per_task

2013-06-27 Thread tip-bot for Alex Shi
Commit-ID:  b92486cbf2aa230d00f160664858495c81d2b37b
Gitweb: http://git.kernel.org/tip/b92486cbf2aa230d00f160664858495c81d2b37b
Author: Alex Shi 
AuthorDate: Thu, 20 Jun 2013 10:18:50 +0800
Committer:  Ingo Molnar 
CommitDate: Thu, 27 Jun 2013 10:07:35 +0200

sched: Compute runnable load avg in cpu_load and cpu_avg_load_per_task

They are the base values in load balance, update them with rq runnable
load average, then the load balance will consider runnable load avg
naturally.

We also try to include the blocked_load_avg as cpu load in balancing,
but that cause kbuild performance drop 6% on every Intel machine, and
aim7/oltp drop on some of 4 CPU sockets machines.
Or only add blocked_load_avg into get_rq_runable_load, hackbench still
drop a little on NHM EX.

Signed-off-by: Alex Shi 
Reviewed-by: Gu Zheng 
Signed-off-by: Peter Zijlstra 
Link: 
http://lkml.kernel.org/r/1371694737-29336-7-git-send-email-alex@intel.com
Signed-off-by: Ingo Molnar 
---
 kernel/sched/fair.c |  5 +++--
 kernel/sched/proc.c | 17 +++--
 2 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 9bbc303..e6d82ca 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2963,7 +2963,7 @@ static void dequeue_task_fair(struct rq *rq, struct 
task_struct *p, int flags)
 /* Used instead of source_load when we know the type == 0 */
 static unsigned long weighted_cpuload(const int cpu)
 {
-   return cpu_rq(cpu)->load.weight;
+   return cpu_rq(cpu)->cfs.runnable_load_avg;
 }
 
 /*
@@ -3008,9 +3008,10 @@ static unsigned long cpu_avg_load_per_task(int cpu)
 {
struct rq *rq = cpu_rq(cpu);
unsigned long nr_running = ACCESS_ONCE(rq->nr_running);
+   unsigned long load_avg = rq->cfs.runnable_load_avg;
 
if (nr_running)
-   return rq->load.weight / nr_running;
+   return load_avg / nr_running;
 
return 0;
 }
diff --git a/kernel/sched/proc.c b/kernel/sched/proc.c
index bb3a6a0..ce5cd48 100644
--- a/kernel/sched/proc.c
+++ b/kernel/sched/proc.c
@@ -501,6 +501,18 @@ static void __update_cpu_load(struct rq *this_rq, unsigned 
long this_load,
sched_avg_update(this_rq);
 }
 
+#ifdef CONFIG_SMP
+unsigned long get_rq_runnable_load(struct rq *rq)
+{
+   return rq->cfs.runnable_load_avg;
+}
+#else
+unsigned long get_rq_runnable_load(struct rq *rq)
+{
+   return rq->load.weight;
+}
+#endif
+
 #ifdef CONFIG_NO_HZ_COMMON
 /*
  * There is no sane way to deal with nohz on smp when using jiffies because the
@@ -522,7 +534,7 @@ static void __update_cpu_load(struct rq *this_rq, unsigned 
long this_load,
 void update_idle_cpu_load(struct rq *this_rq)
 {
unsigned long curr_jiffies = ACCESS_ONCE(jiffies);
-   unsigned long load = this_rq->load.weight;
+   unsigned long load = get_rq_runnable_load(this_rq);
unsigned long pending_updates;
 
/*
@@ -568,11 +580,12 @@ void update_cpu_load_nohz(void)
  */
 void update_cpu_load_active(struct rq *this_rq)
 {
+   unsigned long load = get_rq_runnable_load(this_rq);
/*
 * See the mess around update_idle_cpu_load() / update_cpu_load_nohz().
 */
this_rq->last_load_update_tick = jiffies;
-   __update_cpu_load(this_rq, this_rq->load.weight, 1);
+   __update_cpu_load(this_rq, load, 1);
 
calc_load_account_active(this_rq);
 }
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:sched/core] sched: Move a few runnable tg variables into CONFIG_SMP

2013-06-27 Thread tip-bot for Alex Shi
Commit-ID:  fa6bddeb14d59d701f846b174b59c9982e926e66
Gitweb: http://git.kernel.org/tip/fa6bddeb14d59d701f846b174b59c9982e926e66
Author: Alex Shi 
AuthorDate: Thu, 20 Jun 2013 10:18:46 +0800
Committer:  Ingo Molnar 
CommitDate: Thu, 27 Jun 2013 10:07:29 +0200

sched: Move a few runnable tg variables into CONFIG_SMP

The following 2 variables are only used under CONFIG_SMP, so its
better to move their definiation into CONFIG_SMP too.

atomic64_t load_avg;
atomic_t runnable_avg;

Signed-off-by: Alex Shi 
Signed-off-by: Peter Zijlstra 
Link: 
http://lkml.kernel.org/r/1371694737-29336-3-git-send-email-alex@intel.com
Signed-off-by: Ingo Molnar 
---
 kernel/sched/sched.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 77ce668..31d25f8 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -149,9 +149,11 @@ struct task_group {
unsigned long shares;
 
atomic_t load_weight;
+#ifdef CONFIG_SMP
atomic64_t load_avg;
atomic_t runnable_avg;
 #endif
+#endif
 
 #ifdef CONFIG_RT_GROUP_SCHED
struct sched_rt_entity **rt_se;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:sched/core] sched: Compute runnable load avg in cpu_load and cpu_avg_load_per_task

2013-06-27 Thread tip-bot for Alex Shi
Commit-ID:  b92486cbf2aa230d00f160664858495c81d2b37b
Gitweb: http://git.kernel.org/tip/b92486cbf2aa230d00f160664858495c81d2b37b
Author: Alex Shi alex@intel.com
AuthorDate: Thu, 20 Jun 2013 10:18:50 +0800
Committer:  Ingo Molnar mi...@kernel.org
CommitDate: Thu, 27 Jun 2013 10:07:35 +0200

sched: Compute runnable load avg in cpu_load and cpu_avg_load_per_task

They are the base values in load balance, update them with rq runnable
load average, then the load balance will consider runnable load avg
naturally.

We also try to include the blocked_load_avg as cpu load in balancing,
but that cause kbuild performance drop 6% on every Intel machine, and
aim7/oltp drop on some of 4 CPU sockets machines.
Or only add blocked_load_avg into get_rq_runable_load, hackbench still
drop a little on NHM EX.

Signed-off-by: Alex Shi alex@intel.com
Reviewed-by: Gu Zheng guz.f...@cn.fujitsu.com
Signed-off-by: Peter Zijlstra pet...@infradead.org
Link: 
http://lkml.kernel.org/r/1371694737-29336-7-git-send-email-alex@intel.com
Signed-off-by: Ingo Molnar mi...@kernel.org
---
 kernel/sched/fair.c |  5 +++--
 kernel/sched/proc.c | 17 +++--
 2 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 9bbc303..e6d82ca 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2963,7 +2963,7 @@ static void dequeue_task_fair(struct rq *rq, struct 
task_struct *p, int flags)
 /* Used instead of source_load when we know the type == 0 */
 static unsigned long weighted_cpuload(const int cpu)
 {
-   return cpu_rq(cpu)-load.weight;
+   return cpu_rq(cpu)-cfs.runnable_load_avg;
 }
 
 /*
@@ -3008,9 +3008,10 @@ static unsigned long cpu_avg_load_per_task(int cpu)
 {
struct rq *rq = cpu_rq(cpu);
unsigned long nr_running = ACCESS_ONCE(rq-nr_running);
+   unsigned long load_avg = rq-cfs.runnable_load_avg;
 
if (nr_running)
-   return rq-load.weight / nr_running;
+   return load_avg / nr_running;
 
return 0;
 }
diff --git a/kernel/sched/proc.c b/kernel/sched/proc.c
index bb3a6a0..ce5cd48 100644
--- a/kernel/sched/proc.c
+++ b/kernel/sched/proc.c
@@ -501,6 +501,18 @@ static void __update_cpu_load(struct rq *this_rq, unsigned 
long this_load,
sched_avg_update(this_rq);
 }
 
+#ifdef CONFIG_SMP
+unsigned long get_rq_runnable_load(struct rq *rq)
+{
+   return rq-cfs.runnable_load_avg;
+}
+#else
+unsigned long get_rq_runnable_load(struct rq *rq)
+{
+   return rq-load.weight;
+}
+#endif
+
 #ifdef CONFIG_NO_HZ_COMMON
 /*
  * There is no sane way to deal with nohz on smp when using jiffies because the
@@ -522,7 +534,7 @@ static void __update_cpu_load(struct rq *this_rq, unsigned 
long this_load,
 void update_idle_cpu_load(struct rq *this_rq)
 {
unsigned long curr_jiffies = ACCESS_ONCE(jiffies);
-   unsigned long load = this_rq-load.weight;
+   unsigned long load = get_rq_runnable_load(this_rq);
unsigned long pending_updates;
 
/*
@@ -568,11 +580,12 @@ void update_cpu_load_nohz(void)
  */
 void update_cpu_load_active(struct rq *this_rq)
 {
+   unsigned long load = get_rq_runnable_load(this_rq);
/*
 * See the mess around update_idle_cpu_load() / update_cpu_load_nohz().
 */
this_rq-last_load_update_tick = jiffies;
-   __update_cpu_load(this_rq, this_rq-load.weight, 1);
+   __update_cpu_load(this_rq, load, 1);
 
calc_load_account_active(this_rq);
 }
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:sched/core] sched: Fix sleep time double accounting in enqueue entity

2013-06-27 Thread tip-bot for Alex Shi
Commit-ID:  282cf499f03ec1754b6c8c945c9674b02631fb0f
Gitweb: http://git.kernel.org/tip/282cf499f03ec1754b6c8c945c9674b02631fb0f
Author: Alex Shi alex@intel.com
AuthorDate: Thu, 20 Jun 2013 10:18:48 +0800
Committer:  Ingo Molnar mi...@kernel.org
CommitDate: Thu, 27 Jun 2013 10:07:32 +0200

sched: Fix sleep time double accounting in enqueue entity

The woken migrated task will __synchronize_entity_decay(se); in
migrate_task_rq_fair, then it needs to set
`se-avg.last_runnable_update -= (-se-avg.decay_count)  20' before
update_entity_load_avg, in order to avoid sleep time is updated twice
for se.avg.load_avg_contrib in both __syncchronize and
update_entity_load_avg.

However if the sleeping task is woken up from the same cpu, it miss
the last_runnable_update before update_entity_load_avg(se, 0, 1), then
the sleep time was used twice in both functions.  So we need to remove
the double sleep time accounting.

Paul also contributed some code comments in this commit.

Signed-off-by: Alex Shi alex@intel.com
Reviewed-by: Paul Turner p...@google.com
Signed-off-by: Peter Zijlstra pet...@infradead.org
Link: 
http://lkml.kernel.org/r/1371694737-29336-5-git-send-email-alex@intel.com
Signed-off-by: Ingo Molnar mi...@kernel.org
---
 kernel/sched/fair.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e1602a0..9bbc303 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1571,7 +1571,13 @@ static inline void enqueue_entity_load_avg(struct cfs_rq 
*cfs_rq,
}
wakeup = 0;
} else {
-   __synchronize_entity_decay(se);
+   /*
+* Task re-woke on same cpu (or else migrate_task_rq_fair()
+* would have made count negative); we must be careful to avoid
+* double-accounting blocked time after synchronizing decays.
+*/
+   se-avg.last_runnable_update += __synchronize_entity_decay(se)
+20;
}
 
/* migrated tasks did not contribute to our blocked load */
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:sched/core] sched: Move a few runnable tg variables into CONFIG_SMP

2013-06-27 Thread tip-bot for Alex Shi
Commit-ID:  fa6bddeb14d59d701f846b174b59c9982e926e66
Gitweb: http://git.kernel.org/tip/fa6bddeb14d59d701f846b174b59c9982e926e66
Author: Alex Shi alex@intel.com
AuthorDate: Thu, 20 Jun 2013 10:18:46 +0800
Committer:  Ingo Molnar mi...@kernel.org
CommitDate: Thu, 27 Jun 2013 10:07:29 +0200

sched: Move a few runnable tg variables into CONFIG_SMP

The following 2 variables are only used under CONFIG_SMP, so its
better to move their definiation into CONFIG_SMP too.

atomic64_t load_avg;
atomic_t runnable_avg;

Signed-off-by: Alex Shi alex@intel.com
Signed-off-by: Peter Zijlstra pet...@infradead.org
Link: 
http://lkml.kernel.org/r/1371694737-29336-3-git-send-email-alex@intel.com
Signed-off-by: Ingo Molnar mi...@kernel.org
---
 kernel/sched/sched.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 77ce668..31d25f8 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -149,9 +149,11 @@ struct task_group {
unsigned long shares;
 
atomic_t load_weight;
+#ifdef CONFIG_SMP
atomic64_t load_avg;
atomic_t runnable_avg;
 #endif
+#endif
 
 #ifdef CONFIG_RT_GROUP_SCHED
struct sched_rt_entity **rt_se;
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:sched/core] sched: Consider runnable load average in move_tasks()

2013-06-27 Thread tip-bot for Alex Shi
Commit-ID:  a003a25b227d59ded9197ced109517f037d01c27
Gitweb: http://git.kernel.org/tip/a003a25b227d59ded9197ced109517f037d01c27
Author: Alex Shi alex@intel.com
AuthorDate: Thu, 20 Jun 2013 10:18:51 +0800
Committer:  Ingo Molnar mi...@kernel.org
CommitDate: Thu, 27 Jun 2013 10:07:36 +0200

sched: Consider runnable load average in move_tasks()

Aside from using runnable load average in background, move_tasks is
also the key function in load balance. We need consider the runnable
load average in it in order to make it an apple to apple load
comparison.

Morten had caught a div u64 bug on ARM, thanks!

Thanks-to: Morten Rasmussen morten.rasmus...@arm.com
Signed-off-by: Alex Shi alex@intel.com
Signed-off-by: Peter Zijlstra pet...@infradead.org
Link: 
http://lkml.kernel.org/r/1371694737-29336-8-git-send-email-alex@intel.com
Signed-off-by: Ingo Molnar mi...@kernel.org
---
 kernel/sched/fair.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e6d82ca..7948bb8 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4179,11 +4179,14 @@ static int tg_load_down(struct task_group *tg, void 
*data)
long cpu = (long)data;
 
if (!tg-parent) {
-   load = cpu_rq(cpu)-load.weight;
+   load = cpu_rq(cpu)-avg.load_avg_contrib;
} else {
+   unsigned long tmp_rla;
+   tmp_rla = tg-parent-cfs_rq[cpu]-runnable_load_avg + 1;
+
load = tg-parent-cfs_rq[cpu]-h_load;
-   load *= tg-se[cpu]-load.weight;
-   load /= tg-parent-cfs_rq[cpu]-load.weight + 1;
+   load *= tg-se[cpu]-avg.load_avg_contrib;
+   load /= tmp_rla;
}
 
tg-cfs_rq[cpu]-h_load = load;
@@ -4209,12 +4212,9 @@ static void update_h_load(long cpu)
 static unsigned long task_h_load(struct task_struct *p)
 {
struct cfs_rq *cfs_rq = task_cfs_rq(p);
-   unsigned long load;
-
-   load = p-se.load.weight;
-   load = div_u64(load * cfs_rq-h_load, cfs_rq-load.weight + 1);
 
-   return load;
+   return div64_ul(p-se.avg.load_avg_contrib * cfs_rq-h_load,
+   cfs_rq-runnable_load_avg + 1);
 }
 #else
 static inline void update_blocked_averages(int cpu)
@@ -4227,7 +4227,7 @@ static inline void update_h_load(long cpu)
 
 static unsigned long task_h_load(struct task_struct *p)
 {
-   return p-se.load.weight;
+   return p-se.avg.load_avg_contrib;
 }
 #endif
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:sched/core] sched: Change cfs_rq load avg to unsigned long

2013-06-27 Thread tip-bot for Alex Shi
Commit-ID:  72a4cf20cb71a327c636c7042fdacc25abffc87c
Gitweb: http://git.kernel.org/tip/72a4cf20cb71a327c636c7042fdacc25abffc87c
Author: Alex Shi alex@intel.com
AuthorDate: Thu, 20 Jun 2013 10:18:53 +0800
Committer:  Ingo Molnar mi...@kernel.org
CommitDate: Thu, 27 Jun 2013 10:07:38 +0200

sched: Change cfs_rq load avg to unsigned long

Since the 'u64 runnable_load_avg, blocked_load_avg' in cfs_rq struct are
smaller than 'unsigned long' cfs_rq-load.weight. We don't need u64
vaiables to describe them. unsigned long is more efficient and convenience.

Signed-off-by: Alex Shi alex@intel.com
Reviewed-by: Paul Turner p...@google.com
Tested-by: Vincent Guittot vincent.guit...@linaro.org
Signed-off-by: Peter Zijlstra pet...@infradead.org
Link: 
http://lkml.kernel.org/r/1371694737-29336-10-git-send-email-alex@intel.com
Signed-off-by: Ingo Molnar mi...@kernel.org
---
 kernel/sched/debug.c | 4 ++--
 kernel/sched/fair.c  | 7 ++-
 kernel/sched/sched.h | 2 +-
 3 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index 75024a6..160afdc 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -211,9 +211,9 @@ void print_cfs_rq(struct seq_file *m, int cpu, struct 
cfs_rq *cfs_rq)
SEQ_printf(m,   .%-30s: %ld\n, load, cfs_rq-load.weight);
 #ifdef CONFIG_FAIR_GROUP_SCHED
 #ifdef CONFIG_SMP
-   SEQ_printf(m,   .%-30s: %lld\n, runnable_load_avg,
+   SEQ_printf(m,   .%-30s: %ld\n, runnable_load_avg,
cfs_rq-runnable_load_avg);
-   SEQ_printf(m,   .%-30s: %lld\n, blocked_load_avg,
+   SEQ_printf(m,   .%-30s: %ld\n, blocked_load_avg,
cfs_rq-blocked_load_avg);
SEQ_printf(m,   .%-30s: %lld\n, tg_load_avg,
(unsigned long 
long)atomic64_read(cfs_rq-tg-load_avg));
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7948bb8..f19772d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4181,12 +4181,9 @@ static int tg_load_down(struct task_group *tg, void 
*data)
if (!tg-parent) {
load = cpu_rq(cpu)-avg.load_avg_contrib;
} else {
-   unsigned long tmp_rla;
-   tmp_rla = tg-parent-cfs_rq[cpu]-runnable_load_avg + 1;
-
load = tg-parent-cfs_rq[cpu]-h_load;
-   load *= tg-se[cpu]-avg.load_avg_contrib;
-   load /= tmp_rla;
+   load = div64_ul(load * tg-se[cpu]-avg.load_avg_contrib,
+   tg-parent-cfs_rq[cpu]-runnable_load_avg + 1);
}
 
tg-cfs_rq[cpu]-h_load = load;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 9c65d46..9eb12d9 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -277,7 +277,7 @@ struct cfs_rq {
 * This allows for the description of both thread and group usage (in
 * the FAIR_GROUP_SCHED case).
 */
-   u64 runnable_load_avg, blocked_load_avg;
+   unsigned long runnable_load_avg, blocked_load_avg;
atomic64_t decay_counter, removed_load;
u64 last_decay;
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:sched/core] sched/tg: Remove tg.load_weight

2013-06-27 Thread tip-bot for Alex Shi
Commit-ID:  a9cef46a10cc1b84bf2cdf4060766d858c0439d8
Gitweb: http://git.kernel.org/tip/a9cef46a10cc1b84bf2cdf4060766d858c0439d8
Author: Alex Shi alex@intel.com
AuthorDate: Thu, 20 Jun 2013 10:18:56 +0800
Committer:  Ingo Molnar mi...@kernel.org
CommitDate: Thu, 27 Jun 2013 10:07:43 +0200

sched/tg: Remove tg.load_weight

Since no one use it.

Signed-off-by: Alex Shi alex@intel.com
Reviewed-by: Paul Turner p...@google.com
Tested-by: Vincent Guittot vincent.guit...@linaro.org
Signed-off-by: Peter Zijlstra pet...@infradead.org
Link: 
http://lkml.kernel.org/r/1371694737-29336-13-git-send-email-alex@intel.com
Signed-off-by: Ingo Molnar mi...@kernel.org
---
 kernel/sched/sched.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 7059919..ef0a7b2 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -148,7 +148,6 @@ struct task_group {
struct cfs_rq **cfs_rq;
unsigned long shares;
 
-   atomic_t load_weight;
 #ifdef CONFIG_SMP
atomic_long_t load_avg;
atomic_t runnable_avg;
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:sched/core] sched/cfs_rq: Change atomic64_t removed_load to atomic_long_t

2013-06-27 Thread tip-bot for Alex Shi
Commit-ID:  2509940fd71c2e2915a05052bbdbf2d478364184
Gitweb: http://git.kernel.org/tip/2509940fd71c2e2915a05052bbdbf2d478364184
Author: Alex Shi alex@intel.com
AuthorDate: Thu, 20 Jun 2013 10:18:55 +0800
Committer:  Ingo Molnar mi...@kernel.org
CommitDate: Thu, 27 Jun 2013 10:07:41 +0200

sched/cfs_rq: Change atomic64_t removed_load to atomic_long_t

Similar to runnable_load_avg, blocked_load_avg variable, long type is
enough for removed_load in 64 bit or 32 bit machine.

Then we avoid the expensive atomic64 operations on 32 bit machine.

Signed-off-by: Alex Shi alex@intel.com
Reviewed-by: Paul Turner p...@google.com
Tested-by: Vincent Guittot vincent.guit...@linaro.org
Signed-off-by: Peter Zijlstra pet...@infradead.org
Link: 
http://lkml.kernel.org/r/1371694737-29336-12-git-send-email-alex@intel.com
Signed-off-by: Ingo Molnar mi...@kernel.org
---
 kernel/sched/fair.c  | 10 ++
 kernel/sched/sched.h |  3 ++-
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 30ccc37..b43474a 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1517,8 +1517,9 @@ static void update_cfs_rq_blocked_load(struct cfs_rq 
*cfs_rq, int force_update)
if (!decays  !force_update)
return;
 
-   if (atomic64_read(cfs_rq-removed_load)) {
-   u64 removed_load = atomic64_xchg(cfs_rq-removed_load, 0);
+   if (atomic_long_read(cfs_rq-removed_load)) {
+   unsigned long removed_load;
+   removed_load = atomic_long_xchg(cfs_rq-removed_load, 0);
subtract_blocked_load_contrib(cfs_rq, removed_load);
}
 
@@ -3480,7 +3481,8 @@ migrate_task_rq_fair(struct task_struct *p, int next_cpu)
 */
if (se-avg.decay_count) {
se-avg.decay_count = -__synchronize_entity_decay(se);
-   atomic64_add(se-avg.load_avg_contrib, cfs_rq-removed_load);
+   atomic_long_add(se-avg.load_avg_contrib,
+   cfs_rq-removed_load);
}
 }
 #endif /* CONFIG_SMP */
@@ -5942,7 +5944,7 @@ void init_cfs_rq(struct cfs_rq *cfs_rq)
 #endif
 #ifdef CONFIG_SMP
atomic64_set(cfs_rq-decay_counter, 1);
-   atomic64_set(cfs_rq-removed_load, 0);
+   atomic_long_set(cfs_rq-removed_load, 0);
 #endif
 }
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 5585eb2..7059919 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -278,8 +278,9 @@ struct cfs_rq {
 * the FAIR_GROUP_SCHED case).
 */
unsigned long runnable_load_avg, blocked_load_avg;
-   atomic64_t decay_counter, removed_load;
+   atomic64_t decay_counter;
u64 last_decay;
+   atomic_long_t removed_load;
 
 #ifdef CONFIG_FAIR_GROUP_SCHED
/* Required to track per-cpu representation of a task_group */
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:sched/core] sched/tg: Use 'unsigned long' for load variable in task group

2013-06-27 Thread tip-bot for Alex Shi
Commit-ID:  bf5b986ed4d20428eeec3df4a03dbfebb9b6538c
Gitweb: http://git.kernel.org/tip/bf5b986ed4d20428eeec3df4a03dbfebb9b6538c
Author: Alex Shi alex@intel.com
AuthorDate: Thu, 20 Jun 2013 10:18:54 +0800
Committer:  Ingo Molnar mi...@kernel.org
CommitDate: Thu, 27 Jun 2013 10:07:40 +0200

sched/tg: Use 'unsigned long' for load variable in task group

Since tg-load_avg is smaller than tg-load_weight, we don't need a
atomic64_t variable for load_avg in 32 bit machine.
The same reason for cfs_rq-tg_load_contrib.

The atomic_long_t/unsigned long variable type are more efficient and
convenience for them.

Signed-off-by: Alex Shi alex@intel.com
Tested-by: Vincent Guittot vincent.guit...@linaro.org
Signed-off-by: Peter Zijlstra pet...@infradead.org
Link: 
http://lkml.kernel.org/r/1371694737-29336-11-git-send-email-alex@intel.com
Signed-off-by: Ingo Molnar mi...@kernel.org
---
 kernel/sched/debug.c |  6 +++---
 kernel/sched/fair.c  | 12 ++--
 kernel/sched/sched.h |  4 ++--
 3 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index 160afdc..d803989 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -215,9 +215,9 @@ void print_cfs_rq(struct seq_file *m, int cpu, struct 
cfs_rq *cfs_rq)
cfs_rq-runnable_load_avg);
SEQ_printf(m,   .%-30s: %ld\n, blocked_load_avg,
cfs_rq-blocked_load_avg);
-   SEQ_printf(m,   .%-30s: %lld\n, tg_load_avg,
-   (unsigned long 
long)atomic64_read(cfs_rq-tg-load_avg));
-   SEQ_printf(m,   .%-30s: %lld\n, tg_load_contrib,
+   SEQ_printf(m,   .%-30s: %ld\n, tg_load_avg,
+   atomic_long_read(cfs_rq-tg-load_avg));
+   SEQ_printf(m,   .%-30s: %ld\n, tg_load_contrib,
cfs_rq-tg_load_contrib);
SEQ_printf(m,   .%-30s: %d\n, tg_runnable_contrib,
cfs_rq-tg_runnable_contrib);
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index f19772d..30ccc37 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1075,7 +1075,7 @@ static inline long calc_tg_weight(struct task_group *tg, 
struct cfs_rq *cfs_rq)
 * to gain a more accurate current total weight. See
 * update_cfs_rq_load_contribution().
 */
-   tg_weight = atomic64_read(tg-load_avg);
+   tg_weight = atomic_long_read(tg-load_avg);
tg_weight -= cfs_rq-tg_load_contrib;
tg_weight += cfs_rq-load.weight;
 
@@ -1356,13 +1356,13 @@ static inline void 
__update_cfs_rq_tg_load_contrib(struct cfs_rq *cfs_rq,
 int force_update)
 {
struct task_group *tg = cfs_rq-tg;
-   s64 tg_contrib;
+   long tg_contrib;
 
tg_contrib = cfs_rq-runnable_load_avg + cfs_rq-blocked_load_avg;
tg_contrib -= cfs_rq-tg_load_contrib;
 
-   if (force_update || abs64(tg_contrib)  cfs_rq-tg_load_contrib / 8) {
-   atomic64_add(tg_contrib, tg-load_avg);
+   if (force_update || abs(tg_contrib)  cfs_rq-tg_load_contrib / 8) {
+   atomic_long_add(tg_contrib, tg-load_avg);
cfs_rq-tg_load_contrib += tg_contrib;
}
 }
@@ -1397,8 +1397,8 @@ static inline void __update_group_entity_contrib(struct 
sched_entity *se)
u64 contrib;
 
contrib = cfs_rq-tg_load_contrib * tg-shares;
-   se-avg.load_avg_contrib = div64_u64(contrib,
-atomic64_read(tg-load_avg) + 1);
+   se-avg.load_avg_contrib = div_u64(contrib,
+atomic_long_read(tg-load_avg) + 1);
 
/*
 * For group entities we need to compute a correction term in the case
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 9eb12d9..5585eb2 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -150,7 +150,7 @@ struct task_group {
 
atomic_t load_weight;
 #ifdef CONFIG_SMP
-   atomic64_t load_avg;
+   atomic_long_t load_avg;
atomic_t runnable_avg;
 #endif
 #endif
@@ -284,7 +284,7 @@ struct cfs_rq {
 #ifdef CONFIG_FAIR_GROUP_SCHED
/* Required to track per-cpu representation of a task_group */
u32 tg_runnable_contrib;
-   u64 tg_load_contrib;
+   unsigned long tg_load_contrib;
 #endif /* CONFIG_FAIR_GROUP_SCHED */
 
/*
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:sched/core] sched: Change get_rq_runnable_load() to static and inline

2013-06-27 Thread tip-bot for Alex Shi
Commit-ID:  a9dc5d0e33c677619e4b97a38c23db1a42857905
Gitweb: http://git.kernel.org/tip/a9dc5d0e33c677619e4b97a38c23db1a42857905
Author: Alex Shi alex@intel.com
AuthorDate: Thu, 20 Jun 2013 10:18:57 +0800
Committer:  Ingo Molnar mi...@kernel.org
CommitDate: Thu, 27 Jun 2013 10:07:44 +0200

sched: Change get_rq_runnable_load() to static and inline

Based-on-patch-by: Fengguang Wu fengguang...@intel.com
Signed-off-by: Alex Shi alex@intel.com
Tested-by: Vincent Guittot vincent.guit...@linaro.org
Signed-off-by: Peter Zijlstra pet...@infradead.org
Link: 
http://lkml.kernel.org/r/1371694737-29336-14-git-send-email-alex@intel.com
Signed-off-by: Ingo Molnar mi...@kernel.org
---
 kernel/sched/proc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/proc.c b/kernel/sched/proc.c
index ce5cd48..16f5a30 100644
--- a/kernel/sched/proc.c
+++ b/kernel/sched/proc.c
@@ -502,12 +502,12 @@ static void __update_cpu_load(struct rq *this_rq, 
unsigned long this_load,
 }
 
 #ifdef CONFIG_SMP
-unsigned long get_rq_runnable_load(struct rq *rq)
+static inline unsigned long get_rq_runnable_load(struct rq *rq)
 {
return rq-cfs.runnable_load_avg;
 }
 #else
-unsigned long get_rq_runnable_load(struct rq *rq)
+static inline unsigned long get_rq_runnable_load(struct rq *rq)
 {
return rq-load.weight;
 }
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:sched/core] sched: Update cpu load after task_tick

2013-06-27 Thread tip-bot for Alex Shi
Commit-ID:  83dfd5235ebd66c284b97befe6eabff7132333e6
Gitweb: http://git.kernel.org/tip/83dfd5235ebd66c284b97befe6eabff7132333e6
Author: Alex Shi alex@intel.com
AuthorDate: Thu, 20 Jun 2013 10:18:49 +0800
Committer:  Ingo Molnar mi...@kernel.org
CommitDate: Thu, 27 Jun 2013 10:07:33 +0200

sched: Update cpu load after task_tick

To get the latest runnable info, we need do this cpuload update after
task_tick.

Signed-off-by: Alex Shi alex@intel.com
Reviewed-by: Paul Turner p...@google.com
Signed-off-by: Peter Zijlstra pet...@infradead.org
Link: 
http://lkml.kernel.org/r/1371694737-29336-6-git-send-email-alex@intel.com
Signed-off-by: Ingo Molnar mi...@kernel.org
---
 kernel/sched/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 729e7fc..08746cc 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2165,8 +2165,8 @@ void scheduler_tick(void)
 
raw_spin_lock(rq-lock);
update_rq_clock(rq);
-   update_cpu_load_active(rq);
curr-sched_class-task_tick(rq, curr, 0);
+   update_cpu_load_active(rq);
raw_spin_unlock(rq-lock);
 
perf_event_task_tick();
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:sched/core] sched: Set an initial value of runnable avg for new forked task

2013-06-27 Thread tip-bot for Alex Shi
Commit-ID:  a75cdaa915e42ef0e6f38dc7f2a6a1deca91d648
Gitweb: http://git.kernel.org/tip/a75cdaa915e42ef0e6f38dc7f2a6a1deca91d648
Author: Alex Shi alex@intel.com
AuthorDate: Thu, 20 Jun 2013 10:18:47 +0800
Committer:  Ingo Molnar mi...@kernel.org
CommitDate: Thu, 27 Jun 2013 10:07:30 +0200

sched: Set an initial value of runnable avg for new forked task

We need to initialize the se.avg.{decay_count, load_avg_contrib} for a
new forked task. Otherwise random values of above variables cause a
mess when a new task is enqueued:

enqueue_task_fair
enqueue_entity
enqueue_entity_load_avg

and make fork balancing imbalance due to incorrect load_avg_contrib.

Further more, Morten Rasmussen notice some tasks were not launched at
once after created. So Paul and Peter suggest giving a start value for
new task runnable avg time same as sched_slice().

PeterZ said:

 So the 'problem' is that our running avg is a 'floating' average; ie. it
 decays with time. Now we have to guess about the future of our newly
 spawned task -- something that is nigh impossible seeing these CPU
 vendors keep refusing to implement the crystal ball instruction.

 So there's two asymptotic cases we want to deal well with; 1) the case
 where the newly spawned program will be 'nearly' idle for its lifetime;
 and 2) the case where its cpu-bound.

 Since we have to guess, we'll go for worst case and assume its
 cpu-bound; now we don't want to make the avg so heavy adjusting to the
 near-idle case takes forever. We want to be able to quickly adjust and
 lower our running avg.

 Now we also don't want to make our avg too light, such that it gets
 decremented just for the new task not having had a chance to run yet --
 even if when it would run, it would be more cpu-bound than not.

 So what we do is we make the initial avg of the same duration as that we
 guess it takes to run each task on the system at least once -- aka
 sched_slice().

 Of course we can defeat this with wakeup/fork bombs, but in the 'normal'
 case it should be good enough.

Paul also contributed most of the code comments in this commit.

Signed-off-by: Alex Shi alex@intel.com
Reviewed-by: Gu Zheng guz.f...@cn.fujitsu.com
Reviewed-by: Paul Turner p...@google.com
[peterz; added explanation of sched_slice() usage]
Signed-off-by: Peter Zijlstra pet...@infradead.org
Link: 
http://lkml.kernel.org/r/1371694737-29336-4-git-send-email-alex@intel.com
Signed-off-by: Ingo Molnar mi...@kernel.org
---
 kernel/sched/core.c  |  6 ++
 kernel/sched/fair.c  | 24 
 kernel/sched/sched.h |  2 ++
 3 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 0241b1b..729e7fc 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1611,10 +1611,6 @@ static void __sched_fork(struct task_struct *p)
p-se.vruntime  = 0;
INIT_LIST_HEAD(p-se.group_node);
 
-#ifdef CONFIG_SMP
-   p-se.avg.runnable_avg_period = 0;
-   p-se.avg.runnable_avg_sum = 0;
-#endif
 #ifdef CONFIG_SCHEDSTATS
memset(p-se.statistics, 0, sizeof(p-se.statistics));
 #endif
@@ -1758,6 +1754,8 @@ void wake_up_new_task(struct task_struct *p)
set_task_cpu(p, select_task_rq(p, SD_BALANCE_FORK, 0));
 #endif
 
+   /* Initialize new task's runnable average */
+   init_task_runnable_average(p);
rq = __task_rq_lock(p);
activate_task(rq, p, 0);
p-on_rq = 1;
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 36eadaa..e1602a0 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -680,6 +680,26 @@ static u64 sched_vslice(struct cfs_rq *cfs_rq, struct 
sched_entity *se)
return calc_delta_fair(sched_slice(cfs_rq, se), se);
 }
 
+#ifdef CONFIG_SMP
+static inline void __update_task_entity_contrib(struct sched_entity *se);
+
+/* Give new task start runnable values to heavy its load in infant time */
+void init_task_runnable_average(struct task_struct *p)
+{
+   u32 slice;
+
+   p-se.avg.decay_count = 0;
+   slice = sched_slice(task_cfs_rq(p), p-se)  10;
+   p-se.avg.runnable_avg_sum = slice;
+   p-se.avg.runnable_avg_period = slice;
+   __update_task_entity_contrib(p-se);
+}
+#else
+void init_task_runnable_average(struct task_struct *p)
+{
+}
+#endif
+
 /*
  * Update the current task's runtime statistics. Skip current tasks that
  * are not in our scheduling class.
@@ -1527,6 +1547,10 @@ static inline void enqueue_entity_load_avg(struct cfs_rq 
*cfs_rq,
 * We track migrations using entity decay_count = 0, on a wake-up
 * migration we use a negative decay count to track the remote decays
 * accumulated while sleeping.
+*
+* Newly forked tasks are enqueued with se-avg.decay_count == 0, they
+* are seen by enqueue_entity_load_avg() as a migration with an already
+* constructed load_avg_contrib.
 */
if (unlikely(se-avg.decay_count = 

[tip:sched/core] Revert sched: Introduce temporary FAIR_GROUP_SCHED dependency for load-tracking

2013-06-27 Thread tip-bot for Alex Shi
Commit-ID:  141965c7494d984b2bf24efd361a3125278869c6
Gitweb: http://git.kernel.org/tip/141965c7494d984b2bf24efd361a3125278869c6
Author: Alex Shi alex@intel.com
AuthorDate: Wed, 26 Jun 2013 13:05:39 +0800
Committer:  Ingo Molnar mi...@kernel.org
CommitDate: Thu, 27 Jun 2013 10:07:22 +0200

Revert sched: Introduce temporary FAIR_GROUP_SCHED dependency for 
load-tracking

Remove CONFIG_FAIR_GROUP_SCHED that covers the runnable info, then
we can use runnable load variables.

Also remove 2 CONFIG_FAIR_GROUP_SCHED setting which is not in reverted
patch(introduced in 9ee474f), but also need to revert.

Signed-off-by: Alex Shi alex@intel.com
Signed-off-by: Peter Zijlstra pet...@infradead.org
Link: http://lkml.kernel.org/r/51ca76a3.3050...@intel.com
Signed-off-by: Ingo Molnar mi...@kernel.org
---
 include/linux/sched.h |  7 +--
 kernel/sched/core.c   |  7 +--
 kernel/sched/fair.c   | 17 -
 kernel/sched/sched.h  | 19 ++-
 4 files changed, 8 insertions(+), 42 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 178a8d9..0019bef 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -994,12 +994,7 @@ struct sched_entity {
struct cfs_rq   *my_q;
 #endif
 
-/*
- * Load-tracking only depends on SMP, FAIR_GROUP_SCHED dependency below may be
- * removed when useful for applications beyond shares distribution (e.g.
- * load-balance).
- */
-#if defined(CONFIG_SMP)  defined(CONFIG_FAIR_GROUP_SCHED)
+#ifdef CONFIG_SMP
/* Per-entity load-tracking */
struct sched_avgavg;
 #endif
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ceeaf0f..0241b1b 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1611,12 +1611,7 @@ static void __sched_fork(struct task_struct *p)
p-se.vruntime  = 0;
INIT_LIST_HEAD(p-se.group_node);
 
-/*
- * Load-tracking only depends on SMP, FAIR_GROUP_SCHED dependency below may be
- * removed when useful for applications beyond shares distribution (e.g.
- * load-balance).
- */
-#if defined(CONFIG_SMP)  defined(CONFIG_FAIR_GROUP_SCHED)
+#ifdef CONFIG_SMP
p-se.avg.runnable_avg_period = 0;
p-se.avg.runnable_avg_sum = 0;
 #endif
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c0ac2c3..36eadaa 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1128,8 +1128,7 @@ static inline void update_cfs_shares(struct cfs_rq 
*cfs_rq)
 }
 #endif /* CONFIG_FAIR_GROUP_SCHED */
 
-/* Only depends on SMP, FAIR_GROUP_SCHED may be removed when useful in lb */
-#if defined(CONFIG_SMP)  defined(CONFIG_FAIR_GROUP_SCHED)
+#ifdef CONFIG_SMP
 /*
  * We choose a half-life close to 1 scheduling period.
  * Note: The tables below are dependent on this value.
@@ -3431,12 +3430,6 @@ unlock:
 }
 
 /*
- * Load-tracking only depends on SMP, FAIR_GROUP_SCHED dependency below may be
- * removed when useful for applications beyond shares distribution (e.g.
- * load-balance).
- */
-#ifdef CONFIG_FAIR_GROUP_SCHED
-/*
  * Called immediately before a task is migrated to a new cpu; task_cpu(p) and
  * cfs_rq_of(p) references at time of call are still valid and identify the
  * previous cpu.  However, the caller only guarantees p-pi_lock is held; no
@@ -3459,7 +3452,6 @@ migrate_task_rq_fair(struct task_struct *p, int next_cpu)
atomic64_add(se-avg.load_avg_contrib, cfs_rq-removed_load);
}
 }
-#endif
 #endif /* CONFIG_SMP */
 
 static unsigned long
@@ -5861,7 +5853,7 @@ static void switched_from_fair(struct rq *rq, struct 
task_struct *p)
se-vruntime -= cfs_rq-min_vruntime;
}
 
-#if defined(CONFIG_FAIR_GROUP_SCHED)  defined(CONFIG_SMP)
+#ifdef CONFIG_SMP
/*
* Remove our load from contribution when we leave sched_fair
* and ensure we don't carry in an old decay_count if we
@@ -5920,7 +5912,7 @@ void init_cfs_rq(struct cfs_rq *cfs_rq)
 #ifndef CONFIG_64BIT
cfs_rq-min_vruntime_copy = cfs_rq-min_vruntime;
 #endif
-#if defined(CONFIG_FAIR_GROUP_SCHED)  defined(CONFIG_SMP)
+#ifdef CONFIG_SMP
atomic64_set(cfs_rq-decay_counter, 1);
atomic64_set(cfs_rq-removed_load, 0);
 #endif
@@ -6162,9 +6154,8 @@ const struct sched_class fair_sched_class = {
 
 #ifdef CONFIG_SMP
.select_task_rq = select_task_rq_fair,
-#ifdef CONFIG_FAIR_GROUP_SCHED
.migrate_task_rq= migrate_task_rq_fair,
-#endif
+
.rq_online  = rq_online_fair,
.rq_offline = rq_offline_fair,
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 029601a..77ce668 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -269,12 +269,6 @@ struct cfs_rq {
 #endif
 
 #ifdef CONFIG_SMP
-/*
- * Load-tracking only depends on SMP, FAIR_GROUP_SCHED dependency below may be
- * removed when useful for applications beyond shares distribution (e.g.
- * load-balance).
- */
-#ifdef CONFIG_FAIR_GROUP_SCHED
/*
 * 

[tip:core/locking] rwsem: Implement writer lock-stealing for better scalability

2013-02-22 Thread tip-bot for Alex Shi
Commit-ID:  ce6711f3d196f09ca0ed29a24dfad42d83912b20
Gitweb: http://git.kernel.org/tip/ce6711f3d196f09ca0ed29a24dfad42d83912b20
Author: Alex Shi 
AuthorDate: Tue, 5 Feb 2013 21:11:55 +0800
Committer:  Ingo Molnar 
CommitDate: Tue, 19 Feb 2013 08:42:43 +0100

rwsem: Implement writer lock-stealing for better scalability

Commit 5a505085f043 ("mm/rmap: Convert the struct anon_vma::mutex
to an rwsem") changed struct anon_vma::mutex to an rwsem, which
caused aim7 fork_test performance to drop by 50%.

Yuanhan Liu did the following excellent analysis:

https://lkml.org/lkml/2013/1/29/84

and found that the regression is caused by strict, serialized,
FIFO sequential write-ownership of rwsems. Ingo suggested
implementing opportunistic lock-stealing for the front writer
task in the waitqueue.

Yuanhan Liu implemented lock-stealing for spinlock-rwsems,
which indeed recovered much of the regression - confirming
the analysis that the main factor in the regression was the
FIFO writer-fairness of rwsems.

In this patch we allow lock-stealing to happen when the first
waiter is also writer. With that change in place the
aim7 fork_test performance is fully recovered on my
Intel NHM EP, NHM EX, SNB EP 2S and 4S test-machines.

Reported-by: l...@linux.intel.com
Reported-by: Yuanhan Liu 
Signed-off-by: Alex Shi 
Cc: David Howells 
Cc: Michel Lespinasse 
Cc: Linus Torvalds 
Cc: Andrew Morton 
Cc: Peter Zijlstra 
Cc: Anton Blanchard 
Cc: Arjan van de Ven 
Cc: paul.gortma...@windriver.com
Link: https://lkml.org/lkml/2013/1/29/84
Link: 
http://lkml.kernel.org/r/1360069915-31619-1-git-send-email-alex@intel.com
[ Small stylistic fixes, updated changelog. ]
Signed-off-by: Ingo Molnar 
---
 lib/rwsem.c | 75 +
 1 file changed, 46 insertions(+), 29 deletions(-)

diff --git a/lib/rwsem.c b/lib/rwsem.c
index 8337e1b..ad5e0df 100644
--- a/lib/rwsem.c
+++ b/lib/rwsem.c
@@ -2,6 +2,8 @@
  *
  * Written by David Howells (dhowe...@redhat.com).
  * Derived from arch/i386/kernel/semaphore.c
+ *
+ * Writer lock-stealing by Alex Shi 
  */
 #include 
 #include 
@@ -60,7 +62,7 @@ __rwsem_do_wake(struct rw_semaphore *sem, int wake_type)
struct rwsem_waiter *waiter;
struct task_struct *tsk;
struct list_head *next;
-   signed long oldcount, woken, loop, adjustment;
+   signed long woken, loop, adjustment;
 
waiter = list_entry(sem->wait_list.next, struct rwsem_waiter, list);
if (!(waiter->flags & RWSEM_WAITING_FOR_WRITE))
@@ -72,30 +74,8 @@ __rwsem_do_wake(struct rw_semaphore *sem, int wake_type)
 */
goto out;
 
-   /* There's a writer at the front of the queue - try to grant it the
-* write lock.  However, we only wake this writer if we can transition
-* the active part of the count from 0 -> 1
-*/
-   adjustment = RWSEM_ACTIVE_WRITE_BIAS;
-   if (waiter->list.next == >wait_list)
-   adjustment -= RWSEM_WAITING_BIAS;
-
- try_again_write:
-   oldcount = rwsem_atomic_update(adjustment, sem) - adjustment;
-   if (oldcount & RWSEM_ACTIVE_MASK)
-   /* Someone grabbed the sem already */
-   goto undo_write;
-
-   /* We must be careful not to touch 'waiter' after we set ->task = NULL.
-* It is an allocated on the waiter's stack and may become invalid at
-* any time after that point (due to a wakeup from another source).
-*/
-   list_del(>list);
-   tsk = waiter->task;
-   smp_mb();
-   waiter->task = NULL;
-   wake_up_process(tsk);
-   put_task_struct(tsk);
+   /* Wake up the writing waiter and let the task grab the sem: */
+   wake_up_process(waiter->task);
goto out;
 
  readers_only:
@@ -157,12 +137,40 @@ __rwsem_do_wake(struct rw_semaphore *sem, int wake_type)
 
  out:
return sem;
+}
+
+/* Try to get write sem, caller holds sem->wait_lock: */
+static int try_get_writer_sem(struct rw_semaphore *sem,
+   struct rwsem_waiter *waiter)
+{
+   struct rwsem_waiter *fwaiter;
+   long oldcount, adjustment;
 
-   /* undo the change to the active count, but check for a transition
-* 1->0 */
- undo_write:
+   /* only steal when first waiter is writing */
+   fwaiter = list_entry(sem->wait_list.next, struct rwsem_waiter, list);
+   if (!(fwaiter->flags & RWSEM_WAITING_FOR_WRITE))
+   return 0;
+
+   adjustment = RWSEM_ACTIVE_WRITE_BIAS;
+   /* Only one waiter in the queue: */
+   if (fwaiter == waiter && waiter->list.next == >wait_list)
+   adjustment -= RWSEM_WAITING_BIAS;
+
+try_again_write:
+   oldcount = rwsem_atomic_update(adjustment, sem) - adjustment;
+   if (!(oldcount & RWSEM_ACTIVE_MASK)) {
+   /* No active lock: */
+   struct task_struct *tsk = waiter->task;
+
+   list_del(>list);
+   

[tip:core/locking] rwsem: Implement writer lock-stealing for better scalability

2013-02-22 Thread tip-bot for Alex Shi
Commit-ID:  ce6711f3d196f09ca0ed29a24dfad42d83912b20
Gitweb: http://git.kernel.org/tip/ce6711f3d196f09ca0ed29a24dfad42d83912b20
Author: Alex Shi alex@intel.com
AuthorDate: Tue, 5 Feb 2013 21:11:55 +0800
Committer:  Ingo Molnar mi...@kernel.org
CommitDate: Tue, 19 Feb 2013 08:42:43 +0100

rwsem: Implement writer lock-stealing for better scalability

Commit 5a505085f043 (mm/rmap: Convert the struct anon_vma::mutex
to an rwsem) changed struct anon_vma::mutex to an rwsem, which
caused aim7 fork_test performance to drop by 50%.

Yuanhan Liu did the following excellent analysis:

https://lkml.org/lkml/2013/1/29/84

and found that the regression is caused by strict, serialized,
FIFO sequential write-ownership of rwsems. Ingo suggested
implementing opportunistic lock-stealing for the front writer
task in the waitqueue.

Yuanhan Liu implemented lock-stealing for spinlock-rwsems,
which indeed recovered much of the regression - confirming
the analysis that the main factor in the regression was the
FIFO writer-fairness of rwsems.

In this patch we allow lock-stealing to happen when the first
waiter is also writer. With that change in place the
aim7 fork_test performance is fully recovered on my
Intel NHM EP, NHM EX, SNB EP 2S and 4S test-machines.

Reported-by: l...@linux.intel.com
Reported-by: Yuanhan Liu yuanhan@linux.intel.com
Signed-off-by: Alex Shi alex@intel.com
Cc: David Howells dhowe...@redhat.com
Cc: Michel Lespinasse wal...@google.com
Cc: Linus Torvalds torva...@linux-foundation.org
Cc: Andrew Morton a...@linux-foundation.org
Cc: Peter Zijlstra a.p.zijls...@chello.nl
Cc: Anton Blanchard an...@samba.org
Cc: Arjan van de Ven ar...@linux.intel.com
Cc: paul.gortma...@windriver.com
Link: https://lkml.org/lkml/2013/1/29/84
Link: 
http://lkml.kernel.org/r/1360069915-31619-1-git-send-email-alex@intel.com
[ Small stylistic fixes, updated changelog. ]
Signed-off-by: Ingo Molnar mi...@kernel.org
---
 lib/rwsem.c | 75 +
 1 file changed, 46 insertions(+), 29 deletions(-)

diff --git a/lib/rwsem.c b/lib/rwsem.c
index 8337e1b..ad5e0df 100644
--- a/lib/rwsem.c
+++ b/lib/rwsem.c
@@ -2,6 +2,8 @@
  *
  * Written by David Howells (dhowe...@redhat.com).
  * Derived from arch/i386/kernel/semaphore.c
+ *
+ * Writer lock-stealing by Alex Shi alex@intel.com
  */
 #include linux/rwsem.h
 #include linux/sched.h
@@ -60,7 +62,7 @@ __rwsem_do_wake(struct rw_semaphore *sem, int wake_type)
struct rwsem_waiter *waiter;
struct task_struct *tsk;
struct list_head *next;
-   signed long oldcount, woken, loop, adjustment;
+   signed long woken, loop, adjustment;
 
waiter = list_entry(sem-wait_list.next, struct rwsem_waiter, list);
if (!(waiter-flags  RWSEM_WAITING_FOR_WRITE))
@@ -72,30 +74,8 @@ __rwsem_do_wake(struct rw_semaphore *sem, int wake_type)
 */
goto out;
 
-   /* There's a writer at the front of the queue - try to grant it the
-* write lock.  However, we only wake this writer if we can transition
-* the active part of the count from 0 - 1
-*/
-   adjustment = RWSEM_ACTIVE_WRITE_BIAS;
-   if (waiter-list.next == sem-wait_list)
-   adjustment -= RWSEM_WAITING_BIAS;
-
- try_again_write:
-   oldcount = rwsem_atomic_update(adjustment, sem) - adjustment;
-   if (oldcount  RWSEM_ACTIVE_MASK)
-   /* Someone grabbed the sem already */
-   goto undo_write;
-
-   /* We must be careful not to touch 'waiter' after we set -task = NULL.
-* It is an allocated on the waiter's stack and may become invalid at
-* any time after that point (due to a wakeup from another source).
-*/
-   list_del(waiter-list);
-   tsk = waiter-task;
-   smp_mb();
-   waiter-task = NULL;
-   wake_up_process(tsk);
-   put_task_struct(tsk);
+   /* Wake up the writing waiter and let the task grab the sem: */
+   wake_up_process(waiter-task);
goto out;
 
  readers_only:
@@ -157,12 +137,40 @@ __rwsem_do_wake(struct rw_semaphore *sem, int wake_type)
 
  out:
return sem;
+}
+
+/* Try to get write sem, caller holds sem-wait_lock: */
+static int try_get_writer_sem(struct rw_semaphore *sem,
+   struct rwsem_waiter *waiter)
+{
+   struct rwsem_waiter *fwaiter;
+   long oldcount, adjustment;
 
-   /* undo the change to the active count, but check for a transition
-* 1-0 */
- undo_write:
+   /* only steal when first waiter is writing */
+   fwaiter = list_entry(sem-wait_list.next, struct rwsem_waiter, list);
+   if (!(fwaiter-flags  RWSEM_WAITING_FOR_WRITE))
+   return 0;
+
+   adjustment = RWSEM_ACTIVE_WRITE_BIAS;
+   /* Only one waiter in the queue: */
+   if (fwaiter == waiter  waiter-list.next == sem-wait_list)
+   adjustment -= RWSEM_WAITING_BIAS;
+

[tip:core/urgent] rwsem: Implement writer lock-stealing for better scalability

2013-02-06 Thread tip-bot for Alex Shi
Commit-ID:  3a15e0e0cdda5b401d0a36dd7e83406cd1ce0724
Gitweb: http://git.kernel.org/tip/3a15e0e0cdda5b401d0a36dd7e83406cd1ce0724
Author: Alex Shi 
AuthorDate: Tue, 5 Feb 2013 21:11:55 +0800
Committer:  Ingo Molnar 
CommitDate: Wed, 6 Feb 2013 12:41:43 +0100

rwsem: Implement writer lock-stealing for better scalability

Commit 5a505085f043 ("mm/rmap: Convert the struct anon_vma::mutex
to an rwsem") changed struct anon_vma::mutex to an rwsem, which
caused aim7 fork_test performance to drop by 50%.

Yuanhan Liu did the following excellent analysis:

https://lkml.org/lkml/2013/1/29/84

and found that the regression is caused by strict, serialized,
FIFO sequential write-ownership of rwsems. Ingo suggested
implementing opportunistic lock-stealing for the front writer
task in the waitqueue.

Yuanhan Liu implemented lock-stealing for spinlock-rwsems,
which indeed recovered much of the regression - confirming
the analysis that the main factor in the regression was the
FIFO writer-fairness of rwsems.

In this patch we allow lock-stealing to happen when the first
waiter is also writer. With that change in place the
aim7 fork_test performance is fully recovered on my
Intel NHM EP, NHM EX, SNB EP 2S and 4S test-machines.

Reported-by: l...@linux.intel.com
Reported-by: Yuanhan Liu 
Signed-off-by: Alex Shi 
Cc: David Howells 
Cc: Michel Lespinasse 
Cc: Linus Torvalds 
Cc: Andrew Morton 
Cc: Peter Zijlstra 
Cc: Anton Blanchard 
Cc: Arjan van de Ven 
Cc: paul.gortma...@windriver.com
Link: https://lkml.org/lkml/2013/1/29/84
Link: 
http://lkml.kernel.org/r/1360069915-31619-1-git-send-email-alex@intel.com
[ Small stylistic fixes, updated changelog. ]
Signed-off-by: Ingo Molnar 
---
 lib/rwsem.c | 75 +
 1 file changed, 46 insertions(+), 29 deletions(-)

diff --git a/lib/rwsem.c b/lib/rwsem.c
index 8337e1b..ad5e0df 100644
--- a/lib/rwsem.c
+++ b/lib/rwsem.c
@@ -2,6 +2,8 @@
  *
  * Written by David Howells (dhowe...@redhat.com).
  * Derived from arch/i386/kernel/semaphore.c
+ *
+ * Writer lock-stealing by Alex Shi 
  */
 #include 
 #include 
@@ -60,7 +62,7 @@ __rwsem_do_wake(struct rw_semaphore *sem, int wake_type)
struct rwsem_waiter *waiter;
struct task_struct *tsk;
struct list_head *next;
-   signed long oldcount, woken, loop, adjustment;
+   signed long woken, loop, adjustment;
 
waiter = list_entry(sem->wait_list.next, struct rwsem_waiter, list);
if (!(waiter->flags & RWSEM_WAITING_FOR_WRITE))
@@ -72,30 +74,8 @@ __rwsem_do_wake(struct rw_semaphore *sem, int wake_type)
 */
goto out;
 
-   /* There's a writer at the front of the queue - try to grant it the
-* write lock.  However, we only wake this writer if we can transition
-* the active part of the count from 0 -> 1
-*/
-   adjustment = RWSEM_ACTIVE_WRITE_BIAS;
-   if (waiter->list.next == >wait_list)
-   adjustment -= RWSEM_WAITING_BIAS;
-
- try_again_write:
-   oldcount = rwsem_atomic_update(adjustment, sem) - adjustment;
-   if (oldcount & RWSEM_ACTIVE_MASK)
-   /* Someone grabbed the sem already */
-   goto undo_write;
-
-   /* We must be careful not to touch 'waiter' after we set ->task = NULL.
-* It is an allocated on the waiter's stack and may become invalid at
-* any time after that point (due to a wakeup from another source).
-*/
-   list_del(>list);
-   tsk = waiter->task;
-   smp_mb();
-   waiter->task = NULL;
-   wake_up_process(tsk);
-   put_task_struct(tsk);
+   /* Wake up the writing waiter and let the task grab the sem: */
+   wake_up_process(waiter->task);
goto out;
 
  readers_only:
@@ -157,12 +137,40 @@ __rwsem_do_wake(struct rw_semaphore *sem, int wake_type)
 
  out:
return sem;
+}
+
+/* Try to get write sem, caller holds sem->wait_lock: */
+static int try_get_writer_sem(struct rw_semaphore *sem,
+   struct rwsem_waiter *waiter)
+{
+   struct rwsem_waiter *fwaiter;
+   long oldcount, adjustment;
 
-   /* undo the change to the active count, but check for a transition
-* 1->0 */
- undo_write:
+   /* only steal when first waiter is writing */
+   fwaiter = list_entry(sem->wait_list.next, struct rwsem_waiter, list);
+   if (!(fwaiter->flags & RWSEM_WAITING_FOR_WRITE))
+   return 0;
+
+   adjustment = RWSEM_ACTIVE_WRITE_BIAS;
+   /* Only one waiter in the queue: */
+   if (fwaiter == waiter && waiter->list.next == >wait_list)
+   adjustment -= RWSEM_WAITING_BIAS;
+
+try_again_write:
+   oldcount = rwsem_atomic_update(adjustment, sem) - adjustment;
+   if (!(oldcount & RWSEM_ACTIVE_MASK)) {
+   /* No active lock: */
+   struct task_struct *tsk = waiter->task;
+
+   list_del(>list);
+   smp_mb();

[tip:core/urgent] rwsem: Implement writer lock-stealing for better scalability

2013-02-06 Thread tip-bot for Alex Shi
Commit-ID:  3a15e0e0cdda5b401d0a36dd7e83406cd1ce0724
Gitweb: http://git.kernel.org/tip/3a15e0e0cdda5b401d0a36dd7e83406cd1ce0724
Author: Alex Shi alex@intel.com
AuthorDate: Tue, 5 Feb 2013 21:11:55 +0800
Committer:  Ingo Molnar mi...@kernel.org
CommitDate: Wed, 6 Feb 2013 12:41:43 +0100

rwsem: Implement writer lock-stealing for better scalability

Commit 5a505085f043 (mm/rmap: Convert the struct anon_vma::mutex
to an rwsem) changed struct anon_vma::mutex to an rwsem, which
caused aim7 fork_test performance to drop by 50%.

Yuanhan Liu did the following excellent analysis:

https://lkml.org/lkml/2013/1/29/84

and found that the regression is caused by strict, serialized,
FIFO sequential write-ownership of rwsems. Ingo suggested
implementing opportunistic lock-stealing for the front writer
task in the waitqueue.

Yuanhan Liu implemented lock-stealing for spinlock-rwsems,
which indeed recovered much of the regression - confirming
the analysis that the main factor in the regression was the
FIFO writer-fairness of rwsems.

In this patch we allow lock-stealing to happen when the first
waiter is also writer. With that change in place the
aim7 fork_test performance is fully recovered on my
Intel NHM EP, NHM EX, SNB EP 2S and 4S test-machines.

Reported-by: l...@linux.intel.com
Reported-by: Yuanhan Liu yuanhan@linux.intel.com
Signed-off-by: Alex Shi alex@intel.com
Cc: David Howells dhowe...@redhat.com
Cc: Michel Lespinasse wal...@google.com
Cc: Linus Torvalds torva...@linux-foundation.org
Cc: Andrew Morton a...@linux-foundation.org
Cc: Peter Zijlstra a.p.zijls...@chello.nl
Cc: Anton Blanchard an...@samba.org
Cc: Arjan van de Ven ar...@linux.intel.com
Cc: paul.gortma...@windriver.com
Link: https://lkml.org/lkml/2013/1/29/84
Link: 
http://lkml.kernel.org/r/1360069915-31619-1-git-send-email-alex@intel.com
[ Small stylistic fixes, updated changelog. ]
Signed-off-by: Ingo Molnar mi...@kernel.org
---
 lib/rwsem.c | 75 +
 1 file changed, 46 insertions(+), 29 deletions(-)

diff --git a/lib/rwsem.c b/lib/rwsem.c
index 8337e1b..ad5e0df 100644
--- a/lib/rwsem.c
+++ b/lib/rwsem.c
@@ -2,6 +2,8 @@
  *
  * Written by David Howells (dhowe...@redhat.com).
  * Derived from arch/i386/kernel/semaphore.c
+ *
+ * Writer lock-stealing by Alex Shi alex@intel.com
  */
 #include linux/rwsem.h
 #include linux/sched.h
@@ -60,7 +62,7 @@ __rwsem_do_wake(struct rw_semaphore *sem, int wake_type)
struct rwsem_waiter *waiter;
struct task_struct *tsk;
struct list_head *next;
-   signed long oldcount, woken, loop, adjustment;
+   signed long woken, loop, adjustment;
 
waiter = list_entry(sem-wait_list.next, struct rwsem_waiter, list);
if (!(waiter-flags  RWSEM_WAITING_FOR_WRITE))
@@ -72,30 +74,8 @@ __rwsem_do_wake(struct rw_semaphore *sem, int wake_type)
 */
goto out;
 
-   /* There's a writer at the front of the queue - try to grant it the
-* write lock.  However, we only wake this writer if we can transition
-* the active part of the count from 0 - 1
-*/
-   adjustment = RWSEM_ACTIVE_WRITE_BIAS;
-   if (waiter-list.next == sem-wait_list)
-   adjustment -= RWSEM_WAITING_BIAS;
-
- try_again_write:
-   oldcount = rwsem_atomic_update(adjustment, sem) - adjustment;
-   if (oldcount  RWSEM_ACTIVE_MASK)
-   /* Someone grabbed the sem already */
-   goto undo_write;
-
-   /* We must be careful not to touch 'waiter' after we set -task = NULL.
-* It is an allocated on the waiter's stack and may become invalid at
-* any time after that point (due to a wakeup from another source).
-*/
-   list_del(waiter-list);
-   tsk = waiter-task;
-   smp_mb();
-   waiter-task = NULL;
-   wake_up_process(tsk);
-   put_task_struct(tsk);
+   /* Wake up the writing waiter and let the task grab the sem: */
+   wake_up_process(waiter-task);
goto out;
 
  readers_only:
@@ -157,12 +137,40 @@ __rwsem_do_wake(struct rw_semaphore *sem, int wake_type)
 
  out:
return sem;
+}
+
+/* Try to get write sem, caller holds sem-wait_lock: */
+static int try_get_writer_sem(struct rw_semaphore *sem,
+   struct rwsem_waiter *waiter)
+{
+   struct rwsem_waiter *fwaiter;
+   long oldcount, adjustment;
 
-   /* undo the change to the active count, but check for a transition
-* 1-0 */
- undo_write:
+   /* only steal when first waiter is writing */
+   fwaiter = list_entry(sem-wait_list.next, struct rwsem_waiter, list);
+   if (!(fwaiter-flags  RWSEM_WAITING_FOR_WRITE))
+   return 0;
+
+   adjustment = RWSEM_ACTIVE_WRITE_BIAS;
+   /* Only one waiter in the queue: */
+   if (fwaiter == waiter  waiter-list.next == sem-wait_list)
+   adjustment -= RWSEM_WAITING_BIAS;
+

[tip:sched/core] sched/nohz: Clean up select_nohz_load_balancer()

2012-09-14 Thread tip-bot for Alex Shi
Commit-ID:  c1cc017c59c44d9ede7003631c43adc0cfdce2f9
Gitweb: http://git.kernel.org/tip/c1cc017c59c44d9ede7003631c43adc0cfdce2f9
Author: Alex Shi 
AuthorDate: Mon, 10 Sep 2012 15:10:58 +0800
Committer:  Ingo Molnar 
CommitDate: Thu, 13 Sep 2012 16:52:05 +0200

sched/nohz: Clean up select_nohz_load_balancer()

There is no load_balancer to be selected now. It just sets the
state of the nohz tick to stop.

So rename the function, pass the 'cpu' as a parameter and then
remove the useless call from tick_nohz_restart_sched_tick().

[ s/set_nohz_tick_stopped/nohz_balance_enter_idle/g
  s/clear_nohz_tick_stopped/nohz_balance_exit_idle/g ]
Signed-off-by: Alex Shi 
Acked-by: Suresh Siddha 
Cc: Venkatesh Pallipadi 
Signed-off-by: Peter Zijlstra 
Link: 
http://lkml.kernel.org/r/1347261059-24747-1-git-send-email-alex@intel.com
Signed-off-by: Ingo Molnar 
---
 include/linux/sched.h|4 ++--
 kernel/sched/fair.c  |   25 ++---
 kernel/time/tick-sched.c |3 +--
 3 files changed, 13 insertions(+), 19 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 60e5e38..8c38df0 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -273,11 +273,11 @@ extern void init_idle_bootup_task(struct task_struct 
*idle);
 extern int runqueue_is_locked(int cpu);
 
 #if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ)
-extern void select_nohz_load_balancer(int stop_tick);
+extern void nohz_balance_enter_idle(int cpu);
 extern void set_cpu_sd_state_idle(void);
 extern int get_nohz_timer_target(void);
 #else
-static inline void select_nohz_load_balancer(int stop_tick) { }
+static inline void nohz_balance_enter_idle(int cpu) { }
 static inline void set_cpu_sd_state_idle(void) { }
 #endif
 
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 9ae3a5b..de596a2 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4603,7 +4603,7 @@ static void nohz_balancer_kick(int cpu)
return;
 }
 
-static inline void clear_nohz_tick_stopped(int cpu)
+static inline void nohz_balance_exit_idle(int cpu)
 {
if (unlikely(test_bit(NOHZ_TICK_STOPPED, nohz_flags(cpu {
cpumask_clear_cpu(cpu, nohz.idle_cpus_mask);
@@ -4643,28 +4643,23 @@ void set_cpu_sd_state_idle(void)
 }
 
 /*
- * This routine will record that this cpu is going idle with tick stopped.
+ * This routine will record that the cpu is going idle with tick stopped.
  * This info will be used in performing idle load balancing in the future.
  */
-void select_nohz_load_balancer(int stop_tick)
+void nohz_balance_enter_idle(int cpu)
 {
-   int cpu = smp_processor_id();
-
/*
 * If this cpu is going down, then nothing needs to be done.
 */
if (!cpu_active(cpu))
return;
 
-   if (stop_tick) {
-   if (test_bit(NOHZ_TICK_STOPPED, nohz_flags(cpu)))
-   return;
+   if (test_bit(NOHZ_TICK_STOPPED, nohz_flags(cpu)))
+   return;
 
-   cpumask_set_cpu(cpu, nohz.idle_cpus_mask);
-   atomic_inc(_cpus);
-   set_bit(NOHZ_TICK_STOPPED, nohz_flags(cpu));
-   }
-   return;
+   cpumask_set_cpu(cpu, nohz.idle_cpus_mask);
+   atomic_inc(_cpus);
+   set_bit(NOHZ_TICK_STOPPED, nohz_flags(cpu));
 }
 
 static int __cpuinit sched_ilb_notifier(struct notifier_block *nfb,
@@ -4672,7 +4667,7 @@ static int __cpuinit sched_ilb_notifier(struct 
notifier_block *nfb,
 {
switch (action & ~CPU_TASKS_FROZEN) {
case CPU_DYING:
-   clear_nohz_tick_stopped(smp_processor_id());
+   nohz_balance_exit_idle(smp_processor_id());
return NOTIFY_OK;
default:
return NOTIFY_DONE;
@@ -4833,7 +4828,7 @@ static inline int nohz_kick_needed(struct rq *rq, int cpu)
* busy tick after returning from idle, we will update the busy stats.
*/
set_cpu_sd_state_busy();
-   clear_nohz_tick_stopped(cpu);
+   nohz_balance_exit_idle(cpu);
 
/*
 * None are in tickless mode and hence no need for NOHZ idle load
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 3a9e5d5..1a5ee90 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -372,7 +372,7 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched 
*ts,
 * the scheduler tick in nohz_restart_sched_tick.
 */
if (!ts->tick_stopped) {
-   select_nohz_load_balancer(1);
+   nohz_balance_enter_idle(cpu);
calc_load_enter_idle();
 
ts->last_tick = hrtimer_get_expires(>sched_timer);
@@ -569,7 +569,6 @@ static void tick_nohz_restart(struct tick_sched *ts, 
ktime_t now)
 static void tick_nohz_restart_sched_tick(struct tick_sched *ts, ktime_t now)
 {
/* Update jiffies first */
-   select_nohz_load_balancer(0);

[tip:sched/core] sched/nohz: Clean up select_nohz_load_balancer()

2012-09-14 Thread tip-bot for Alex Shi
Commit-ID:  c1cc017c59c44d9ede7003631c43adc0cfdce2f9
Gitweb: http://git.kernel.org/tip/c1cc017c59c44d9ede7003631c43adc0cfdce2f9
Author: Alex Shi alex@intel.com
AuthorDate: Mon, 10 Sep 2012 15:10:58 +0800
Committer:  Ingo Molnar mi...@kernel.org
CommitDate: Thu, 13 Sep 2012 16:52:05 +0200

sched/nohz: Clean up select_nohz_load_balancer()

There is no load_balancer to be selected now. It just sets the
state of the nohz tick to stop.

So rename the function, pass the 'cpu' as a parameter and then
remove the useless call from tick_nohz_restart_sched_tick().

[ s/set_nohz_tick_stopped/nohz_balance_enter_idle/g
  s/clear_nohz_tick_stopped/nohz_balance_exit_idle/g ]
Signed-off-by: Alex Shi alex@intel.com
Acked-by: Suresh Siddha suresh.b.sid...@intel.com
Cc: Venkatesh Pallipadi ve...@google.com
Signed-off-by: Peter Zijlstra a.p.zijls...@chello.nl
Link: 
http://lkml.kernel.org/r/1347261059-24747-1-git-send-email-alex@intel.com
Signed-off-by: Ingo Molnar mi...@kernel.org
---
 include/linux/sched.h|4 ++--
 kernel/sched/fair.c  |   25 ++---
 kernel/time/tick-sched.c |3 +--
 3 files changed, 13 insertions(+), 19 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 60e5e38..8c38df0 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -273,11 +273,11 @@ extern void init_idle_bootup_task(struct task_struct 
*idle);
 extern int runqueue_is_locked(int cpu);
 
 #if defined(CONFIG_SMP)  defined(CONFIG_NO_HZ)
-extern void select_nohz_load_balancer(int stop_tick);
+extern void nohz_balance_enter_idle(int cpu);
 extern void set_cpu_sd_state_idle(void);
 extern int get_nohz_timer_target(void);
 #else
-static inline void select_nohz_load_balancer(int stop_tick) { }
+static inline void nohz_balance_enter_idle(int cpu) { }
 static inline void set_cpu_sd_state_idle(void) { }
 #endif
 
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 9ae3a5b..de596a2 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4603,7 +4603,7 @@ static void nohz_balancer_kick(int cpu)
return;
 }
 
-static inline void clear_nohz_tick_stopped(int cpu)
+static inline void nohz_balance_exit_idle(int cpu)
 {
if (unlikely(test_bit(NOHZ_TICK_STOPPED, nohz_flags(cpu {
cpumask_clear_cpu(cpu, nohz.idle_cpus_mask);
@@ -4643,28 +4643,23 @@ void set_cpu_sd_state_idle(void)
 }
 
 /*
- * This routine will record that this cpu is going idle with tick stopped.
+ * This routine will record that the cpu is going idle with tick stopped.
  * This info will be used in performing idle load balancing in the future.
  */
-void select_nohz_load_balancer(int stop_tick)
+void nohz_balance_enter_idle(int cpu)
 {
-   int cpu = smp_processor_id();
-
/*
 * If this cpu is going down, then nothing needs to be done.
 */
if (!cpu_active(cpu))
return;
 
-   if (stop_tick) {
-   if (test_bit(NOHZ_TICK_STOPPED, nohz_flags(cpu)))
-   return;
+   if (test_bit(NOHZ_TICK_STOPPED, nohz_flags(cpu)))
+   return;
 
-   cpumask_set_cpu(cpu, nohz.idle_cpus_mask);
-   atomic_inc(nohz.nr_cpus);
-   set_bit(NOHZ_TICK_STOPPED, nohz_flags(cpu));
-   }
-   return;
+   cpumask_set_cpu(cpu, nohz.idle_cpus_mask);
+   atomic_inc(nohz.nr_cpus);
+   set_bit(NOHZ_TICK_STOPPED, nohz_flags(cpu));
 }
 
 static int __cpuinit sched_ilb_notifier(struct notifier_block *nfb,
@@ -4672,7 +4667,7 @@ static int __cpuinit sched_ilb_notifier(struct 
notifier_block *nfb,
 {
switch (action  ~CPU_TASKS_FROZEN) {
case CPU_DYING:
-   clear_nohz_tick_stopped(smp_processor_id());
+   nohz_balance_exit_idle(smp_processor_id());
return NOTIFY_OK;
default:
return NOTIFY_DONE;
@@ -4833,7 +4828,7 @@ static inline int nohz_kick_needed(struct rq *rq, int cpu)
* busy tick after returning from idle, we will update the busy stats.
*/
set_cpu_sd_state_busy();
-   clear_nohz_tick_stopped(cpu);
+   nohz_balance_exit_idle(cpu);
 
/*
 * None are in tickless mode and hence no need for NOHZ idle load
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 3a9e5d5..1a5ee90 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -372,7 +372,7 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched 
*ts,
 * the scheduler tick in nohz_restart_sched_tick.
 */
if (!ts-tick_stopped) {
-   select_nohz_load_balancer(1);
+   nohz_balance_enter_idle(cpu);
calc_load_enter_idle();
 
ts-last_tick = hrtimer_get_expires(ts-sched_timer);
@@ -569,7 +569,6 @@ static void tick_nohz_restart(struct tick_sched *ts, 
ktime_t now)
 static void tick_nohz_restart_sched_tick(struct 

[tip:sched/core] tile: Remove SD_PREFER_LOCAL leftover

2012-08-15 Thread tip-bot for Alex Shi
Commit-ID:  c7660994ed6b44d17dad0aac0d156da1e0a2f003
Gitweb: http://git.kernel.org/tip/c7660994ed6b44d17dad0aac0d156da1e0a2f003
Author: Alex Shi 
AuthorDate: Wed, 15 Aug 2012 08:14:36 +0800
Committer:  Thomas Gleixner 
CommitDate: Wed, 15 Aug 2012 13:22:55 +0200

tile: Remove SD_PREFER_LOCAL leftover

commit (sched: recover SD_WAKE_AFFINE in select_task_rq_fair and code
clean up) removed SD_PREFER_LOCAL, but left a SD_PREFER_LOCAL usage in
arch/tile code. That breaks the arch/tile build.

Reported-by: Fengguang Wu 
Signed-off-by: Alex Shi 
Acked-by: Peter Zijlstra 
Link: http://lkml.kernel.org/r/502af3e6.3050...@intel.com
Signed-off-by: Thomas Gleixner 
---
 arch/tile/include/asm/topology.h |1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/arch/tile/include/asm/topology.h b/arch/tile/include/asm/topology.h
index 7a7ce39..d5e86c9 100644
--- a/arch/tile/include/asm/topology.h
+++ b/arch/tile/include/asm/topology.h
@@ -69,7 +69,6 @@ static inline const struct cpumask *cpumask_of_node(int node)
| 1*SD_BALANCE_FORK \
| 0*SD_BALANCE_WAKE \
| 0*SD_WAKE_AFFINE  \
-   | 0*SD_PREFER_LOCAL \
| 0*SD_SHARE_CPUPOWER   \
| 0*SD_SHARE_PKG_RESOURCES  \
| 0*SD_SERIALIZE\
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:sched/core] tile: Remove SD_PREFER_LOCAL leftover

2012-08-15 Thread tip-bot for Alex Shi
Commit-ID:  c7660994ed6b44d17dad0aac0d156da1e0a2f003
Gitweb: http://git.kernel.org/tip/c7660994ed6b44d17dad0aac0d156da1e0a2f003
Author: Alex Shi alex@intel.com
AuthorDate: Wed, 15 Aug 2012 08:14:36 +0800
Committer:  Thomas Gleixner t...@linutronix.de
CommitDate: Wed, 15 Aug 2012 13:22:55 +0200

tile: Remove SD_PREFER_LOCAL leftover

commit (sched: recover SD_WAKE_AFFINE in select_task_rq_fair and code
clean up) removed SD_PREFER_LOCAL, but left a SD_PREFER_LOCAL usage in
arch/tile code. That breaks the arch/tile build.

Reported-by: Fengguang Wu fengguang...@intel.com
Signed-off-by: Alex Shi alex@intel.com
Acked-by: Peter Zijlstra a.p.zijls...@chello.nl
Link: http://lkml.kernel.org/r/502af3e6.3050...@intel.com
Signed-off-by: Thomas Gleixner t...@linutronix.de
---
 arch/tile/include/asm/topology.h |1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/arch/tile/include/asm/topology.h b/arch/tile/include/asm/topology.h
index 7a7ce39..d5e86c9 100644
--- a/arch/tile/include/asm/topology.h
+++ b/arch/tile/include/asm/topology.h
@@ -69,7 +69,6 @@ static inline const struct cpumask *cpumask_of_node(int node)
| 1*SD_BALANCE_FORK \
| 0*SD_BALANCE_WAKE \
| 0*SD_WAKE_AFFINE  \
-   | 0*SD_PREFER_LOCAL \
| 0*SD_SHARE_CPUPOWER   \
| 0*SD_SHARE_PKG_RESOURCES  \
| 0*SD_SERIALIZE\
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:sched/core] sched: recover SD_WAKE_AFFINE in select_task_rq_fair and code clean up

2012-08-13 Thread tip-bot for Alex Shi
Commit-ID:  f03542a7019c600163ac4441d8a826c92c1bd510
Gitweb: http://git.kernel.org/tip/f03542a7019c600163ac4441d8a826c92c1bd510
Author: Alex Shi 
AuthorDate: Thu, 26 Jul 2012 08:55:34 +0800
Committer:  Thomas Gleixner 
CommitDate: Mon, 13 Aug 2012 19:02:05 +0200

sched: recover SD_WAKE_AFFINE in select_task_rq_fair and code clean up

Since power saving code was removed from sched now, the implement
code is out of service in this function, and even pollute other logical.
like, 'want_sd' never has chance to be set '0', that remove the effect
of SD_WAKE_AFFINE here.

So, clean up the obsolete code, includes SD_PREFER_LOCAL.

Signed-off-by: Alex Shi 
Signed-off-by: Peter Zijlstra 
Link: http://lkml.kernel.org/r/5028f431.6000...@intel.com
Signed-off-by: Thomas Gleixner 
---
 include/linux/sched.h|1 -
 include/linux/topology.h |2 --
 kernel/sched/core.c  |1 -
 kernel/sched/fair.c  |   34 +++---
 4 files changed, 3 insertions(+), 35 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index b8c8664..f3eebc1 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -860,7 +860,6 @@ enum cpu_idle_type {
 #define SD_BALANCE_FORK0x0008  /* Balance on fork, clone */
 #define SD_BALANCE_WAKE0x0010  /* Balance on wakeup */
 #define SD_WAKE_AFFINE 0x0020  /* Wake task to waking CPU */
-#define SD_PREFER_LOCAL0x0040  /* Prefer to keep tasks local 
to this domain */
 #define SD_SHARE_CPUPOWER  0x0080  /* Domain members share cpu power */
 #define SD_SHARE_PKG_RESOURCES 0x0200  /* Domain members share cpu pkg 
resources */
 #define SD_SERIALIZE   0x0400  /* Only a single load balancing 
instance */
diff --git a/include/linux/topology.h b/include/linux/topology.h
index fec12d6..d3cf0d6 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -129,7 +129,6 @@ int arch_update_cpu_topology(void);
| 1*SD_BALANCE_FORK \
| 0*SD_BALANCE_WAKE \
| 1*SD_WAKE_AFFINE  \
-   | 0*SD_PREFER_LOCAL \
| 0*SD_SHARE_CPUPOWER   \
| 1*SD_SHARE_PKG_RESOURCES  \
| 0*SD_SERIALIZE\
@@ -160,7 +159,6 @@ int arch_update_cpu_topology(void);
| 1*SD_BALANCE_FORK \
| 0*SD_BALANCE_WAKE \
| 1*SD_WAKE_AFFINE  \
-   | 0*SD_PREFER_LOCAL \
| 0*SD_SHARE_CPUPOWER   \
| 0*SD_SHARE_PKG_RESOURCES  \
| 0*SD_SERIALIZE\
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c9a3655..4376c9f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6622,7 +6622,6 @@ sd_numa_init(struct sched_domain_topology_level *tl, int 
cpu)
| 0*SD_BALANCE_FORK
| 0*SD_BALANCE_WAKE
| 0*SD_WAKE_AFFINE
-   | 0*SD_PREFER_LOCAL
| 0*SD_SHARE_CPUPOWER
| 0*SD_SHARE_PKG_RESOURCES
| 1*SD_SERIALIZE
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 287bfac..01d3eda 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2686,7 +2686,6 @@ select_task_rq_fair(struct task_struct *p, int sd_flag, 
int wake_flags)
int prev_cpu = task_cpu(p);
int new_cpu = cpu;
int want_affine = 0;
-   int want_sd = 1;
int sync = wake_flags & WF_SYNC;
 
if (p->nr_cpus_allowed == 1)
@@ -2704,48 +2703,21 @@ select_task_rq_fair(struct task_struct *p, int sd_flag, 
int wake_flags)
continue;
 
/*
-* If power savings logic is enabled for a domain, see if we
-* are not overloaded, if so, don't balance wider.
-*/
-   if (tmp->flags & (SD_PREFER_LOCAL)) {
-   unsigned long power = 0;
-   unsigned long nr_running = 0;
-   unsigned long capacity;
-   int i;
-
-   for_each_cpu(i, sched_domain_span(tmp)) {
-   power += power_of(i);
-   nr_running += cpu_rq(i)->cfs.nr_running;
-   }
-
-   capacity = DIV_ROUND_CLOSEST(power, 

[tip:sched/core] sched: recover SD_WAKE_AFFINE in select_task_rq_fair and code clean up

2012-08-13 Thread tip-bot for Alex Shi
Commit-ID:  f03542a7019c600163ac4441d8a826c92c1bd510
Gitweb: http://git.kernel.org/tip/f03542a7019c600163ac4441d8a826c92c1bd510
Author: Alex Shi alex@intel.com
AuthorDate: Thu, 26 Jul 2012 08:55:34 +0800
Committer:  Thomas Gleixner t...@linutronix.de
CommitDate: Mon, 13 Aug 2012 19:02:05 +0200

sched: recover SD_WAKE_AFFINE in select_task_rq_fair and code clean up

Since power saving code was removed from sched now, the implement
code is out of service in this function, and even pollute other logical.
like, 'want_sd' never has chance to be set '0', that remove the effect
of SD_WAKE_AFFINE here.

So, clean up the obsolete code, includes SD_PREFER_LOCAL.

Signed-off-by: Alex Shi alex@intel.com
Signed-off-by: Peter Zijlstra a.p.zijls...@chello.nl
Link: http://lkml.kernel.org/r/5028f431.6000...@intel.com
Signed-off-by: Thomas Gleixner t...@linutronix.de
---
 include/linux/sched.h|1 -
 include/linux/topology.h |2 --
 kernel/sched/core.c  |1 -
 kernel/sched/fair.c  |   34 +++---
 4 files changed, 3 insertions(+), 35 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index b8c8664..f3eebc1 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -860,7 +860,6 @@ enum cpu_idle_type {
 #define SD_BALANCE_FORK0x0008  /* Balance on fork, clone */
 #define SD_BALANCE_WAKE0x0010  /* Balance on wakeup */
 #define SD_WAKE_AFFINE 0x0020  /* Wake task to waking CPU */
-#define SD_PREFER_LOCAL0x0040  /* Prefer to keep tasks local 
to this domain */
 #define SD_SHARE_CPUPOWER  0x0080  /* Domain members share cpu power */
 #define SD_SHARE_PKG_RESOURCES 0x0200  /* Domain members share cpu pkg 
resources */
 #define SD_SERIALIZE   0x0400  /* Only a single load balancing 
instance */
diff --git a/include/linux/topology.h b/include/linux/topology.h
index fec12d6..d3cf0d6 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -129,7 +129,6 @@ int arch_update_cpu_topology(void);
| 1*SD_BALANCE_FORK \
| 0*SD_BALANCE_WAKE \
| 1*SD_WAKE_AFFINE  \
-   | 0*SD_PREFER_LOCAL \
| 0*SD_SHARE_CPUPOWER   \
| 1*SD_SHARE_PKG_RESOURCES  \
| 0*SD_SERIALIZE\
@@ -160,7 +159,6 @@ int arch_update_cpu_topology(void);
| 1*SD_BALANCE_FORK \
| 0*SD_BALANCE_WAKE \
| 1*SD_WAKE_AFFINE  \
-   | 0*SD_PREFER_LOCAL \
| 0*SD_SHARE_CPUPOWER   \
| 0*SD_SHARE_PKG_RESOURCES  \
| 0*SD_SERIALIZE\
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c9a3655..4376c9f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6622,7 +6622,6 @@ sd_numa_init(struct sched_domain_topology_level *tl, int 
cpu)
| 0*SD_BALANCE_FORK
| 0*SD_BALANCE_WAKE
| 0*SD_WAKE_AFFINE
-   | 0*SD_PREFER_LOCAL
| 0*SD_SHARE_CPUPOWER
| 0*SD_SHARE_PKG_RESOURCES
| 1*SD_SERIALIZE
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 287bfac..01d3eda 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2686,7 +2686,6 @@ select_task_rq_fair(struct task_struct *p, int sd_flag, 
int wake_flags)
int prev_cpu = task_cpu(p);
int new_cpu = cpu;
int want_affine = 0;
-   int want_sd = 1;
int sync = wake_flags  WF_SYNC;
 
if (p-nr_cpus_allowed == 1)
@@ -2704,48 +2703,21 @@ select_task_rq_fair(struct task_struct *p, int sd_flag, 
int wake_flags)
continue;
 
/*
-* If power savings logic is enabled for a domain, see if we
-* are not overloaded, if so, don't balance wider.
-*/
-   if (tmp-flags  (SD_PREFER_LOCAL)) {
-   unsigned long power = 0;
-   unsigned long nr_running = 0;
-   unsigned long capacity;
-   int i;
-
-   for_each_cpu(i, sched_domain_span(tmp)) {
-   power += power_of(i);
-   nr_running += cpu_rq(i)-cfs.nr_running;
- 

[tip:sched/urgent] sched/numa: Add SD_PERFER_SIBLING to CPU domain

2012-07-26 Thread tip-bot for Alex Shi
Commit-ID:  6956dc568f34107f1d02b24f87efe7250803fc87
Gitweb: http://git.kernel.org/tip/6956dc568f34107f1d02b24f87efe7250803fc87
Author: Alex Shi 
AuthorDate: Fri, 20 Jul 2012 14:19:50 +0800
Committer:  Ingo Molnar 
CommitDate: Thu, 26 Jul 2012 11:46:58 +0200

sched/numa: Add SD_PERFER_SIBLING to CPU domain

Commit 8e7fbcbc22c ("sched: Remove stale power aware scheduling remnants
and dysfunctional knobs") removed SD_PERFER_SIBLING from the CPU domain.

On NUMA machines this causes that load_balance() doesn't perfer LCPU in
 same physical CPU package.

It causes some actual performance regressions on our NUMA machines from
Core2 to NHM and SNB.

Adding this domain flag again recovers the performance drop.

This change doesn't have any bad impact on any of my benchmarks:
 specjbb, kbuild, fio, hackbench .. etc, on all my machines.

Signed-off-by: Alex Shi 
Signed-off-by: Peter Zijlstra 
Link: 
http://lkml.kernel.org/r/1342765190-21540-1-git-send-email-alex@intel.com
Signed-off-by: Ingo Molnar 
---
 include/linux/topology.h |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/include/linux/topology.h b/include/linux/topology.h
index e91cd43..fec12d6 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -164,6 +164,7 @@ int arch_update_cpu_topology(void);
| 0*SD_SHARE_CPUPOWER   \
| 0*SD_SHARE_PKG_RESOURCES  \
| 0*SD_SERIALIZE\
+   | 1*SD_PREFER_SIBLING   \
,   \
.last_balance   = jiffies,  \
.balance_interval   = 1,\
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:sched/urgent] sched/numa: Add SD_PERFER_SIBLING to CPU domain

2012-07-26 Thread tip-bot for Alex Shi
Commit-ID:  6956dc568f34107f1d02b24f87efe7250803fc87
Gitweb: http://git.kernel.org/tip/6956dc568f34107f1d02b24f87efe7250803fc87
Author: Alex Shi alex@intel.com
AuthorDate: Fri, 20 Jul 2012 14:19:50 +0800
Committer:  Ingo Molnar mi...@kernel.org
CommitDate: Thu, 26 Jul 2012 11:46:58 +0200

sched/numa: Add SD_PERFER_SIBLING to CPU domain

Commit 8e7fbcbc22c (sched: Remove stale power aware scheduling remnants
and dysfunctional knobs) removed SD_PERFER_SIBLING from the CPU domain.

On NUMA machines this causes that load_balance() doesn't perfer LCPU in
 same physical CPU package.

It causes some actual performance regressions on our NUMA machines from
Core2 to NHM and SNB.

Adding this domain flag again recovers the performance drop.

This change doesn't have any bad impact on any of my benchmarks:
 specjbb, kbuild, fio, hackbench .. etc, on all my machines.

Signed-off-by: Alex Shi alex@intel.com
Signed-off-by: Peter Zijlstra a.p.zijls...@chello.nl
Link: 
http://lkml.kernel.org/r/1342765190-21540-1-git-send-email-alex@intel.com
Signed-off-by: Ingo Molnar mi...@kernel.org
---
 include/linux/topology.h |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/include/linux/topology.h b/include/linux/topology.h
index e91cd43..fec12d6 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -164,6 +164,7 @@ int arch_update_cpu_topology(void);
| 0*SD_SHARE_CPUPOWER   \
| 0*SD_SHARE_PKG_RESOURCES  \
| 0*SD_SERIALIZE\
+   | 1*SD_PREFER_SIBLING   \
,   \
.last_balance   = jiffies,  \
.balance_interval   = 1,\
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:x86/mm] x86/tlb: Fix build warning and crash when building for !SMP

2012-07-20 Thread tip-bot for Alex Shi
Commit-ID:  7efa1c87963d23cc57ba40c07316d3e28cc75a3a
Gitweb: http://git.kernel.org/tip/7efa1c87963d23cc57ba40c07316d3e28cc75a3a
Author: Alex Shi 
AuthorDate: Fri, 20 Jul 2012 09:18:23 +0800
Committer:  H. Peter Anvin 
CommitDate: Fri, 20 Jul 2012 15:01:48 -0700

x86/tlb: Fix build warning and crash when building for !SMP

The incompatible parameter of flush_tlb_mm_range cause build warning.
Fix it by correct parameter.

Ingo Molnar found that this could also cause a user space crash.

Reported-by: Tetsuo Handa 
Reported-by: Ingo Molnar 
Signed-off-by: Alex Shi 
Link: 
http://lkml.kernel.org/r/1342747103-19765-1-git-send-email-alex@intel.com
Signed-off-by: H. Peter Anvin 
---
 arch/x86/include/asm/tlbflush.h |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index b5a27bd..74a4433 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -105,10 +105,10 @@ static inline void flush_tlb_range(struct vm_area_struct 
*vma,
__flush_tlb();
 }
 
-static inline void flush_tlb_mm_range(struct vm_area_struct *vma,
+static inline void flush_tlb_mm_range(struct mm_struct *mm,
   unsigned long start, unsigned long end, unsigned long vmflag)
 {
-   if (vma->vm_mm == current->active_mm)
+   if (mm == current->active_mm)
__flush_tlb();
 }
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:x86/mm] x86/tlb: Fix build warning and crash when building for !SMP

2012-07-20 Thread tip-bot for Alex Shi
Commit-ID:  7efa1c87963d23cc57ba40c07316d3e28cc75a3a
Gitweb: http://git.kernel.org/tip/7efa1c87963d23cc57ba40c07316d3e28cc75a3a
Author: Alex Shi alex@intel.com
AuthorDate: Fri, 20 Jul 2012 09:18:23 +0800
Committer:  H. Peter Anvin h...@zytor.com
CommitDate: Fri, 20 Jul 2012 15:01:48 -0700

x86/tlb: Fix build warning and crash when building for !SMP

The incompatible parameter of flush_tlb_mm_range cause build warning.
Fix it by correct parameter.

Ingo Molnar found that this could also cause a user space crash.

Reported-by: Tetsuo Handa penguin-ker...@i-love.sakura.ne.jp
Reported-by: Ingo Molnar mi...@kernel.org
Signed-off-by: Alex Shi alex@intel.com
Link: 
http://lkml.kernel.org/r/1342747103-19765-1-git-send-email-alex@intel.com
Signed-off-by: H. Peter Anvin h...@zytor.com
---
 arch/x86/include/asm/tlbflush.h |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index b5a27bd..74a4433 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -105,10 +105,10 @@ static inline void flush_tlb_range(struct vm_area_struct 
*vma,
__flush_tlb();
 }
 
-static inline void flush_tlb_mm_range(struct vm_area_struct *vma,
+static inline void flush_tlb_mm_range(struct mm_struct *mm,
   unsigned long start, unsigned long end, unsigned long vmflag)
 {
-   if (vma-vm_mm == current-active_mm)
+   if (mm == current-active_mm)
__flush_tlb();
 }
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/