Re: broken suspend (sched related) [Was: 2.6.24-rc4-mm1]

2007-12-10 Thread Gautham R Shenoy
On Dec 10, 2007 4:58 PM, Ingo Molnar <[EMAIL PROTECTED]> wrote:
>
> * Gautham R Shenoy <[EMAIL PROTECTED]> wrote:
>
> > > say we've got 100 CPUs, so we've got 100 watchdog tasks running -
> > > one for each CPU. Checking for hung tasks is a global operation not
> > > a per-CPU operation (we iterate over the global tasklist), hence
> > > only one CPU should really be calling this function. That
> > > online-cpus logic achieves this by picking a single CPU. Perhaps it
> > > would be better to keep a hung_task_checker_cpu variable that is
> > > driven from a CPU-hotplug-down notifier? That way if a CPU is
> > > brought down we can update hung_task_checker_cpu to another,
> > > still-online CPU. (this would also be faster, because event-driven)
> >
> > Do you mean something like this?
>
> yeah, thanks - queued it up.

Stupid me! I forgot to remove the local variable check_cpu in static
int watchdog(void * __bind_cpu). Could you please correct it before
applying?

>
> one question:
>
> > +static int check_cpu = -1;
>
> >   case CPU_ONLINE:
> >   case CPU_ONLINE_FROZEN:
> > + check_cpu = any_online_cpu(cpu_online_map);
> >   wake_up_process(per_cpu(watchdog_task, hotcpu));
> >   break;
>
> do we bring the boot CPU online too - i.e. will check_cpu be properly
> initialized on UP too?

Yes, it does.
>
> Ingo
>

Thanks and Regards
gautham.
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: broken suspend (sched related) [Was: 2.6.24-rc4-mm1]

2007-12-10 Thread Ingo Molnar

* Gautham R Shenoy <[EMAIL PROTECTED]> wrote:

> > say we've got 100 CPUs, so we've got 100 watchdog tasks running - 
> > one for each CPU. Checking for hung tasks is a global operation not 
> > a per-CPU operation (we iterate over the global tasklist), hence 
> > only one CPU should really be calling this function. That 
> > online-cpus logic achieves this by picking a single CPU. Perhaps it 
> > would be better to keep a hung_task_checker_cpu variable that is 
> > driven from a CPU-hotplug-down notifier? That way if a CPU is 
> > brought down we can update hung_task_checker_cpu to another, 
> > still-online CPU. (this would also be faster, because event-driven)
> 
> Do you mean something like this?

yeah, thanks - queued it up.

one question:

> +static int check_cpu = -1;

>   case CPU_ONLINE:
>   case CPU_ONLINE_FROZEN:
> + check_cpu = any_online_cpu(cpu_online_map);
>   wake_up_process(per_cpu(watchdog_task, hotcpu));
>   break;

do we bring the boot CPU online too - i.e. will check_cpu be properly 
initialized on UP too?

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: broken suspend (sched related) [Was: 2.6.24-rc4-mm1]

2007-12-10 Thread Gautham R Shenoy
On Mon, Dec 10, 2007 at 11:21:57AM +0100, Ingo Molnar wrote:
> 
> * Gautham R Shenoy <[EMAIL PROTECTED]> wrote:
> 
> > > i'm wondering, what's the proper CPU-hotplug safe sequence here 
> > > then? I'm picking a CPU number from cpu_online_map, and that CPU 
> > > could go away while i'm still using it, right? What's saving us 
> > > here?
> > 
> > In this particular case, we are trying to see if any task on a 
> > particular cpu has not been scheduled for a really long time. If we do 
> > this check on a cpu which has gone offline, then a) If the tasks have 
> > not been migrated on to another cpu yet, we will still perform that 
> > check and yell if something has been holding any task for a 
> > sufficiently long time. b) If the tasks have been migrated off, then 
> > we have nothing to check.
> 
> say we've got 100 CPUs, so we've got 100 watchdog tasks running - one 
> for each CPU. Checking for hung tasks is a global operation not a 
> per-CPU operation (we iterate over the global tasklist), hence only one 
> CPU should really be calling this function. That online-cpus logic 
> achieves this by picking a single CPU. Perhaps it would be better to 
> keep a hung_task_checker_cpu variable that is driven from a 
> CPU-hotplug-down notifier? That way if a CPU is brought down we can 
> update hung_task_checker_cpu to another, still-online CPU. (this would 
> also be faster, because event-driven)

Do you mean something like this?

From: Gautham R Shenoy <[EMAIL PROTECTED]>

softlockup: update check_cpu during cpu-hotplug

Update the check_cpu value during a cpu-hotplug operation
so that we don't check for hung tasks on a cpu which is about 
to go offline.

Signed-off-by: Gautham R Shenoy <[EMAIL PROTECTED]>
Cc: Ingo Molnar <[EMAIL PROTECTED]>
Cc: Thomas Gleixner <[EMAIL PROTECTED]>
Cc: Jiri Slaby <[EMAIL PROTECTED]>
---

 kernel/softlockup.c |   14 --
 1 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/kernel/softlockup.c b/kernel/softlockup.c
index 576eb9c..b1a8c7c 100644
--- a/kernel/softlockup.c
+++ b/kernel/softlockup.c
@@ -194,6 +194,9 @@ static void check_hung_uninterruptible_tasks(int this_cpu)
read_unlock(&tasklist_lock);
 }
 
+
+static int check_cpu = -1;
+
 /*
  * The watchdog thread - runs every second and touches the timestamp.
  */
@@ -219,8 +222,6 @@ static int watchdog(void *__bind_cpu)
/*
 * Only do the hung-tasks check on one CPU:
 */
-   check_cpu = any_online_cpu(cpu_online_map);
-
if (this_cpu != check_cpu)
continue;
 
@@ -255,6 +256,7 @@ cpu_callback(struct notifier_block *nfb, unsigned long 
action, void *hcpu)
break;
case CPU_ONLINE:
case CPU_ONLINE_FROZEN:
+   check_cpu = any_online_cpu(cpu_online_map);
wake_up_process(per_cpu(watchdog_task, hotcpu));
break;
 #ifdef CONFIG_HOTPLUG_CPU
@@ -265,6 +267,14 @@ cpu_callback(struct notifier_block *nfb, unsigned long 
action, void *hcpu)
/* Unbind so it can run.  Fall thru. */
kthread_bind(per_cpu(watchdog_task, hotcpu),
 any_online_cpu(cpu_online_map));
+   case CPU_DOWN_PREPARE:
+   case CPU_DOWN_PREPARE_FROZEN:
+   if (hotcpu == check_cpu) {
+   cpumask_t temp_cpu_online_map = cpu_online_map; 
+   cpu_clear(hotcpu, temp_cpu_online_map);
+   check_cpu = any_online_cpu(temp_cpu_online_map);
+   }
+   break;
case CPU_DEAD:
case CPU_DEAD_FROZEN:
p = per_cpu(watchdog_task, hotcpu);



> 
>   Ingo

Thanks and Regards
gautham.
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: broken suspend (sched related) [Was: 2.6.24-rc4-mm1]

2007-12-10 Thread Ingo Molnar

* Gautham R Shenoy <[EMAIL PROTECTED]> wrote:

> > i'm wondering, what's the proper CPU-hotplug safe sequence here 
> > then? I'm picking a CPU number from cpu_online_map, and that CPU 
> > could go away while i'm still using it, right? What's saving us 
> > here?
> 
> In this particular case, we are trying to see if any task on a 
> particular cpu has not been scheduled for a really long time. If we do 
> this check on a cpu which has gone offline, then a) If the tasks have 
> not been migrated on to another cpu yet, we will still perform that 
> check and yell if something has been holding any task for a 
> sufficiently long time. b) If the tasks have been migrated off, then 
> we have nothing to check.

say we've got 100 CPUs, so we've got 100 watchdog tasks running - one 
for each CPU. Checking for hung tasks is a global operation not a 
per-CPU operation (we iterate over the global tasklist), hence only one 
CPU should really be calling this function. That online-cpus logic 
achieves this by picking a single CPU. Perhaps it would be better to 
keep a hung_task_checker_cpu variable that is driven from a 
CPU-hotplug-down notifier? That way if a CPU is brought down we can 
update hung_task_checker_cpu to another, still-online CPU. (this would 
also be faster, because event-driven)

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: broken suspend (sched related) [Was: 2.6.24-rc4-mm1]

2007-12-10 Thread Gautham R Shenoy
On Mon, Dec 10, 2007 at 10:10:52AM +0100, Ingo Molnar wrote:
> 
> > > softlockup: remove get_online_cpus() which doesn't help here.
> > > 
> > > The get_online_cpus() protection seems to be bogus in 
> > > kernel/softlockup.c as cpu cached in check_cpu can go offline once 
> > > we do a put_online_cpus().
> > > 
> > > This can also cause deadlock during a cpu offline as follows:
> 
> i'm wondering, what's the proper CPU-hotplug safe sequence here then? 
> I'm picking a CPU number from cpu_online_map, and that CPU could go away 
> while i'm still using it, right? What's saving us here?

In this particular case, we are trying to see if any task on a particular
cpu has not been scheduled for a really long time. If we do this check
on a cpu which has gone offline, then
a) If the tasks have not been migrated on to another cpu yet, we will
still perform that check and yell if something has been holding any task
for a sufficiently long time.
b) If the tasks have been migrated off, then we have nothing to check.

However, if we still want that particular cpu to not go offline during
the check, then could we use the following patch

commit a49736844717e00f7a37c96368cea8fec7eb31cf
Author: Gautham R Shenoy <[EMAIL PROTECTED]>
Date:   Mon Dec 10 15:43:32 2007 +0530

CPU-Hotplug: Add try_get_online_cpus()

Add the fastpath code, try_get_online_cpus() which will
return 1 once it has managed to hold the reference to the cpu_online_map
if there are no threads trying to perform a cpu-hotplug.

Use the primitive in the softlockup code.

Signed-off-by: Gautham R Shenoy <[EMAIL PROTECTED]>
Cc: Ingo Molnar <[EMAIL PROTECTED]>
Cc: Thomas Gleixner <[EMAIL PROTECTED]>
Cc: Jiri Slaby <[EMAIL PROTECTED]>

diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index e0132cb..d236e21 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -107,6 +107,7 @@ static inline void cpuhotplug_mutex_unlock(struct mutex 
*cpu_hp_mutex)
 }
 
 extern void get_online_cpus(void);
+extern int  try_get_online_cpus(void);
 extern void put_online_cpus(void);
 #define hotcpu_notifier(fn, pri) { \
static struct notifier_block fn##_nb =  \
@@ -127,6 +128,9 @@ static inline void cpuhotplug_mutex_unlock(struct mutex 
*cpu_hp_mutex)
 
 #define get_online_cpus()  do { } while (0)
 #define put_online_cpus()  do { } while (0)
+static inline int try_get_online_cpus(void) 
+{ return 1;}
+
 #define hotcpu_notifier(fn, pri)   do { (void)(fn); } while (0)
 /* These aren't inline functions due to a GCC bug. */
 #define register_hotcpu_notifier(nb)   ({ (void)(nb); 0; })
diff --git a/kernel/cpu.c b/kernel/cpu.c
index e0d3a4f..38537c9 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -48,11 +48,35 @@ void __init cpu_hotplug_init(void)
 
 #ifdef CONFIG_HOTPLUG_CPU
 
+/*
+ * try_get_online_cpus(): Tries to hold a reference 
+ * to the cpu_online_map if no one is trying to perform 
+ * a cpu-hotplug operation. This is the fastpath code for
+ * get_online_cpus.
+ *
+ * Returns 1 if there is no cpu-hotplug operation
+ * currently in progress.
+ */
+int try_get_online_cpus(void)
+{
+   if(!cpu_hotplug.active_writer) {
+   cpu_hotplug.refcount++;
+   return 1;
+   }
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(try_get_online_cpus);
+
 void get_online_cpus(void)
 {
might_sleep();
if (cpu_hotplug.active_writer == current)
return;
+   if (try_get_online_cpus())
+   return;
+
+   /* The writer exists, hence the slowpath */
mutex_lock(&cpu_hotplug.lock);
cpu_hotplug.refcount++;
mutex_unlock(&cpu_hotplug.lock);
@@ -120,6 +144,11 @@ static void cpu_hotplug_begin(void)
mutex_lock(&cpu_hotplug.lock);
 
cpu_hotplug.active_writer = current;
+   synchronize_sched();
+   /* New users of get_online_cpus() will see a non-NULL value
+* for cpu_hotplug.active_writer here and will take the slowpath
+*/
+
add_wait_queue_exclusive(&cpu_hotplug.writer_queue, &wait);
while (cpu_hotplug.refcount) {
set_current_state(TASK_UNINTERRUPTIBLE);
diff --git a/kernel/softlockup.c b/kernel/softlockup.c
index 576eb9c..cb0616d 100644
--- a/kernel/softlockup.c
+++ b/kernel/softlockup.c
@@ -150,8 +150,8 @@ static void check_hung_task(struct task_struct *t, unsigned 
long now)
sysctl_hung_task_warnings--;
 
/*
-* Ok, the task did not get scheduled for more than 2 minutes,
-* complain:
+* Ok, the task did not get scheduled for more than
+* sysctl_hung_task_timeout_secs, complain:
 */
printk(KERN_ERR "INFO: task %s:%d blocked for more than "
"%ld seconds.\n", t->comm, t->pid,
@@ -216,16 +216,21 @@ static int watchdog(void *__bind_cpu)
touch_softlockup_watchdog();
msleep_interruptible(1);
 
+   /* 
+* If a cpu-hotplug operati

Re: broken suspend (sched related) [Was: 2.6.24-rc4-mm1]

2007-12-10 Thread Gautham R Shenoy
On Sun, Dec 09, 2007 at 08:46:47AM +0100, Ingo Molnar wrote:
> 
> * Jiri Slaby <[EMAIL PROTECTED]> wrote:
> 
> > On 12/08/2007 04:24 PM, Ingo Molnar wrote:
> > > i'm wondering why it had no effect now
> > 
> > diff --git a/kernel/cpu.c b/kernel/cpu.c
> > index e0d3a4f..a46c252 100644
> > --- a/kernel/cpu.c
> > +++ b/kernel/cpu.c
> > @@ -47,15 +47,21 @@ void __init cpu_hotplug_init(void)
> >  }
> > 
> >  #ifdef CONFIG_HOTPLUG_CPU
> > -
> > +#include 
> >  void get_online_cpus(void)
> >  {
> > +   outb(0x20, 0x80);
> > might_sleep();
> > +   outb(0x21, 0x80);
> 
> ah. If you comment out get_online_cpus()/put_online_cpus() from 
> kernel/softlockup.c, does it start working?
> 
> Gautham, any ideas?
Hi Ingo, 

>From the code I fail to see how get_online_cpus() can help us. 

+   /*
+* Only do the hung-tasks check on one CPU:
+*/
+   get_online_cpus();
+   check_cpu = any_online_cpu(cpu_online_map);
+   put_online_cpus();

check_cpu can go offline here, no?

+
+   if (this_cpu != check_cpu)
+   continue;
+
+   if (sysctl_hung_task_timeout_secs)
+   check_hung_uninterruptible_tasks(this_cpu);

Further more this can cause a deadlock since we're calling 
get_online_cpus() from the watchdog thread's context, 
which is going to be kthread_stop'ed from a cpu-hotplug context.
This is what I think was happening in the case reported by Jiri.

Please find the patch below.

Thanks and Regards
gautham.

commit 15bfb662b35c609490185fba2fd4713d230b9374
Author: Gautham R Shenoy <[EMAIL PROTECTED]>
Date:   Mon Dec 10 13:41:45 2007 +0530

softlockup: remove get_online_cpus() which doesn't help here.

The get_online_cpus() protection seems to be bogus
in kernel/softlockup.c as cpu cached in check_cpu can go offline
once we do a put_online_cpus().

This can also cause deadlock during a cpu offline as follows:

WATCHDOG_THREAD:OFFLINE_CPU:
mutex_down(&cpu_hotplug.lock);
/* All subsequent get_online_cpus
 * will be blocked till we're
 * done with this cpu-hotplug
 * operation.
 */

get_online_cpus();
/* watchdog is blocked
   Thus we cannot
   go further until
   the cpu-hotplug
   operation completes
 */
CPU_DEAD:
kthread_stop(watchdog_thread);

/* we're trying to stop a
 * thread which is blocked
 * waiting for us to finish.
 *
 * Since we cannot finish until
 * the thread stops, we deadlock here!
 */

Signed-off-by: Gautham R Shenoy <[EMAIL PROTECTED]>
Cc: Ingo Molnar <[EMAIL PROTECTED]>
Cc: Thomas Gleixner <[EMAIL PROTECTED]>
Cc: Arjan van de Van <[EMAIL PROTECTED]>
Cc: Jiri Slaby <[EMAIL PROTECTED]>

diff --git a/kernel/softlockup.c b/kernel/softlockup.c
index e50b44a..576eb9c 100644
--- a/kernel/softlockup.c
+++ b/kernel/softlockup.c
@@ -219,9 +219,7 @@ static int watchdog(void *__bind_cpu)
/*
 * Only do the hung-tasks check on one CPU:
 */
-   get_online_cpus();
check_cpu = any_online_cpu(cpu_online_map);
-   put_online_cpus();
 
if (this_cpu != check_cpu)
continue;

-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: broken suspend (sched related) [Was: 2.6.24-rc4-mm1]

2007-12-10 Thread Ingo Molnar

* Gautham R Shenoy <[EMAIL PROTECTED]> wrote:

> Further more this can cause a deadlock since we're calling 
> get_online_cpus() from the watchdog thread's context, which is going 
> to be kthread_stop'ed from a cpu-hotplug context. This is what I think 
> was happening in the case reported by Jiri.
> 
> Please find the patch below.

thanks, and i've updated sched-devel.git.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: broken suspend (sched related) [Was: 2.6.24-rc4-mm1]

2007-12-10 Thread Ingo Molnar

> > softlockup: remove get_online_cpus() which doesn't help here.
> > 
> > The get_online_cpus() protection seems to be bogus in 
> > kernel/softlockup.c as cpu cached in check_cpu can go offline once 
> > we do a put_online_cpus().
> > 
> > This can also cause deadlock during a cpu offline as follows:

i'm wondering, what's the proper CPU-hotplug safe sequence here then? 
I'm picking a CPU number from cpu_online_map, and that CPU could go away 
while i'm still using it, right? What's saving us here?

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: broken suspend (sched related) [Was: 2.6.24-rc4-mm1]

2007-12-10 Thread Jiri Slaby
On 12/10/2007 09:19 AM, Gautham R Shenoy wrote:
> commit 15bfb662b35c609490185fba2fd4713d230b9374
> Author: Gautham R Shenoy <[EMAIL PROTECTED]>
> Date:   Mon Dec 10 13:41:45 2007 +0530
> 
> softlockup: remove get_online_cpus() which doesn't help here.
> 
> The get_online_cpus() protection seems to be bogus
> in kernel/softlockup.c as cpu cached in check_cpu can go offline
> once we do a put_online_cpus().
> 
> This can also cause deadlock during a cpu offline as follows:
> 
> WATCHDOG_THREAD:  OFFLINE_CPU:
>   mutex_down(&cpu_hotplug.lock);
>   /* All subsequent get_online_cpus
>* will be blocked till we're
>* done with this cpu-hotplug
>* operation.
>*/
> 
> get_online_cpus();
> /* watchdog is blocked
>Thus we cannot
>go further until
>the cpu-hotplug
>operation completes
>  */
>   CPU_DEAD:
>   kthread_stop(watchdog_thread);
> 
>   /* we're trying to stop a
>* thread which is blocked
>* waiting for us to finish.
>*
>* Since we cannot finish until
>* the thread stops, we deadlock here!
>*/
> 
> Signed-off-by: Gautham R Shenoy <[EMAIL PROTECTED]>
> Cc: Ingo Molnar <[EMAIL PROTECTED]>
> Cc: Thomas Gleixner <[EMAIL PROTECTED]>
> Cc: Arjan van de Van <[EMAIL PROTECTED]>
> Cc: Jiri Slaby <[EMAIL PROTECTED]>

Tested-by: Jiri Slaby <[EMAIL PROTECTED]>

> diff --git a/kernel/softlockup.c b/kernel/softlockup.c
> index e50b44a..576eb9c 100644
> --- a/kernel/softlockup.c
> +++ b/kernel/softlockup.c
> @@ -219,9 +219,7 @@ static int watchdog(void *__bind_cpu)
>   /*
>* Only do the hung-tasks check on one CPU:
>*/
> - get_online_cpus();
>   check_cpu = any_online_cpu(cpu_online_map);
> - put_online_cpus();
>  
>   if (this_cpu != check_cpu)
>   continue;
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: broken suspend (sched related) [Was: 2.6.24-rc4-mm1]

2007-12-09 Thread Jiri Slaby
On 12/09/2007 08:46 AM, Ingo Molnar wrote:
> ah. If you comment out get_online_cpus()/put_online_cpus() from 
> kernel/softlockup.c, does it start working?

Yes indeed. (the process_64 part of the previous patch applied, but I think it
has no effect to this issue?)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: broken suspend (sched related) [Was: 2.6.24-rc4-mm1]

2007-12-09 Thread Ingo Molnar

* Jiri Slaby <[EMAIL PROTECTED]> wrote:

> On 12/08/2007 04:24 PM, Ingo Molnar wrote:
> > i'm wondering why it had no effect now - the new code is in essence a 
> > NOP over what we had.
> 
> Maybe a dumb question. Why those changes in process_32.c in the patch 
> and not in process_64.c?

not a dumb question at all - i forgot about it. Find the updated patch 
below.

( sidenote: this shows the x86 unification concept in action. You
  noticed the missing _64.c probably because you saw a _32.c file 
  modified in the patch and the rule is that if we modify a _32.c file 
  then the matching _64.c file needs to be updated too. If this had been
  an old-style pre-unification arch/i386/kernel/process.c file you'd not
  have been able to tell this from just looking at the patch file - and
  we'd possibly have missed to include a fix on the 64-bit side. With
  the unification of files we are realizing it how many times this
  happened in the past (and went unnoticed). )

Ingo

->
Subject: x86: idle wakeup event in the HLT loop
From: Ingo Molnar <[EMAIL PROTECTED]>

do a proper idle-wakeup event on HLT as well - some CPUs stop the TSC
in HLT too, not just when going through the ACPI methods.

(the ACPI idle code already does this.)

[ update the 64-bit side too, as noticed by Jiri Slaby. ]

Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
---
 arch/x86/kernel/process_32.c |   15 ---
 arch/x86/kernel/process_64.c |   13 ++---
 2 files changed, 22 insertions(+), 6 deletions(-)

Index: linux-x86.q/arch/x86/kernel/process_32.c
===
--- linux-x86.q.orig/arch/x86/kernel/process_32.c
+++ linux-x86.q/arch/x86/kernel/process_32.c
@@ -113,10 +113,19 @@ void default_idle(void)
smp_mb();
 
local_irq_disable();
-   if (!need_resched())
+   if (!need_resched()) {
+   ktime_t t0, t1;
+   u64 t0n, t1n;
+
+   t0 = ktime_get();
+   t0n = ktime_to_ns(t0);
safe_halt();/* enables interrupts racelessly */
-   else
-   local_irq_enable();
+   local_irq_disable();
+   t1 = ktime_get();
+   t1n = ktime_to_ns(t1);
+   sched_clock_idle_wakeup_event(t1n - t0n);
+   }
+   local_irq_enable();
current_thread_info()->status |= TS_POLLING;
} else {
/* loop is done by the caller */
Index: linux-x86.q/arch/x86/kernel/process_64.c
===
--- linux-x86.q.orig/arch/x86/kernel/process_64.c
+++ linux-x86.q/arch/x86/kernel/process_64.c
@@ -116,9 +116,16 @@ static void default_idle(void)
smp_mb();
local_irq_disable();
if (!need_resched()) {
-   /* Enables interrupts one instruction before HLT.
-  x86 special cases this so there is no race. */
-   safe_halt();
+   ktime_t t0, t1;
+   u64 t0n, t1n;
+
+   t0 = ktime_get();
+   t0n = ktime_to_ns(t0);
+   safe_halt();/* enables interrupts racelessly */
+   local_irq_disable();
+   t1 = ktime_get();
+   t1n = ktime_to_ns(t1);
+   sched_clock_idle_wakeup_event(t1n - t0n);
} else
local_irq_enable();
current_thread_info()->status |= TS_POLLING;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: broken suspend (sched related) [Was: 2.6.24-rc4-mm1]

2007-12-08 Thread Ingo Molnar

* Jiri Slaby <[EMAIL PROTECTED]> wrote:

> On 12/08/2007 04:24 PM, Ingo Molnar wrote:
> > i'm wondering why it had no effect now
> 
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index e0d3a4f..a46c252 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -47,15 +47,21 @@ void __init cpu_hotplug_init(void)
>  }
> 
>  #ifdef CONFIG_HOTPLUG_CPU
> -
> +#include 
>  void get_online_cpus(void)
>  {
> +   outb(0x20, 0x80);
> might_sleep();
> +   outb(0x21, 0x80);

ah. If you comment out get_online_cpus()/put_online_cpus() from 
kernel/softlockup.c, does it start working?

Gautham, any ideas?

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: broken suspend (sched related) [Was: 2.6.24-rc4-mm1]

2007-12-08 Thread Jiri Slaby
On 12/08/2007 04:24 PM, Ingo Molnar wrote:
> i'm wondering why it had no effect now

diff --git a/kernel/cpu.c b/kernel/cpu.c
index e0d3a4f..a46c252 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -47,15 +47,21 @@ void __init cpu_hotplug_init(void)
 }

 #ifdef CONFIG_HOTPLUG_CPU
-
+#include 
 void get_online_cpus(void)
 {
+   outb(0x20, 0x80);
might_sleep();
+   outb(0x21, 0x80);
if (cpu_hotplug.active_writer == current)
return;
+   outb(0x22, 0x80);
mutex_lock(&cpu_hotplug.lock);
+   outb(0x23, 0x80);
cpu_hotplug.refcount++;
+   outb(0x24, 0x80);
mutex_unlock(&cpu_hotplug.lock);
+   outb(0x25, 0x80);

 }
 EXPORT_SYMBOL_GPL(get_online_cpus);

produces 0x22 on the LCD (called from watchdog kthread).

Going to catch some sleep,
--js
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: broken suspend (sched related) [Was: 2.6.24-rc4-mm1]

2007-12-08 Thread Jiri Slaby
On 12/08/2007 04:24 PM, Ingo Molnar wrote:
> i'm wondering why it had no effect now - the new code is in essence a 
> NOP over what we had.

Maybe a dumb question. Why those changes in process_32.c in the patch and not in
process_64.c?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: broken suspend (sched related) [Was: 2.6.24-rc4-mm1]

2007-12-08 Thread Jiri Slaby
On 12/08/2007 04:24 PM, Ingo Molnar wrote:
> i'm wondering why it had no effect now - the new code is in essence a 
> NOP over what we had. Could you send me your current (modified) 
> kernel/softlockup.c code?

Only these changes:
diff --git a/kernel/softlockup.c b/kernel/softlockup.c
index e50b44a..7011549 100644
--- a/kernel/softlockup.c
+++ b/kernel/softlockup.c
@@ -25,7 +25,7 @@ static DEFINE_PER_CPU(unsigned long, print_timestamp);
 static DEFINE_PER_CPU(struct task_struct *, watchdog_task);

 static int did_panic;
-int softlockup_thresh = 60;
+int softlockup_thresh = 10;

 static int
 softlock_panic(struct notifier_block *this, unsigned long event, void *ptr)
@@ -101,7 +101,11 @@ void softlockup_tick(void)

now = get_timestamp(this_cpu);

-   /* Warn about unreasonable delays: */
+   /* Wake up the high-prio watchdog task every second: */
+   if (now > (touch_timestamp + 1))
+   wake_up_process(per_cpu(watchdog_task, this_cpu));
+
+   /* Warn about unreasonable 10+ seconds delays: */
if (now <= (touch_timestamp + softlockup_thresh))
return;

@@ -213,8 +217,9 @@ static int watchdog(void *__bind_cpu)
 * debug-printout triggers in softlockup_tick().
 */
while (!kthread_should_stop()) {
+   set_current_state(TASK_INTERRUPTIBLE);
touch_softlockup_watchdog();
-   msleep_interruptible(1);
+   schedule();

/*
 * Only do the hung-tasks check on one CPU:


Whole file:
http://www.fi.muni.cz/~xslaby/sklad/softlockup.c
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: broken suspend (sched related) [Was: 2.6.24-rc4-mm1]

2007-12-08 Thread Ingo Molnar

* Jiri Slaby <[EMAIL PROTECTED]> wrote:

> On 12/08/2007 09:39 AM, Ingo Molnar wrote:
> > * Jiri Slaby <[EMAIL PROTECTED]> wrote:
> > 
> >> Unfortunately no change here.
> > 
> > could you try to revert this change:
> > 
> > -int softlockup_thresh = 10;
> > +int softlockup_thresh = 60;
> > 
> > i.e. change the value of softlockup_thresh back to 10. You should be 
> > able to tweak this runtime as well, without patching the kernel:
> > 
> >   echo 10 > /proc/sys/kernel/softlockup_thresh
> 
> What should have this changed? I can't see any difference.

it changes the wakeup frequency of the softlockup thread.

i'm wondering why it had no effect now - the new code is in essence a 
NOP over what we had. Could you send me your current (modified) 
kernel/softlockup.c code?

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: broken suspend (sched related) [Was: 2.6.24-rc4-mm1]

2007-12-08 Thread Jiri Slaby
On 12/08/2007 09:39 AM, Ingo Molnar wrote:
> * Jiri Slaby <[EMAIL PROTECTED]> wrote:
> 
>> Unfortunately no change here.
> 
> could you try to revert this change:
> 
> -int softlockup_thresh = 10;
> +int softlockup_thresh = 60;
> 
> i.e. change the value of softlockup_thresh back to 10. You should be 
> able to tweak this runtime as well, without patching the kernel:
> 
>   echo 10 > /proc/sys/kernel/softlockup_thresh

What should have this changed? I can't see any difference.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: broken suspend (sched related) [Was: 2.6.24-rc4-mm1]

2007-12-08 Thread Ingo Molnar

* Jiri Slaby <[EMAIL PROTECTED]> wrote:

> Unfortunately no change here.

could you try to revert this change:

-int softlockup_thresh = 10;
+int softlockup_thresh = 60;

i.e. change the value of softlockup_thresh back to 10. You should be 
able to tweak this runtime as well, without patching the kernel:

  echo 10 > /proc/sys/kernel/softlockup_thresh

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: broken suspend (sched related) [Was: 2.6.24-rc4-mm1]

2007-12-08 Thread Jiri Slaby
On 12/07/2007 06:51 PM, Ingo Molnar wrote:
> * Ingo Molnar <[EMAIL PROTECTED]> wrote:
> 
>> thanks for tracking it down. Does the patch below help?
> 
> oops, that should be the patch below. Otherwise the watchdog kernel 
> threads will just loop around.
> 
>   Ingo
> 
> ---
>  kernel/softlockup.c |9 +++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> Index: linux/kernel/softlockup.c
> ===
> --- linux.orig/kernel/softlockup.c
> +++ linux/kernel/softlockup.c
> @@ -101,7 +101,11 @@ void softlockup_tick(void)
>  
>   now = get_timestamp(this_cpu);
>  
> - /* Warn about unreasonable delays: */
> + /* Wake up the high-prio watchdog task every second: */
> + if (now > (touch_timestamp + 1))
> + wake_up_process(per_cpu(watchdog_task, this_cpu));
> +
> + /* Warn about unreasonable 10+ seconds delays: */
>   if (now <= (touch_timestamp + softlockup_thresh))
>   return;
>  
> @@ -213,8 +217,9 @@ static int watchdog(void *__bind_cpu)
>* debug-printout triggers in softlockup_tick().
>*/
>   while (!kthread_should_stop()) {
> + set_current_state(TASK_INTERRUPTIBLE);
>   touch_softlockup_watchdog();
> - msleep_interruptible(1);
> + schedule();
>  
>   /*
>* Only do the hung-tasks check on one CPU:

Unfortunately no change here.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: broken suspend (sched related) [Was: 2.6.24-rc4-mm1]

2007-12-07 Thread Ingo Molnar

* Ingo Molnar <[EMAIL PROTECTED]> wrote:

> thanks for tracking it down. Does the patch below help?

oops, that should be the patch below. Otherwise the watchdog kernel 
threads will just loop around.

Ingo

---
 kernel/softlockup.c |9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

Index: linux/kernel/softlockup.c
===
--- linux.orig/kernel/softlockup.c
+++ linux/kernel/softlockup.c
@@ -101,7 +101,11 @@ void softlockup_tick(void)
 
now = get_timestamp(this_cpu);
 
-   /* Warn about unreasonable delays: */
+   /* Wake up the high-prio watchdog task every second: */
+   if (now > (touch_timestamp + 1))
+   wake_up_process(per_cpu(watchdog_task, this_cpu));
+
+   /* Warn about unreasonable 10+ seconds delays: */
if (now <= (touch_timestamp + softlockup_thresh))
return;
 
@@ -213,8 +217,9 @@ static int watchdog(void *__bind_cpu)
 * debug-printout triggers in softlockup_tick().
 */
while (!kthread_should_stop()) {
+   set_current_state(TASK_INTERRUPTIBLE);
touch_softlockup_watchdog();
-   msleep_interruptible(1);
+   schedule();
 
/*
 * Only do the hung-tasks check on one CPU:
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: broken suspend (sched related) [Was: 2.6.24-rc4-mm1]

2007-12-07 Thread Ingo Molnar

* Jiri Slaby <[EMAIL PROTECTED]> wrote:

> On 12/05/2007 06:17 AM, Andrew Morton wrote:
> >   
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc4/2.6.24-rc4-mm1/
> 
> >  git-sched.patch
> 
> breaks suspend here since -rc3-mm2. More precisely, this one:
> softlockup: automatically detect hung TASK_UNINTERRUPTIBLE tasks
> 
> 2.6.24-rc4-mm1 minus this one works just fine. Otherwise disks stop, graphics
> stops and then it hangs not powering down.
> 
> Core 2 Duo, SMP kernel, voluntary preempt, 250 HZ, SLUB, 64 bit.
> 
> Ideas?

thanks for tracking it down. Does the patch below help?

Ingo

---
 kernel/softlockup.c |8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

Index: linux/kernel/softlockup.c
===
--- linux.orig/kernel/softlockup.c
+++ linux/kernel/softlockup.c
@@ -101,7 +101,11 @@ void softlockup_tick(void)
 
now = get_timestamp(this_cpu);
 
-   /* Warn about unreasonable delays: */
+   /* Wake up the high-prio watchdog task every second: */
+   if (now > (touch_timestamp + 1))
+   wake_up_process(per_cpu(watchdog_task, this_cpu));
+
+   /* Warn about unreasonable 10+ seconds delays: */
if (now <= (touch_timestamp + softlockup_thresh))
return;
 
@@ -214,7 +218,7 @@ static int watchdog(void *__bind_cpu)
 */
while (!kthread_should_stop()) {
touch_softlockup_watchdog();
-   msleep_interruptible(1);
+   schedule();
 
/*
 * Only do the hung-tasks check on one CPU:
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


broken suspend (sched related) [Was: 2.6.24-rc4-mm1]

2007-12-07 Thread Jiri Slaby
On 12/05/2007 06:17 AM, Andrew Morton wrote:
>   
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc4/2.6.24-rc4-mm1/

>  git-sched.patch

breaks suspend here since -rc3-mm2. More precisely, this one:
softlockup: automatically detect hung TASK_UNINTERRUPTIBLE tasks

2.6.24-rc4-mm1 minus this one works just fine. Otherwise disks stop, graphics
stops and then it hangs not powering down.

Core 2 Duo, SMP kernel, voluntary preempt, 250 HZ, SLUB, 64 bit.

Ideas?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/