subject:"CPUFREQ\-CPUHOTPLUG\: Possible circular locking dependency"

RE: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency

2006-12-07 Thread Pallipadi, Venkatesh

 

>-Original Message-
>From: [EMAIL PROTECTED] 
>[mailto:[EMAIL PROTECTED] On Behalf Of 
>Gautham R Shenoy
>Sent: Wednesday, December 06, 2006 11:07 PM
>To: Pallipadi, Venkatesh
>Cc: [EMAIL PROTECTED]; Ingo Molnar; [EMAIL PROTECTED]; 
>linux-kernel@vger.kernel.org; [EMAIL PROTECTED]; 
>[EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]
>Subject: Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency
>
>Hi Venki,
>On Wed, Dec 06, 2006 at 10:27:01AM -0800, Pallipadi, Venkatesh wrote:
>
>> But, if we make cpufreq more affected_cpus aware and have a per_cpu
>> target()
>> call by moving set_cpus_allowed() from driver into cpufreq core and
>> define
>> the target function to be atomic/non-sleeping type, then we 
>really don't
>> need a hotplug lock for the driver any more. Driver can have
>> get_cpu/put_cpu
>> pair to disable preemption and then change the frequency.
>
>Well, we would still need to keep the affected_cpus map in 
>sync with the
>cpu_online map. That would still require hotplug protection, right?

Not really. Cpufreq can look at all the CPUs in affected_cpus and call
new_target() only for CPUs that are online. Or it can call for every CPU
and the actual implementation in driver can do something like

set_cpus_allowed(requested_processor_mask)
If (get_cpu() != requested_cpu) {
/* CPU offline and nothing can be done */
put_cpu();
return 0;
}

This is what I did for new cpufreq interface I added for getavg().
It was easy to ensure the atomic driver call as only one driver is
using it today :-)


>Besides, I would love to see a way of implementing target 
>function to be
>atomic/non-sleeping type. But as of now, the target functions call
>cpufreq_notify_transition which might sleep.
>

That is the reason I don't have a patch for this now.. :-) There are
more
than 10 or so drivers that need to change for new interface. I haven't
checked
whether it is possible to do this without sleeping in all those drivers.


>That's not the last of my worries. The ondemand-workqueue interaction
>in the cpu_hotplug callback path can cause a deadlock if we go for
>per-subsystem hotcpu mutexes. Can you think of a way by which we can 
>avoid destroying the kondemand workqueue from the cpu-hotplug callback
>path ? Please see http://lkml.org/lkml/2006/11/30/9 for the
>culprit-callpath.
>

Yes. I was thinking about it. One possible solution is to move 
create_workqueue()/destroy_workqueue() as in attached patch.

Thanks,
Venki

Not fully tested at the moment.

Remove callbacks using workqueue callbacks in governor start and stop.
And move those call back to module init and exit time.

Signed-off-by: Venkatesh Pallipadi <[EMAIL PROTECTED]>

Index: linux-2.6.19-rc-mm/drivers/cpufreq/cpufreq_ondemand.c
===
--- linux-2.6.19-rc-mm.orig/drivers/cpufreq/cpufreq_ondemand.c
+++ linux-2.6.19-rc-mm/drivers/cpufreq/cpufreq_ondemand.c
@@ -514,7 +514,6 @@ static inline void dbs_timer_exit(struct
 {
dbs_info->enable = 0;
cancel_delayed_work(_info->work);
-   flush_workqueue(kondemand_wq);
 }
 
 static int cpufreq_governor_dbs(struct cpufreq_policy *policy,
@@ -543,16 +542,6 @@ static int cpufreq_governor_dbs(struct c
 
mutex_lock(_mutex);
dbs_enable++;
-   if (dbs_enable == 1) {
-   kondemand_wq = create_workqueue("kondemand");
-   if (!kondemand_wq) {
-   printk(KERN_ERR
-"Creation of kondemand
failed\n");
-   dbs_enable--;
-   mutex_unlock(_mutex);
-   return -ENOSPC;
-   }
-   }
 
rc = sysfs_create_group(>kobj, _attr_group);
if (rc) {
@@ -601,8 +590,6 @@ static int cpufreq_governor_dbs(struct c
dbs_timer_exit(this_dbs_info);
sysfs_remove_group(>kobj, _attr_group);
dbs_enable--;
-   if (dbs_enable == 0)
-   destroy_workqueue(kondemand_wq);
 
mutex_unlock(_mutex);
 
@@ -632,12 +619,18 @@ static struct cpufreq_governor cpufreq_g
 
 static int __init cpufreq_gov_dbs_init(void)
 {
+   kondemand_wq = create_workqueue("kondemand");
+   if (!kondemand_wq) {
+   printk(KERN_ERR "Creation of kondemand failed\n");
+   return -EFAULT;
+   }
return cpufreq_register_governor(_gov_dbs);
 }
 
 static void __exit cpufreq_gov_dbs_exit(void)
 {
cpufreq_unregister_governor(_gov_dbs);
+   destroy_workqueue(kondemand_wq);
 }


ondemand_recursive_locking_fix.patch
Description: ondemand_recursive_locking_fix.patch

RE: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency

2006-12-07 Thread Pallipadi, Venkatesh

-Original Message-
From: [EMAIL PROTECTED] 
[mailto:[EMAIL PROTECTED] On Behalf Of 
Gautham R Shenoy
Sent: Wednesday, December 06, 2006 11:07 PM
To: Pallipadi, Venkatesh
Cc: [EMAIL PROTECTED]; Ingo Molnar; [EMAIL PROTECTED]; 
linux-kernel@vger.kernel.org; [EMAIL PROTECTED]; 
[EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency

Hi Venki,
On Wed, Dec 06, 2006 at 10:27:01AM -0800, Pallipadi, Venkatesh wrote:

 But, if we make cpufreq more affected_cpus aware and have a per_cpu
 target()
 call by moving set_cpus_allowed() from driver into cpufreq core and
 define
 the target function to be atomic/non-sleeping type, then we 
really don't
 need a hotplug lock for the driver any more. Driver can have
 get_cpu/put_cpu
 pair to disable preemption and then change the frequency.

Well, we would still need to keep the affected_cpus map in 
sync with the
cpu_online map. That would still require hotplug protection, right?

Not really. Cpufreq can look at all the CPUs in affected_cpus and call
new_target() only for CPUs that are online. Or it can call for every CPU
and the actual implementation in driver can do something like

set_cpus_allowed(requested_processor_mask)
If (get_cpu() != requested_cpu) {
/* CPU offline and nothing can be done */
put_cpu();
return 0;
}

This is what I did for new cpufreq interface I added for getavg().
It was easy to ensure the atomic driver call as only one driver is
using it today :-)

Besides, I would love to see a way of implementing target 
function to be
atomic/non-sleeping type. But as of now, the target functions call
cpufreq_notify_transition which might sleep.

That is the reason I don't have a patch for this now.. :-) There are
more
than 10 or so drivers that need to change for new interface. I haven't
checked
whether it is possible to do this without sleeping in all those drivers.

That's not the last of my worries. The ondemand-workqueue interaction
in the cpu_hotplug callback path can cause a deadlock if we go for
per-subsystem hotcpu mutexes. Can you think of a way by which we can 
avoid destroying the kondemand workqueue from the cpu-hotplug callback
path ? Please see http://lkml.org/lkml/2006/11/30/9 for the
culprit-callpath.

Yes. I was thinking about it. One possible solution is to move 
create_workqueue()/destroy_workqueue() as in attached patch.

Thanks,
Venki

Not fully tested at the moment.

Remove callbacks using workqueue callbacks in governor start and stop.
And move those call back to module init and exit time.

Signed-off-by: Venkatesh Pallipadi [EMAIL PROTECTED]

Index: linux-2.6.19-rc-mm/drivers/cpufreq/cpufreq_ondemand.c
===
--- linux-2.6.19-rc-mm.orig/drivers/cpufreq/cpufreq_ondemand.c
+++ linux-2.6.19-rc-mm/drivers/cpufreq/cpufreq_ondemand.c
@@ -514,7 +514,6 @@ static inline void dbs_timer_exit(struct
 {
dbs_info-enable = 0;
cancel_delayed_work(dbs_info-work);
-   flush_workqueue(kondemand_wq);
 }

 static int cpufreq_governor_dbs(struct cpufreq_policy *policy,
@@ -543,16 +542,6 @@ static int cpufreq_governor_dbs(struct c

mutex_lock(dbs_mutex);
dbs_enable++;
-   if (dbs_enable == 1) {
-   kondemand_wq = create_workqueue(kondemand);
-   if (!kondemand_wq) {
-   printk(KERN_ERR
-Creation of kondemand
failed\n);
-   dbs_enable--;
-   mutex_unlock(dbs_mutex);
-   return -ENOSPC;
-   }
-   }

rc = sysfs_create_group(policy-kobj, dbs_attr_group);
if (rc) {
@@ -601,8 +590,6 @@ static int cpufreq_governor_dbs(struct c
dbs_timer_exit(this_dbs_info);
sysfs_remove_group(policy-kobj, dbs_attr_group);
dbs_enable--;
-   if (dbs_enable == 0)
-   destroy_workqueue(kondemand_wq);

mutex_unlock(dbs_mutex);

@@ -632,12 +619,18 @@ static struct cpufreq_governor cpufreq_g

 static int __init cpufreq_gov_dbs_init(void)
 {
+   kondemand_wq = create_workqueue(kondemand);
+   if (!kondemand_wq) {
+   printk(KERN_ERR Creation of kondemand failed\n);
+   return -EFAULT;
+   }
return cpufreq_register_governor(cpufreq_gov_dbs);
 }

 static void __exit cpufreq_gov_dbs_exit(void)
 {
cpufreq_unregister_governor(cpufreq_gov_dbs);
+   destroy_workqueue(kondemand_wq);
 }

ondemand_recursive_locking_fix.patch
Description: ondemand_recursive_locking_fix.patch

Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency

2006-12-06 Thread Gautham R Shenoy

Hi Venki,
On Wed, Dec 06, 2006 at 10:27:01AM -0800, Pallipadi, Venkatesh wrote:

> But, if we make cpufreq more affected_cpus aware and have a per_cpu
> target()
> call by moving set_cpus_allowed() from driver into cpufreq core and
> define
> the target function to be atomic/non-sleeping type, then we really don't
> need a hotplug lock for the driver any more. Driver can have
> get_cpu/put_cpu
> pair to disable preemption and then change the frequency.

Well, we would still need to keep the affected_cpus map in sync with the
cpu_online map. That would still require hotplug protection, right?

Besides, I would love to see a way of implementing target function to be
atomic/non-sleeping type. But as of now, the target functions call
cpufreq_notify_transition which might sleep.

That's not the last of my worries. The ondemand-workqueue interaction
in the cpu_hotplug callback path can cause a deadlock if we go for
per-subsystem hotcpu mutexes. Can you think of a way by which we can 
avoid destroying the kondemand workqueue from the cpu-hotplug callback
path ? Please see http://lkml.org/lkml/2006/11/30/9 for the
culprit-callpath.

Thanks and Regards
gautham.
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency

2006-12-06 Thread Pallipadi, Venkatesh

>-Original Message-
>From: [EMAIL PROTECTED] 
>[mailto:[EMAIL PROTECTED] On Behalf Of 
>Gautham R Shenoy
>Sent: Thursday, November 30, 2006 3:44 AM
>To: Ingo Molnar
>Cc: Gautham R Shenoy; [EMAIL PROTECTED]; 
>linux-kernel@vger.kernel.org; [EMAIL PROTECTED]; 
>[EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]
>Subject: Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency
>
>On Thu, Nov 30, 2006 at 12:03:15PM +0100, Ingo Molnar wrote:
>> 
>> * Gautham R Shenoy <[EMAIL PROTECTED]> wrote:
>> 
>> > a) cpufreq maintain's it's own cpumask in the variable
>> > policy->affected_cpus and says : If a frequency change is issued to
>> > any one of the cpu's in the affected_cpus mask, you change 
>frequency
>> > on all cpus in the mask. So this needs to be consistent with
>> > cpu_online map and hence cpu hotplug aware. Furthermore, 
>we don't want
>> > cpus in this mask to go down when we are trying to change 
>frequencies
>> > on them. The function which drives the frequency change in
>> > cpufreq-core is cpufreq_driver_target and it needs cpu-hotplug
>> > protection.
>> 
>> couldnt this complexity be radically simplified by having new kernel
>> infrastructure that does something like:
>> 
>>   " 'gather' all CPUs mentioned in  via scheduling a separate
>> helper-kthread on every CPU that  specifies, disable all
>>interrupts, and execute function  once all CPUs have been
>>'gathered' - and release all CPUs once  has executed 
>on each of
>>them."
>> 
>> ?
>
>This is what is currently being done by cpufreq:
>
>a) get_some_cpu_hotplug_protection() [use either some global mechanism 
>   or a persubsystem mutex]
>
>b) actual_freq_change_driver_function(mask) 
>/* You can check out cpufreq_p4_target() in
> * arch/i386/kernel/cpu/cpufreq/p4-clockmod.c
> */
>  
>   {
>   for_each_cpu_mask(i, mask) {
>   cpumask_t this_cpu = cpumask_of_cpu(i);
>   set_cpus_allowed(current, this_cpu);
>   function_to_change_frequency();
>   
>   }
>  }
>

As there are many options being discussed here, let me propose one 
more option that can eliminate the need for hotplug lock in 
cpufreq_driver_target() path.

As Gautham clearly explained before, today we have cpufreq calling
cpufreq_driver_target() in each driver to change the CPU frequency
And the driver internally uses set_cpus_allowed to reschedule onto
different affected_cpus and change the frequency. That is the main
reason why we need to disable hotplug in this path.

But, if we make cpufreq more affected_cpus aware and have a per_cpu
target()
call by moving set_cpus_allowed() from driver into cpufreq core and
define
the target function to be atomic/non-sleeping type, then we really don't
need a hotplug lock for the driver any more. Driver can have
get_cpu/put_cpu
pair to disable preemption and then change the frequency.

This means a lot of changes as we need new interface changes to cpufreq
and
rewrite of bunch of drivers. But, this looks to me as the least
complicated solution.

Thanks,
Venki 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency

2006-12-06 Thread Pallipadi, Venkatesh

-Original Message-
From: [EMAIL PROTECTED] 
[mailto:[EMAIL PROTECTED] On Behalf Of 
Gautham R Shenoy
Sent: Thursday, November 30, 2006 3:44 AM
To: Ingo Molnar
Cc: Gautham R Shenoy; [EMAIL PROTECTED]; 
linux-kernel@vger.kernel.org; [EMAIL PROTECTED]; 
[EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency

On Thu, Nov 30, 2006 at 12:03:15PM +0100, Ingo Molnar wrote:

 * Gautham R Shenoy [EMAIL PROTECTED] wrote:

  a) cpufreq maintain's it's own cpumask in the variable
  policy-affected_cpus and says : If a frequency change is issued to
  any one of the cpu's in the affected_cpus mask, you change 
frequency
  on all cpus in the mask. So this needs to be consistent with
  cpu_online map and hence cpu hotplug aware. Furthermore, 
we don't want
  cpus in this mask to go down when we are trying to change 
frequencies
  on them. The function which drives the frequency change in
  cpufreq-core is cpufreq_driver_target and it needs cpu-hotplug
  protection.

 couldnt this complexity be radically simplified by having new kernel
 infrastructure that does something like:

'gather' all CPUs mentioned in mask via scheduling a separate
 helper-kthread on every CPU that mask specifies, disable all
interrupts, and execute function fn once all CPUs have been
'gathered' - and release all CPUs once fn has executed 
on each of
them.

 ?

This is what is currently being done by cpufreq:

a) get_some_cpu_hotplug_protection() [use either some global mechanism 
   or a persubsystem mutex]

b) actual_freq_change_driver_function(mask) 
/* You can check out cpufreq_p4_target() in
 * arch/i386/kernel/cpu/cpufreq/p4-clockmod.c
 */

   {
   for_each_cpu_mask(i, mask) {
   cpumask_t this_cpu = cpumask_of_cpu(i);
   set_cpus_allowed(current, this_cpu);
   function_to_change_frequency();

   }
  }

As there are many options being discussed here, let me propose one 
more option that can eliminate the need for hotplug lock in 
cpufreq_driver_target() path.

As Gautham clearly explained before, today we have cpufreq calling
cpufreq_driver_target() in each driver to change the CPU frequency
And the driver internally uses set_cpus_allowed to reschedule onto
different affected_cpus and change the frequency. That is the main
reason why we need to disable hotplug in this path.

But, if we make cpufreq more affected_cpus aware and have a per_cpu
target()
call by moving set_cpus_allowed() from driver into cpufreq core and
define
the target function to be atomic/non-sleeping type, then we really don't
need a hotplug lock for the driver any more. Driver can have
get_cpu/put_cpu
pair to disable preemption and then change the frequency.

This means a lot of changes as we need new interface changes to cpufreq
and
rewrite of bunch of drivers. But, this looks to me as the least
complicated solution.

Thanks,
Venki 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency

2006-12-06 Thread Gautham R Shenoy

Hi Venki,
On Wed, Dec 06, 2006 at 10:27:01AM -0800, Pallipadi, Venkatesh wrote:

 But, if we make cpufreq more affected_cpus aware and have a per_cpu
 target()
 call by moving set_cpus_allowed() from driver into cpufreq core and
 define
 the target function to be atomic/non-sleeping type, then we really don't
 need a hotplug lock for the driver any more. Driver can have
 get_cpu/put_cpu
 pair to disable preemption and then change the frequency.

Well, we would still need to keep the affected_cpus map in sync with the
cpu_online map. That would still require hotplug protection, right?

Besides, I would love to see a way of implementing target function to be
atomic/non-sleeping type. But as of now, the target functions call
cpufreq_notify_transition which might sleep.

That's not the last of my worries. The ondemand-workqueue interaction
in the cpu_hotplug callback path can cause a deadlock if we go for
per-subsystem hotcpu mutexes. Can you think of a way by which we can 
avoid destroying the kondemand workqueue from the cpu-hotplug callback
path ? Please see http://lkml.org/lkml/2006/11/30/9 for the
culprit-callpath.

Thanks and Regards
gautham.
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency

2006-12-01 Thread Ingo Molnar

* Gautham R Shenoy <[EMAIL PROTECTED]> wrote:

> Consider a case where we have three locks A, B and C.
> We have very clear locking rule inside the kernel that lock A *should*
> be acquired before acquiring either lock B or lock C.
> 
> At runtime lockdep detects the two dependency chains,
> A --> B --> C
> 
> and
> 
> A --> C --> B.
> 
> Does lockdep issue a circular dependency warning for this ? 
> It's quite clear from the locking rule that we cannot have a
> circular deadlock, since A acts as a mutex for B->C / C->B callpath.
> 
> Just curious :-) [ Well, I might encounter such a scenario in an attempt
>  to make cpufreq cpu-hotplug safe! ]

yes, lockdep will warn about this. Will you /ever/ have a B->C or C->B 
dependency without A being taken first?

if not then my suggestion would be to just lump 'B' and 'C' into a 
single lock - or to get rid of them altogether. There's little reason to 
keep them separate. /If/ you want to keep them separate (because they 
protect different data structures) then it's the cleanest design to have 
an ordering between them. The taking of 'A' might go away anytime - such 
a construct is really, really fragile and is only asking for trouble.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency

2006-12-01 Thread Ingo Molnar


* Gautham R Shenoy [EMAIL PROTECTED] wrote:

 Consider a case where we have three locks A, B and C.
 We have very clear locking rule inside the kernel that lock A *should*
 be acquired before acquiring either lock B or lock C.
 
 At runtime lockdep detects the two dependency chains,
 A -- B -- C
 
 and
 
 A -- C -- B.
 
 Does lockdep issue a circular dependency warning for this ? 
 It's quite clear from the locking rule that we cannot have a
 circular deadlock, since A acts as a mutex for B-C / C-B callpath.
 
 Just curious :-) [ Well, I might encounter such a scenario in an attempt
  to make cpufreq cpu-hotplug safe! ]

yes, lockdep will warn about this. Will you /ever/ have a B-C or C-B 
dependency without A being taken first?

if not then my suggestion would be to just lump 'B' and 'C' into a 
single lock - or to get rid of them altogether. There's little reason to 
keep them separate. /If/ you want to keep them separate (because they 
protect different data structures) then it's the cleanest design to have 
an ordering between them. The taking of 'A' might go away anytime - such 
a construct is really, really fragile and is only asking for trouble.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency

2006-11-30 Thread Gautham R Shenoy

On Thu, Nov 30, 2006 at 09:29:34AM +0100, Ingo Molnar wrote:
> what lockdep does is it observes actual locking dependencies as they
> happen individually in various contexts, and then 'completes' the
> dependency graph by combining all the possible scenarios how contexts
> might preempt each other. So if lockdep sees independent dependencies
> and concludes that they are circular, there's nothing that saves us from
> the deadlock.

Ingo,

Consider a case where we have three locks A, B and C.
We have very clear locking rule inside the kernel that lock A *should*
be acquired before acquiring either lock B or lock C.

At runtime lockdep detects the two dependency chains,
A --> B --> C

and

A --> C --> B.

Does lockdep issue a circular dependency warning for this ? 
It's quite clear from the locking rule that we cannot have a
circular deadlock, since A acts as a mutex for B->C / C->B callpath.

Just curious :-) [ Well, I might encounter such a scenario in an attempt
   to make cpufreq cpu-hotplug safe! ]

>   Ingo

Thanks and Regards
gautham.
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency

2006-11-30 Thread Ingo Molnar

* Andrew Morton <[EMAIL PROTECTED]> wrote:

> > Even with complex inter-subsystem interactions, hotplugging could be 
> > effectively and scalably controlled via a self-recursive per-CPU 
> > mutex, and a pointer to it embedded in task_struct:

> So what I would propose is that rather than going ahead and piling 
> more complexity on top of the existing poo-pile in an attempt to make 
> it seem to work, we should simply rip all the cpu-hotplug locking out 
> of cpufreq (there's a davej patch for that in -mm) and then just redo 
> it all in an organised fashion.

actually, that's precisely what i'm suggesting too, i wrote it to 
Gautham in one of the previous mails:

|| that would flatten the whole locking. Only one kind of lock taken, 
|| recursive and scalable.
||
|| Then the mechanism that changes CPU frequency should take all these 
|| hotplug locks on all (online) CPUs, and then first stop all 
|| processing on all CPUs, and then do the frequency change, atomically. 
|| This is with interrupts disabled everywhere /first/, and /without any 
|| additional locking/. That would prevent any sort of interaction from 
|| other CPUs - they'd all be sitting still with interrupts disabled.

no other locking, only the CPU hotplug lock and the (existing) ability 
to 'do stuff' with nothing else running on any other CPU.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency

2006-11-30 Thread Andrew Morton

On Thu, 30 Nov 2006 12:46:17 +0100
Ingo Molnar <[EMAIL PROTECTED]> wrote:

> * Andrew Morton <[EMAIL PROTECTED]> wrote:
> 
> > > This would be done totally serialized and while holding the hotplug 
> > > lock, so no CPU could go away or arrive while this operation is 
> > > going on.
> > 
> > You said "the hotplug lock".  That is the problem.
> 
> maybe i'm too dense today but i still dont see the fundamental problem. 
> 
> Even with complex inter-subsystem interactions, hotplugging could be 
> effectively and scalably controlled via a self-recursive per-CPU mutex, 
> and a pointer to it embedded in task_struct:

Yes, I suppose we could come up with now new lock type which fixes the
problem.  But first, let us review the problem.

Someone went through cpufreq sprinkling lock_cpu_hotplug() everywhere as if
it had magical properties.  For the past year or more, others have been
picking through the resulting bugs, trying to make things better and
treating that initial magic-pixie-dust as if it were inviolate and nobody
had a chance of understanding it.

IOW, cpufreq is screwed up, and we keep on trying to come up with more
complexity to unscrew it.

So what I would propose is that rather than going ahead and piling more
complexity on top of the existing poo-pile in an attempt to make it seem to
work, we should simply rip all the cpu-hotplug locking out of cpufreq
(there's a davej patch for that in -mm) and then just redo it all in an
organised fashion.

If, as a result of that exercise, we decide that the existing cpu-hotplug
serialisation mechanisms (ie: per-subsystem notification callbacks and
preempt_disable()) are insufficient then sure, that is the time to think
about more sophisticated locking primitives.

But please, let's stop asking "how do we make cpufreq's hotplug locking
work?".  Let's instead ask "how do we make cpufreq safe wrt hotplug?"

To do the latter is a matter of

- identify the per-cpu resources which need locking

- decide what mechanism is to be used to protect each such resource

- design the locking and its hierarchy

- implement, test.

In all this time, Gautham's emails from yesterday were the first occasion
on which anybody has taken the time to sit down and get us started on this
quite ordinary design & devel process.

Let me re-re-re-re-reiterate: this is a cpufreq problem.  It has not yet
been demonstrated (AFAICS) that it is a general problem.  I am unaware of
any other subsystems having got themselves into such a mess over hotplug
locking.  Now, maybe that's because cpufreq really is especially hard.  Or
maybe it's because it's just messed up.  I don't believe we know which of
those is true yet.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency

2006-11-30 Thread Ingo Molnar


* Gautham R Shenoy <[EMAIL PROTECTED]> wrote:

> In process context preemptible code, 
> Lets say we are currently running on processor i.
> 
> cpu_hotplug_lock() ; /* does mutex_lock((hotplug_lock, i)) */
> 
> /* do some operation, which might sleep */
> /* migrates to cpu j */
> 
> cpu_hotplug_unlock(); /* does mutex_unlock((hotplug_lock, i)
>while running on cpu j */
> 
> This would cause cacheline ping pong, no?

that would be attached to a very cache-inefficient thing: migrating a 
task from one CPU to another. This is not the kind of ping-pong we are 
normally worried about. (nor does it happen all that often)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency

2006-11-30 Thread Gautham R Shenoy

On Thu, Nov 30, 2006 at 12:46:17PM +0100, Ingo Molnar wrote:
> 
>   struct task_struct {
>   ...
>   int hotplug_depth;
>   struct mutex *hotplug_lock;
>   }
>   ...
> 
>   DEFINE_PER_CPU(struct mutex, hotplug_lock);
> 
>   void cpu_hotplug_lock(void)
>   {
>   int cpu = raw_smp_processor_id();
>   /*
>* Interrupts/softirqs are hotplug-safe:
>*/
>   if (in_interrupt())
>   return;
>   if (current->hotplug_depth++)
>   return;
>   current->hotplug_lock = _cpu(hotplug_lock, cpu);
>   mutex_lock(current->hotplug_lock);
>   }
> 
>   void cpu_hotplug_unlock(void)
>   {
>   int cpu;
> 
>   if (in_interrupt())
>   return;
>   if (DEBUG_LOCKS_WARN_ON(!current->hotplug_depth))
>   return;
>   if (--current->hotplug_depth)
>   return;
> 
>   mutex_unlock(current->hotplug_lock);
>   current->hotplug_lock = NULL;
>   }
> 

In process context preemptible code, 
Lets say we are currently running on processor i.

cpu_hotplug_lock() ; /* does mutex_lock((hotplug_lock, i)) */

/* do some operation, which might sleep */
/* migrates to cpu j */

cpu_hotplug_unlock(); /* does mutex_unlock((hotplug_lock, i)
 while running on cpu j */

This would cause cacheline ping pong, no?

> 
>   Ingo

regards 
gautham.
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency

2006-11-30 Thread Gautham R Shenoy

On Thu, Nov 30, 2006 at 12:53:27PM +0100, Ingo Molnar wrote:
> 
> * Gautham R Shenoy <[EMAIL PROTECTED]> wrote:
> 
> > This is what is currently being done by cpufreq:
> 
> ok!
> 
> > a) get_some_cpu_hotplug_protection() [use either some global mechanism
> > or a persubsystem mutex]
> 
> this bit is wrong i think. Any reason why it's not a per-CPU (but
> otherwise global) array of mutexes that controls CPU hotplug - as per my
> previous mail?
> 
> that would flatten the whole locking. Only one kind of lock taken,
> recursive and scalable.

I had posted one such recursive scalable version which can be found here
http://lkml.org/lkml/2006/10/26/73

I remember cc'ing you.

Yeah, it looks complicated and big, but then I did not want to add
another field to the task struct as one such attempt had already been
frowned upon ( I think long back Ashok posted it)

So I ended up writing the whole read/write lock/unlock code myself.

It's a RCU based lock, extremely light on the read side, but costly for the
writers since it does a synchronize_sched.

And yeah, it's partial towards the readers but with an additional field
in the task struct we can have a fair implementation.

Besides, an unfair cpu_hotplug_lock won't work since a process doing a
sched_getaffinity in a forever_while loop can prevent any hotplug from
happening.

> 
> Then the mechanism that changes CPU frequency should take all these
> hotplug locks on all (online) CPUs, and then first stop all processing
> on all CPUs, and then do the frequency change, atomically. This is with
> interrupts disabled everywhere /first/, and /without any additional
> locking/. That would prevent any sort of interaction from other CPUs -
> they'd all be sitting still with interrupts disabled.
> 

Yup.

>   Ingo

regards
gautham.
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency

2006-11-30 Thread Ingo Molnar

* Gautham R Shenoy <[EMAIL PROTECTED]> wrote:

> This is what is currently being done by cpufreq:

ok!

> a) get_some_cpu_hotplug_protection() [use either some global mechanism 
>   or a persubsystem mutex]

this bit is wrong i think. Any reason why it's not a per-CPU (but 
otherwise global) array of mutexes that controls CPU hotplug - as per my 
previous mail?

that would flatten the whole locking. Only one kind of lock taken, 
recursive and scalable.

Then the mechanism that changes CPU frequency should take all these 
hotplug locks on all (online) CPUs, and then first stop all processing 
on all CPUs, and then do the frequency change, atomically. This is with 
interrupts disabled everywhere /first/, and /without any additional 
locking/. That would prevent any sort of interaction from other CPUs - 
they'd all be sitting still with interrupts disabled.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency

2006-11-30 Thread Ingo Molnar


* Andrew Morton <[EMAIL PROTECTED]> wrote:

> > This would be done totally serialized and while holding the hotplug 
> > lock, so no CPU could go away or arrive while this operation is 
> > going on.
> 
> You said "the hotplug lock".  That is the problem.

maybe i'm too dense today but i still dont see the fundamental problem. 

Even with complex inter-subsystem interactions, hotplugging could be 
effectively and scalably controlled via a self-recursive per-CPU mutex, 
and a pointer to it embedded in task_struct:

struct task_struct {
...
int hotplug_depth;
struct mutex *hotplug_lock;
}
...

DEFINE_PER_CPU(struct mutex, hotplug_lock);

void cpu_hotplug_lock(void)
{
int cpu = raw_smp_processor_id();
/*
 * Interrupts/softirqs are hotplug-safe:
 */
if (in_interrupt())
return;
if (current->hotplug_depth++)
return;
current->hotplug_lock = _cpu(hotplug_lock, cpu);
mutex_lock(current->hotplug_lock);
}

void cpu_hotplug_unlock(void)
{
int cpu;

if (in_interrupt())
return;
if (DEBUG_LOCKS_WARN_ON(!current->hotplug_depth))
return;
if (--current->hotplug_depth)
return;

mutex_unlock(current->hotplug_lock);
current->hotplug_lock = NULL;
}

...

void do_exit(void)
{
...
DEBUG_LOCKS_WARN_ON(current->hotplug_depth);
...
}
...
copy_process(void)
{
...
p->hotplug_depth = 0;
p->hotplug_lock = NULL;
...
}

50 lines of code at most. The only rule is to not use cpu_hotplug_lock() 
in process-context non-preemptible code [interrupt contexts are 
automatically ignored]. What am i missing?

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency

2006-11-30 Thread Gautham R Shenoy

On Thu, Nov 30, 2006 at 12:03:15PM +0100, Ingo Molnar wrote:
> 
> * Gautham R Shenoy <[EMAIL PROTECTED]> wrote:
> 
> > a) cpufreq maintain's it's own cpumask in the variable
> > policy->affected_cpus and says : If a frequency change is issued to
> > any one of the cpu's in the affected_cpus mask, you change frequency
> > on all cpus in the mask. So this needs to be consistent with
> > cpu_online map and hence cpu hotplug aware. Furthermore, we don't want
> > cpus in this mask to go down when we are trying to change frequencies
> > on them. The function which drives the frequency change in
> > cpufreq-core is cpufreq_driver_target and it needs cpu-hotplug
> > protection.
> 
> couldnt this complexity be radically simplified by having new kernel
> infrastructure that does something like:
> 
>   " 'gather' all CPUs mentioned in  via scheduling a separate
> helper-kthread on every CPU that  specifies, disable all
>interrupts, and execute function  once all CPUs have been
>'gathered' - and release all CPUs once  has executed on each of
>them."
> 
> ?

This is what is currently being done by cpufreq:

a) get_some_cpu_hotplug_protection() [use either some global mechanism 
or a persubsystem mutex]

b) actual_freq_change_driver_function(mask) 
/* You can check out cpufreq_p4_target() in
 * arch/i386/kernel/cpu/cpufreq/p4-clockmod.c
 */
  
   {
for_each_cpu_mask(i, mask) {
cpumask_t this_cpu = cpumask_of_cpu(i);
set_cpus_allowed(current, this_cpu);
function_to_change_frequency();

}
  }

c) release_whatever_cpu_hotplug_protection()

> 
> This would be done totally serialized and while holding the hotplug
> lock, so no CPU could go away or arrive while this operation is going
> on.
> 

Isn't the above same as what you are suggesting? Or have I missed out
anything? 

>   Ingo

regards
gautham.
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency

2006-11-30 Thread Andrew Morton

On Thu, 30 Nov 2006 12:03:15 +0100
Ingo Molnar <[EMAIL PROTECTED]> wrote:

> 
> * Gautham R Shenoy <[EMAIL PROTECTED]> wrote:
> 
> > a) cpufreq maintain's it's own cpumask in the variable 
> > policy->affected_cpus and says : If a frequency change is issued to 
> > any one of the cpu's in the affected_cpus mask, you change frequency 
> > on all cpus in the mask. So this needs to be consistent with 
> > cpu_online map and hence cpu hotplug aware. Furthermore, we don't want 
> > cpus in this mask to go down when we are trying to change frequencies 
> > on them. The function which drives the frequency change in 
> > cpufreq-core is cpufreq_driver_target and it needs cpu-hotplug 
> > protection.
> 
> couldnt this complexity be radically simplified by having new kernel 
> infrastructure that does something like:
> 
>   " 'gather' all CPUs mentioned in  via scheduling a separate 
> helper-kthread on every CPU that  specifies, disable all
>interrupts, and execute function  once all CPUs have been 
>'gathered' - and release all CPUs once  has executed on each of
>them."
> 
> ?

How does this differ from stop_machine_run(), which hot-unplug presently uses?

> This would be done totally serialized and while holding the hotplug 
> lock, so no CPU could go away or arrive while this operation is going 
> on.

You said "the hotplug lock".  That is the problem.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency

2006-11-30 Thread Ingo Molnar


* Gautham R Shenoy <[EMAIL PROTECTED]> wrote:

> a) cpufreq maintain's it's own cpumask in the variable 
> policy->affected_cpus and says : If a frequency change is issued to 
> any one of the cpu's in the affected_cpus mask, you change frequency 
> on all cpus in the mask. So this needs to be consistent with 
> cpu_online map and hence cpu hotplug aware. Furthermore, we don't want 
> cpus in this mask to go down when we are trying to change frequencies 
> on them. The function which drives the frequency change in 
> cpufreq-core is cpufreq_driver_target and it needs cpu-hotplug 
> protection.

couldnt this complexity be radically simplified by having new kernel 
infrastructure that does something like:

  " 'gather' all CPUs mentioned in  via scheduling a separate 
helper-kthread on every CPU that  specifies, disable all
   interrupts, and execute function  once all CPUs have been 
   'gathered' - and release all CPUs once  has executed on each of
   them."

?

This would be done totally serialized and while holding the hotplug 
lock, so no CPU could go away or arrive while this operation is going 
on.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency

2006-11-30 Thread Gautham R Shenoy

On Thu, Nov 30, 2006 at 09:31:44AM +0100, Ingo Molnar wrote:
> 
> * Gautham R Shenoy <[EMAIL PROTECTED]> wrote:
> 
> > So do we
> > - Rethink the strategy of per-subsystem hotcpu-locks ?
> >
> >   OR
> >
> > - Think of a way to straighten out the super-convoluted cpufreq code ?
> 
> i'm still wondering what the conceptual source of this fundamental
> locking complexity in cpufreq (and hotplug) is - it is not intuitive to
> me at all. Could you try to explain that?

I can try :-)

IIRC, cpufreq was written before cpu-hotplug and thus, was never 
hotplug aware when it was first written.

Let me try and simplify this as far as I can.

a) cpufreq maintain's it's own cpumask in the variable 
policy->affected_cpus and says : If a frequency change is issued to
any one of the cpu's in the affected_cpus mask, you change frequency on 
all cpus in the mask.
So this needs to be consistent with cpu_online map and hence
cpu hotplug aware. Furthermore, we don't want cpus in this mask to go
down when we are trying to change frequencies on them. The function
which drives the frequency change in cpufreq-core is
cpufreq_driver_target and it needs cpu-hotplug protection.

b) This mask is changed when the cpu_hotplug core sends a
CPU_DEAD/CPU_ONLINE notification through the function
cpufreq_cpu_callback.

So it's a Read-Write situation, where cpufreq_driver_target()
is the read side and cpufreq_cpu_callback() is
the write side. 
So we would want a) and b) to be mutually exclusive.

c) Problem is when a just before a cpu is offlined, 
we want to put it at the lowest possible frequency.
So we send out a CPU_DOWN_PREPARE event handled by cpufreq_cpu_callback
which calls cpufreq_driver_target.

Classic case of calling down_read from a down_write context!

To solve this so many changes have taken place since 2.6.18-rcsomething.

i) Linus split cpu_control mutex in cpu.c to cpu_add_remove_lock and
cpu_bitmask_lock. While the former serializes cpu_hotplug attempts, the
latter actually provides cpu-hotplug protection.

ii) Arjan cleaned up cpufreq to eliminate all intra-cpufreq recursive
lock_cpu_hotplug calls.

iii) Andrew replaced lock_cpu_hotplug in workqueue code with
workqueue_mutex that solved the recursive hotplug problem between
ondemand governor and workqueue.

This is how it exists in 2.6.19.
Is it safe? Not really. 
The split locking still can hit a BUG_ON as described in the post
http://lkml.org/lkml/2006/8/24/89 because the split locking opens up a
window which allows cpufreq to try changing frequency on stale cpus.

Besides having so many subsystems depend on a global lock may not be 
such a good idea from the scalability perspective.

I did post a rcu-based scalable version of lock_cpu_hotplug but didn't
get much response.

Which is why we tried having per-subsystem hotplug locks where
each subsystem can define it's own mutex.

The readside will be something like
mutex_lock(_hotcpu_lock);

/* No hotplug can happen here */

mutex_unlock(_hotcpu_lock);

Define a callback function my_cpu_callback and handle events
CPU_LOCK_ACQUIRE and CPU_LOCK_RELEASE which would be sent out before
beginning a cpu hotplug and after the completion of a cpu hotplug
respectively.

my_cpu_callback( int event)
{
switch(event) {
case CPU_LOCK_ACQUIRE:
mutex_lock(_hotcpu_mutex);
break;

/*
 *  Cpu hotplug happens here.
 *  Handle any other pre/post hotplug events 
 */

case CPU_LOCK_RELEASE:
mutex_unlock(_hotcpu_mutex);
}   break;
}

This would work fine, if the interactions between the cpu-hotplug aware
code across subsystem is well ordered.

But the ondemand governor-workqueue interactions are causing the
circular dependency problem and in another case a recursive locking
problem. Sigh!

Hope this explaination helped :-)

>   Ingo

regards
gautham.
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency

2006-11-30 Thread Gautham R Shenoy

On Thu, Nov 30, 2006 at 09:29:34AM +0100, Ingo Molnar wrote:
> 
> * Gautham R Shenoy <[EMAIL PROTECTED]> wrote:
> 
> > Ok, I see that we are already doing it :(. So we can end up in a
> > deadlock.
> >
> > Here's the culprit callpath:
> 
> in general lockdep is 100% correct when it comes to "individual locks".
> The overwhelming majority of lockdep false-positives is not due to
> lockdep not getting the dependencies right, but due to the "lock class"
> not being correctly identified. That's not an issue here i think.

You're right. That's not the issue.

> 
> what lockdep does is it observes actual locking dependencies as they
> happen individually in various contexts, and then 'completes' the
> dependency graph by combining all the possible scenarios how contexts
> might preempt each other. So if lockdep sees independent dependencies
> and concludes that they are circular, there's nothing that saves us from
> the deadlock.
> 

Ah! I get it now. I had taken neither preemption nor the SMP scenario
into account before concluding that the warning might be a false
positive.

All I need to do is to run my test cases on a preemptible kernel 
or in parallel on a smp box. It'll definitely deadlock there!

> The only way for those dependencies to /never/ trigger simultaneously on
> different CPUs would be via the use of a further 'outer' exclusion
> mechanism (i.e. a lock) - but all explicit kernel-API exclusion
> mechanisms are tracked by lockdep => Q.E.D. (Open-coded exclusion might
> escape the attention of lockdep, but those are extremely rare and are
> also easily found.)

Thanks for making it clear :-)

> 
>   Ingo

regards
gautham.
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency

2006-11-30 Thread Ingo Molnar


* Gautham R Shenoy <[EMAIL PROTECTED]> wrote:

> So do we
> - Rethink the strategy of per-subsystem hotcpu-locks ?
> 
>   OR
>   
> - Think of a way to straighten out the super-convoluted cpufreq code ?

i'm still wondering what the conceptual source of this fundamental 
locking complexity in cpufreq (and hotplug) is - it is not intuitive to 
me at all. Could you try to explain that?

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency

2006-11-30 Thread Ingo Molnar

* Gautham R Shenoy <[EMAIL PROTECTED]> wrote:

> Ok, I see that we are already doing it :(. So we can end up in a 
> deadlock.
> 
> Here's the culprit callpath:

in general lockdep is 100% correct when it comes to "individual locks". 
The overwhelming majority of lockdep false-positives is not due to 
lockdep not getting the dependencies right, but due to the "lock class" 
not being correctly identified. That's not an issue here i think.

what lockdep does is it observes actual locking dependencies as they 
happen individually in various contexts, and then 'completes' the 
dependency graph by combining all the possible scenarios how contexts 
might preempt each other. So if lockdep sees independent dependencies 
and concludes that they are circular, there's nothing that saves us from 
the deadlock.

The only way for those dependencies to /never/ trigger simultaneously on 
different CPUs would be via the use of a further 'outer' exclusion 
mechanism (i.e. a lock) - but all explicit kernel-API exclusion 
mechanisms are tracked by lockdep => Q.E.D. (Open-coded exclusion might 
escape the attention of lockdep, but those are extremely rare and are 
also easily found.)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency

2006-11-30 Thread Ingo Molnar


* Gautham R Shenoy [EMAIL PROTECTED] wrote:

 So do we
 - Rethink the strategy of per-subsystem hotcpu-locks ?
 
   OR
   
 - Think of a way to straighten out the super-convoluted cpufreq code ?

i'm still wondering what the conceptual source of this fundamental 
locking complexity in cpufreq (and hotplug) is - it is not intuitive to 
me at all. Could you try to explain that?

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency

2006-11-30 Thread Ingo Molnar


* Gautham R Shenoy [EMAIL PROTECTED] wrote:

 Ok, I see that we are already doing it :(. So we can end up in a 
 deadlock.
 
 Here's the culprit callpath:

in general lockdep is 100% correct when it comes to individual locks. 
The overwhelming majority of lockdep false-positives is not due to 
lockdep not getting the dependencies right, but due to the lock class 
not being correctly identified. That's not an issue here i think.

what lockdep does is it observes actual locking dependencies as they 
happen individually in various contexts, and then 'completes' the 
dependency graph by combining all the possible scenarios how contexts 
might preempt each other. So if lockdep sees independent dependencies 
and concludes that they are circular, there's nothing that saves us from 
the deadlock.

The only way for those dependencies to /never/ trigger simultaneously on 
different CPUs would be via the use of a further 'outer' exclusion 
mechanism (i.e. a lock) - but all explicit kernel-API exclusion 
mechanisms are tracked by lockdep = Q.E.D. (Open-coded exclusion might 
escape the attention of lockdep, but those are extremely rare and are 
also easily found.)

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency

2006-11-30 Thread Gautham R Shenoy

On Thu, Nov 30, 2006 at 09:29:34AM +0100, Ingo Molnar wrote:
 
 * Gautham R Shenoy [EMAIL PROTECTED] wrote:
 
  Ok, I see that we are already doing it :(. So we can end up in a
  deadlock.
 
  Here's the culprit callpath:
 
 in general lockdep is 100% correct when it comes to individual locks.
 The overwhelming majority of lockdep false-positives is not due to
 lockdep not getting the dependencies right, but due to the lock class
 not being correctly identified. That's not an issue here i think.

You're right. That's not the issue.

 
 what lockdep does is it observes actual locking dependencies as they
 happen individually in various contexts, and then 'completes' the
 dependency graph by combining all the possible scenarios how contexts
 might preempt each other. So if lockdep sees independent dependencies
 and concludes that they are circular, there's nothing that saves us from
 the deadlock.
 

Ah! I get it now. I had taken neither preemption nor the SMP scenario
into account before concluding that the warning might be a false
positive.

All I need to do is to run my test cases on a preemptible kernel 
or in parallel on a smp box. It'll definitely deadlock there!

 The only way for those dependencies to /never/ trigger simultaneously on
 different CPUs would be via the use of a further 'outer' exclusion
 mechanism (i.e. a lock) - but all explicit kernel-API exclusion
 mechanisms are tracked by lockdep = Q.E.D. (Open-coded exclusion might
 escape the attention of lockdep, but those are extremely rare and are
 also easily found.)

Thanks for making it clear :-)

 
   Ingo

regards
gautham.
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency

2006-11-30 Thread Gautham R Shenoy

On Thu, Nov 30, 2006 at 09:31:44AM +0100, Ingo Molnar wrote:
 
 * Gautham R Shenoy [EMAIL PROTECTED] wrote:
 
  So do we
  - Rethink the strategy of per-subsystem hotcpu-locks ?
 
OR
 
  - Think of a way to straighten out the super-convoluted cpufreq code ?
 
 i'm still wondering what the conceptual source of this fundamental
 locking complexity in cpufreq (and hotplug) is - it is not intuitive to
 me at all. Could you try to explain that?

I can try :-)

IIRC, cpufreq was written before cpu-hotplug and thus, was never 
hotplug aware when it was first written.

Let me try and simplify this as far as I can.

a) cpufreq maintain's it's own cpumask in the variable 
policy-affected_cpus and says : If a frequency change is issued to
any one of the cpu's in the affected_cpus mask, you change frequency on 
all cpus in the mask.
So this needs to be consistent with cpu_online map and hence
cpu hotplug aware. Furthermore, we don't want cpus in this mask to go
down when we are trying to change frequencies on them. The function
which drives the frequency change in cpufreq-core is
cpufreq_driver_target and it needs cpu-hotplug protection.

b) This mask is changed when the cpu_hotplug core sends a
CPU_DEAD/CPU_ONLINE notification through the function
cpufreq_cpu_callback.

So it's a Read-Write situation, where cpufreq_driver_target()
is the read side and cpufreq_cpu_callback() is
the write side. 
So we would want a) and b) to be mutually exclusive.

c) Problem is when a just before a cpu is offlined, 
we want to put it at the lowest possible frequency.
So we send out a CPU_DOWN_PREPARE event handled by cpufreq_cpu_callback
which calls cpufreq_driver_target.

Classic case of calling down_read from a down_write context!

To solve this so many changes have taken place since 2.6.18-rcsomething.

i) Linus split cpu_control mutex in cpu.c to cpu_add_remove_lock and
cpu_bitmask_lock. While the former serializes cpu_hotplug attempts, the
latter actually provides cpu-hotplug protection.

ii) Arjan cleaned up cpufreq to eliminate all intra-cpufreq recursive
lock_cpu_hotplug calls.

iii) Andrew replaced lock_cpu_hotplug in workqueue code with
workqueue_mutex that solved the recursive hotplug problem between
ondemand governor and workqueue.

This is how it exists in 2.6.19.
Is it safe? Not really. 
The split locking still can hit a BUG_ON as described in the post
http://lkml.org/lkml/2006/8/24/89 because the split locking opens up a
window which allows cpufreq to try changing frequency on stale cpus.

Besides having so many subsystems depend on a global lock may not be 
such a good idea from the scalability perspective.

I did post a rcu-based scalable version of lock_cpu_hotplug but didn't
get much response.

Which is why we tried having per-subsystem hotplug locks where
each subsystem can define it's own mutex.

The readside will be something like
mutex_lock(my_hotcpu_lock);

/* No hotplug can happen here */

mutex_unlock(my_hotcpu_lock);

Define a callback function my_cpu_callback and handle events
CPU_LOCK_ACQUIRE and CPU_LOCK_RELEASE which would be sent out before
beginning a cpu hotplug and after the completion of a cpu hotplug
respectively.

my_cpu_callback( int event)
{
switch(event) {
case CPU_LOCK_ACQUIRE:
mutex_lock(my_hotcpu_mutex);
break;

/*
 *  Cpu hotplug happens here.
 *  Handle any other pre/post hotplug events 
 */

case CPU_LOCK_RELEASE:
mutex_unlock(my_hotcpu_mutex);
}   break;
}

This would work fine, if the interactions between the cpu-hotplug aware
code across subsystem is well ordered.

But the ondemand governor-workqueue interactions are causing the
circular dependency problem and in another case a recursive locking
problem. Sigh!

Hope this explaination helped :-)

   Ingo

regards
gautham.
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency

2006-11-30 Thread Ingo Molnar


* Gautham R Shenoy [EMAIL PROTECTED] wrote:

 a) cpufreq maintain's it's own cpumask in the variable 
 policy-affected_cpus and says : If a frequency change is issued to 
 any one of the cpu's in the affected_cpus mask, you change frequency 
 on all cpus in the mask. So this needs to be consistent with 
 cpu_online map and hence cpu hotplug aware. Furthermore, we don't want 
 cpus in this mask to go down when we are trying to change frequencies 
 on them. The function which drives the frequency change in 
 cpufreq-core is cpufreq_driver_target and it needs cpu-hotplug 
 protection.

couldnt this complexity be radically simplified by having new kernel 
infrastructure that does something like:

   'gather' all CPUs mentioned in mask via scheduling a separate 
helper-kthread on every CPU that mask specifies, disable all
   interrupts, and execute function fn once all CPUs have been 
   'gathered' - and release all CPUs once fn has executed on each of
   them.

?

This would be done totally serialized and while holding the hotplug 
lock, so no CPU could go away or arrive while this operation is going 
on.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency

2006-11-30 Thread Andrew Morton

On Thu, 30 Nov 2006 12:03:15 +0100
Ingo Molnar [EMAIL PROTECTED] wrote:

 
 * Gautham R Shenoy [EMAIL PROTECTED] wrote:
 
  a) cpufreq maintain's it's own cpumask in the variable 
  policy-affected_cpus and says : If a frequency change is issued to 
  any one of the cpu's in the affected_cpus mask, you change frequency 
  on all cpus in the mask. So this needs to be consistent with 
  cpu_online map and hence cpu hotplug aware. Furthermore, we don't want 
  cpus in this mask to go down when we are trying to change frequencies 
  on them. The function which drives the frequency change in 
  cpufreq-core is cpufreq_driver_target and it needs cpu-hotplug 
  protection.
 
 couldnt this complexity be radically simplified by having new kernel 
 infrastructure that does something like:
 
'gather' all CPUs mentioned in mask via scheduling a separate 
 helper-kthread on every CPU that mask specifies, disable all
interrupts, and execute function fn once all CPUs have been 
'gathered' - and release all CPUs once fn has executed on each of
them.
 
 ?

How does this differ from stop_machine_run(), which hot-unplug presently uses?

 This would be done totally serialized and while holding the hotplug 
 lock, so no CPU could go away or arrive while this operation is going 
 on.

You said the hotplug lock.  That is the problem.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency

2006-11-30 Thread Gautham R Shenoy

On Thu, Nov 30, 2006 at 12:03:15PM +0100, Ingo Molnar wrote:
 
 * Gautham R Shenoy [EMAIL PROTECTED] wrote:
 
  a) cpufreq maintain's it's own cpumask in the variable
  policy-affected_cpus and says : If a frequency change is issued to
  any one of the cpu's in the affected_cpus mask, you change frequency
  on all cpus in the mask. So this needs to be consistent with
  cpu_online map and hence cpu hotplug aware. Furthermore, we don't want
  cpus in this mask to go down when we are trying to change frequencies
  on them. The function which drives the frequency change in
  cpufreq-core is cpufreq_driver_target and it needs cpu-hotplug
  protection.
 
 couldnt this complexity be radically simplified by having new kernel
 infrastructure that does something like:
 
'gather' all CPUs mentioned in mask via scheduling a separate
 helper-kthread on every CPU that mask specifies, disable all
interrupts, and execute function fn once all CPUs have been
'gathered' - and release all CPUs once fn has executed on each of
them.
 
 ?

This is what is currently being done by cpufreq:

a) get_some_cpu_hotplug_protection() [use either some global mechanism 
or a persubsystem mutex]

b) actual_freq_change_driver_function(mask) 
/* You can check out cpufreq_p4_target() in
 * arch/i386/kernel/cpu/cpufreq/p4-clockmod.c
 */
  
   {
for_each_cpu_mask(i, mask) {
cpumask_t this_cpu = cpumask_of_cpu(i);
set_cpus_allowed(current, this_cpu);
function_to_change_frequency();

}
  }

c) release_whatever_cpu_hotplug_protection()

 
 This would be done totally serialized and while holding the hotplug
 lock, so no CPU could go away or arrive while this operation is going
 on.
 

Isn't the above same as what you are suggesting? Or have I missed out
anything? 

   Ingo

regards
gautham.
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency

2006-11-30 Thread Ingo Molnar


* Andrew Morton [EMAIL PROTECTED] wrote:

  This would be done totally serialized and while holding the hotplug 
  lock, so no CPU could go away or arrive while this operation is 
  going on.
 
 You said the hotplug lock.  That is the problem.

maybe i'm too dense today but i still dont see the fundamental problem. 

Even with complex inter-subsystem interactions, hotplugging could be 
effectively and scalably controlled via a self-recursive per-CPU mutex, 
and a pointer to it embedded in task_struct:

struct task_struct {
...
int hotplug_depth;
struct mutex *hotplug_lock;
}
...

DEFINE_PER_CPU(struct mutex, hotplug_lock);

void cpu_hotplug_lock(void)
{
int cpu = raw_smp_processor_id();
/*
 * Interrupts/softirqs are hotplug-safe:
 */
if (in_interrupt())
return;
if (current-hotplug_depth++)
return;
current-hotplug_lock = per_cpu(hotplug_lock, cpu);
mutex_lock(current-hotplug_lock);
}

void cpu_hotplug_unlock(void)
{
int cpu;

if (in_interrupt())
return;
if (DEBUG_LOCKS_WARN_ON(!current-hotplug_depth))
return;
if (--current-hotplug_depth)
return;

mutex_unlock(current-hotplug_lock);
current-hotplug_lock = NULL;
}

...

void do_exit(void)
{
...
DEBUG_LOCKS_WARN_ON(current-hotplug_depth);
...
}
...
copy_process(void)
{
...
p-hotplug_depth = 0;
p-hotplug_lock = NULL;
...
}

50 lines of code at most. The only rule is to not use cpu_hotplug_lock() 
in process-context non-preemptible code [interrupt contexts are 
automatically ignored]. What am i missing?

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency

2006-11-30 Thread Ingo Molnar


* Gautham R Shenoy [EMAIL PROTECTED] wrote:

 This is what is currently being done by cpufreq:

ok!

 a) get_some_cpu_hotplug_protection() [use either some global mechanism 
   or a persubsystem mutex]

this bit is wrong i think. Any reason why it's not a per-CPU (but 
otherwise global) array of mutexes that controls CPU hotplug - as per my 
previous mail?

that would flatten the whole locking. Only one kind of lock taken, 
recursive and scalable.

Then the mechanism that changes CPU frequency should take all these 
hotplug locks on all (online) CPUs, and then first stop all processing 
on all CPUs, and then do the frequency change, atomically. This is with 
interrupts disabled everywhere /first/, and /without any additional 
locking/. That would prevent any sort of interaction from other CPUs - 
they'd all be sitting still with interrupts disabled.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency

2006-11-30 Thread Gautham R Shenoy

On Thu, Nov 30, 2006 at 12:53:27PM +0100, Ingo Molnar wrote:
 
 * Gautham R Shenoy [EMAIL PROTECTED] wrote:
 
  This is what is currently being done by cpufreq:
 
 ok!
 
  a) get_some_cpu_hotplug_protection() [use either some global mechanism
  or a persubsystem mutex]
 
 this bit is wrong i think. Any reason why it's not a per-CPU (but
 otherwise global) array of mutexes that controls CPU hotplug - as per my
 previous mail?
 
 that would flatten the whole locking. Only one kind of lock taken,
 recursive and scalable.

I had posted one such recursive scalable version which can be found here
http://lkml.org/lkml/2006/10/26/73

I remember cc'ing you.

Yeah, it looks complicated and big, but then I did not want to add
another field to the task struct as one such attempt had already been
frowned upon ( I think long back Ashok posted it)

So I ended up writing the whole read/write lock/unlock code myself.

It's a RCU based lock, extremely light on the read side, but costly for the
writers since it does a synchronize_sched.

And yeah, it's partial towards the readers but with an additional field
in the task struct we can have a fair implementation.

Besides, an unfair cpu_hotplug_lock won't work since a process doing a
sched_getaffinity in a forever_while loop can prevent any hotplug from
happening.

 
 Then the mechanism that changes CPU frequency should take all these
 hotplug locks on all (online) CPUs, and then first stop all processing
 on all CPUs, and then do the frequency change, atomically. This is with
 interrupts disabled everywhere /first/, and /without any additional
 locking/. That would prevent any sort of interaction from other CPUs -
 they'd all be sitting still with interrupts disabled.
 

Yup.

   Ingo

regards
gautham.
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency

2006-11-30 Thread Gautham R Shenoy

On Thu, Nov 30, 2006 at 12:46:17PM +0100, Ingo Molnar wrote:
 
   struct task_struct {
   ...
   int hotplug_depth;
   struct mutex *hotplug_lock;
   }
   ...
 
   DEFINE_PER_CPU(struct mutex, hotplug_lock);
 
   void cpu_hotplug_lock(void)
   {
   int cpu = raw_smp_processor_id();
   /*
* Interrupts/softirqs are hotplug-safe:
*/
   if (in_interrupt())
   return;
   if (current-hotplug_depth++)
   return;
   current-hotplug_lock = per_cpu(hotplug_lock, cpu);
   mutex_lock(current-hotplug_lock);
   }
 
   void cpu_hotplug_unlock(void)
   {
   int cpu;
 
   if (in_interrupt())
   return;
   if (DEBUG_LOCKS_WARN_ON(!current-hotplug_depth))
   return;
   if (--current-hotplug_depth)
   return;
 
   mutex_unlock(current-hotplug_lock);
   current-hotplug_lock = NULL;
   }
 

In process context preemptible code, 
Lets say we are currently running on processor i.

cpu_hotplug_lock() ; /* does mutex_lock(percpu(hotplug_lock, i)) */

/* do some operation, which might sleep */
/* migrates to cpu j */

cpu_hotplug_unlock(); /* does mutex_unlock(percpu(hotplug_lock, i)
 while running on cpu j */

This would cause cacheline ping pong, no?

 
   Ingo

regards 
gautham.
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency

2006-11-30 Thread Ingo Molnar


* Gautham R Shenoy [EMAIL PROTECTED] wrote:

 In process context preemptible code, 
 Lets say we are currently running on processor i.
 
 cpu_hotplug_lock() ; /* does mutex_lock(percpu(hotplug_lock, i)) */
 
 /* do some operation, which might sleep */
 /* migrates to cpu j */
 
 cpu_hotplug_unlock(); /* does mutex_unlock(percpu(hotplug_lock, i)
while running on cpu j */
 
 This would cause cacheline ping pong, no?

that would be attached to a very cache-inefficient thing: migrating a 
task from one CPU to another. This is not the kind of ping-pong we are 
normally worried about. (nor does it happen all that often)

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency

2006-11-30 Thread Andrew Morton

On Thu, 30 Nov 2006 12:46:17 +0100
Ingo Molnar [EMAIL PROTECTED] wrote:

 * Andrew Morton [EMAIL PROTECTED] wrote:
 
   This would be done totally serialized and while holding the hotplug 
   lock, so no CPU could go away or arrive while this operation is 
   going on.
  
  You said the hotplug lock.  That is the problem.
 
 maybe i'm too dense today but i still dont see the fundamental problem. 
 
 Even with complex inter-subsystem interactions, hotplugging could be 
 effectively and scalably controlled via a self-recursive per-CPU mutex, 
 and a pointer to it embedded in task_struct:

Yes, I suppose we could come up with now new lock type which fixes the
problem.  But first, let us review the problem.

Someone went through cpufreq sprinkling lock_cpu_hotplug() everywhere as if
it had magical properties.  For the past year or more, others have been
picking through the resulting bugs, trying to make things better and
treating that initial magic-pixie-dust as if it were inviolate and nobody
had a chance of understanding it.

IOW, cpufreq is screwed up, and we keep on trying to come up with more
complexity to unscrew it.

So what I would propose is that rather than going ahead and piling more
complexity on top of the existing poo-pile in an attempt to make it seem to
work, we should simply rip all the cpu-hotplug locking out of cpufreq
(there's a davej patch for that in -mm) and then just redo it all in an
organised fashion.

If, as a result of that exercise, we decide that the existing cpu-hotplug
serialisation mechanisms (ie: per-subsystem notification callbacks and
preempt_disable()) are insufficient then sure, that is the time to think
about more sophisticated locking primitives.

But please, let's stop asking how do we make cpufreq's hotplug locking
work?.  Let's instead ask how do we make cpufreq safe wrt hotplug?

To do the latter is a matter of

- identify the per-cpu resources which need locking

- decide what mechanism is to be used to protect each such resource

- design the locking and its hierarchy

- implement, test.

In all this time, Gautham's emails from yesterday were the first occasion
on which anybody has taken the time to sit down and get us started on this
quite ordinary design  devel process.


Let me re-re-re-re-reiterate: this is a cpufreq problem.  It has not yet
been demonstrated (AFAICS) that it is a general problem.  I am unaware of
any other subsystems having got themselves into such a mess over hotplug
locking.  Now, maybe that's because cpufreq really is especially hard.  Or
maybe it's because it's just messed up.  I don't believe we know which of
those is true yet.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency

2006-11-30 Thread Ingo Molnar


* Andrew Morton [EMAIL PROTECTED] wrote:

  Even with complex inter-subsystem interactions, hotplugging could be 
  effectively and scalably controlled via a self-recursive per-CPU 
  mutex, and a pointer to it embedded in task_struct:

 So what I would propose is that rather than going ahead and piling 
 more complexity on top of the existing poo-pile in an attempt to make 
 it seem to work, we should simply rip all the cpu-hotplug locking out 
 of cpufreq (there's a davej patch for that in -mm) and then just redo 
 it all in an organised fashion.

actually, that's precisely what i'm suggesting too, i wrote it to 
Gautham in one of the previous mails:

|| that would flatten the whole locking. Only one kind of lock taken, 
|| recursive and scalable.
||
|| Then the mechanism that changes CPU frequency should take all these 
|| hotplug locks on all (online) CPUs, and then first stop all 
|| processing on all CPUs, and then do the frequency change, atomically. 
|| This is with interrupts disabled everywhere /first/, and /without any 
|| additional locking/. That would prevent any sort of interaction from 
|| other CPUs - they'd all be sitting still with interrupts disabled.

no other locking, only the CPU hotplug lock and the (existing) ability 
to 'do stuff' with nothing else running on any other CPU.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency

2006-11-30 Thread Gautham R Shenoy

On Thu, Nov 30, 2006 at 09:29:34AM +0100, Ingo Molnar wrote:
 what lockdep does is it observes actual locking dependencies as they
 happen individually in various contexts, and then 'completes' the
 dependency graph by combining all the possible scenarios how contexts
 might preempt each other. So if lockdep sees independent dependencies
 and concludes that they are circular, there's nothing that saves us from
 the deadlock.

Ingo,

Consider a case where we have three locks A, B and C.
We have very clear locking rule inside the kernel that lock A *should*
be acquired before acquiring either lock B or lock C.

At runtime lockdep detects the two dependency chains,
A -- B -- C

and

A -- C -- B.

Does lockdep issue a circular dependency warning for this ? 
It's quite clear from the locking rule that we cannot have a
circular deadlock, since A acts as a mutex for B-C / C-B callpath.

Just curious :-) [ Well, I might encounter such a scenario in an attempt
   to make cpufreq cpu-hotplug safe! ]

   Ingo

Thanks and Regards
gautham.
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency

2006-11-29 Thread Gautham R Shenoy

On Thu, Nov 30, 2006 at 09:58:07AM +0530, Gautham R Shenoy wrote:
> 
> So can we ignore this circular-dep warning as a false positive?
> Or is there a way to exploit this circular dependency ?
> 
> At the moment, I cannot think of way to exploit this circular dependency
> unless we do something like try destroying the created workqueue when the
> cpu is dead, i.e make the cpufreq governors cpu-hotplug-aware.
> (eeks! that doesn't look good)

Ok, I see that we are already doing it :(. So we can end up in a
deadlock.

Here's the culprit callpath:

_cpu_down()
!
!-> raw_notifier_call_chain(CPU_LOCK_ACQUIRE)
!   !
!   !-> workqueue_cpu_mutex(CPU_LOCK_ACQUIRE) [*]
!
!-> raw_notifier_call_chain(CPU_DEAD)
!
!-> cpufreq_cpu_callback (CPU_DEAD)
!
!-> cpufreq_remove_dev
!
!-> __cpufreq_governor(data, GOVERNOR_STOP)
!
!-> policy->governor->governor()
!
!-> cpufreq_governor_dbs(GOVERNOR_STOP)
!
!-> destroy_workqueue() [*]

[*] indicates function takes workqueue_mutex.

So a deadlock!

I wasn't able to observe this because I'm running Xeon SMP box on which
you cannot offline cpu0. And cpufreq data is created only for cpu0,
while all other cpus cpufreq_data just point to cpu0's cpufreq_data.

So the mentioned callpath within  cpufreq_remove_dev is never reached
during the normal cpu offline cycle.

However, if there are architectures which allow the first-booted-cpu
(or to be precise, the cpu for which cpufreq_data is *actually* created) 
to be offlined and we are running Ondemand governor during the offline,
we will see this deadlock.

regards
gautham.
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency

2006-11-29 Thread Gautham R Shenoy

On Wed, Nov 29, 2006 at 01:05:56PM -0800, Andrew Morton wrote:
> On Wed, 29 Nov 2006 20:54:04 +0530
> Gautham R Shenoy <[EMAIL PROTECTED]> wrote:
> 
> > Ok, so to cut the long story short,
> > - While changing governor from anything to
> > ondemand, locks are taken in the following order
> >
> > policy->lock ===> dbs_mutex ===> workqueue_mutex.

> >
> > - While offlining a cpu, locks are taken in the following order
> >
> > cpu_add_remove_lock ==> sched_hotcpu_mutex ==> workqueue_mutex ==
> > ==> cache_chain_mutex ==> policy->lock.
> 
> What functions are taking all these locks?  (ie: the callpath?)

While changing cpufreq governor to ondemand, the locks taken are:
--
lockfunctionfile
--
policy->lockstore_scaling_governor  drivers/cpufreq/cpufreq.c

dbs_mutex   cpufreq_governor_dbsdrivers/cpufreq/cpufreq_ondemand.c

workqueue_mutex __create_workqueue  kernel/workqueue.c
--

The complete callpath would be

store_scaling_governor [*]
|
__cpufreq_set_policy
|
__cpufreq_governor(data, CPUFREQ_GOV_START)
|
policy->governor->governor => cpufreq_governor_dbs(data, CPUFREQ_GOV_START) [*]
|
create_workqueue #defined as __create_workqueue [*]

where [*] = locks taken.

While offlining a cpu, locks are taken in the following order:

--
lockfunctionfile
--
cpu_add_remove_lock cpu_downkernel/cpu.c

sched_hotcpu_mutex  migration_call  kernel/sched.c

workqueue_mutex workqueue_cpu_callback  kernel/workqueue.c

cache_chain_mutex   cpuup_callback  mm/slab.c

policy->lockcpufreq_driver_target   drivers/cpufreq/cpufreq.c
---

Please note that in the above,
- sched_hotcpu_mutex, workqueue_mutex, cache_chain_mutex are taken 
  while handling CPU_LOCK_ACQUIRE events in the respective subsystems'
  cpu_callback functions.

- policy->lock is taken while handling CPU_DOWN_PREPARE in 
  cpufreq_cpu_callback which calls cpufreq_driver_target.

It's perfectly clear that in the cpu offline callpath, cpufreq
does not have to do anything with the workqueue. 

So can we ignore this circular-dep warning as a false positive?
Or is there a way to exploit this circular dependency ?

At the moment, I cannot think of way to exploit this circular dependency
unless we do something like try destroying the created workqueue when the
cpu is dead, i.e make the cpufreq governors cpu-hotplug-aware.
(eeks! that doesn't look good)

I'm working on fixing this. Let me see if I can come up with something.

Thanks and Regards
gautham.
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency

2006-11-29 Thread Andrew Morton

On Wed, 29 Nov 2006 20:54:04 +0530
Gautham R Shenoy <[EMAIL PROTECTED]> wrote:

> Ok, so to cut the long story short, 
> - While changing governor from anything to
> ondemand, locks are taken in the following order
> 
> policy->lock ===> dbs_mutex ===> workqueue_mutex.
> 
> - While offlining a cpu, locks are taken in the following order
> 
> cpu_add_remove_lock ==> sched_hotcpu_mutex ==> workqueue_mutex ==
> ==> cache_chain_mutex ==> policy->lock.

What functions are taking all these locks?  (ie: the callpath?)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

CPUFREQ-CPUHOTPLUG: Possible circular locking dependency

2006-11-29 Thread Gautham R Shenoy

Hi all,

Looks like the lockdep has resumed yelling about the cpufreq-cpu hotplug 
interactions! Again, it's the Ondemand governor that's the culprit.

On linux-2.6.19-rc6-mm2, this is what I got yesterday evening.

[EMAIL PROTECTED] tests]# echo ondemand > 
/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
[EMAIL PROTECTED] tests]# echo 0 > /sys/devices/system/cpu/cpu1/online

===
[ INFO: possible circular locking dependency detected ]
2.6.19-rc6-mm2 #14
---
bash/4601 is trying to acquire lock:
 (>lock){--..}, at: [] mutex_lock+0x12/0x15

but task is already holding lock:
 (cache_chain_mutex){--..}, at: [] mutex_lock+0x12/0x15

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #3 (cache_chain_mutex){--..}:
   [] __lock_acquire+0x8ef/0x9f3
   [] lock_acquire+0x68/0x82
   [] __mutex_lock_slowpath+0xd3/0x224
   [] mutex_lock+0x12/0x15
   [] cpuup_callback+0x1a3/0x2d6
   [] notifier_call_chain+0x2b/0x5b
   [] __raw_notifier_call_chain+0x18/0x1d
   [] raw_notifier_call_chain+0x1a/0x1c
   [] _cpu_down+0x56/0x1ef
   [] cpu_down+0x26/0x3a
   [] store_online+0x27/0x5a
   [] sysdev_store+0x20/0x25
   [] sysfs_write_file+0xb6/0xde
   [] vfs_write+0x90/0x144
   [] sys_write+0x3d/0x61
   [] sysenter_past_esp+0x5f/0x99
   [] 0x

-> #2 (workqueue_mutex){--..}:
   [] __lock_acquire+0x8ef/0x9f3
   [] lock_acquire+0x68/0x82
   [] __mutex_lock_slowpath+0xd3/0x224
   [] mutex_lock+0x12/0x15
   [] __create_workqueue+0x5b/0x11c
   [] cpufreq_governor_dbs+0xa0/0x2e8
   [] __cpufreq_governor+0x64/0xac
   [] __cpufreq_set_policy+0x187/0x20b
   [] store_scaling_governor+0x132/0x16a
   [] store+0x35/0x46
   [] sysfs_write_file+0xb6/0xde
   [] vfs_write+0x90/0x144
   [] sys_write+0x3d/0x61
   [] sysenter_past_esp+0x5f/0x99
   [] 0x

-> #1 (dbs_mutex){--..}:
   [] __lock_acquire+0x8ef/0x9f3
   [] lock_acquire+0x68/0x82
   [] __mutex_lock_slowpath+0xd3/0x224
   [] mutex_lock+0x12/0x15
   [] cpufreq_governor_dbs+0x84/0x2e8
   [] __cpufreq_governor+0x64/0xac
   [] __cpufreq_set_policy+0x187/0x20b
   [] store_scaling_governor+0x132/0x16a
   [] store+0x35/0x46
   [] sysfs_write_file+0xb6/0xde
   [] vfs_write+0x90/0x144
   [] sys_write+0x3d/0x61
   [] sysenter_past_esp+0x5f/0x99
   [] 0x

-> #0 (>lock){--..}:
   [] __lock_acquire+0x7f3/0x9f3
   [] lock_acquire+0x68/0x82
   [] __mutex_lock_slowpath+0xd3/0x224
   [] mutex_lock+0x12/0x15
   [] cpufreq_driver_target+0x2b/0x51
   [] cpufreq_cpu_callback+0x42/0x52
   [] notifier_call_chain+0x2b/0x5b
   [] __raw_notifier_call_chain+0x18/0x1d
   [] raw_notifier_call_chain+0x1a/0x1c
   [] _cpu_down+0x56/0x1ef
   [] cpu_down+0x26/0x3a
   [] store_online+0x27/0x5a
   [] sysdev_store+0x20/0x25
   [] sysfs_write_file+0xb6/0xde
   [] vfs_write+0x90/0x144
   [] sys_write+0x3d/0x61
   [] sysenter_past_esp+0x5f/0x99
   [] 0x

other info that might help us debug this:

4 locks held by bash/4601:
 #0:  (cpu_add_remove_lock){--..}, at: [] mutex_lock+0x12/0x15
 #1:  (sched_hotcpu_mutex){--..}, at: [] mutex_lock+0x12/0x15
 #2:  (workqueue_mutex){--..}, at: [] mutex_lock+0x12/0x15
 #3:  (cache_chain_mutex){--..}, at: [] mutex_lock+0x12/0x15

stack backtrace:
 [] show_trace_log_lvl+0x19/0x2e
 [] show_trace+0x12/0x14
 [] dump_stack+0x14/0x16
 [] print_circular_bug_tail+0x7c/0x85
 [] __lock_acquire+0x7f3/0x9f3
 [] lock_acquire+0x68/0x82
 [] __mutex_lock_slowpath+0xd3/0x224
 [] mutex_lock+0x12/0x15
 [] cpufreq_driver_target+0x2b/0x51
 [] cpufreq_cpu_callback+0x42/0x52
 [] notifier_call_chain+0x2b/0x5b
 [] __raw_notifier_call_chain+0x18/0x1d
 [] raw_notifier_call_chain+0x1a/0x1c
 [] _cpu_down+0x56/0x1ef
 [] cpu_down+0x26/0x3a
 [] store_online+0x27/0x5a
 [] sysdev_store+0x20/0x25
 [] sysfs_write_file+0xb6/0xde
 [] vfs_write+0x90/0x144
 [] sys_write+0x3d/0x61
 [] sysenter_past_esp+0x5f/0x99
 ===
Breaking affinity for irq 24
CPU 1 is now offline

Ok, so to cut the long story short, 
- While changing governor from anything to
ondemand, locks are taken in the following order

policy->lock ===> dbs_mutex ===> workqueue_mutex.

- While offlining a cpu, locks are taken in the following order

cpu_add_remove_lock ==> sched_hotcpu_mutex ==> workqueue_mutex ==
==> cache_chain_mutex ==> policy->lock.

The dependency graph built out of this info has a circular dependency
as pointed out by lockdep. However, I am not quite sure how seriously this
circular dependency warning should be taken.

One way to avoid these warnings is to take the policy->lock before the
rest of the locks, while offlining the cpu. 
For a moment I even thought of taking/releasing policy->lock under

CPUFREQ-CPUHOTPLUG: Possible circular locking dependency

2006-11-29 Thread Gautham R Shenoy

Hi all,

Looks like the lockdep has resumed yelling about the cpufreq-cpu hotplug 
interactions! Again, it's the Ondemand governor that's the culprit.

On linux-2.6.19-rc6-mm2, this is what I got yesterday evening.

[EMAIL PROTECTED] tests]# echo ondemand  
/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
[EMAIL PROTECTED] tests]# echo 0  /sys/devices/system/cpu/cpu1/online

===
[ INFO: possible circular locking dependency detected ]
2.6.19-rc6-mm2 #14
---
bash/4601 is trying to acquire lock:
 (policy-lock){--..}, at: [c045793d] mutex_lock+0x12/0x15

but task is already holding lock:
 (cache_chain_mutex){--..}, at: [c045793d] mutex_lock+0x12/0x15

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

- #3 (cache_chain_mutex){--..}:
   [c013bddc] __lock_acquire+0x8ef/0x9f3
   [c013c1ec] lock_acquire+0x68/0x82
   [c04577da] __mutex_lock_slowpath+0xd3/0x224
   [c045793d] mutex_lock+0x12/0x15
   [c01642f1] cpuup_callback+0x1a3/0x2d6
   [c045a952] notifier_call_chain+0x2b/0x5b
   [c012efb6] __raw_notifier_call_chain+0x18/0x1d
   [c012efd5] raw_notifier_call_chain+0x1a/0x1c
   [c013f798] _cpu_down+0x56/0x1ef
   [c013fa5c] cpu_down+0x26/0x3a
   [c029dde2] store_online+0x27/0x5a
   [c029adec] sysdev_store+0x20/0x25
   [c0197284] sysfs_write_file+0xb6/0xde
   [c01674d4] vfs_write+0x90/0x144
   [c0167c74] sys_write+0x3d/0x61
   [c0103d36] sysenter_past_esp+0x5f/0x99
   [] 0x

- #2 (workqueue_mutex){--..}:
   [c013bddc] __lock_acquire+0x8ef/0x9f3
   [c013c1ec] lock_acquire+0x68/0x82
   [c04577da] __mutex_lock_slowpath+0xd3/0x224
   [c045793d] mutex_lock+0x12/0x15
   [c0131e91] __create_workqueue+0x5b/0x11c
   [c03c4a9a] cpufreq_governor_dbs+0xa0/0x2e8
   [c03c2cde] __cpufreq_governor+0x64/0xac
   [c03c30d5] __cpufreq_set_policy+0x187/0x20b
   [c03c33a1] store_scaling_governor+0x132/0x16a
   [c03c2af9] store+0x35/0x46
   [c0197284] sysfs_write_file+0xb6/0xde
   [c01674d4] vfs_write+0x90/0x144
   [c0167c74] sys_write+0x3d/0x61
   [c0103d36] sysenter_past_esp+0x5f/0x99
   [] 0x

- #1 (dbs_mutex){--..}:
   [c013bddc] __lock_acquire+0x8ef/0x9f3
   [c013c1ec] lock_acquire+0x68/0x82
   [c04577da] __mutex_lock_slowpath+0xd3/0x224
   [c045793d] mutex_lock+0x12/0x15
   [c03c4a7e] cpufreq_governor_dbs+0x84/0x2e8
   [c03c2cde] __cpufreq_governor+0x64/0xac
   [c03c30d5] __cpufreq_set_policy+0x187/0x20b
   [c03c33a1] store_scaling_governor+0x132/0x16a
   [c03c2af9] store+0x35/0x46
   [c0197284] sysfs_write_file+0xb6/0xde
   [c01674d4] vfs_write+0x90/0x144
   [c0167c74] sys_write+0x3d/0x61
   [c0103d36] sysenter_past_esp+0x5f/0x99
   [] 0x

- #0 (policy-lock){--..}:
   [c013bce0] __lock_acquire+0x7f3/0x9f3
   [c013c1ec] lock_acquire+0x68/0x82
   [c04577da] __mutex_lock_slowpath+0xd3/0x224
   [c045793d] mutex_lock+0x12/0x15
   [c03c2c1f] cpufreq_driver_target+0x2b/0x51
   [c03c38db] cpufreq_cpu_callback+0x42/0x52
   [c045a952] notifier_call_chain+0x2b/0x5b
   [c012efb6] __raw_notifier_call_chain+0x18/0x1d
   [c012efd5] raw_notifier_call_chain+0x1a/0x1c
   [c013f798] _cpu_down+0x56/0x1ef
   [c013fa5c] cpu_down+0x26/0x3a
   [c029dde2] store_online+0x27/0x5a
   [c029adec] sysdev_store+0x20/0x25
   [c0197284] sysfs_write_file+0xb6/0xde
   [c01674d4] vfs_write+0x90/0x144
   [c0167c74] sys_write+0x3d/0x61
   [c0103d36] sysenter_past_esp+0x5f/0x99
   [] 0x

other info that might help us debug this:

4 locks held by bash/4601:
 #0:  (cpu_add_remove_lock){--..}, at: [c045793d] mutex_lock+0x12/0x15
 #1:  (sched_hotcpu_mutex){--..}, at: [c045793d] mutex_lock+0x12/0x15
 #2:  (workqueue_mutex){--..}, at: [c045793d] mutex_lock+0x12/0x15
 #3:  (cache_chain_mutex){--..}, at: [c045793d] mutex_lock+0x12/0x15

stack backtrace:
 [c0104c8f] show_trace_log_lvl+0x19/0x2e
 [c0104d8f] show_trace+0x12/0x14
 [c0104da5] dump_stack+0x14/0x16
 [c013a157] print_circular_bug_tail+0x7c/0x85
 [c013bce0] __lock_acquire+0x7f3/0x9f3
 [c013c1ec] lock_acquire+0x68/0x82
 [c04577da] __mutex_lock_slowpath+0xd3/0x224
 [c045793d] mutex_lock+0x12/0x15
 [c03c2c1f] cpufreq_driver_target+0x2b/0x51
 [c03c38db] cpufreq_cpu_callback+0x42/0x52
 [c045a952] notifier_call_chain+0x2b/0x5b
 [c012efb6] __raw_notifier_call_chain+0x18/0x1d
 [c012efd5] raw_notifier_call_chain+0x1a/0x1c
 [c013f798] _cpu_down+0x56/0x1ef
 [c013fa5c] cpu_down+0x26/0x3a
 [c029dde2] store_online+0x27/0x5a
 [c029adec] sysdev_store+0x20/0x25
 [c0197284] sysfs_write_file+0xb6/0xde
 [c01674d4] vfs_write+0x90/0x144
 [c0167c74] sys_write+0x3d/0x61
 [c0103d36] sysenter_past_esp+0x5f/0x99
 ===
Breaking affinity for irq 24
CPU 1 is now

Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency

2006-11-29 Thread Andrew Morton

On Wed, 29 Nov 2006 20:54:04 +0530
Gautham R Shenoy [EMAIL PROTECTED] wrote:

 Ok, so to cut the long story short, 
 - While changing governor from anything to
 ondemand, locks are taken in the following order
 
 policy-lock === dbs_mutex === workqueue_mutex.
 
 - While offlining a cpu, locks are taken in the following order
 
 cpu_add_remove_lock == sched_hotcpu_mutex == workqueue_mutex ==
 == cache_chain_mutex == policy-lock.

What functions are taking all these locks?  (ie: the callpath?)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency

2006-11-29 Thread Gautham R Shenoy

On Wed, Nov 29, 2006 at 01:05:56PM -0800, Andrew Morton wrote:
 On Wed, 29 Nov 2006 20:54:04 +0530
 Gautham R Shenoy [EMAIL PROTECTED] wrote:
 
  Ok, so to cut the long story short,
  - While changing governor from anything to
  ondemand, locks are taken in the following order
 
  policy-lock === dbs_mutex === workqueue_mutex.

 
  - While offlining a cpu, locks are taken in the following order
 
  cpu_add_remove_lock == sched_hotcpu_mutex == workqueue_mutex ==
  == cache_chain_mutex == policy-lock.
 
 What functions are taking all these locks?  (ie: the callpath?)

While changing cpufreq governor to ondemand, the locks taken are:
--
lockfunctionfile
--
policy-lockstore_scaling_governor  drivers/cpufreq/cpufreq.c

dbs_mutex   cpufreq_governor_dbsdrivers/cpufreq/cpufreq_ondemand.c

workqueue_mutex __create_workqueue  kernel/workqueue.c
--

The complete callpath would be

store_scaling_governor [*]
|
__cpufreq_set_policy
|
__cpufreq_governor(data, CPUFREQ_GOV_START)
|
policy-governor-governor = cpufreq_governor_dbs(data, CPUFREQ_GOV_START) [*]
|
create_workqueue #defined as __create_workqueue [*]

where [*] = locks taken.

While offlining a cpu, locks are taken in the following order:

--
lockfunctionfile
--
cpu_add_remove_lock cpu_downkernel/cpu.c

sched_hotcpu_mutex  migration_call  kernel/sched.c

workqueue_mutex workqueue_cpu_callback  kernel/workqueue.c

cache_chain_mutex   cpuup_callback  mm/slab.c

policy-lockcpufreq_driver_target   drivers/cpufreq/cpufreq.c
---

Please note that in the above,
- sched_hotcpu_mutex, workqueue_mutex, cache_chain_mutex are taken 
  while handling CPU_LOCK_ACQUIRE events in the respective subsystems'
  cpu_callback functions.

- policy-lock is taken while handling CPU_DOWN_PREPARE in 
  cpufreq_cpu_callback which calls cpufreq_driver_target.

It's perfectly clear that in the cpu offline callpath, cpufreq
does not have to do anything with the workqueue. 

So can we ignore this circular-dep warning as a false positive?
Or is there a way to exploit this circular dependency ?

At the moment, I cannot think of way to exploit this circular dependency
unless we do something like try destroying the created workqueue when the
cpu is dead, i.e make the cpufreq governors cpu-hotplug-aware.
(eeks! that doesn't look good)

I'm working on fixing this. Let me see if I can come up with something.

Thanks and Regards
gautham.
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency

2006-11-29 Thread Gautham R Shenoy

On Thu, Nov 30, 2006 at 09:58:07AM +0530, Gautham R Shenoy wrote:
 
 So can we ignore this circular-dep warning as a false positive?
 Or is there a way to exploit this circular dependency ?
 
 At the moment, I cannot think of way to exploit this circular dependency
 unless we do something like try destroying the created workqueue when the
 cpu is dead, i.e make the cpufreq governors cpu-hotplug-aware.
 (eeks! that doesn't look good)

Ok, I see that we are already doing it :(. So we can end up in a
deadlock.

Here's the culprit callpath:

_cpu_down()
!
!- raw_notifier_call_chain(CPU_LOCK_ACQUIRE)
!   !
!   !- workqueue_cpu_mutex(CPU_LOCK_ACQUIRE) [*]
!
!- raw_notifier_call_chain(CPU_DEAD)
!
!- cpufreq_cpu_callback (CPU_DEAD)
!
!- cpufreq_remove_dev
!
!- __cpufreq_governor(data, GOVERNOR_STOP)
!
!- policy-governor-governor()
!
!- cpufreq_governor_dbs(GOVERNOR_STOP)
!
!- destroy_workqueue() [*]

[*] indicates function takes workqueue_mutex.

So a deadlock!

I wasn't able to observe this because I'm running Xeon SMP box on which
you cannot offline cpu0. And cpufreq data is created only for cpu0,
while all other cpus cpufreq_data just point to cpu0's cpufreq_data.

So the mentioned callpath within  cpufreq_remove_dev is never reached
during the normal cpu offline cycle.

However, if there are architectures which allow the first-booted-cpu
(or to be precise, the cpu for which cpufreq_data is *actually* created) 
to be offlined and we are running Ondemand governor during the offline,
we will see this deadlock.

regards
gautham.
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

46 matches

Mail list logo