RE: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency
>-Original Message- >From: [EMAIL PROTECTED] >[mailto:[EMAIL PROTECTED] On Behalf Of >Gautham R Shenoy >Sent: Wednesday, December 06, 2006 11:07 PM >To: Pallipadi, Venkatesh >Cc: [EMAIL PROTECTED]; Ingo Molnar; [EMAIL PROTECTED]; >linux-kernel@vger.kernel.org; [EMAIL PROTECTED]; >[EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED] >Subject: Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency > >Hi Venki, >On Wed, Dec 06, 2006 at 10:27:01AM -0800, Pallipadi, Venkatesh wrote: > >> But, if we make cpufreq more affected_cpus aware and have a per_cpu >> target() >> call by moving set_cpus_allowed() from driver into cpufreq core and >> define >> the target function to be atomic/non-sleeping type, then we >really don't >> need a hotplug lock for the driver any more. Driver can have >> get_cpu/put_cpu >> pair to disable preemption and then change the frequency. > >Well, we would still need to keep the affected_cpus map in >sync with the >cpu_online map. That would still require hotplug protection, right? Not really. Cpufreq can look at all the CPUs in affected_cpus and call new_target() only for CPUs that are online. Or it can call for every CPU and the actual implementation in driver can do something like set_cpus_allowed(requested_processor_mask) If (get_cpu() != requested_cpu) { /* CPU offline and nothing can be done */ put_cpu(); return 0; } This is what I did for new cpufreq interface I added for getavg(). It was easy to ensure the atomic driver call as only one driver is using it today :-) >Besides, I would love to see a way of implementing target >function to be >atomic/non-sleeping type. But as of now, the target functions call >cpufreq_notify_transition which might sleep. > That is the reason I don't have a patch for this now.. :-) There are more than 10 or so drivers that need to change for new interface. I haven't checked whether it is possible to do this without sleeping in all those drivers. >That's not the last of my worries. The ondemand-workqueue interaction >in the cpu_hotplug callback path can cause a deadlock if we go for >per-subsystem hotcpu mutexes. Can you think of a way by which we can >avoid destroying the kondemand workqueue from the cpu-hotplug callback >path ? Please see http://lkml.org/lkml/2006/11/30/9 for the >culprit-callpath. > Yes. I was thinking about it. One possible solution is to move create_workqueue()/destroy_workqueue() as in attached patch. Thanks, Venki Not fully tested at the moment. Remove callbacks using workqueue callbacks in governor start and stop. And move those call back to module init and exit time. Signed-off-by: Venkatesh Pallipadi <[EMAIL PROTECTED]> Index: linux-2.6.19-rc-mm/drivers/cpufreq/cpufreq_ondemand.c === --- linux-2.6.19-rc-mm.orig/drivers/cpufreq/cpufreq_ondemand.c +++ linux-2.6.19-rc-mm/drivers/cpufreq/cpufreq_ondemand.c @@ -514,7 +514,6 @@ static inline void dbs_timer_exit(struct { dbs_info->enable = 0; cancel_delayed_work(_info->work); - flush_workqueue(kondemand_wq); } static int cpufreq_governor_dbs(struct cpufreq_policy *policy, @@ -543,16 +542,6 @@ static int cpufreq_governor_dbs(struct c mutex_lock(_mutex); dbs_enable++; - if (dbs_enable == 1) { - kondemand_wq = create_workqueue("kondemand"); - if (!kondemand_wq) { - printk(KERN_ERR -"Creation of kondemand failed\n"); - dbs_enable--; - mutex_unlock(_mutex); - return -ENOSPC; - } - } rc = sysfs_create_group(>kobj, _attr_group); if (rc) { @@ -601,8 +590,6 @@ static int cpufreq_governor_dbs(struct c dbs_timer_exit(this_dbs_info); sysfs_remove_group(>kobj, _attr_group); dbs_enable--; - if (dbs_enable == 0) - destroy_workqueue(kondemand_wq); mutex_unlock(_mutex); @@ -632,12 +619,18 @@ static struct cpufreq_governor cpufreq_g static int __init cpufreq_gov_dbs_init(void) { + kondemand_wq = create_workqueue("kondemand"); + if (!kondemand_wq) { + printk(KERN_ERR "Creation of kondemand failed\n"); + return -EFAULT; + } return cpufreq_register_governor(_gov_dbs); } static void __exit cpufreq_gov_dbs_exit(void) { cpufreq_unregister_governor(_gov_dbs); + destroy_workqueue(kondemand_wq); } ondemand_recursive_locking_fix.patch Description: ondemand_recursive_locking_fix.patch
RE: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Gautham R Shenoy Sent: Wednesday, December 06, 2006 11:07 PM To: Pallipadi, Venkatesh Cc: [EMAIL PROTECTED]; Ingo Molnar; [EMAIL PROTECTED]; linux-kernel@vger.kernel.org; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency Hi Venki, On Wed, Dec 06, 2006 at 10:27:01AM -0800, Pallipadi, Venkatesh wrote: But, if we make cpufreq more affected_cpus aware and have a per_cpu target() call by moving set_cpus_allowed() from driver into cpufreq core and define the target function to be atomic/non-sleeping type, then we really don't need a hotplug lock for the driver any more. Driver can have get_cpu/put_cpu pair to disable preemption and then change the frequency. Well, we would still need to keep the affected_cpus map in sync with the cpu_online map. That would still require hotplug protection, right? Not really. Cpufreq can look at all the CPUs in affected_cpus and call new_target() only for CPUs that are online. Or it can call for every CPU and the actual implementation in driver can do something like set_cpus_allowed(requested_processor_mask) If (get_cpu() != requested_cpu) { /* CPU offline and nothing can be done */ put_cpu(); return 0; } This is what I did for new cpufreq interface I added for getavg(). It was easy to ensure the atomic driver call as only one driver is using it today :-) Besides, I would love to see a way of implementing target function to be atomic/non-sleeping type. But as of now, the target functions call cpufreq_notify_transition which might sleep. That is the reason I don't have a patch for this now.. :-) There are more than 10 or so drivers that need to change for new interface. I haven't checked whether it is possible to do this without sleeping in all those drivers. That's not the last of my worries. The ondemand-workqueue interaction in the cpu_hotplug callback path can cause a deadlock if we go for per-subsystem hotcpu mutexes. Can you think of a way by which we can avoid destroying the kondemand workqueue from the cpu-hotplug callback path ? Please see http://lkml.org/lkml/2006/11/30/9 for the culprit-callpath. Yes. I was thinking about it. One possible solution is to move create_workqueue()/destroy_workqueue() as in attached patch. Thanks, Venki Not fully tested at the moment. Remove callbacks using workqueue callbacks in governor start and stop. And move those call back to module init and exit time. Signed-off-by: Venkatesh Pallipadi [EMAIL PROTECTED] Index: linux-2.6.19-rc-mm/drivers/cpufreq/cpufreq_ondemand.c === --- linux-2.6.19-rc-mm.orig/drivers/cpufreq/cpufreq_ondemand.c +++ linux-2.6.19-rc-mm/drivers/cpufreq/cpufreq_ondemand.c @@ -514,7 +514,6 @@ static inline void dbs_timer_exit(struct { dbs_info-enable = 0; cancel_delayed_work(dbs_info-work); - flush_workqueue(kondemand_wq); } static int cpufreq_governor_dbs(struct cpufreq_policy *policy, @@ -543,16 +542,6 @@ static int cpufreq_governor_dbs(struct c mutex_lock(dbs_mutex); dbs_enable++; - if (dbs_enable == 1) { - kondemand_wq = create_workqueue(kondemand); - if (!kondemand_wq) { - printk(KERN_ERR -Creation of kondemand failed\n); - dbs_enable--; - mutex_unlock(dbs_mutex); - return -ENOSPC; - } - } rc = sysfs_create_group(policy-kobj, dbs_attr_group); if (rc) { @@ -601,8 +590,6 @@ static int cpufreq_governor_dbs(struct c dbs_timer_exit(this_dbs_info); sysfs_remove_group(policy-kobj, dbs_attr_group); dbs_enable--; - if (dbs_enable == 0) - destroy_workqueue(kondemand_wq); mutex_unlock(dbs_mutex); @@ -632,12 +619,18 @@ static struct cpufreq_governor cpufreq_g static int __init cpufreq_gov_dbs_init(void) { + kondemand_wq = create_workqueue(kondemand); + if (!kondemand_wq) { + printk(KERN_ERR Creation of kondemand failed\n); + return -EFAULT; + } return cpufreq_register_governor(cpufreq_gov_dbs); } static void __exit cpufreq_gov_dbs_exit(void) { cpufreq_unregister_governor(cpufreq_gov_dbs); + destroy_workqueue(kondemand_wq); } ondemand_recursive_locking_fix.patch Description: ondemand_recursive_locking_fix.patch
Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency
Hi Venki, On Wed, Dec 06, 2006 at 10:27:01AM -0800, Pallipadi, Venkatesh wrote: > But, if we make cpufreq more affected_cpus aware and have a per_cpu > target() > call by moving set_cpus_allowed() from driver into cpufreq core and > define > the target function to be atomic/non-sleeping type, then we really don't > need a hotplug lock for the driver any more. Driver can have > get_cpu/put_cpu > pair to disable preemption and then change the frequency. Well, we would still need to keep the affected_cpus map in sync with the cpu_online map. That would still require hotplug protection, right? Besides, I would love to see a way of implementing target function to be atomic/non-sleeping type. But as of now, the target functions call cpufreq_notify_transition which might sleep. That's not the last of my worries. The ondemand-workqueue interaction in the cpu_hotplug callback path can cause a deadlock if we go for per-subsystem hotcpu mutexes. Can you think of a way by which we can avoid destroying the kondemand workqueue from the cpu-hotplug callback path ? Please see http://lkml.org/lkml/2006/11/30/9 for the culprit-callpath. Thanks and Regards gautham. -- Gautham R Shenoy Linux Technology Center IBM India. "Freedom comes with a price tag of responsibility, which is still a bargain, because Freedom is priceless!" - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency
>-Original Message- >From: [EMAIL PROTECTED] >[mailto:[EMAIL PROTECTED] On Behalf Of >Gautham R Shenoy >Sent: Thursday, November 30, 2006 3:44 AM >To: Ingo Molnar >Cc: Gautham R Shenoy; [EMAIL PROTECTED]; >linux-kernel@vger.kernel.org; [EMAIL PROTECTED]; >[EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED] >Subject: Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency > >On Thu, Nov 30, 2006 at 12:03:15PM +0100, Ingo Molnar wrote: >> >> * Gautham R Shenoy <[EMAIL PROTECTED]> wrote: >> >> > a) cpufreq maintain's it's own cpumask in the variable >> > policy->affected_cpus and says : If a frequency change is issued to >> > any one of the cpu's in the affected_cpus mask, you change >frequency >> > on all cpus in the mask. So this needs to be consistent with >> > cpu_online map and hence cpu hotplug aware. Furthermore, >we don't want >> > cpus in this mask to go down when we are trying to change >frequencies >> > on them. The function which drives the frequency change in >> > cpufreq-core is cpufreq_driver_target and it needs cpu-hotplug >> > protection. >> >> couldnt this complexity be radically simplified by having new kernel >> infrastructure that does something like: >> >> " 'gather' all CPUs mentioned in via scheduling a separate >> helper-kthread on every CPU that specifies, disable all >>interrupts, and execute function once all CPUs have been >>'gathered' - and release all CPUs once has executed >on each of >>them." >> >> ? > >This is what is currently being done by cpufreq: > >a) get_some_cpu_hotplug_protection() [use either some global mechanism > or a persubsystem mutex] > >b) actual_freq_change_driver_function(mask) >/* You can check out cpufreq_p4_target() in > * arch/i386/kernel/cpu/cpufreq/p4-clockmod.c > */ > > { > for_each_cpu_mask(i, mask) { > cpumask_t this_cpu = cpumask_of_cpu(i); > set_cpus_allowed(current, this_cpu); > function_to_change_frequency(); > > } > } > As there are many options being discussed here, let me propose one more option that can eliminate the need for hotplug lock in cpufreq_driver_target() path. As Gautham clearly explained before, today we have cpufreq calling cpufreq_driver_target() in each driver to change the CPU frequency And the driver internally uses set_cpus_allowed to reschedule onto different affected_cpus and change the frequency. That is the main reason why we need to disable hotplug in this path. But, if we make cpufreq more affected_cpus aware and have a per_cpu target() call by moving set_cpus_allowed() from driver into cpufreq core and define the target function to be atomic/non-sleeping type, then we really don't need a hotplug lock for the driver any more. Driver can have get_cpu/put_cpu pair to disable preemption and then change the frequency. This means a lot of changes as we need new interface changes to cpufreq and rewrite of bunch of drivers. But, this looks to me as the least complicated solution. Thanks, Venki - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Gautham R Shenoy Sent: Thursday, November 30, 2006 3:44 AM To: Ingo Molnar Cc: Gautham R Shenoy; [EMAIL PROTECTED]; linux-kernel@vger.kernel.org; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency On Thu, Nov 30, 2006 at 12:03:15PM +0100, Ingo Molnar wrote: * Gautham R Shenoy [EMAIL PROTECTED] wrote: a) cpufreq maintain's it's own cpumask in the variable policy-affected_cpus and says : If a frequency change is issued to any one of the cpu's in the affected_cpus mask, you change frequency on all cpus in the mask. So this needs to be consistent with cpu_online map and hence cpu hotplug aware. Furthermore, we don't want cpus in this mask to go down when we are trying to change frequencies on them. The function which drives the frequency change in cpufreq-core is cpufreq_driver_target and it needs cpu-hotplug protection. couldnt this complexity be radically simplified by having new kernel infrastructure that does something like: 'gather' all CPUs mentioned in mask via scheduling a separate helper-kthread on every CPU that mask specifies, disable all interrupts, and execute function fn once all CPUs have been 'gathered' - and release all CPUs once fn has executed on each of them. ? This is what is currently being done by cpufreq: a) get_some_cpu_hotplug_protection() [use either some global mechanism or a persubsystem mutex] b) actual_freq_change_driver_function(mask) /* You can check out cpufreq_p4_target() in * arch/i386/kernel/cpu/cpufreq/p4-clockmod.c */ { for_each_cpu_mask(i, mask) { cpumask_t this_cpu = cpumask_of_cpu(i); set_cpus_allowed(current, this_cpu); function_to_change_frequency(); } } As there are many options being discussed here, let me propose one more option that can eliminate the need for hotplug lock in cpufreq_driver_target() path. As Gautham clearly explained before, today we have cpufreq calling cpufreq_driver_target() in each driver to change the CPU frequency And the driver internally uses set_cpus_allowed to reschedule onto different affected_cpus and change the frequency. That is the main reason why we need to disable hotplug in this path. But, if we make cpufreq more affected_cpus aware and have a per_cpu target() call by moving set_cpus_allowed() from driver into cpufreq core and define the target function to be atomic/non-sleeping type, then we really don't need a hotplug lock for the driver any more. Driver can have get_cpu/put_cpu pair to disable preemption and then change the frequency. This means a lot of changes as we need new interface changes to cpufreq and rewrite of bunch of drivers. But, this looks to me as the least complicated solution. Thanks, Venki - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency
Hi Venki, On Wed, Dec 06, 2006 at 10:27:01AM -0800, Pallipadi, Venkatesh wrote: But, if we make cpufreq more affected_cpus aware and have a per_cpu target() call by moving set_cpus_allowed() from driver into cpufreq core and define the target function to be atomic/non-sleeping type, then we really don't need a hotplug lock for the driver any more. Driver can have get_cpu/put_cpu pair to disable preemption and then change the frequency. Well, we would still need to keep the affected_cpus map in sync with the cpu_online map. That would still require hotplug protection, right? Besides, I would love to see a way of implementing target function to be atomic/non-sleeping type. But as of now, the target functions call cpufreq_notify_transition which might sleep. That's not the last of my worries. The ondemand-workqueue interaction in the cpu_hotplug callback path can cause a deadlock if we go for per-subsystem hotcpu mutexes. Can you think of a way by which we can avoid destroying the kondemand workqueue from the cpu-hotplug callback path ? Please see http://lkml.org/lkml/2006/11/30/9 for the culprit-callpath. Thanks and Regards gautham. -- Gautham R Shenoy Linux Technology Center IBM India. Freedom comes with a price tag of responsibility, which is still a bargain, because Freedom is priceless! - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency
* Gautham R Shenoy <[EMAIL PROTECTED]> wrote: > Consider a case where we have three locks A, B and C. > We have very clear locking rule inside the kernel that lock A *should* > be acquired before acquiring either lock B or lock C. > > At runtime lockdep detects the two dependency chains, > A --> B --> C > > and > > A --> C --> B. > > Does lockdep issue a circular dependency warning for this ? > It's quite clear from the locking rule that we cannot have a > circular deadlock, since A acts as a mutex for B->C / C->B callpath. > > Just curious :-) [ Well, I might encounter such a scenario in an attempt > to make cpufreq cpu-hotplug safe! ] yes, lockdep will warn about this. Will you /ever/ have a B->C or C->B dependency without A being taken first? if not then my suggestion would be to just lump 'B' and 'C' into a single lock - or to get rid of them altogether. There's little reason to keep them separate. /If/ you want to keep them separate (because they protect different data structures) then it's the cleanest design to have an ordering between them. The taking of 'A' might go away anytime - such a construct is really, really fragile and is only asking for trouble. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency
* Gautham R Shenoy [EMAIL PROTECTED] wrote: Consider a case where we have three locks A, B and C. We have very clear locking rule inside the kernel that lock A *should* be acquired before acquiring either lock B or lock C. At runtime lockdep detects the two dependency chains, A -- B -- C and A -- C -- B. Does lockdep issue a circular dependency warning for this ? It's quite clear from the locking rule that we cannot have a circular deadlock, since A acts as a mutex for B-C / C-B callpath. Just curious :-) [ Well, I might encounter such a scenario in an attempt to make cpufreq cpu-hotplug safe! ] yes, lockdep will warn about this. Will you /ever/ have a B-C or C-B dependency without A being taken first? if not then my suggestion would be to just lump 'B' and 'C' into a single lock - or to get rid of them altogether. There's little reason to keep them separate. /If/ you want to keep them separate (because they protect different data structures) then it's the cleanest design to have an ordering between them. The taking of 'A' might go away anytime - such a construct is really, really fragile and is only asking for trouble. Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency
On Thu, Nov 30, 2006 at 09:29:34AM +0100, Ingo Molnar wrote: > what lockdep does is it observes actual locking dependencies as they > happen individually in various contexts, and then 'completes' the > dependency graph by combining all the possible scenarios how contexts > might preempt each other. So if lockdep sees independent dependencies > and concludes that they are circular, there's nothing that saves us from > the deadlock. Ingo, Consider a case where we have three locks A, B and C. We have very clear locking rule inside the kernel that lock A *should* be acquired before acquiring either lock B or lock C. At runtime lockdep detects the two dependency chains, A --> B --> C and A --> C --> B. Does lockdep issue a circular dependency warning for this ? It's quite clear from the locking rule that we cannot have a circular deadlock, since A acts as a mutex for B->C / C->B callpath. Just curious :-) [ Well, I might encounter such a scenario in an attempt to make cpufreq cpu-hotplug safe! ] > Ingo Thanks and Regards gautham. -- Gautham R Shenoy Linux Technology Center IBM India. "Freedom comes with a price tag of responsibility, which is still a bargain, because Freedom is priceless!" - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency
* Andrew Morton <[EMAIL PROTECTED]> wrote: > > Even with complex inter-subsystem interactions, hotplugging could be > > effectively and scalably controlled via a self-recursive per-CPU > > mutex, and a pointer to it embedded in task_struct: > So what I would propose is that rather than going ahead and piling > more complexity on top of the existing poo-pile in an attempt to make > it seem to work, we should simply rip all the cpu-hotplug locking out > of cpufreq (there's a davej patch for that in -mm) and then just redo > it all in an organised fashion. actually, that's precisely what i'm suggesting too, i wrote it to Gautham in one of the previous mails: || that would flatten the whole locking. Only one kind of lock taken, || recursive and scalable. || || Then the mechanism that changes CPU frequency should take all these || hotplug locks on all (online) CPUs, and then first stop all || processing on all CPUs, and then do the frequency change, atomically. || This is with interrupts disabled everywhere /first/, and /without any || additional locking/. That would prevent any sort of interaction from || other CPUs - they'd all be sitting still with interrupts disabled. no other locking, only the CPU hotplug lock and the (existing) ability to 'do stuff' with nothing else running on any other CPU. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency
On Thu, 30 Nov 2006 12:46:17 +0100 Ingo Molnar <[EMAIL PROTECTED]> wrote: > * Andrew Morton <[EMAIL PROTECTED]> wrote: > > > > This would be done totally serialized and while holding the hotplug > > > lock, so no CPU could go away or arrive while this operation is > > > going on. > > > > You said "the hotplug lock". That is the problem. > > maybe i'm too dense today but i still dont see the fundamental problem. > > Even with complex inter-subsystem interactions, hotplugging could be > effectively and scalably controlled via a self-recursive per-CPU mutex, > and a pointer to it embedded in task_struct: Yes, I suppose we could come up with now new lock type which fixes the problem. But first, let us review the problem. Someone went through cpufreq sprinkling lock_cpu_hotplug() everywhere as if it had magical properties. For the past year or more, others have been picking through the resulting bugs, trying to make things better and treating that initial magic-pixie-dust as if it were inviolate and nobody had a chance of understanding it. IOW, cpufreq is screwed up, and we keep on trying to come up with more complexity to unscrew it. So what I would propose is that rather than going ahead and piling more complexity on top of the existing poo-pile in an attempt to make it seem to work, we should simply rip all the cpu-hotplug locking out of cpufreq (there's a davej patch for that in -mm) and then just redo it all in an organised fashion. If, as a result of that exercise, we decide that the existing cpu-hotplug serialisation mechanisms (ie: per-subsystem notification callbacks and preempt_disable()) are insufficient then sure, that is the time to think about more sophisticated locking primitives. But please, let's stop asking "how do we make cpufreq's hotplug locking work?". Let's instead ask "how do we make cpufreq safe wrt hotplug?" To do the latter is a matter of - identify the per-cpu resources which need locking - decide what mechanism is to be used to protect each such resource - design the locking and its hierarchy - implement, test. In all this time, Gautham's emails from yesterday were the first occasion on which anybody has taken the time to sit down and get us started on this quite ordinary design & devel process. Let me re-re-re-re-reiterate: this is a cpufreq problem. It has not yet been demonstrated (AFAICS) that it is a general problem. I am unaware of any other subsystems having got themselves into such a mess over hotplug locking. Now, maybe that's because cpufreq really is especially hard. Or maybe it's because it's just messed up. I don't believe we know which of those is true yet. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency
* Gautham R Shenoy <[EMAIL PROTECTED]> wrote: > In process context preemptible code, > Lets say we are currently running on processor i. > > cpu_hotplug_lock() ; /* does mutex_lock((hotplug_lock, i)) */ > > /* do some operation, which might sleep */ > /* migrates to cpu j */ > > cpu_hotplug_unlock(); /* does mutex_unlock((hotplug_lock, i) >while running on cpu j */ > > This would cause cacheline ping pong, no? that would be attached to a very cache-inefficient thing: migrating a task from one CPU to another. This is not the kind of ping-pong we are normally worried about. (nor does it happen all that often) Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency
On Thu, Nov 30, 2006 at 12:46:17PM +0100, Ingo Molnar wrote: > > struct task_struct { > ... > int hotplug_depth; > struct mutex *hotplug_lock; > } > ... > > DEFINE_PER_CPU(struct mutex, hotplug_lock); > > void cpu_hotplug_lock(void) > { > int cpu = raw_smp_processor_id(); > /* >* Interrupts/softirqs are hotplug-safe: >*/ > if (in_interrupt()) > return; > if (current->hotplug_depth++) > return; > current->hotplug_lock = _cpu(hotplug_lock, cpu); > mutex_lock(current->hotplug_lock); > } > > void cpu_hotplug_unlock(void) > { > int cpu; > > if (in_interrupt()) > return; > if (DEBUG_LOCKS_WARN_ON(!current->hotplug_depth)) > return; > if (--current->hotplug_depth) > return; > > mutex_unlock(current->hotplug_lock); > current->hotplug_lock = NULL; > } > In process context preemptible code, Lets say we are currently running on processor i. cpu_hotplug_lock() ; /* does mutex_lock((hotplug_lock, i)) */ /* do some operation, which might sleep */ /* migrates to cpu j */ cpu_hotplug_unlock(); /* does mutex_unlock((hotplug_lock, i) while running on cpu j */ This would cause cacheline ping pong, no? > > Ingo regards gautham. -- Gautham R Shenoy Linux Technology Center IBM India. "Freedom comes with a price tag of responsibility, which is still a bargain, because Freedom is priceless!" - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency
On Thu, Nov 30, 2006 at 12:53:27PM +0100, Ingo Molnar wrote: > > * Gautham R Shenoy <[EMAIL PROTECTED]> wrote: > > > This is what is currently being done by cpufreq: > > ok! > > > a) get_some_cpu_hotplug_protection() [use either some global mechanism > > or a persubsystem mutex] > > this bit is wrong i think. Any reason why it's not a per-CPU (but > otherwise global) array of mutexes that controls CPU hotplug - as per my > previous mail? > > that would flatten the whole locking. Only one kind of lock taken, > recursive and scalable. I had posted one such recursive scalable version which can be found here http://lkml.org/lkml/2006/10/26/73 I remember cc'ing you. Yeah, it looks complicated and big, but then I did not want to add another field to the task struct as one such attempt had already been frowned upon ( I think long back Ashok posted it) So I ended up writing the whole read/write lock/unlock code myself. It's a RCU based lock, extremely light on the read side, but costly for the writers since it does a synchronize_sched. And yeah, it's partial towards the readers but with an additional field in the task struct we can have a fair implementation. Besides, an unfair cpu_hotplug_lock won't work since a process doing a sched_getaffinity in a forever_while loop can prevent any hotplug from happening. > > Then the mechanism that changes CPU frequency should take all these > hotplug locks on all (online) CPUs, and then first stop all processing > on all CPUs, and then do the frequency change, atomically. This is with > interrupts disabled everywhere /first/, and /without any additional > locking/. That would prevent any sort of interaction from other CPUs - > they'd all be sitting still with interrupts disabled. > Yup. > Ingo regards gautham. -- Gautham R Shenoy Linux Technology Center IBM India. "Freedom comes with a price tag of responsibility, which is still a bargain, because Freedom is priceless!" - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency
* Gautham R Shenoy <[EMAIL PROTECTED]> wrote: > This is what is currently being done by cpufreq: ok! > a) get_some_cpu_hotplug_protection() [use either some global mechanism > or a persubsystem mutex] this bit is wrong i think. Any reason why it's not a per-CPU (but otherwise global) array of mutexes that controls CPU hotplug - as per my previous mail? that would flatten the whole locking. Only one kind of lock taken, recursive and scalable. Then the mechanism that changes CPU frequency should take all these hotplug locks on all (online) CPUs, and then first stop all processing on all CPUs, and then do the frequency change, atomically. This is with interrupts disabled everywhere /first/, and /without any additional locking/. That would prevent any sort of interaction from other CPUs - they'd all be sitting still with interrupts disabled. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency
* Andrew Morton <[EMAIL PROTECTED]> wrote: > > This would be done totally serialized and while holding the hotplug > > lock, so no CPU could go away or arrive while this operation is > > going on. > > You said "the hotplug lock". That is the problem. maybe i'm too dense today but i still dont see the fundamental problem. Even with complex inter-subsystem interactions, hotplugging could be effectively and scalably controlled via a self-recursive per-CPU mutex, and a pointer to it embedded in task_struct: struct task_struct { ... int hotplug_depth; struct mutex *hotplug_lock; } ... DEFINE_PER_CPU(struct mutex, hotplug_lock); void cpu_hotplug_lock(void) { int cpu = raw_smp_processor_id(); /* * Interrupts/softirqs are hotplug-safe: */ if (in_interrupt()) return; if (current->hotplug_depth++) return; current->hotplug_lock = _cpu(hotplug_lock, cpu); mutex_lock(current->hotplug_lock); } void cpu_hotplug_unlock(void) { int cpu; if (in_interrupt()) return; if (DEBUG_LOCKS_WARN_ON(!current->hotplug_depth)) return; if (--current->hotplug_depth) return; mutex_unlock(current->hotplug_lock); current->hotplug_lock = NULL; } ... void do_exit(void) { ... DEBUG_LOCKS_WARN_ON(current->hotplug_depth); ... } ... copy_process(void) { ... p->hotplug_depth = 0; p->hotplug_lock = NULL; ... } 50 lines of code at most. The only rule is to not use cpu_hotplug_lock() in process-context non-preemptible code [interrupt contexts are automatically ignored]. What am i missing? Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency
On Thu, Nov 30, 2006 at 12:03:15PM +0100, Ingo Molnar wrote: > > * Gautham R Shenoy <[EMAIL PROTECTED]> wrote: > > > a) cpufreq maintain's it's own cpumask in the variable > > policy->affected_cpus and says : If a frequency change is issued to > > any one of the cpu's in the affected_cpus mask, you change frequency > > on all cpus in the mask. So this needs to be consistent with > > cpu_online map and hence cpu hotplug aware. Furthermore, we don't want > > cpus in this mask to go down when we are trying to change frequencies > > on them. The function which drives the frequency change in > > cpufreq-core is cpufreq_driver_target and it needs cpu-hotplug > > protection. > > couldnt this complexity be radically simplified by having new kernel > infrastructure that does something like: > > " 'gather' all CPUs mentioned in via scheduling a separate > helper-kthread on every CPU that specifies, disable all >interrupts, and execute function once all CPUs have been >'gathered' - and release all CPUs once has executed on each of >them." > > ? This is what is currently being done by cpufreq: a) get_some_cpu_hotplug_protection() [use either some global mechanism or a persubsystem mutex] b) actual_freq_change_driver_function(mask) /* You can check out cpufreq_p4_target() in * arch/i386/kernel/cpu/cpufreq/p4-clockmod.c */ { for_each_cpu_mask(i, mask) { cpumask_t this_cpu = cpumask_of_cpu(i); set_cpus_allowed(current, this_cpu); function_to_change_frequency(); } } c) release_whatever_cpu_hotplug_protection() > > This would be done totally serialized and while holding the hotplug > lock, so no CPU could go away or arrive while this operation is going > on. > Isn't the above same as what you are suggesting? Or have I missed out anything? > Ingo regards gautham. -- Gautham R Shenoy Linux Technology Center IBM India. "Freedom comes with a price tag of responsibility, which is still a bargain, because Freedom is priceless!" - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency
On Thu, 30 Nov 2006 12:03:15 +0100 Ingo Molnar <[EMAIL PROTECTED]> wrote: > > * Gautham R Shenoy <[EMAIL PROTECTED]> wrote: > > > a) cpufreq maintain's it's own cpumask in the variable > > policy->affected_cpus and says : If a frequency change is issued to > > any one of the cpu's in the affected_cpus mask, you change frequency > > on all cpus in the mask. So this needs to be consistent with > > cpu_online map and hence cpu hotplug aware. Furthermore, we don't want > > cpus in this mask to go down when we are trying to change frequencies > > on them. The function which drives the frequency change in > > cpufreq-core is cpufreq_driver_target and it needs cpu-hotplug > > protection. > > couldnt this complexity be radically simplified by having new kernel > infrastructure that does something like: > > " 'gather' all CPUs mentioned in via scheduling a separate > helper-kthread on every CPU that specifies, disable all >interrupts, and execute function once all CPUs have been >'gathered' - and release all CPUs once has executed on each of >them." > > ? How does this differ from stop_machine_run(), which hot-unplug presently uses? > This would be done totally serialized and while holding the hotplug > lock, so no CPU could go away or arrive while this operation is going > on. You said "the hotplug lock". That is the problem. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency
* Gautham R Shenoy <[EMAIL PROTECTED]> wrote: > a) cpufreq maintain's it's own cpumask in the variable > policy->affected_cpus and says : If a frequency change is issued to > any one of the cpu's in the affected_cpus mask, you change frequency > on all cpus in the mask. So this needs to be consistent with > cpu_online map and hence cpu hotplug aware. Furthermore, we don't want > cpus in this mask to go down when we are trying to change frequencies > on them. The function which drives the frequency change in > cpufreq-core is cpufreq_driver_target and it needs cpu-hotplug > protection. couldnt this complexity be radically simplified by having new kernel infrastructure that does something like: " 'gather' all CPUs mentioned in via scheduling a separate helper-kthread on every CPU that specifies, disable all interrupts, and execute function once all CPUs have been 'gathered' - and release all CPUs once has executed on each of them." ? This would be done totally serialized and while holding the hotplug lock, so no CPU could go away or arrive while this operation is going on. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency
On Thu, Nov 30, 2006 at 09:31:44AM +0100, Ingo Molnar wrote: > > * Gautham R Shenoy <[EMAIL PROTECTED]> wrote: > > > So do we > > - Rethink the strategy of per-subsystem hotcpu-locks ? > > > > OR > > > > - Think of a way to straighten out the super-convoluted cpufreq code ? > > i'm still wondering what the conceptual source of this fundamental > locking complexity in cpufreq (and hotplug) is - it is not intuitive to > me at all. Could you try to explain that? I can try :-) IIRC, cpufreq was written before cpu-hotplug and thus, was never hotplug aware when it was first written. Let me try and simplify this as far as I can. a) cpufreq maintain's it's own cpumask in the variable policy->affected_cpus and says : If a frequency change is issued to any one of the cpu's in the affected_cpus mask, you change frequency on all cpus in the mask. So this needs to be consistent with cpu_online map and hence cpu hotplug aware. Furthermore, we don't want cpus in this mask to go down when we are trying to change frequencies on them. The function which drives the frequency change in cpufreq-core is cpufreq_driver_target and it needs cpu-hotplug protection. b) This mask is changed when the cpu_hotplug core sends a CPU_DEAD/CPU_ONLINE notification through the function cpufreq_cpu_callback. So it's a Read-Write situation, where cpufreq_driver_target() is the read side and cpufreq_cpu_callback() is the write side. So we would want a) and b) to be mutually exclusive. c) Problem is when a just before a cpu is offlined, we want to put it at the lowest possible frequency. So we send out a CPU_DOWN_PREPARE event handled by cpufreq_cpu_callback which calls cpufreq_driver_target. Classic case of calling down_read from a down_write context! To solve this so many changes have taken place since 2.6.18-rcsomething. i) Linus split cpu_control mutex in cpu.c to cpu_add_remove_lock and cpu_bitmask_lock. While the former serializes cpu_hotplug attempts, the latter actually provides cpu-hotplug protection. ii) Arjan cleaned up cpufreq to eliminate all intra-cpufreq recursive lock_cpu_hotplug calls. iii) Andrew replaced lock_cpu_hotplug in workqueue code with workqueue_mutex that solved the recursive hotplug problem between ondemand governor and workqueue. This is how it exists in 2.6.19. Is it safe? Not really. The split locking still can hit a BUG_ON as described in the post http://lkml.org/lkml/2006/8/24/89 because the split locking opens up a window which allows cpufreq to try changing frequency on stale cpus. Besides having so many subsystems depend on a global lock may not be such a good idea from the scalability perspective. I did post a rcu-based scalable version of lock_cpu_hotplug but didn't get much response. Which is why we tried having per-subsystem hotplug locks where each subsystem can define it's own mutex. The readside will be something like mutex_lock(_hotcpu_lock); /* No hotplug can happen here */ mutex_unlock(_hotcpu_lock); Define a callback function my_cpu_callback and handle events CPU_LOCK_ACQUIRE and CPU_LOCK_RELEASE which would be sent out before beginning a cpu hotplug and after the completion of a cpu hotplug respectively. my_cpu_callback( int event) { switch(event) { case CPU_LOCK_ACQUIRE: mutex_lock(_hotcpu_mutex); break; /* * Cpu hotplug happens here. * Handle any other pre/post hotplug events */ case CPU_LOCK_RELEASE: mutex_unlock(_hotcpu_mutex); } break; } This would work fine, if the interactions between the cpu-hotplug aware code across subsystem is well ordered. But the ondemand governor-workqueue interactions are causing the circular dependency problem and in another case a recursive locking problem. Sigh! Hope this explaination helped :-) > Ingo regards gautham. -- Gautham R Shenoy Linux Technology Center IBM India. "Freedom comes with a price tag of responsibility, which is still a bargain, because Freedom is priceless!" - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency
On Thu, Nov 30, 2006 at 09:29:34AM +0100, Ingo Molnar wrote: > > * Gautham R Shenoy <[EMAIL PROTECTED]> wrote: > > > Ok, I see that we are already doing it :(. So we can end up in a > > deadlock. > > > > Here's the culprit callpath: > > in general lockdep is 100% correct when it comes to "individual locks". > The overwhelming majority of lockdep false-positives is not due to > lockdep not getting the dependencies right, but due to the "lock class" > not being correctly identified. That's not an issue here i think. You're right. That's not the issue. > > what lockdep does is it observes actual locking dependencies as they > happen individually in various contexts, and then 'completes' the > dependency graph by combining all the possible scenarios how contexts > might preempt each other. So if lockdep sees independent dependencies > and concludes that they are circular, there's nothing that saves us from > the deadlock. > Ah! I get it now. I had taken neither preemption nor the SMP scenario into account before concluding that the warning might be a false positive. All I need to do is to run my test cases on a preemptible kernel or in parallel on a smp box. It'll definitely deadlock there! > The only way for those dependencies to /never/ trigger simultaneously on > different CPUs would be via the use of a further 'outer' exclusion > mechanism (i.e. a lock) - but all explicit kernel-API exclusion > mechanisms are tracked by lockdep => Q.E.D. (Open-coded exclusion might > escape the attention of lockdep, but those are extremely rare and are > also easily found.) Thanks for making it clear :-) > > Ingo regards gautham. -- Gautham R Shenoy Linux Technology Center IBM India. "Freedom comes with a price tag of responsibility, which is still a bargain, because Freedom is priceless!" - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency
* Gautham R Shenoy <[EMAIL PROTECTED]> wrote: > So do we > - Rethink the strategy of per-subsystem hotcpu-locks ? > > OR > > - Think of a way to straighten out the super-convoluted cpufreq code ? i'm still wondering what the conceptual source of this fundamental locking complexity in cpufreq (and hotplug) is - it is not intuitive to me at all. Could you try to explain that? Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency
* Gautham R Shenoy <[EMAIL PROTECTED]> wrote: > Ok, I see that we are already doing it :(. So we can end up in a > deadlock. > > Here's the culprit callpath: in general lockdep is 100% correct when it comes to "individual locks". The overwhelming majority of lockdep false-positives is not due to lockdep not getting the dependencies right, but due to the "lock class" not being correctly identified. That's not an issue here i think. what lockdep does is it observes actual locking dependencies as they happen individually in various contexts, and then 'completes' the dependency graph by combining all the possible scenarios how contexts might preempt each other. So if lockdep sees independent dependencies and concludes that they are circular, there's nothing that saves us from the deadlock. The only way for those dependencies to /never/ trigger simultaneously on different CPUs would be via the use of a further 'outer' exclusion mechanism (i.e. a lock) - but all explicit kernel-API exclusion mechanisms are tracked by lockdep => Q.E.D. (Open-coded exclusion might escape the attention of lockdep, but those are extremely rare and are also easily found.) Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency
* Gautham R Shenoy [EMAIL PROTECTED] wrote: So do we - Rethink the strategy of per-subsystem hotcpu-locks ? OR - Think of a way to straighten out the super-convoluted cpufreq code ? i'm still wondering what the conceptual source of this fundamental locking complexity in cpufreq (and hotplug) is - it is not intuitive to me at all. Could you try to explain that? Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency
* Gautham R Shenoy [EMAIL PROTECTED] wrote: Ok, I see that we are already doing it :(. So we can end up in a deadlock. Here's the culprit callpath: in general lockdep is 100% correct when it comes to individual locks. The overwhelming majority of lockdep false-positives is not due to lockdep not getting the dependencies right, but due to the lock class not being correctly identified. That's not an issue here i think. what lockdep does is it observes actual locking dependencies as they happen individually in various contexts, and then 'completes' the dependency graph by combining all the possible scenarios how contexts might preempt each other. So if lockdep sees independent dependencies and concludes that they are circular, there's nothing that saves us from the deadlock. The only way for those dependencies to /never/ trigger simultaneously on different CPUs would be via the use of a further 'outer' exclusion mechanism (i.e. a lock) - but all explicit kernel-API exclusion mechanisms are tracked by lockdep = Q.E.D. (Open-coded exclusion might escape the attention of lockdep, but those are extremely rare and are also easily found.) Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency
On Thu, Nov 30, 2006 at 09:29:34AM +0100, Ingo Molnar wrote: * Gautham R Shenoy [EMAIL PROTECTED] wrote: Ok, I see that we are already doing it :(. So we can end up in a deadlock. Here's the culprit callpath: in general lockdep is 100% correct when it comes to individual locks. The overwhelming majority of lockdep false-positives is not due to lockdep not getting the dependencies right, but due to the lock class not being correctly identified. That's not an issue here i think. You're right. That's not the issue. what lockdep does is it observes actual locking dependencies as they happen individually in various contexts, and then 'completes' the dependency graph by combining all the possible scenarios how contexts might preempt each other. So if lockdep sees independent dependencies and concludes that they are circular, there's nothing that saves us from the deadlock. Ah! I get it now. I had taken neither preemption nor the SMP scenario into account before concluding that the warning might be a false positive. All I need to do is to run my test cases on a preemptible kernel or in parallel on a smp box. It'll definitely deadlock there! The only way for those dependencies to /never/ trigger simultaneously on different CPUs would be via the use of a further 'outer' exclusion mechanism (i.e. a lock) - but all explicit kernel-API exclusion mechanisms are tracked by lockdep = Q.E.D. (Open-coded exclusion might escape the attention of lockdep, but those are extremely rare and are also easily found.) Thanks for making it clear :-) Ingo regards gautham. -- Gautham R Shenoy Linux Technology Center IBM India. Freedom comes with a price tag of responsibility, which is still a bargain, because Freedom is priceless! - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency
On Thu, Nov 30, 2006 at 09:31:44AM +0100, Ingo Molnar wrote: * Gautham R Shenoy [EMAIL PROTECTED] wrote: So do we - Rethink the strategy of per-subsystem hotcpu-locks ? OR - Think of a way to straighten out the super-convoluted cpufreq code ? i'm still wondering what the conceptual source of this fundamental locking complexity in cpufreq (and hotplug) is - it is not intuitive to me at all. Could you try to explain that? I can try :-) IIRC, cpufreq was written before cpu-hotplug and thus, was never hotplug aware when it was first written. Let me try and simplify this as far as I can. a) cpufreq maintain's it's own cpumask in the variable policy-affected_cpus and says : If a frequency change is issued to any one of the cpu's in the affected_cpus mask, you change frequency on all cpus in the mask. So this needs to be consistent with cpu_online map and hence cpu hotplug aware. Furthermore, we don't want cpus in this mask to go down when we are trying to change frequencies on them. The function which drives the frequency change in cpufreq-core is cpufreq_driver_target and it needs cpu-hotplug protection. b) This mask is changed when the cpu_hotplug core sends a CPU_DEAD/CPU_ONLINE notification through the function cpufreq_cpu_callback. So it's a Read-Write situation, where cpufreq_driver_target() is the read side and cpufreq_cpu_callback() is the write side. So we would want a) and b) to be mutually exclusive. c) Problem is when a just before a cpu is offlined, we want to put it at the lowest possible frequency. So we send out a CPU_DOWN_PREPARE event handled by cpufreq_cpu_callback which calls cpufreq_driver_target. Classic case of calling down_read from a down_write context! To solve this so many changes have taken place since 2.6.18-rcsomething. i) Linus split cpu_control mutex in cpu.c to cpu_add_remove_lock and cpu_bitmask_lock. While the former serializes cpu_hotplug attempts, the latter actually provides cpu-hotplug protection. ii) Arjan cleaned up cpufreq to eliminate all intra-cpufreq recursive lock_cpu_hotplug calls. iii) Andrew replaced lock_cpu_hotplug in workqueue code with workqueue_mutex that solved the recursive hotplug problem between ondemand governor and workqueue. This is how it exists in 2.6.19. Is it safe? Not really. The split locking still can hit a BUG_ON as described in the post http://lkml.org/lkml/2006/8/24/89 because the split locking opens up a window which allows cpufreq to try changing frequency on stale cpus. Besides having so many subsystems depend on a global lock may not be such a good idea from the scalability perspective. I did post a rcu-based scalable version of lock_cpu_hotplug but didn't get much response. Which is why we tried having per-subsystem hotplug locks where each subsystem can define it's own mutex. The readside will be something like mutex_lock(my_hotcpu_lock); /* No hotplug can happen here */ mutex_unlock(my_hotcpu_lock); Define a callback function my_cpu_callback and handle events CPU_LOCK_ACQUIRE and CPU_LOCK_RELEASE which would be sent out before beginning a cpu hotplug and after the completion of a cpu hotplug respectively. my_cpu_callback( int event) { switch(event) { case CPU_LOCK_ACQUIRE: mutex_lock(my_hotcpu_mutex); break; /* * Cpu hotplug happens here. * Handle any other pre/post hotplug events */ case CPU_LOCK_RELEASE: mutex_unlock(my_hotcpu_mutex); } break; } This would work fine, if the interactions between the cpu-hotplug aware code across subsystem is well ordered. But the ondemand governor-workqueue interactions are causing the circular dependency problem and in another case a recursive locking problem. Sigh! Hope this explaination helped :-) Ingo regards gautham. -- Gautham R Shenoy Linux Technology Center IBM India. Freedom comes with a price tag of responsibility, which is still a bargain, because Freedom is priceless! - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency
* Gautham R Shenoy [EMAIL PROTECTED] wrote: a) cpufreq maintain's it's own cpumask in the variable policy-affected_cpus and says : If a frequency change is issued to any one of the cpu's in the affected_cpus mask, you change frequency on all cpus in the mask. So this needs to be consistent with cpu_online map and hence cpu hotplug aware. Furthermore, we don't want cpus in this mask to go down when we are trying to change frequencies on them. The function which drives the frequency change in cpufreq-core is cpufreq_driver_target and it needs cpu-hotplug protection. couldnt this complexity be radically simplified by having new kernel infrastructure that does something like: 'gather' all CPUs mentioned in mask via scheduling a separate helper-kthread on every CPU that mask specifies, disable all interrupts, and execute function fn once all CPUs have been 'gathered' - and release all CPUs once fn has executed on each of them. ? This would be done totally serialized and while holding the hotplug lock, so no CPU could go away or arrive while this operation is going on. Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency
On Thu, 30 Nov 2006 12:03:15 +0100 Ingo Molnar [EMAIL PROTECTED] wrote: * Gautham R Shenoy [EMAIL PROTECTED] wrote: a) cpufreq maintain's it's own cpumask in the variable policy-affected_cpus and says : If a frequency change is issued to any one of the cpu's in the affected_cpus mask, you change frequency on all cpus in the mask. So this needs to be consistent with cpu_online map and hence cpu hotplug aware. Furthermore, we don't want cpus in this mask to go down when we are trying to change frequencies on them. The function which drives the frequency change in cpufreq-core is cpufreq_driver_target and it needs cpu-hotplug protection. couldnt this complexity be radically simplified by having new kernel infrastructure that does something like: 'gather' all CPUs mentioned in mask via scheduling a separate helper-kthread on every CPU that mask specifies, disable all interrupts, and execute function fn once all CPUs have been 'gathered' - and release all CPUs once fn has executed on each of them. ? How does this differ from stop_machine_run(), which hot-unplug presently uses? This would be done totally serialized and while holding the hotplug lock, so no CPU could go away or arrive while this operation is going on. You said the hotplug lock. That is the problem. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency
On Thu, Nov 30, 2006 at 12:03:15PM +0100, Ingo Molnar wrote: * Gautham R Shenoy [EMAIL PROTECTED] wrote: a) cpufreq maintain's it's own cpumask in the variable policy-affected_cpus and says : If a frequency change is issued to any one of the cpu's in the affected_cpus mask, you change frequency on all cpus in the mask. So this needs to be consistent with cpu_online map and hence cpu hotplug aware. Furthermore, we don't want cpus in this mask to go down when we are trying to change frequencies on them. The function which drives the frequency change in cpufreq-core is cpufreq_driver_target and it needs cpu-hotplug protection. couldnt this complexity be radically simplified by having new kernel infrastructure that does something like: 'gather' all CPUs mentioned in mask via scheduling a separate helper-kthread on every CPU that mask specifies, disable all interrupts, and execute function fn once all CPUs have been 'gathered' - and release all CPUs once fn has executed on each of them. ? This is what is currently being done by cpufreq: a) get_some_cpu_hotplug_protection() [use either some global mechanism or a persubsystem mutex] b) actual_freq_change_driver_function(mask) /* You can check out cpufreq_p4_target() in * arch/i386/kernel/cpu/cpufreq/p4-clockmod.c */ { for_each_cpu_mask(i, mask) { cpumask_t this_cpu = cpumask_of_cpu(i); set_cpus_allowed(current, this_cpu); function_to_change_frequency(); } } c) release_whatever_cpu_hotplug_protection() This would be done totally serialized and while holding the hotplug lock, so no CPU could go away or arrive while this operation is going on. Isn't the above same as what you are suggesting? Or have I missed out anything? Ingo regards gautham. -- Gautham R Shenoy Linux Technology Center IBM India. Freedom comes with a price tag of responsibility, which is still a bargain, because Freedom is priceless! - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency
* Andrew Morton [EMAIL PROTECTED] wrote: This would be done totally serialized and while holding the hotplug lock, so no CPU could go away or arrive while this operation is going on. You said the hotplug lock. That is the problem. maybe i'm too dense today but i still dont see the fundamental problem. Even with complex inter-subsystem interactions, hotplugging could be effectively and scalably controlled via a self-recursive per-CPU mutex, and a pointer to it embedded in task_struct: struct task_struct { ... int hotplug_depth; struct mutex *hotplug_lock; } ... DEFINE_PER_CPU(struct mutex, hotplug_lock); void cpu_hotplug_lock(void) { int cpu = raw_smp_processor_id(); /* * Interrupts/softirqs are hotplug-safe: */ if (in_interrupt()) return; if (current-hotplug_depth++) return; current-hotplug_lock = per_cpu(hotplug_lock, cpu); mutex_lock(current-hotplug_lock); } void cpu_hotplug_unlock(void) { int cpu; if (in_interrupt()) return; if (DEBUG_LOCKS_WARN_ON(!current-hotplug_depth)) return; if (--current-hotplug_depth) return; mutex_unlock(current-hotplug_lock); current-hotplug_lock = NULL; } ... void do_exit(void) { ... DEBUG_LOCKS_WARN_ON(current-hotplug_depth); ... } ... copy_process(void) { ... p-hotplug_depth = 0; p-hotplug_lock = NULL; ... } 50 lines of code at most. The only rule is to not use cpu_hotplug_lock() in process-context non-preemptible code [interrupt contexts are automatically ignored]. What am i missing? Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency
* Gautham R Shenoy [EMAIL PROTECTED] wrote: This is what is currently being done by cpufreq: ok! a) get_some_cpu_hotplug_protection() [use either some global mechanism or a persubsystem mutex] this bit is wrong i think. Any reason why it's not a per-CPU (but otherwise global) array of mutexes that controls CPU hotplug - as per my previous mail? that would flatten the whole locking. Only one kind of lock taken, recursive and scalable. Then the mechanism that changes CPU frequency should take all these hotplug locks on all (online) CPUs, and then first stop all processing on all CPUs, and then do the frequency change, atomically. This is with interrupts disabled everywhere /first/, and /without any additional locking/. That would prevent any sort of interaction from other CPUs - they'd all be sitting still with interrupts disabled. Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency
On Thu, Nov 30, 2006 at 12:53:27PM +0100, Ingo Molnar wrote: * Gautham R Shenoy [EMAIL PROTECTED] wrote: This is what is currently being done by cpufreq: ok! a) get_some_cpu_hotplug_protection() [use either some global mechanism or a persubsystem mutex] this bit is wrong i think. Any reason why it's not a per-CPU (but otherwise global) array of mutexes that controls CPU hotplug - as per my previous mail? that would flatten the whole locking. Only one kind of lock taken, recursive and scalable. I had posted one such recursive scalable version which can be found here http://lkml.org/lkml/2006/10/26/73 I remember cc'ing you. Yeah, it looks complicated and big, but then I did not want to add another field to the task struct as one such attempt had already been frowned upon ( I think long back Ashok posted it) So I ended up writing the whole read/write lock/unlock code myself. It's a RCU based lock, extremely light on the read side, but costly for the writers since it does a synchronize_sched. And yeah, it's partial towards the readers but with an additional field in the task struct we can have a fair implementation. Besides, an unfair cpu_hotplug_lock won't work since a process doing a sched_getaffinity in a forever_while loop can prevent any hotplug from happening. Then the mechanism that changes CPU frequency should take all these hotplug locks on all (online) CPUs, and then first stop all processing on all CPUs, and then do the frequency change, atomically. This is with interrupts disabled everywhere /first/, and /without any additional locking/. That would prevent any sort of interaction from other CPUs - they'd all be sitting still with interrupts disabled. Yup. Ingo regards gautham. -- Gautham R Shenoy Linux Technology Center IBM India. Freedom comes with a price tag of responsibility, which is still a bargain, because Freedom is priceless! - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency
On Thu, Nov 30, 2006 at 12:46:17PM +0100, Ingo Molnar wrote: struct task_struct { ... int hotplug_depth; struct mutex *hotplug_lock; } ... DEFINE_PER_CPU(struct mutex, hotplug_lock); void cpu_hotplug_lock(void) { int cpu = raw_smp_processor_id(); /* * Interrupts/softirqs are hotplug-safe: */ if (in_interrupt()) return; if (current-hotplug_depth++) return; current-hotplug_lock = per_cpu(hotplug_lock, cpu); mutex_lock(current-hotplug_lock); } void cpu_hotplug_unlock(void) { int cpu; if (in_interrupt()) return; if (DEBUG_LOCKS_WARN_ON(!current-hotplug_depth)) return; if (--current-hotplug_depth) return; mutex_unlock(current-hotplug_lock); current-hotplug_lock = NULL; } In process context preemptible code, Lets say we are currently running on processor i. cpu_hotplug_lock() ; /* does mutex_lock(percpu(hotplug_lock, i)) */ /* do some operation, which might sleep */ /* migrates to cpu j */ cpu_hotplug_unlock(); /* does mutex_unlock(percpu(hotplug_lock, i) while running on cpu j */ This would cause cacheline ping pong, no? Ingo regards gautham. -- Gautham R Shenoy Linux Technology Center IBM India. Freedom comes with a price tag of responsibility, which is still a bargain, because Freedom is priceless! - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency
* Gautham R Shenoy [EMAIL PROTECTED] wrote: In process context preemptible code, Lets say we are currently running on processor i. cpu_hotplug_lock() ; /* does mutex_lock(percpu(hotplug_lock, i)) */ /* do some operation, which might sleep */ /* migrates to cpu j */ cpu_hotplug_unlock(); /* does mutex_unlock(percpu(hotplug_lock, i) while running on cpu j */ This would cause cacheline ping pong, no? that would be attached to a very cache-inefficient thing: migrating a task from one CPU to another. This is not the kind of ping-pong we are normally worried about. (nor does it happen all that often) Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency
On Thu, 30 Nov 2006 12:46:17 +0100 Ingo Molnar [EMAIL PROTECTED] wrote: * Andrew Morton [EMAIL PROTECTED] wrote: This would be done totally serialized and while holding the hotplug lock, so no CPU could go away or arrive while this operation is going on. You said the hotplug lock. That is the problem. maybe i'm too dense today but i still dont see the fundamental problem. Even with complex inter-subsystem interactions, hotplugging could be effectively and scalably controlled via a self-recursive per-CPU mutex, and a pointer to it embedded in task_struct: Yes, I suppose we could come up with now new lock type which fixes the problem. But first, let us review the problem. Someone went through cpufreq sprinkling lock_cpu_hotplug() everywhere as if it had magical properties. For the past year or more, others have been picking through the resulting bugs, trying to make things better and treating that initial magic-pixie-dust as if it were inviolate and nobody had a chance of understanding it. IOW, cpufreq is screwed up, and we keep on trying to come up with more complexity to unscrew it. So what I would propose is that rather than going ahead and piling more complexity on top of the existing poo-pile in an attempt to make it seem to work, we should simply rip all the cpu-hotplug locking out of cpufreq (there's a davej patch for that in -mm) and then just redo it all in an organised fashion. If, as a result of that exercise, we decide that the existing cpu-hotplug serialisation mechanisms (ie: per-subsystem notification callbacks and preempt_disable()) are insufficient then sure, that is the time to think about more sophisticated locking primitives. But please, let's stop asking how do we make cpufreq's hotplug locking work?. Let's instead ask how do we make cpufreq safe wrt hotplug? To do the latter is a matter of - identify the per-cpu resources which need locking - decide what mechanism is to be used to protect each such resource - design the locking and its hierarchy - implement, test. In all this time, Gautham's emails from yesterday were the first occasion on which anybody has taken the time to sit down and get us started on this quite ordinary design devel process. Let me re-re-re-re-reiterate: this is a cpufreq problem. It has not yet been demonstrated (AFAICS) that it is a general problem. I am unaware of any other subsystems having got themselves into such a mess over hotplug locking. Now, maybe that's because cpufreq really is especially hard. Or maybe it's because it's just messed up. I don't believe we know which of those is true yet. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency
* Andrew Morton [EMAIL PROTECTED] wrote: Even with complex inter-subsystem interactions, hotplugging could be effectively and scalably controlled via a self-recursive per-CPU mutex, and a pointer to it embedded in task_struct: So what I would propose is that rather than going ahead and piling more complexity on top of the existing poo-pile in an attempt to make it seem to work, we should simply rip all the cpu-hotplug locking out of cpufreq (there's a davej patch for that in -mm) and then just redo it all in an organised fashion. actually, that's precisely what i'm suggesting too, i wrote it to Gautham in one of the previous mails: || that would flatten the whole locking. Only one kind of lock taken, || recursive and scalable. || || Then the mechanism that changes CPU frequency should take all these || hotplug locks on all (online) CPUs, and then first stop all || processing on all CPUs, and then do the frequency change, atomically. || This is with interrupts disabled everywhere /first/, and /without any || additional locking/. That would prevent any sort of interaction from || other CPUs - they'd all be sitting still with interrupts disabled. no other locking, only the CPU hotplug lock and the (existing) ability to 'do stuff' with nothing else running on any other CPU. Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency
On Thu, Nov 30, 2006 at 09:29:34AM +0100, Ingo Molnar wrote: what lockdep does is it observes actual locking dependencies as they happen individually in various contexts, and then 'completes' the dependency graph by combining all the possible scenarios how contexts might preempt each other. So if lockdep sees independent dependencies and concludes that they are circular, there's nothing that saves us from the deadlock. Ingo, Consider a case where we have three locks A, B and C. We have very clear locking rule inside the kernel that lock A *should* be acquired before acquiring either lock B or lock C. At runtime lockdep detects the two dependency chains, A -- B -- C and A -- C -- B. Does lockdep issue a circular dependency warning for this ? It's quite clear from the locking rule that we cannot have a circular deadlock, since A acts as a mutex for B-C / C-B callpath. Just curious :-) [ Well, I might encounter such a scenario in an attempt to make cpufreq cpu-hotplug safe! ] Ingo Thanks and Regards gautham. -- Gautham R Shenoy Linux Technology Center IBM India. Freedom comes with a price tag of responsibility, which is still a bargain, because Freedom is priceless! - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency
On Thu, Nov 30, 2006 at 09:58:07AM +0530, Gautham R Shenoy wrote: > > So can we ignore this circular-dep warning as a false positive? > Or is there a way to exploit this circular dependency ? > > At the moment, I cannot think of way to exploit this circular dependency > unless we do something like try destroying the created workqueue when the > cpu is dead, i.e make the cpufreq governors cpu-hotplug-aware. > (eeks! that doesn't look good) Ok, I see that we are already doing it :(. So we can end up in a deadlock. Here's the culprit callpath: _cpu_down() ! !-> raw_notifier_call_chain(CPU_LOCK_ACQUIRE) ! ! ! !-> workqueue_cpu_mutex(CPU_LOCK_ACQUIRE) [*] ! !-> raw_notifier_call_chain(CPU_DEAD) ! !-> cpufreq_cpu_callback (CPU_DEAD) ! !-> cpufreq_remove_dev ! !-> __cpufreq_governor(data, GOVERNOR_STOP) ! !-> policy->governor->governor() ! !-> cpufreq_governor_dbs(GOVERNOR_STOP) ! !-> destroy_workqueue() [*] [*] indicates function takes workqueue_mutex. So a deadlock! I wasn't able to observe this because I'm running Xeon SMP box on which you cannot offline cpu0. And cpufreq data is created only for cpu0, while all other cpus cpufreq_data just point to cpu0's cpufreq_data. So the mentioned callpath within cpufreq_remove_dev is never reached during the normal cpu offline cycle. However, if there are architectures which allow the first-booted-cpu (or to be precise, the cpu for which cpufreq_data is *actually* created) to be offlined and we are running Ondemand governor during the offline, we will see this deadlock. regards gautham. -- Gautham R Shenoy Linux Technology Center IBM India. "Freedom comes with a price tag of responsibility, which is still a bargain, because Freedom is priceless!" - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency
On Wed, Nov 29, 2006 at 01:05:56PM -0800, Andrew Morton wrote: > On Wed, 29 Nov 2006 20:54:04 +0530 > Gautham R Shenoy <[EMAIL PROTECTED]> wrote: > > > Ok, so to cut the long story short, > > - While changing governor from anything to > > ondemand, locks are taken in the following order > > > > policy->lock ===> dbs_mutex ===> workqueue_mutex. > > > > - While offlining a cpu, locks are taken in the following order > > > > cpu_add_remove_lock ==> sched_hotcpu_mutex ==> workqueue_mutex == > > ==> cache_chain_mutex ==> policy->lock. > > What functions are taking all these locks? (ie: the callpath?) While changing cpufreq governor to ondemand, the locks taken are: -- lockfunctionfile -- policy->lockstore_scaling_governor drivers/cpufreq/cpufreq.c dbs_mutex cpufreq_governor_dbsdrivers/cpufreq/cpufreq_ondemand.c workqueue_mutex __create_workqueue kernel/workqueue.c -- The complete callpath would be store_scaling_governor [*] | __cpufreq_set_policy | __cpufreq_governor(data, CPUFREQ_GOV_START) | policy->governor->governor => cpufreq_governor_dbs(data, CPUFREQ_GOV_START) [*] | create_workqueue #defined as __create_workqueue [*] where [*] = locks taken. While offlining a cpu, locks are taken in the following order: -- lockfunctionfile -- cpu_add_remove_lock cpu_downkernel/cpu.c sched_hotcpu_mutex migration_call kernel/sched.c workqueue_mutex workqueue_cpu_callback kernel/workqueue.c cache_chain_mutex cpuup_callback mm/slab.c policy->lockcpufreq_driver_target drivers/cpufreq/cpufreq.c --- Please note that in the above, - sched_hotcpu_mutex, workqueue_mutex, cache_chain_mutex are taken while handling CPU_LOCK_ACQUIRE events in the respective subsystems' cpu_callback functions. - policy->lock is taken while handling CPU_DOWN_PREPARE in cpufreq_cpu_callback which calls cpufreq_driver_target. It's perfectly clear that in the cpu offline callpath, cpufreq does not have to do anything with the workqueue. So can we ignore this circular-dep warning as a false positive? Or is there a way to exploit this circular dependency ? At the moment, I cannot think of way to exploit this circular dependency unless we do something like try destroying the created workqueue when the cpu is dead, i.e make the cpufreq governors cpu-hotplug-aware. (eeks! that doesn't look good) I'm working on fixing this. Let me see if I can come up with something. Thanks and Regards gautham. -- Gautham R Shenoy Linux Technology Center IBM India. "Freedom comes with a price tag of responsibility, which is still a bargain, because Freedom is priceless!" - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency
On Wed, 29 Nov 2006 20:54:04 +0530 Gautham R Shenoy <[EMAIL PROTECTED]> wrote: > Ok, so to cut the long story short, > - While changing governor from anything to > ondemand, locks are taken in the following order > > policy->lock ===> dbs_mutex ===> workqueue_mutex. > > - While offlining a cpu, locks are taken in the following order > > cpu_add_remove_lock ==> sched_hotcpu_mutex ==> workqueue_mutex == > ==> cache_chain_mutex ==> policy->lock. What functions are taking all these locks? (ie: the callpath?) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
CPUFREQ-CPUHOTPLUG: Possible circular locking dependency
Hi all, Looks like the lockdep has resumed yelling about the cpufreq-cpu hotplug interactions! Again, it's the Ondemand governor that's the culprit. On linux-2.6.19-rc6-mm2, this is what I got yesterday evening. [EMAIL PROTECTED] tests]# echo ondemand > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor [EMAIL PROTECTED] tests]# echo 0 > /sys/devices/system/cpu/cpu1/online === [ INFO: possible circular locking dependency detected ] 2.6.19-rc6-mm2 #14 --- bash/4601 is trying to acquire lock: (>lock){--..}, at: [] mutex_lock+0x12/0x15 but task is already holding lock: (cache_chain_mutex){--..}, at: [] mutex_lock+0x12/0x15 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 (cache_chain_mutex){--..}: [] __lock_acquire+0x8ef/0x9f3 [] lock_acquire+0x68/0x82 [] __mutex_lock_slowpath+0xd3/0x224 [] mutex_lock+0x12/0x15 [] cpuup_callback+0x1a3/0x2d6 [] notifier_call_chain+0x2b/0x5b [] __raw_notifier_call_chain+0x18/0x1d [] raw_notifier_call_chain+0x1a/0x1c [] _cpu_down+0x56/0x1ef [] cpu_down+0x26/0x3a [] store_online+0x27/0x5a [] sysdev_store+0x20/0x25 [] sysfs_write_file+0xb6/0xde [] vfs_write+0x90/0x144 [] sys_write+0x3d/0x61 [] sysenter_past_esp+0x5f/0x99 [] 0x -> #2 (workqueue_mutex){--..}: [] __lock_acquire+0x8ef/0x9f3 [] lock_acquire+0x68/0x82 [] __mutex_lock_slowpath+0xd3/0x224 [] mutex_lock+0x12/0x15 [] __create_workqueue+0x5b/0x11c [] cpufreq_governor_dbs+0xa0/0x2e8 [] __cpufreq_governor+0x64/0xac [] __cpufreq_set_policy+0x187/0x20b [] store_scaling_governor+0x132/0x16a [] store+0x35/0x46 [] sysfs_write_file+0xb6/0xde [] vfs_write+0x90/0x144 [] sys_write+0x3d/0x61 [] sysenter_past_esp+0x5f/0x99 [] 0x -> #1 (dbs_mutex){--..}: [] __lock_acquire+0x8ef/0x9f3 [] lock_acquire+0x68/0x82 [] __mutex_lock_slowpath+0xd3/0x224 [] mutex_lock+0x12/0x15 [] cpufreq_governor_dbs+0x84/0x2e8 [] __cpufreq_governor+0x64/0xac [] __cpufreq_set_policy+0x187/0x20b [] store_scaling_governor+0x132/0x16a [] store+0x35/0x46 [] sysfs_write_file+0xb6/0xde [] vfs_write+0x90/0x144 [] sys_write+0x3d/0x61 [] sysenter_past_esp+0x5f/0x99 [] 0x -> #0 (>lock){--..}: [] __lock_acquire+0x7f3/0x9f3 [] lock_acquire+0x68/0x82 [] __mutex_lock_slowpath+0xd3/0x224 [] mutex_lock+0x12/0x15 [] cpufreq_driver_target+0x2b/0x51 [] cpufreq_cpu_callback+0x42/0x52 [] notifier_call_chain+0x2b/0x5b [] __raw_notifier_call_chain+0x18/0x1d [] raw_notifier_call_chain+0x1a/0x1c [] _cpu_down+0x56/0x1ef [] cpu_down+0x26/0x3a [] store_online+0x27/0x5a [] sysdev_store+0x20/0x25 [] sysfs_write_file+0xb6/0xde [] vfs_write+0x90/0x144 [] sys_write+0x3d/0x61 [] sysenter_past_esp+0x5f/0x99 [] 0x other info that might help us debug this: 4 locks held by bash/4601: #0: (cpu_add_remove_lock){--..}, at: [] mutex_lock+0x12/0x15 #1: (sched_hotcpu_mutex){--..}, at: [] mutex_lock+0x12/0x15 #2: (workqueue_mutex){--..}, at: [] mutex_lock+0x12/0x15 #3: (cache_chain_mutex){--..}, at: [] mutex_lock+0x12/0x15 stack backtrace: [] show_trace_log_lvl+0x19/0x2e [] show_trace+0x12/0x14 [] dump_stack+0x14/0x16 [] print_circular_bug_tail+0x7c/0x85 [] __lock_acquire+0x7f3/0x9f3 [] lock_acquire+0x68/0x82 [] __mutex_lock_slowpath+0xd3/0x224 [] mutex_lock+0x12/0x15 [] cpufreq_driver_target+0x2b/0x51 [] cpufreq_cpu_callback+0x42/0x52 [] notifier_call_chain+0x2b/0x5b [] __raw_notifier_call_chain+0x18/0x1d [] raw_notifier_call_chain+0x1a/0x1c [] _cpu_down+0x56/0x1ef [] cpu_down+0x26/0x3a [] store_online+0x27/0x5a [] sysdev_store+0x20/0x25 [] sysfs_write_file+0xb6/0xde [] vfs_write+0x90/0x144 [] sys_write+0x3d/0x61 [] sysenter_past_esp+0x5f/0x99 === Breaking affinity for irq 24 CPU 1 is now offline Ok, so to cut the long story short, - While changing governor from anything to ondemand, locks are taken in the following order policy->lock ===> dbs_mutex ===> workqueue_mutex. - While offlining a cpu, locks are taken in the following order cpu_add_remove_lock ==> sched_hotcpu_mutex ==> workqueue_mutex == ==> cache_chain_mutex ==> policy->lock. The dependency graph built out of this info has a circular dependency as pointed out by lockdep. However, I am not quite sure how seriously this circular dependency warning should be taken. One way to avoid these warnings is to take the policy->lock before the rest of the locks, while offlining the cpu. For a moment I even thought of taking/releasing policy->lock under
CPUFREQ-CPUHOTPLUG: Possible circular locking dependency
Hi all, Looks like the lockdep has resumed yelling about the cpufreq-cpu hotplug interactions! Again, it's the Ondemand governor that's the culprit. On linux-2.6.19-rc6-mm2, this is what I got yesterday evening. [EMAIL PROTECTED] tests]# echo ondemand /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor [EMAIL PROTECTED] tests]# echo 0 /sys/devices/system/cpu/cpu1/online === [ INFO: possible circular locking dependency detected ] 2.6.19-rc6-mm2 #14 --- bash/4601 is trying to acquire lock: (policy-lock){--..}, at: [c045793d] mutex_lock+0x12/0x15 but task is already holding lock: (cache_chain_mutex){--..}, at: [c045793d] mutex_lock+0x12/0x15 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: - #3 (cache_chain_mutex){--..}: [c013bddc] __lock_acquire+0x8ef/0x9f3 [c013c1ec] lock_acquire+0x68/0x82 [c04577da] __mutex_lock_slowpath+0xd3/0x224 [c045793d] mutex_lock+0x12/0x15 [c01642f1] cpuup_callback+0x1a3/0x2d6 [c045a952] notifier_call_chain+0x2b/0x5b [c012efb6] __raw_notifier_call_chain+0x18/0x1d [c012efd5] raw_notifier_call_chain+0x1a/0x1c [c013f798] _cpu_down+0x56/0x1ef [c013fa5c] cpu_down+0x26/0x3a [c029dde2] store_online+0x27/0x5a [c029adec] sysdev_store+0x20/0x25 [c0197284] sysfs_write_file+0xb6/0xde [c01674d4] vfs_write+0x90/0x144 [c0167c74] sys_write+0x3d/0x61 [c0103d36] sysenter_past_esp+0x5f/0x99 [] 0x - #2 (workqueue_mutex){--..}: [c013bddc] __lock_acquire+0x8ef/0x9f3 [c013c1ec] lock_acquire+0x68/0x82 [c04577da] __mutex_lock_slowpath+0xd3/0x224 [c045793d] mutex_lock+0x12/0x15 [c0131e91] __create_workqueue+0x5b/0x11c [c03c4a9a] cpufreq_governor_dbs+0xa0/0x2e8 [c03c2cde] __cpufreq_governor+0x64/0xac [c03c30d5] __cpufreq_set_policy+0x187/0x20b [c03c33a1] store_scaling_governor+0x132/0x16a [c03c2af9] store+0x35/0x46 [c0197284] sysfs_write_file+0xb6/0xde [c01674d4] vfs_write+0x90/0x144 [c0167c74] sys_write+0x3d/0x61 [c0103d36] sysenter_past_esp+0x5f/0x99 [] 0x - #1 (dbs_mutex){--..}: [c013bddc] __lock_acquire+0x8ef/0x9f3 [c013c1ec] lock_acquire+0x68/0x82 [c04577da] __mutex_lock_slowpath+0xd3/0x224 [c045793d] mutex_lock+0x12/0x15 [c03c4a7e] cpufreq_governor_dbs+0x84/0x2e8 [c03c2cde] __cpufreq_governor+0x64/0xac [c03c30d5] __cpufreq_set_policy+0x187/0x20b [c03c33a1] store_scaling_governor+0x132/0x16a [c03c2af9] store+0x35/0x46 [c0197284] sysfs_write_file+0xb6/0xde [c01674d4] vfs_write+0x90/0x144 [c0167c74] sys_write+0x3d/0x61 [c0103d36] sysenter_past_esp+0x5f/0x99 [] 0x - #0 (policy-lock){--..}: [c013bce0] __lock_acquire+0x7f3/0x9f3 [c013c1ec] lock_acquire+0x68/0x82 [c04577da] __mutex_lock_slowpath+0xd3/0x224 [c045793d] mutex_lock+0x12/0x15 [c03c2c1f] cpufreq_driver_target+0x2b/0x51 [c03c38db] cpufreq_cpu_callback+0x42/0x52 [c045a952] notifier_call_chain+0x2b/0x5b [c012efb6] __raw_notifier_call_chain+0x18/0x1d [c012efd5] raw_notifier_call_chain+0x1a/0x1c [c013f798] _cpu_down+0x56/0x1ef [c013fa5c] cpu_down+0x26/0x3a [c029dde2] store_online+0x27/0x5a [c029adec] sysdev_store+0x20/0x25 [c0197284] sysfs_write_file+0xb6/0xde [c01674d4] vfs_write+0x90/0x144 [c0167c74] sys_write+0x3d/0x61 [c0103d36] sysenter_past_esp+0x5f/0x99 [] 0x other info that might help us debug this: 4 locks held by bash/4601: #0: (cpu_add_remove_lock){--..}, at: [c045793d] mutex_lock+0x12/0x15 #1: (sched_hotcpu_mutex){--..}, at: [c045793d] mutex_lock+0x12/0x15 #2: (workqueue_mutex){--..}, at: [c045793d] mutex_lock+0x12/0x15 #3: (cache_chain_mutex){--..}, at: [c045793d] mutex_lock+0x12/0x15 stack backtrace: [c0104c8f] show_trace_log_lvl+0x19/0x2e [c0104d8f] show_trace+0x12/0x14 [c0104da5] dump_stack+0x14/0x16 [c013a157] print_circular_bug_tail+0x7c/0x85 [c013bce0] __lock_acquire+0x7f3/0x9f3 [c013c1ec] lock_acquire+0x68/0x82 [c04577da] __mutex_lock_slowpath+0xd3/0x224 [c045793d] mutex_lock+0x12/0x15 [c03c2c1f] cpufreq_driver_target+0x2b/0x51 [c03c38db] cpufreq_cpu_callback+0x42/0x52 [c045a952] notifier_call_chain+0x2b/0x5b [c012efb6] __raw_notifier_call_chain+0x18/0x1d [c012efd5] raw_notifier_call_chain+0x1a/0x1c [c013f798] _cpu_down+0x56/0x1ef [c013fa5c] cpu_down+0x26/0x3a [c029dde2] store_online+0x27/0x5a [c029adec] sysdev_store+0x20/0x25 [c0197284] sysfs_write_file+0xb6/0xde [c01674d4] vfs_write+0x90/0x144 [c0167c74] sys_write+0x3d/0x61 [c0103d36] sysenter_past_esp+0x5f/0x99 === Breaking affinity for irq 24 CPU 1 is now
Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency
On Wed, 29 Nov 2006 20:54:04 +0530 Gautham R Shenoy [EMAIL PROTECTED] wrote: Ok, so to cut the long story short, - While changing governor from anything to ondemand, locks are taken in the following order policy-lock === dbs_mutex === workqueue_mutex. - While offlining a cpu, locks are taken in the following order cpu_add_remove_lock == sched_hotcpu_mutex == workqueue_mutex == == cache_chain_mutex == policy-lock. What functions are taking all these locks? (ie: the callpath?) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency
On Wed, Nov 29, 2006 at 01:05:56PM -0800, Andrew Morton wrote: On Wed, 29 Nov 2006 20:54:04 +0530 Gautham R Shenoy [EMAIL PROTECTED] wrote: Ok, so to cut the long story short, - While changing governor from anything to ondemand, locks are taken in the following order policy-lock === dbs_mutex === workqueue_mutex. - While offlining a cpu, locks are taken in the following order cpu_add_remove_lock == sched_hotcpu_mutex == workqueue_mutex == == cache_chain_mutex == policy-lock. What functions are taking all these locks? (ie: the callpath?) While changing cpufreq governor to ondemand, the locks taken are: -- lockfunctionfile -- policy-lockstore_scaling_governor drivers/cpufreq/cpufreq.c dbs_mutex cpufreq_governor_dbsdrivers/cpufreq/cpufreq_ondemand.c workqueue_mutex __create_workqueue kernel/workqueue.c -- The complete callpath would be store_scaling_governor [*] | __cpufreq_set_policy | __cpufreq_governor(data, CPUFREQ_GOV_START) | policy-governor-governor = cpufreq_governor_dbs(data, CPUFREQ_GOV_START) [*] | create_workqueue #defined as __create_workqueue [*] where [*] = locks taken. While offlining a cpu, locks are taken in the following order: -- lockfunctionfile -- cpu_add_remove_lock cpu_downkernel/cpu.c sched_hotcpu_mutex migration_call kernel/sched.c workqueue_mutex workqueue_cpu_callback kernel/workqueue.c cache_chain_mutex cpuup_callback mm/slab.c policy-lockcpufreq_driver_target drivers/cpufreq/cpufreq.c --- Please note that in the above, - sched_hotcpu_mutex, workqueue_mutex, cache_chain_mutex are taken while handling CPU_LOCK_ACQUIRE events in the respective subsystems' cpu_callback functions. - policy-lock is taken while handling CPU_DOWN_PREPARE in cpufreq_cpu_callback which calls cpufreq_driver_target. It's perfectly clear that in the cpu offline callpath, cpufreq does not have to do anything with the workqueue. So can we ignore this circular-dep warning as a false positive? Or is there a way to exploit this circular dependency ? At the moment, I cannot think of way to exploit this circular dependency unless we do something like try destroying the created workqueue when the cpu is dead, i.e make the cpufreq governors cpu-hotplug-aware. (eeks! that doesn't look good) I'm working on fixing this. Let me see if I can come up with something. Thanks and Regards gautham. -- Gautham R Shenoy Linux Technology Center IBM India. Freedom comes with a price tag of responsibility, which is still a bargain, because Freedom is priceless! - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency
On Thu, Nov 30, 2006 at 09:58:07AM +0530, Gautham R Shenoy wrote: So can we ignore this circular-dep warning as a false positive? Or is there a way to exploit this circular dependency ? At the moment, I cannot think of way to exploit this circular dependency unless we do something like try destroying the created workqueue when the cpu is dead, i.e make the cpufreq governors cpu-hotplug-aware. (eeks! that doesn't look good) Ok, I see that we are already doing it :(. So we can end up in a deadlock. Here's the culprit callpath: _cpu_down() ! !- raw_notifier_call_chain(CPU_LOCK_ACQUIRE) ! ! ! !- workqueue_cpu_mutex(CPU_LOCK_ACQUIRE) [*] ! !- raw_notifier_call_chain(CPU_DEAD) ! !- cpufreq_cpu_callback (CPU_DEAD) ! !- cpufreq_remove_dev ! !- __cpufreq_governor(data, GOVERNOR_STOP) ! !- policy-governor-governor() ! !- cpufreq_governor_dbs(GOVERNOR_STOP) ! !- destroy_workqueue() [*] [*] indicates function takes workqueue_mutex. So a deadlock! I wasn't able to observe this because I'm running Xeon SMP box on which you cannot offline cpu0. And cpufreq data is created only for cpu0, while all other cpus cpufreq_data just point to cpu0's cpufreq_data. So the mentioned callpath within cpufreq_remove_dev is never reached during the normal cpu offline cycle. However, if there are architectures which allow the first-booted-cpu (or to be precise, the cpu for which cpufreq_data is *actually* created) to be offlined and we are running Ondemand governor during the offline, we will see this deadlock. regards gautham. -- Gautham R Shenoy Linux Technology Center IBM India. Freedom comes with a price tag of responsibility, which is still a bargain, because Freedom is priceless! - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/