Re: Locking issues with cpufreq and sysfs
On 17 October 2014 17:45, Prarit Bhargava wrote: > Hmmm > > This is what I'm doing: > > echo ondemand > scaling_governor > cat ondemand/* > echo conservative > scaling_governor > > OOC what are you doing to test? Exactly same and even tried Roberts script too :) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Locking issues with cpufreq and sysfs
On 10/17/2014 07:38 AM, Viresh Kumar wrote: > On 13 October 2014 18:41, Prarit Bhargava wrote: >> There are several issues with the current locking design of cpufreq. Most >> notably is the panic reported here: >> >> http://marc.info/?l=linux-kernel&m=140622451625236&w=2 >> >> which was introduced by commit 955ef4833574636819cd269cfbae12f79cbde63a, >> cpufreq: Drop rwsem lock around CPUFREQ_GOV_POLICY_EXIT, which introduces > > Don't know whats going wrong but I am just not able to reproduce the > lockdep again :( > I have tried this on two boards and am making sure that all things are > correctly > configured. I am trying this on two of Exynos boards, One a dual-A15 and > another > big little with 8 cores.. > > > @@ -2195,9 +2195,7 @@ static int cpufreq_set_policy(struct > cpufreq_policy *policy, > /* end old governor */ > if (old_gov) { > __cpufreq_governor(policy, CPUFREQ_GOV_STOP); > - up_write(&policy->rwsem); > __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT); > - down_write(&policy->rwsem); > } > > /* start new governor */ > @@ -2206,9 +2204,7 @@ static int cpufreq_set_policy(struct > cpufreq_policy *policy, > if (!__cpufreq_governor(policy, CPUFREQ_GOV_START)) > goto out; > > - up_write(&policy->rwsem); > __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT); > - down_write(&policy->rwsem); > } > > diff --git a/drivers/cpufreq/cpufreq_governor.c > b/drivers/cpufreq/cpufreq_governor.c > index 1b44496..1a6972a 100644 > --- a/drivers/cpufreq/cpufreq_governor.c > +++ b/drivers/cpufreq/cpufreq_governor.c > @@ -323,6 +323,7 @@ int cpufreq_governor_dbs(struct cpufreq_policy *policy, > return 0; > case CPUFREQ_GOV_POLICY_EXIT: > if (!--dbs_data->usage_count) { > + pr_info("%s\n", __func__); > sysfs_remove_group(get_governor_parent_kobj(policy), > get_sysfs_attr(dbs_data)); > > Hmmm This is what I'm doing: echo ondemand > scaling_governor cat ondemand/* echo conservative > scaling_governor OOC what are you doing to test? P. > > > .config attached too, please let me know what am I missing. > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Locking issues with cpufreq and sysfs
On 10/16/2014 07:23 AM, Viresh Kumar wrote: > On 14 October 2014 23:54, Prarit Bhargava wrote: >> Here's what I think we should do. Taking a step back, the purpose of the >> cpufreq sysfs files is to allow userspace to read current cpu frequency >> settings, and to allow userspce to modify the governor and set the max & min >> ranges for cpu frequencies. This can be done per device or for all cpus >> depending on the driver. > > Okay. > >> We have to guarantee that bothing reading and writing will always work and >> that >> write operations will always be atomic relative to userspace. The current > > Ok. > >> implementation of cpufreq does this through the following locks: >> >> cpufreq_driver_lock: protects the cpufreq_cpu_data array and >> cpufreq_driver->boost >> cpufreq_governor_lock: protects the current governor > > Its just for serialization.. > >> cpufreq_governor_mutex: protects the cpufreq_governor_list >> cpufreq_rwsem: protects the driver from being unloaded >> global_kobj_lock: protects the "cpufreq" kobject >> each policy has a rwsem (policy->rwsem): protects the cpufreq_policy struct >> each policy has a transition_lock (policy->transition): synchronizes >> frequency transitions >> >> While examining this code I was wondering exactly why we allow multiple >> readers >> and writers in cpufreq. I could understand if we felt that this data was >> critical; but it really isn't. A short delay here isn't that big of a deal >> IMO >> (if someone can produce a case where a delay would cause a serious problem >> I'd >> like to hear it). I don't even think it is safe in most cases to allow >> readers >> while cpufreq values are changing; if we're changing the governor userspace >> cannot rely on the value of (for example) cpuinfo_max_freq. > > I don't know how reader writer lock will fail and a normal lock will not. > There is only benefit of rwlock, that readers can read things while > there is nobody > writing.. > >> So I'm proposing that we move to a single threaded read/write using, if > > Okay, but how will that benefit us ? It will greatly simplify the code. The locking isn't working in this code at all right now and is causing various reported panics ... you yourself are pushing a lock patch that serializes operations -- which is causing other problems during testing. > >> possible, a single policy lock for now. We might transition this back to a >> rwsem later on, however, for the first attempt at cleaning this up I think we >> should just stick with a simple lock. In doing that, IMO we remove >> >> cpufreq_rwsem: protects the driver from being unloaded >> cpufreq_governor_lock: protects the current governor >> each policy has a rwsem (policy->rwsem): protects the cpufreq_policy struct >> >> and potentially >> >> cpufreq_driver_lock: protects the cpufreq_cpu_data array and >> cpufreq_driver->boost > > Not really sure, but yeah we might be able to club few of them.. > >> After looking at the way the code would be structured, I'm wondering if >> >> cpufreq_governor_mutex: protects the cpufreq_governor_list >> >> is overkill. The loading of a module should be atomic relative to the >> cpufreq >> code, so this lock may not be required. (Admittedly I haven't tested >> that...) >> >> That would leave: >> >> global_kobj_lock: protects the "cpufreq" kobject >> each policy has a transition_lock (policy->transition): synchronizes >> frequency transitions >> >> and a new lock, perhaps called policy->lock, to serialize all events. >> >> Pros: We clean all this up to a simpler single threaded model. Bugs and >> races >> here would be much easier to handle. We're currently putting band-aid on >> band-aids in this code ATM and it looks like we're seeing old races expanded >> or new races exposed. >> >> Cons: We lose the ability to do simultaneous reads and writes ... although >> I remain unconvinced that this would ever be safe to do. ie) If I change the >> governor while at the same time reading, for example, the current cpu >> frequency I cannot rely on that value to be valid. >> >> After that we can add some reference counting to the sysfs file accesses >> so that we can block after the sysfs removal when we change cpufreq >> governors. I think that would be trivial and that it would resolve any races >> when adding and removing governor's sysfs files. > > Not really sure, but if you solve few things with getting these bugs resolved > then we might apply your patches without any issues. > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Locking issues with cpufreq and sysfs
On 13 October 2014 18:41, Prarit Bhargava wrote: > There are several issues with the current locking design of cpufreq. Most > notably is the panic reported here: > > http://marc.info/?l=linux-kernel&m=140622451625236&w=2 > > which was introduced by commit 955ef4833574636819cd269cfbae12f79cbde63a, > cpufreq: Drop rwsem lock around CPUFREQ_GOV_POLICY_EXIT, which introduces Don't know whats going wrong but I am just not able to reproduce the lockdep again :( I have tried this on two boards and am making sure that all things are correctly configured. I am trying this on two of Exynos boards, One a dual-A15 and another big little with 8 cores.. @@ -2195,9 +2195,7 @@ static int cpufreq_set_policy(struct cpufreq_policy *policy, /* end old governor */ if (old_gov) { __cpufreq_governor(policy, CPUFREQ_GOV_STOP); - up_write(&policy->rwsem); __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT); - down_write(&policy->rwsem); } /* start new governor */ @@ -2206,9 +2204,7 @@ static int cpufreq_set_policy(struct cpufreq_policy *policy, if (!__cpufreq_governor(policy, CPUFREQ_GOV_START)) goto out; - up_write(&policy->rwsem); __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT); - down_write(&policy->rwsem); } diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index 1b44496..1a6972a 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -323,6 +323,7 @@ int cpufreq_governor_dbs(struct cpufreq_policy *policy, return 0; case CPUFREQ_GOV_POLICY_EXIT: if (!--dbs_data->usage_count) { + pr_info("%s\n", __func__); sysfs_remove_group(get_governor_parent_kobj(policy), get_sysfs_attr(dbs_data)); .config attached too, please let me know what am I missing. .config Description: Binary data
Re: Locking issues with cpufreq and sysfs
On 14 October 2014 23:54, Prarit Bhargava wrote: > Here's what I think we should do. Taking a step back, the purpose of the > cpufreq sysfs files is to allow userspace to read current cpu frequency > settings, and to allow userspce to modify the governor and set the max & min > ranges for cpu frequencies. This can be done per device or for all cpus > depending on the driver. Okay. > We have to guarantee that bothing reading and writing will always work and > that > write operations will always be atomic relative to userspace. The current Ok. > implementation of cpufreq does this through the following locks: > > cpufreq_driver_lock: protects the cpufreq_cpu_data array and > cpufreq_driver->boost > cpufreq_governor_lock: protects the current governor Its just for serialization.. > cpufreq_governor_mutex: protects the cpufreq_governor_list > cpufreq_rwsem: protects the driver from being unloaded > global_kobj_lock: protects the "cpufreq" kobject > each policy has a rwsem (policy->rwsem): protects the cpufreq_policy struct > each policy has a transition_lock (policy->transition): synchronizes > frequency transitions > > While examining this code I was wondering exactly why we allow multiple > readers > and writers in cpufreq. I could understand if we felt that this data was > critical; but it really isn't. A short delay here isn't that big of a deal > IMO > (if someone can produce a case where a delay would cause a serious problem I'd > like to hear it). I don't even think it is safe in most cases to allow > readers > while cpufreq values are changing; if we're changing the governor userspace > cannot rely on the value of (for example) cpuinfo_max_freq. I don't know how reader writer lock will fail and a normal lock will not. There is only benefit of rwlock, that readers can read things while there is nobody writing.. > So I'm proposing that we move to a single threaded read/write using, if Okay, but how will that benefit us ? > possible, a single policy lock for now. We might transition this back to a > rwsem later on, however, for the first attempt at cleaning this up I think we > should just stick with a simple lock. In doing that, IMO we remove > > cpufreq_rwsem: protects the driver from being unloaded > cpufreq_governor_lock: protects the current governor > each policy has a rwsem (policy->rwsem): protects the cpufreq_policy struct > > and potentially > > cpufreq_driver_lock: protects the cpufreq_cpu_data array and > cpufreq_driver->boost Not really sure, but yeah we might be able to club few of them.. > After looking at the way the code would be structured, I'm wondering if > > cpufreq_governor_mutex: protects the cpufreq_governor_list > > is overkill. The loading of a module should be atomic relative to the cpufreq > code, so this lock may not be required. (Admittedly I haven't tested that...) > > That would leave: > > global_kobj_lock: protects the "cpufreq" kobject > each policy has a transition_lock (policy->transition): synchronizes > frequency transitions > > and a new lock, perhaps called policy->lock, to serialize all events. > > Pros: We clean all this up to a simpler single threaded model. Bugs and races > here would be much easier to handle. We're currently putting band-aid on > band-aids in this code ATM and it looks like we're seeing old races expanded > or new races exposed. > > Cons: We lose the ability to do simultaneous reads and writes ... although > I remain unconvinced that this would ever be safe to do. ie) If I change the > governor while at the same time reading, for example, the current cpu > frequency I cannot rely on that value to be valid. > > After that we can add some reference counting to the sysfs file accesses > so that we can block after the sysfs removal when we change cpufreq > governors. I think that would be trivial and that it would resolve any races > when adding and removing governor's sysfs files. Not really sure, but if you solve few things with getting these bugs resolved then we might apply your patches without any issues. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Locking issues with cpufreq and sysfs
> -Original Message- > From: linux-kernel-ow...@vger.kernel.org [mailto:linux-kernel- > ow...@vger.kernel.org] On Behalf Of Prarit Bhargava > Sent: Tuesday, 14 October, 2014 1:24 PM > To: Viresh Kumar > Cc: Saravana Kannan; Rafael J. Wysocki; linux...@vger.kernel.org; Linux > Kernel; Robert Schöne > Subject: Re: Locking issues with cpufreq and sysfs > > On 10/14/2014 03:10 AM, Viresh Kumar wrote: > > On 13 October 2014 18:41, Prarit Bhargava wrote: > >> > >> The locking is insufficient here, Viresh. I no longer believe that fixes > >> to this locking scheme are the right way to move forward here. I'm > wondering > >> if we can look at other alternatives such as maintaining a refcount or > >> perhaps using a queuing mechanism for governor and policy related changes. > >> ... > So I'm proposing that we move to a single threaded read/write using, if > possible, a single policy lock for now. We might transition this back to a > rwsem later on, however, for the first attempt at cleaning this up I think we > should just stick with a simple lock. In doing that, IMO we remove > cpufreq_rwsem: protects the driver from being unloaded > cpufreq_governor_lock: protects the current governor > each policy has a rwsem (policy->rwsem): protects the cpufreq_policy struct > > and potentially > cpufreq_driver_lock: protects the cpufreq_cpu_data array and cpufreq_driver- > >boost > > After looking at the way the code would be structured, I'm wondering if > cpufreq_governor_mutex: protects the cpufreq_governor_list > is overkill. The loading of a module should be atomic relative to the > cpufreq code, so this lock may not be required. (Admittedly I haven't > tested that...) > > That would leave: > global_kobj_lock: protects the "cpufreq" kobject > each policy has a transition_lock (policy->transition): synchronizes > frequency transitions > and a new lock, perhaps called policy->lock, to serialize all events. > Please keep performance in mind too. cpufreq_governor_lock contention is a bit of an issue with heavy IO workloads as described in: http://marc.info/?l=linux-pm&m=140924051503827&w=2 --- Rob ElliottHP Server Storage N�r��yb�X��ǧv�^�){.n�+{zX����ܨ}���Ơz�&j:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a��� 0��h���i
Re: Locking issues with cpufreq and sysfs
On 10/14/2014 03:10 AM, Viresh Kumar wrote: > On 13 October 2014 18:41, Prarit Bhargava wrote: >> >> The locking is insufficient here, Viresh. I no longer believe that fixes >> to this locking scheme are the right way to move forward here. I'm wondering >> if we can look at other alternatives such as maintaining a refcount or >> perhaps using a queuing mechanism for governor and policy related changes. >> Here's what I think we should do. Taking a step back, the purpose of the cpufreq sysfs files is to allow userspace to read current cpu frequency settings, and to allow userspce to modify the governor and set the max & min ranges for cpu frequencies. This can be done per device or for all cpus depending on the driver. We have to guarantee that bothing reading and writing will always work and that write operations will always be atomic relative to userspace. The current implementation of cpufreq does this through the following locks: cpufreq_driver_lock: protects the cpufreq_cpu_data array and cpufreq_driver->boost cpufreq_governor_lock: protects the current governor cpufreq_governor_mutex: protects the cpufreq_governor_list cpufreq_rwsem: protects the driver from being unloaded global_kobj_lock: protects the "cpufreq" kobject each policy has a rwsem (policy->rwsem): protects the cpufreq_policy struct each policy has a transition_lock (policy->transition): synchronizes frequency transitions While examining this code I was wondering exactly why we allow multiple readers and writers in cpufreq. I could understand if we felt that this data was critical; but it really isn't. A short delay here isn't that big of a deal IMO (if someone can produce a case where a delay would cause a serious problem I'd like to hear it). I don't even think it is safe in most cases to allow readers while cpufreq values are changing; if we're changing the governor userspace cannot rely on the value of (for example) cpuinfo_max_freq. So I'm proposing that we move to a single threaded read/write using, if possible, a single policy lock for now. We might transition this back to a rwsem later on, however, for the first attempt at cleaning this up I think we should just stick with a simple lock. In doing that, IMO we remove cpufreq_rwsem: protects the driver from being unloaded cpufreq_governor_lock: protects the current governor each policy has a rwsem (policy->rwsem): protects the cpufreq_policy struct and potentially cpufreq_driver_lock: protects the cpufreq_cpu_data array and cpufreq_driver->boost After looking at the way the code would be structured, I'm wondering if cpufreq_governor_mutex: protects the cpufreq_governor_list is overkill. The loading of a module should be atomic relative to the cpufreq code, so this lock may not be required. (Admittedly I haven't tested that...) That would leave: global_kobj_lock: protects the "cpufreq" kobject each policy has a transition_lock (policy->transition): synchronizes frequency transitions and a new lock, perhaps called policy->lock, to serialize all events. Pros: We clean all this up to a simpler single threaded model. Bugs and races here would be much easier to handle. We're currently putting band-aid on band-aids in this code ATM and it looks like we're seeing old races expanded or new races exposed. Cons: We lose the ability to do simultaneous reads and writes ... although I remain unconvinced that this would ever be safe to do. ie) If I change the governor while at the same time reading, for example, the current cpu frequency I cannot rely on that value to be valid. After that we can add some reference counting to the sysfs file accesses so that we can block after the sysfs removal when we change cpufreq governors. I think that would be trivial and that it would resolve any races when adding and removing governor's sysfs files. P. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Locking issues with cpufreq and sysfs
On 13 October 2014 18:41, Prarit Bhargava wrote: > There are several issues with the current locking design of cpufreq. Most Sadly yes :( > [Question: was the original reported deadlock "real"? Did it really happen or > did lockdep only report it (I may have asked this question previously and > forgot the answer)? The reason I ask is that this situation is very similar > to > USB's device removal in which the sysfs attributes are removed for a device > but > not the device it was called for. I actually think that's part of the problem > here.] I am still not sure about those lockdep warnings :( > The above, obviously, is a complete hack of the code but in a sense does > mimic a proper locking fix. However, even with this fix we are still left > with a race in accessing the sysfs files. Consider the following example, > > CPU 1: accesses scaling_setspeed to set cpu speed > > simultaneously, > > CPU 2: accesses scaling_governor to set governor to ondemand > > CPU 1 & 2 race ... and this can result in different critical situations. > The first is that CPU 1 holds the scalling_setspeed open while CPU attempts > to change the governor. This results in a syfs warning about creating a > file with an existing file name which in some cases can lead to additional > corruption and a panic. The second case is that CPU 1's setting of the speed > is now done on the new governor -- which may or may not be correct. In any > case an argument could be made that the userspace program doing this type > of action should be "smart" enough to confirm simultaneous changes... but > in any case the kernel should not panic or corrupt data. > > The locking is insufficient here, Viresh. I no longer believe that fixes > to this locking scheme are the right way to move forward here. I'm wondering > if we can look at other alternatives such as maintaining a refcount or > perhaps using a queuing mechanism for governor and policy related changes. Probably this is similar to what I have been trying, i.e. to give access to only a single thread to call __cpufreq_governor(). Can you please try the cpufreq/governor-fixes-v4 branch once ? On 13 October 2014 18:41, Prarit Bhargava wrote: > There are several issues with the current locking design of cpufreq. Most > notably is the panic reported here: > > http://marc.info/?l=linux-kernel&m=140622451625236&w=2 > > which was introduced by commit 955ef4833574636819cd269cfbae12f79cbde63a, > cpufreq: Drop rwsem lock around CPUFREQ_GOV_POLICY_EXIT, which introduces > a race in the changing of the cpufreq policy. This change was introduced > because of a lockdep deadlock warning that can be reproduced (on x86 with > the acpi_cpufreq driver) via the following debug patch: > > iff --git a/drivers/cpufreq/acpi-cpufreq.c b/drivers/cpufreq/acpi-cpufreq.c > index b0c18ed..366cfb7 100644 > --- a/drivers/cpufreq/acpi-cpufreq.c > +++ b/drivers/cpufreq/acpi-cpufreq.c > @@ -885,6 +885,7 @@ static struct freq_attr *acpi_cpufreq_attr[] = { > > static struct cpufreq_driver acpi_cpufreq_driver = { > .verify = cpufreq_generic_frequency_table_verify, > + .flags = CPUFREQ_HAVE_GOVERNOR_PER_POLICY, > .target_index = acpi_cpufreq_target, > .bios_limit = acpi_processor_get_bios_limit, > .init = acpi_cpufreq_cpu_init, > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c > index 61190f6..4cb488a 100644 > --- a/drivers/cpufreq/cpufreq.c > +++ b/drivers/cpufreq/cpufreq.c > @@ -2195,9 +2195,7 @@ static int cpufreq_set_policy(struct cpufreq_policy > *polic > /* end old governor */ > if (old_gov) { > __cpufreq_governor(policy, CPUFREQ_GOV_STOP); > - up_write(&policy->rwsem); > __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT); > - down_write(&policy->rwsem); > } > > /* start new governor */ > @@ -2206,9 +2204,7 @@ static int cpufreq_set_policy(struct cpufreq_policy > *polic > if (!__cpufreq_governor(policy, CPUFREQ_GOV_START)) > goto out; > > - up_write(&policy->rwsem); > __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT); > - down_write(&policy->rwsem); > } > > /* new governor failed, so re-start old one */ > > (which causes the acpi-cpufreq driver to emulate the behaviour of the arm > cpufreq driver), and by doing > > echo ondemand > /sys/devices/system/cpu/cpu5/cpufreq/scaling_governor > cat /sys/devices/system/cpu/cpu5/cpufreq/ondemand/* > echo conservative > /sys/devices/system/cpu/cpu5/cpufreq/scaling_governor > exit 0 > > [Question: was the original reported deadlock "real"? Did it really happen or > did lockdep only report it (I may have asked this question previously and > forgot the answer)? The reason I ask is that this situation is very similar > to > USB's device removal in which the sysfs attributes are removed f
Re: Locking issues with cpufreq and sysfs
On Monday, October 13, 2014 09:22:49 AM Prarit Bhargava wrote: > > On 10/13/2014 09:11 AM, Prarit Bhargava wrote: > > There are several issues with the current locking design of cpufreq. Most > > notably is the panic reported here: > > > > http://marc.info/?l=linux-kernel&m=140622451625236&w=2 > > > > which was introduced by commit 955ef4833574636819cd269cfbae12f79cbde63a, > > cpufreq: Drop rwsem lock around CPUFREQ_GOV_POLICY_EXIT, which introduces > > a race in the changing of the cpufreq policy. This change was introduced > > because of a lockdep deadlock warning that can be reproduced (on x86 with > > the acpi_cpufreq driver) via the following debug patch: > > > > iff --git a/drivers/cpufreq/acpi-cpufreq.c b/drivers/cpufreq/acpi-cpufreq.c > > index b0c18ed..366cfb7 100644 > > --- a/drivers/cpufreq/acpi-cpufreq.c > > +++ b/drivers/cpufreq/acpi-cpufreq.c > > @@ -885,6 +885,7 @@ static struct freq_attr *acpi_cpufreq_attr[] = { > > > > static struct cpufreq_driver acpi_cpufreq_driver = { > > .verify = cpufreq_generic_frequency_table_verify, > > + .flags = CPUFREQ_HAVE_GOVERNOR_PER_POLICY, > > .target_index = acpi_cpufreq_target, > > .bios_limit = acpi_processor_get_bios_limit, > > .init = acpi_cpufreq_cpu_init, > > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c > > index 61190f6..4cb488a 100644 > > --- a/drivers/cpufreq/cpufreq.c > > +++ b/drivers/cpufreq/cpufreq.c > > @@ -2195,9 +2195,7 @@ static int cpufreq_set_policy(struct cpufreq_policy > > *polic > > /* end old governor */ > > if (old_gov) { > > __cpufreq_governor(policy, CPUFREQ_GOV_STOP); > > - up_write(&policy->rwsem); > > __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT); > > - down_write(&policy->rwsem); > > } > > > > /* start new governor */ > > @@ -2206,9 +2204,7 @@ static int cpufreq_set_policy(struct cpufreq_policy > > *polic > > if (!__cpufreq_governor(policy, CPUFREQ_GOV_START)) > > goto out; > > > > - up_write(&policy->rwsem); > > __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT); > > - down_write(&policy->rwsem); > > } > > > > /* new governor failed, so re-start old one */ > > > > (which causes the acpi-cpufreq driver to emulate the behaviour of the arm > > cpufreq driver), and by doing > > > > echo ondemand > /sys/devices/system/cpu/cpu5/cpufreq/scaling_governor > > cat /sys/devices/system/cpu/cpu5/cpufreq/ondemand/* > > echo conservative > /sys/devices/system/cpu/cpu5/cpufreq/scaling_governor > > exit 0 > > > > [Question: was the original reported deadlock "real"? Did it really happen > > or > > did lockdep only report it (I may have asked this question previously and > > forgot the answer)? The reason I ask is that this situation is very > > similar to > > USB's device removal in which the sysfs attributes are removed for a device > > but > > not the device it was called for. I actually think that's part of the > > problem > > here.] > > > > The above, obviously, is a complete hack of the code but in a sense does > > mimic a proper locking fix. However, even with this fix we are still left > > with a race in accessing the sysfs files. Consider the following example, > > > > CPU 1: accesses scaling_setspeed to set cpu speed > > > > simultaneously, > > > > CPU 2: accesses scaling_governor to set governor to ondemand > > > > CPU 1 & 2 race ... and this can result in different critical situations. > > The first is that CPU 1 holds the scalling_setspeed open while CPU attempts > > to change the governor. This results in a syfs warning about creating a > > file with an existing file name which in some cases can lead to additional > > corruption and a panic. The second case is that CPU 1's setting of the > > speed > > is now done on the new governor -- which may or may not be correct. In any > > case an argument could be made that the userspace program doing this type > > of action should be "smart" enough to confirm simultaneous changes... but > > in any case the kernel should not panic or corrupt data. > > > > The locking is insufficient here, Viresh. I no longer believe that fixes > > to this locking scheme are the right way to move forward here. I'm > > wondering > > if we can look at other alternatives such as maintaining a refcount or > > perhaps using a queuing mechanism for governor and policy related changes. > > > > Uh ... I meant this as "I'm willing to modify the code to do this but I'd like > to know what everyone else thinks before I do anything" ;) OK, that's constructive. :-) Can we discuss the target design first, please? You certainly have something in mind, so can you describe it? -- I speak only for myself. Rafael J. Wysocki, Intel Open Source Technology Center. -- To unsubscribe from this list:
Re: Locking issues with cpufreq and sysfs
On 10/13/2014 09:11 AM, Prarit Bhargava wrote: > There are several issues with the current locking design of cpufreq. Most > notably is the panic reported here: > > http://marc.info/?l=linux-kernel&m=140622451625236&w=2 > > which was introduced by commit 955ef4833574636819cd269cfbae12f79cbde63a, > cpufreq: Drop rwsem lock around CPUFREQ_GOV_POLICY_EXIT, which introduces > a race in the changing of the cpufreq policy. This change was introduced > because of a lockdep deadlock warning that can be reproduced (on x86 with > the acpi_cpufreq driver) via the following debug patch: > > iff --git a/drivers/cpufreq/acpi-cpufreq.c b/drivers/cpufreq/acpi-cpufreq.c > index b0c18ed..366cfb7 100644 > --- a/drivers/cpufreq/acpi-cpufreq.c > +++ b/drivers/cpufreq/acpi-cpufreq.c > @@ -885,6 +885,7 @@ static struct freq_attr *acpi_cpufreq_attr[] = { > > static struct cpufreq_driver acpi_cpufreq_driver = { > .verify = cpufreq_generic_frequency_table_verify, > + .flags = CPUFREQ_HAVE_GOVERNOR_PER_POLICY, > .target_index = acpi_cpufreq_target, > .bios_limit = acpi_processor_get_bios_limit, > .init = acpi_cpufreq_cpu_init, > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c > index 61190f6..4cb488a 100644 > --- a/drivers/cpufreq/cpufreq.c > +++ b/drivers/cpufreq/cpufreq.c > @@ -2195,9 +2195,7 @@ static int cpufreq_set_policy(struct cpufreq_policy > *polic > /* end old governor */ > if (old_gov) { > __cpufreq_governor(policy, CPUFREQ_GOV_STOP); > - up_write(&policy->rwsem); > __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT); > - down_write(&policy->rwsem); > } > > /* start new governor */ > @@ -2206,9 +2204,7 @@ static int cpufreq_set_policy(struct cpufreq_policy > *polic > if (!__cpufreq_governor(policy, CPUFREQ_GOV_START)) > goto out; > > - up_write(&policy->rwsem); > __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT); > - down_write(&policy->rwsem); > } > > /* new governor failed, so re-start old one */ > > (which causes the acpi-cpufreq driver to emulate the behaviour of the arm > cpufreq driver), and by doing > > echo ondemand > /sys/devices/system/cpu/cpu5/cpufreq/scaling_governor > cat /sys/devices/system/cpu/cpu5/cpufreq/ondemand/* > echo conservative > /sys/devices/system/cpu/cpu5/cpufreq/scaling_governor > exit 0 > > [Question: was the original reported deadlock "real"? Did it really happen or > did lockdep only report it (I may have asked this question previously and > forgot the answer)? The reason I ask is that this situation is very similar > to > USB's device removal in which the sysfs attributes are removed for a device > but > not the device it was called for. I actually think that's part of the problem > here.] > > The above, obviously, is a complete hack of the code but in a sense does > mimic a proper locking fix. However, even with this fix we are still left > with a race in accessing the sysfs files. Consider the following example, > > CPU 1: accesses scaling_setspeed to set cpu speed > > simultaneously, > > CPU 2: accesses scaling_governor to set governor to ondemand > > CPU 1 & 2 race ... and this can result in different critical situations. > The first is that CPU 1 holds the scalling_setspeed open while CPU attempts > to change the governor. This results in a syfs warning about creating a > file with an existing file name which in some cases can lead to additional > corruption and a panic. The second case is that CPU 1's setting of the speed > is now done on the new governor -- which may or may not be correct. In any > case an argument could be made that the userspace program doing this type > of action should be "smart" enough to confirm simultaneous changes... but > in any case the kernel should not panic or corrupt data. > > The locking is insufficient here, Viresh. I no longer believe that fixes > to this locking scheme are the right way to move forward here. I'm wondering > if we can look at other alternatives such as maintaining a refcount or > perhaps using a queuing mechanism for governor and policy related changes. > Uh ... I meant this as "I'm willing to modify the code to do this but I'd like to know what everyone else thinks before I do anything" ;) P. > P. > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Locking issues with cpufreq and sysfs
There are several issues with the current locking design of cpufreq. Most notably is the panic reported here: http://marc.info/?l=linux-kernel&m=140622451625236&w=2 which was introduced by commit 955ef4833574636819cd269cfbae12f79cbde63a, cpufreq: Drop rwsem lock around CPUFREQ_GOV_POLICY_EXIT, which introduces a race in the changing of the cpufreq policy. This change was introduced because of a lockdep deadlock warning that can be reproduced (on x86 with the acpi_cpufreq driver) via the following debug patch: iff --git a/drivers/cpufreq/acpi-cpufreq.c b/drivers/cpufreq/acpi-cpufreq.c index b0c18ed..366cfb7 100644 --- a/drivers/cpufreq/acpi-cpufreq.c +++ b/drivers/cpufreq/acpi-cpufreq.c @@ -885,6 +885,7 @@ static struct freq_attr *acpi_cpufreq_attr[] = { static struct cpufreq_driver acpi_cpufreq_driver = { .verify = cpufreq_generic_frequency_table_verify, + .flags = CPUFREQ_HAVE_GOVERNOR_PER_POLICY, .target_index = acpi_cpufreq_target, .bios_limit = acpi_processor_get_bios_limit, .init = acpi_cpufreq_cpu_init, diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 61190f6..4cb488a 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -2195,9 +2195,7 @@ static int cpufreq_set_policy(struct cpufreq_policy *polic /* end old governor */ if (old_gov) { __cpufreq_governor(policy, CPUFREQ_GOV_STOP); - up_write(&policy->rwsem); __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT); - down_write(&policy->rwsem); } /* start new governor */ @@ -2206,9 +2204,7 @@ static int cpufreq_set_policy(struct cpufreq_policy *polic if (!__cpufreq_governor(policy, CPUFREQ_GOV_START)) goto out; - up_write(&policy->rwsem); __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT); - down_write(&policy->rwsem); } /* new governor failed, so re-start old one */ (which causes the acpi-cpufreq driver to emulate the behaviour of the arm cpufreq driver), and by doing echo ondemand > /sys/devices/system/cpu/cpu5/cpufreq/scaling_governor cat /sys/devices/system/cpu/cpu5/cpufreq/ondemand/* echo conservative > /sys/devices/system/cpu/cpu5/cpufreq/scaling_governor exit 0 [Question: was the original reported deadlock "real"? Did it really happen or did lockdep only report it (I may have asked this question previously and forgot the answer)? The reason I ask is that this situation is very similar to USB's device removal in which the sysfs attributes are removed for a device but not the device it was called for. I actually think that's part of the problem here.] The above, obviously, is a complete hack of the code but in a sense does mimic a proper locking fix. However, even with this fix we are still left with a race in accessing the sysfs files. Consider the following example, CPU 1: accesses scaling_setspeed to set cpu speed simultaneously, CPU 2: accesses scaling_governor to set governor to ondemand CPU 1 & 2 race ... and this can result in different critical situations. The first is that CPU 1 holds the scalling_setspeed open while CPU attempts to change the governor. This results in a syfs warning about creating a file with an existing file name which in some cases can lead to additional corruption and a panic. The second case is that CPU 1's setting of the speed is now done on the new governor -- which may or may not be correct. In any case an argument could be made that the userspace program doing this type of action should be "smart" enough to confirm simultaneous changes... but in any case the kernel should not panic or corrupt data. The locking is insufficient here, Viresh. I no longer believe that fixes to this locking scheme are the right way to move forward here. I'm wondering if we can look at other alternatives such as maintaining a refcount or perhaps using a queuing mechanism for governor and policy related changes. P. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/