Re: Locking issues with cpufreq and sysfs

2014-10-17 Thread Viresh Kumar
On 17 October 2014 17:45, Prarit Bhargava  wrote:
> Hmmm
>
> This is what I'm doing:
>
> echo ondemand > scaling_governor
> cat ondemand/*
> echo conservative > scaling_governor
>
> OOC what are you doing to test?

Exactly same and even tried Roberts script too :)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Locking issues with cpufreq and sysfs

2014-10-17 Thread Prarit Bhargava


On 10/17/2014 07:38 AM, Viresh Kumar wrote:
> On 13 October 2014 18:41, Prarit Bhargava  wrote:
>> There are several issues with the current locking design of cpufreq.  Most
>> notably is the panic reported here:
>>
>> http://marc.info/?l=linux-kernel=140622451625236=2
>>
>> which was introduced by commit 955ef4833574636819cd269cfbae12f79cbde63a,
>> cpufreq: Drop rwsem lock around CPUFREQ_GOV_POLICY_EXIT, which introduces
> 
> Don't know whats going wrong but I am just not able to reproduce the
> lockdep again :(
> I have tried this on two boards and am making sure that all things are 
> correctly
> configured. I am trying this on two of Exynos boards, One a dual-A15 and 
> another
> big little with 8 cores..
> 
> 
> @@ -2195,9 +2195,7 @@ static int cpufreq_set_policy(struct
> cpufreq_policy *policy,
> /* end old governor */
> if (old_gov) {
> __cpufreq_governor(policy, CPUFREQ_GOV_STOP);
> -   up_write(>rwsem);
> __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT);
> -   down_write(>rwsem);
> }
> 
> /* start new governor */
> @@ -2206,9 +2204,7 @@ static int cpufreq_set_policy(struct
> cpufreq_policy *policy,
> if (!__cpufreq_governor(policy, CPUFREQ_GOV_START))
> goto out;
> 
> -   up_write(>rwsem);
> __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT);
> -   down_write(>rwsem);
> }
> 
> diff --git a/drivers/cpufreq/cpufreq_governor.c
> b/drivers/cpufreq/cpufreq_governor.c
> index 1b44496..1a6972a 100644
> --- a/drivers/cpufreq/cpufreq_governor.c
> +++ b/drivers/cpufreq/cpufreq_governor.c
> @@ -323,6 +323,7 @@ int cpufreq_governor_dbs(struct cpufreq_policy *policy,
> return 0;
> case CPUFREQ_GOV_POLICY_EXIT:
> if (!--dbs_data->usage_count) {
> +   pr_info("%s\n", __func__);
> sysfs_remove_group(get_governor_parent_kobj(policy),
> get_sysfs_attr(dbs_data));
> 
> 

Hmmm

This is what I'm doing:

echo ondemand > scaling_governor
cat ondemand/*
echo conservative > scaling_governor

OOC what are you doing to test?

P.

> 
> 
> .config attached too, please let me know what am I missing.
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Locking issues with cpufreq and sysfs

2014-10-17 Thread Prarit Bhargava


On 10/16/2014 07:23 AM, Viresh Kumar wrote:
> On 14 October 2014 23:54, Prarit Bhargava  wrote:
>> Here's what I think we should do.  Taking a step back, the purpose of the
>> cpufreq sysfs files is to allow userspace to read current cpu frequency
>> settings, and to allow userspce to modify the governor and set the max & min
>> ranges for cpu frequencies.  This can be done per device or for all cpus
>> depending on the driver.
> 
> Okay.
> 
>> We have to guarantee that bothing reading and writing will always work and 
>> that
>> write operations will always be atomic relative to userspace.  The current
> 
> Ok.
> 
>> implementation of cpufreq does this through the following locks:
>>
>> cpufreq_driver_lock: protects the cpufreq_cpu_data array and 
>> cpufreq_driver->boost
>> cpufreq_governor_lock: protects the current governor
> 
> Its just for serialization..
> 
>> cpufreq_governor_mutex: protects the cpufreq_governor_list
>> cpufreq_rwsem: protects the driver from being unloaded
>> global_kobj_lock: protects the "cpufreq" kobject
>> each policy has a rwsem (policy->rwsem): protects the cpufreq_policy struct
>> each policy has a transition_lock (policy->transition): synchronizes
>> frequency transitions
>>
>> While examining this code I was wondering exactly why we allow multiple 
>> readers
>> and writers in cpufreq.  I could understand if we felt that this data was
>> critical; but it really isn't.  A short delay here isn't that big of a deal 
>> IMO
>> (if someone can produce a case where a delay would cause a serious problem 
>> I'd
>>  like to hear it).  I don't even think it is safe in most cases to allow 
>> readers
>> while cpufreq values are changing; if we're changing the governor userspace
>> cannot rely on the value of (for example) cpuinfo_max_freq.
> 
> I don't know how reader writer lock will fail and a normal lock will not.
> There is only benefit of rwlock, that readers can read things while
> there is nobody
> writing..
> 
>> So I'm proposing that we move to a single threaded read/write using, if
> 
> Okay, but how will that benefit us ?

It will greatly simplify the code.  The locking isn't working in this code at
all right now and is causing various reported panics ... you yourself are
pushing a lock patch that serializes operations -- which is causing other
problems during testing.

> 
>> possible, a single policy lock for now.  We might transition this back to a
>> rwsem later on, however, for the first attempt at cleaning this up I think we
>> should just stick with a simple lock.  In doing that, IMO we remove
>>
>> cpufreq_rwsem: protects the driver from being unloaded
>> cpufreq_governor_lock: protects the current governor
>> each policy has a rwsem (policy->rwsem): protects the cpufreq_policy struct
>>
>> and potentially
>>
>> cpufreq_driver_lock: protects the cpufreq_cpu_data array and 
>> cpufreq_driver->boost
> 
> Not really sure, but yeah we might be able to club few of them..
> 
>> After looking at the way the code would be structured, I'm wondering if
>>
>> cpufreq_governor_mutex: protects the cpufreq_governor_list
>>
>> is overkill.  The loading of a module should be atomic relative to the 
>> cpufreq
>> code, so this lock may not be required.  (Admittedly I haven't tested 
>> that...)
>>
>> That would leave:
>>
>> global_kobj_lock: protects the "cpufreq" kobject
>> each policy has a transition_lock (policy->transition): synchronizes
>> frequency transitions
>>
>> and a new lock, perhaps called policy->lock, to serialize all events.
>>
>> Pros: We clean all this up to a simpler single threaded model.  Bugs and 
>> races
>> here would be much easier to handle.  We're currently putting band-aid on
>> band-aids in this code ATM and it looks like we're seeing old races expanded
>> or new races exposed.
>>
>> Cons: We lose the ability to do simultaneous reads and writes ... although
>> I remain unconvinced that this would ever be safe to do.  ie) If I change the
>> governor while at the same time reading, for example, the current cpu
>> frequency I cannot rely on that value to be valid.
>>
>> After that we can add some reference counting to the sysfs file accesses
>> so that we can block after the sysfs removal when we change cpufreq
>> governors.  I think that would be trivial and that it would resolve any races
>> when adding and removing governor's sysfs files.
> 
> Not really sure, but if you solve few things with getting these bugs resolved
> then we might apply your patches without any issues.
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Locking issues with cpufreq and sysfs

2014-10-17 Thread Viresh Kumar
On 13 October 2014 18:41, Prarit Bhargava  wrote:
> There are several issues with the current locking design of cpufreq.  Most
> notably is the panic reported here:
>
> http://marc.info/?l=linux-kernel=140622451625236=2
>
> which was introduced by commit 955ef4833574636819cd269cfbae12f79cbde63a,
> cpufreq: Drop rwsem lock around CPUFREQ_GOV_POLICY_EXIT, which introduces

Don't know whats going wrong but I am just not able to reproduce the
lockdep again :(
I have tried this on two boards and am making sure that all things are correctly
configured. I am trying this on two of Exynos boards, One a dual-A15 and another
big little with 8 cores..


@@ -2195,9 +2195,7 @@ static int cpufreq_set_policy(struct
cpufreq_policy *policy,
/* end old governor */
if (old_gov) {
__cpufreq_governor(policy, CPUFREQ_GOV_STOP);
-   up_write(>rwsem);
__cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT);
-   down_write(>rwsem);
}

/* start new governor */
@@ -2206,9 +2204,7 @@ static int cpufreq_set_policy(struct
cpufreq_policy *policy,
if (!__cpufreq_governor(policy, CPUFREQ_GOV_START))
goto out;

-   up_write(>rwsem);
__cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT);
-   down_write(>rwsem);
}

diff --git a/drivers/cpufreq/cpufreq_governor.c
b/drivers/cpufreq/cpufreq_governor.c
index 1b44496..1a6972a 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -323,6 +323,7 @@ int cpufreq_governor_dbs(struct cpufreq_policy *policy,
return 0;
case CPUFREQ_GOV_POLICY_EXIT:
if (!--dbs_data->usage_count) {
+   pr_info("%s\n", __func__);
sysfs_remove_group(get_governor_parent_kobj(policy),
get_sysfs_attr(dbs_data));




.config attached too, please let me know what am I missing.


.config
Description: Binary data


Re: Locking issues with cpufreq and sysfs

2014-10-17 Thread Viresh Kumar
On 13 October 2014 18:41, Prarit Bhargava pra...@redhat.com wrote:
 There are several issues with the current locking design of cpufreq.  Most
 notably is the panic reported here:

 http://marc.info/?l=linux-kernelm=140622451625236w=2

 which was introduced by commit 955ef4833574636819cd269cfbae12f79cbde63a,
 cpufreq: Drop rwsem lock around CPUFREQ_GOV_POLICY_EXIT, which introduces

Don't know whats going wrong but I am just not able to reproduce the
lockdep again :(
I have tried this on two boards and am making sure that all things are correctly
configured. I am trying this on two of Exynos boards, One a dual-A15 and another
big little with 8 cores..


@@ -2195,9 +2195,7 @@ static int cpufreq_set_policy(struct
cpufreq_policy *policy,
/* end old governor */
if (old_gov) {
__cpufreq_governor(policy, CPUFREQ_GOV_STOP);
-   up_write(policy-rwsem);
__cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT);
-   down_write(policy-rwsem);
}

/* start new governor */
@@ -2206,9 +2204,7 @@ static int cpufreq_set_policy(struct
cpufreq_policy *policy,
if (!__cpufreq_governor(policy, CPUFREQ_GOV_START))
goto out;

-   up_write(policy-rwsem);
__cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT);
-   down_write(policy-rwsem);
}

diff --git a/drivers/cpufreq/cpufreq_governor.c
b/drivers/cpufreq/cpufreq_governor.c
index 1b44496..1a6972a 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -323,6 +323,7 @@ int cpufreq_governor_dbs(struct cpufreq_policy *policy,
return 0;
case CPUFREQ_GOV_POLICY_EXIT:
if (!--dbs_data-usage_count) {
+   pr_info(%s\n, __func__);
sysfs_remove_group(get_governor_parent_kobj(policy),
get_sysfs_attr(dbs_data));




.config attached too, please let me know what am I missing.


.config
Description: Binary data


Re: Locking issues with cpufreq and sysfs

2014-10-17 Thread Prarit Bhargava


On 10/16/2014 07:23 AM, Viresh Kumar wrote:
 On 14 October 2014 23:54, Prarit Bhargava pra...@redhat.com wrote:
 Here's what I think we should do.  Taking a step back, the purpose of the
 cpufreq sysfs files is to allow userspace to read current cpu frequency
 settings, and to allow userspce to modify the governor and set the max  min
 ranges for cpu frequencies.  This can be done per device or for all cpus
 depending on the driver.
 
 Okay.
 
 We have to guarantee that bothing reading and writing will always work and 
 that
 write operations will always be atomic relative to userspace.  The current
 
 Ok.
 
 implementation of cpufreq does this through the following locks:

 cpufreq_driver_lock: protects the cpufreq_cpu_data array and 
 cpufreq_driver-boost
 cpufreq_governor_lock: protects the current governor
 
 Its just for serialization..
 
 cpufreq_governor_mutex: protects the cpufreq_governor_list
 cpufreq_rwsem: protects the driver from being unloaded
 global_kobj_lock: protects the cpufreq kobject
 each policy has a rwsem (policy-rwsem): protects the cpufreq_policy struct
 each policy has a transition_lock (policy-transition): synchronizes
 frequency transitions

 While examining this code I was wondering exactly why we allow multiple 
 readers
 and writers in cpufreq.  I could understand if we felt that this data was
 critical; but it really isn't.  A short delay here isn't that big of a deal 
 IMO
 (if someone can produce a case where a delay would cause a serious problem 
 I'd
  like to hear it).  I don't even think it is safe in most cases to allow 
 readers
 while cpufreq values are changing; if we're changing the governor userspace
 cannot rely on the value of (for example) cpuinfo_max_freq.
 
 I don't know how reader writer lock will fail and a normal lock will not.
 There is only benefit of rwlock, that readers can read things while
 there is nobody
 writing..
 
 So I'm proposing that we move to a single threaded read/write using, if
 
 Okay, but how will that benefit us ?

It will greatly simplify the code.  The locking isn't working in this code at
all right now and is causing various reported panics ... you yourself are
pushing a lock patch that serializes operations -- which is causing other
problems during testing.

 
 possible, a single policy lock for now.  We might transition this back to a
 rwsem later on, however, for the first attempt at cleaning this up I think we
 should just stick with a simple lock.  In doing that, IMO we remove

 cpufreq_rwsem: protects the driver from being unloaded
 cpufreq_governor_lock: protects the current governor
 each policy has a rwsem (policy-rwsem): protects the cpufreq_policy struct

 and potentially

 cpufreq_driver_lock: protects the cpufreq_cpu_data array and 
 cpufreq_driver-boost
 
 Not really sure, but yeah we might be able to club few of them..
 
 After looking at the way the code would be structured, I'm wondering if

 cpufreq_governor_mutex: protects the cpufreq_governor_list

 is overkill.  The loading of a module should be atomic relative to the 
 cpufreq
 code, so this lock may not be required.  (Admittedly I haven't tested 
 that...)

 That would leave:

 global_kobj_lock: protects the cpufreq kobject
 each policy has a transition_lock (policy-transition): synchronizes
 frequency transitions

 and a new lock, perhaps called policy-lock, to serialize all events.

 Pros: We clean all this up to a simpler single threaded model.  Bugs and 
 races
 here would be much easier to handle.  We're currently putting band-aid on
 band-aids in this code ATM and it looks like we're seeing old races expanded
 or new races exposed.

 Cons: We lose the ability to do simultaneous reads and writes ... although
 I remain unconvinced that this would ever be safe to do.  ie) If I change the
 governor while at the same time reading, for example, the current cpu
 frequency I cannot rely on that value to be valid.

 After that we can add some reference counting to the sysfs file accesses
 so that we can block after the sysfs removal when we change cpufreq
 governors.  I think that would be trivial and that it would resolve any races
 when adding and removing governor's sysfs files.
 
 Not really sure, but if you solve few things with getting these bugs resolved
 then we might apply your patches without any issues.
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Locking issues with cpufreq and sysfs

2014-10-17 Thread Prarit Bhargava


On 10/17/2014 07:38 AM, Viresh Kumar wrote:
 On 13 October 2014 18:41, Prarit Bhargava pra...@redhat.com wrote:
 There are several issues with the current locking design of cpufreq.  Most
 notably is the panic reported here:

 http://marc.info/?l=linux-kernelm=140622451625236w=2

 which was introduced by commit 955ef4833574636819cd269cfbae12f79cbde63a,
 cpufreq: Drop rwsem lock around CPUFREQ_GOV_POLICY_EXIT, which introduces
 
 Don't know whats going wrong but I am just not able to reproduce the
 lockdep again :(
 I have tried this on two boards and am making sure that all things are 
 correctly
 configured. I am trying this on two of Exynos boards, One a dual-A15 and 
 another
 big little with 8 cores..
 
 
 @@ -2195,9 +2195,7 @@ static int cpufreq_set_policy(struct
 cpufreq_policy *policy,
 /* end old governor */
 if (old_gov) {
 __cpufreq_governor(policy, CPUFREQ_GOV_STOP);
 -   up_write(policy-rwsem);
 __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT);
 -   down_write(policy-rwsem);
 }
 
 /* start new governor */
 @@ -2206,9 +2204,7 @@ static int cpufreq_set_policy(struct
 cpufreq_policy *policy,
 if (!__cpufreq_governor(policy, CPUFREQ_GOV_START))
 goto out;
 
 -   up_write(policy-rwsem);
 __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT);
 -   down_write(policy-rwsem);
 }
 
 diff --git a/drivers/cpufreq/cpufreq_governor.c
 b/drivers/cpufreq/cpufreq_governor.c
 index 1b44496..1a6972a 100644
 --- a/drivers/cpufreq/cpufreq_governor.c
 +++ b/drivers/cpufreq/cpufreq_governor.c
 @@ -323,6 +323,7 @@ int cpufreq_governor_dbs(struct cpufreq_policy *policy,
 return 0;
 case CPUFREQ_GOV_POLICY_EXIT:
 if (!--dbs_data-usage_count) {
 +   pr_info(%s\n, __func__);
 sysfs_remove_group(get_governor_parent_kobj(policy),
 get_sysfs_attr(dbs_data));
 
 

Hmmm

This is what I'm doing:

echo ondemand  scaling_governor
cat ondemand/*
echo conservative  scaling_governor

OOC what are you doing to test?

P.

 
 
 .config attached too, please let me know what am I missing.
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Locking issues with cpufreq and sysfs

2014-10-17 Thread Viresh Kumar
On 17 October 2014 17:45, Prarit Bhargava pra...@redhat.com wrote:
 Hmmm

 This is what I'm doing:

 echo ondemand  scaling_governor
 cat ondemand/*
 echo conservative  scaling_governor

 OOC what are you doing to test?

Exactly same and even tried Roberts script too :)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Locking issues with cpufreq and sysfs

2014-10-16 Thread Viresh Kumar
On 14 October 2014 23:54, Prarit Bhargava  wrote:
> Here's what I think we should do.  Taking a step back, the purpose of the
> cpufreq sysfs files is to allow userspace to read current cpu frequency
> settings, and to allow userspce to modify the governor and set the max & min
> ranges for cpu frequencies.  This can be done per device or for all cpus
> depending on the driver.

Okay.

> We have to guarantee that bothing reading and writing will always work and 
> that
> write operations will always be atomic relative to userspace.  The current

Ok.

> implementation of cpufreq does this through the following locks:
>
> cpufreq_driver_lock: protects the cpufreq_cpu_data array and 
> cpufreq_driver->boost
> cpufreq_governor_lock: protects the current governor

Its just for serialization..

> cpufreq_governor_mutex: protects the cpufreq_governor_list
> cpufreq_rwsem: protects the driver from being unloaded
> global_kobj_lock: protects the "cpufreq" kobject
> each policy has a rwsem (policy->rwsem): protects the cpufreq_policy struct
> each policy has a transition_lock (policy->transition): synchronizes
> frequency transitions
>
> While examining this code I was wondering exactly why we allow multiple 
> readers
> and writers in cpufreq.  I could understand if we felt that this data was
> critical; but it really isn't.  A short delay here isn't that big of a deal 
> IMO
> (if someone can produce a case where a delay would cause a serious problem I'd
>  like to hear it).  I don't even think it is safe in most cases to allow 
> readers
> while cpufreq values are changing; if we're changing the governor userspace
> cannot rely on the value of (for example) cpuinfo_max_freq.

I don't know how reader writer lock will fail and a normal lock will not.
There is only benefit of rwlock, that readers can read things while
there is nobody
writing..

> So I'm proposing that we move to a single threaded read/write using, if

Okay, but how will that benefit us ?

> possible, a single policy lock for now.  We might transition this back to a
> rwsem later on, however, for the first attempt at cleaning this up I think we
> should just stick with a simple lock.  In doing that, IMO we remove
>
> cpufreq_rwsem: protects the driver from being unloaded
> cpufreq_governor_lock: protects the current governor
> each policy has a rwsem (policy->rwsem): protects the cpufreq_policy struct
>
> and potentially
>
> cpufreq_driver_lock: protects the cpufreq_cpu_data array and 
> cpufreq_driver->boost

Not really sure, but yeah we might be able to club few of them..

> After looking at the way the code would be structured, I'm wondering if
>
> cpufreq_governor_mutex: protects the cpufreq_governor_list
>
> is overkill.  The loading of a module should be atomic relative to the cpufreq
> code, so this lock may not be required.  (Admittedly I haven't tested that...)
>
> That would leave:
>
> global_kobj_lock: protects the "cpufreq" kobject
> each policy has a transition_lock (policy->transition): synchronizes
> frequency transitions
>
> and a new lock, perhaps called policy->lock, to serialize all events.
>
> Pros: We clean all this up to a simpler single threaded model.  Bugs and races
> here would be much easier to handle.  We're currently putting band-aid on
> band-aids in this code ATM and it looks like we're seeing old races expanded
> or new races exposed.
>
> Cons: We lose the ability to do simultaneous reads and writes ... although
> I remain unconvinced that this would ever be safe to do.  ie) If I change the
> governor while at the same time reading, for example, the current cpu
> frequency I cannot rely on that value to be valid.
>
> After that we can add some reference counting to the sysfs file accesses
> so that we can block after the sysfs removal when we change cpufreq
> governors.  I think that would be trivial and that it would resolve any races
> when adding and removing governor's sysfs files.

Not really sure, but if you solve few things with getting these bugs resolved
then we might apply your patches without any issues.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Locking issues with cpufreq and sysfs

2014-10-16 Thread Viresh Kumar
On 14 October 2014 23:54, Prarit Bhargava pra...@redhat.com wrote:
 Here's what I think we should do.  Taking a step back, the purpose of the
 cpufreq sysfs files is to allow userspace to read current cpu frequency
 settings, and to allow userspce to modify the governor and set the max  min
 ranges for cpu frequencies.  This can be done per device or for all cpus
 depending on the driver.

Okay.

 We have to guarantee that bothing reading and writing will always work and 
 that
 write operations will always be atomic relative to userspace.  The current

Ok.

 implementation of cpufreq does this through the following locks:

 cpufreq_driver_lock: protects the cpufreq_cpu_data array and 
 cpufreq_driver-boost
 cpufreq_governor_lock: protects the current governor

Its just for serialization..

 cpufreq_governor_mutex: protects the cpufreq_governor_list
 cpufreq_rwsem: protects the driver from being unloaded
 global_kobj_lock: protects the cpufreq kobject
 each policy has a rwsem (policy-rwsem): protects the cpufreq_policy struct
 each policy has a transition_lock (policy-transition): synchronizes
 frequency transitions

 While examining this code I was wondering exactly why we allow multiple 
 readers
 and writers in cpufreq.  I could understand if we felt that this data was
 critical; but it really isn't.  A short delay here isn't that big of a deal 
 IMO
 (if someone can produce a case where a delay would cause a serious problem I'd
  like to hear it).  I don't even think it is safe in most cases to allow 
 readers
 while cpufreq values are changing; if we're changing the governor userspace
 cannot rely on the value of (for example) cpuinfo_max_freq.

I don't know how reader writer lock will fail and a normal lock will not.
There is only benefit of rwlock, that readers can read things while
there is nobody
writing..

 So I'm proposing that we move to a single threaded read/write using, if

Okay, but how will that benefit us ?

 possible, a single policy lock for now.  We might transition this back to a
 rwsem later on, however, for the first attempt at cleaning this up I think we
 should just stick with a simple lock.  In doing that, IMO we remove

 cpufreq_rwsem: protects the driver from being unloaded
 cpufreq_governor_lock: protects the current governor
 each policy has a rwsem (policy-rwsem): protects the cpufreq_policy struct

 and potentially

 cpufreq_driver_lock: protects the cpufreq_cpu_data array and 
 cpufreq_driver-boost

Not really sure, but yeah we might be able to club few of them..

 After looking at the way the code would be structured, I'm wondering if

 cpufreq_governor_mutex: protects the cpufreq_governor_list

 is overkill.  The loading of a module should be atomic relative to the cpufreq
 code, so this lock may not be required.  (Admittedly I haven't tested that...)

 That would leave:

 global_kobj_lock: protects the cpufreq kobject
 each policy has a transition_lock (policy-transition): synchronizes
 frequency transitions

 and a new lock, perhaps called policy-lock, to serialize all events.

 Pros: We clean all this up to a simpler single threaded model.  Bugs and races
 here would be much easier to handle.  We're currently putting band-aid on
 band-aids in this code ATM and it looks like we're seeing old races expanded
 or new races exposed.

 Cons: We lose the ability to do simultaneous reads and writes ... although
 I remain unconvinced that this would ever be safe to do.  ie) If I change the
 governor while at the same time reading, for example, the current cpu
 frequency I cannot rely on that value to be valid.

 After that we can add some reference counting to the sysfs file accesses
 so that we can block after the sysfs removal when we change cpufreq
 governors.  I think that would be trivial and that it would resolve any races
 when adding and removing governor's sysfs files.

Not really sure, but if you solve few things with getting these bugs resolved
then we might apply your patches without any issues.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Locking issues with cpufreq and sysfs

2014-10-14 Thread Elliott, Robert (Server Storage)


> -Original Message-
> From: linux-kernel-ow...@vger.kernel.org [mailto:linux-kernel-
> ow...@vger.kernel.org] On Behalf Of Prarit Bhargava
> Sent: Tuesday, 14 October, 2014 1:24 PM
> To: Viresh Kumar
> Cc: Saravana Kannan; Rafael J. Wysocki; linux...@vger.kernel.org; Linux
> Kernel; Robert Schöne
> Subject: Re: Locking issues with cpufreq and sysfs
> 
> On 10/14/2014 03:10 AM, Viresh Kumar wrote:
> > On 13 October 2014 18:41, Prarit Bhargava  wrote:
> >>
> >> The locking is insufficient here, Viresh.  I no longer believe that fixes
> >> to this locking scheme are the right way to move forward here.  I'm
> wondering
> >> if we can look at other alternatives such as maintaining a refcount or
> >> perhaps using a queuing mechanism for governor and policy related changes.
> >>
...
> So I'm proposing that we move to a single threaded read/write using, if
> possible, a single policy lock for now.  We might transition this back to a
> rwsem later on, however, for the first attempt at cleaning this up I think we
> should just stick with a simple lock.  In doing that, IMO we remove
> cpufreq_rwsem: protects the driver from being unloaded
> cpufreq_governor_lock: protects the current governor
> each policy has a rwsem (policy->rwsem): protects the cpufreq_policy struct
> 
> and potentially
> cpufreq_driver_lock: protects the cpufreq_cpu_data array and cpufreq_driver-
> >boost
> 
> After looking at the way the code would be structured, I'm wondering if
> cpufreq_governor_mutex: protects the cpufreq_governor_list
> is overkill.  The loading of a module should be atomic relative to the
> cpufreq code, so this lock may not be required.  (Admittedly I haven't
> tested that...)
> 
> That would leave:
> global_kobj_lock: protects the "cpufreq" kobject
> each policy has a transition_lock (policy->transition): synchronizes
> frequency transitions
> and a new lock, perhaps called policy->lock, to serialize all events.
> 

Please keep performance in mind too.  cpufreq_governor_lock
contention is a bit of an issue with heavy IO workloads
as described in:
http://marc.info/?l=linux-pm=140924051503827=2


---
Rob ElliottHP Server Storage



N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

Re: Locking issues with cpufreq and sysfs

2014-10-14 Thread Prarit Bhargava


On 10/14/2014 03:10 AM, Viresh Kumar wrote:
> On 13 October 2014 18:41, Prarit Bhargava  wrote:
>>
>> The locking is insufficient here, Viresh.  I no longer believe that fixes
>> to this locking scheme are the right way to move forward here.  I'm wondering
>> if we can look at other alternatives such as maintaining a refcount or
>> perhaps using a queuing mechanism for governor and policy related changes.
>>

Here's what I think we should do.  Taking a step back, the purpose of the
cpufreq sysfs files is to allow userspace to read current cpu frequency
settings, and to allow userspce to modify the governor and set the max & min
ranges for cpu frequencies.  This can be done per device or for all cpus
depending on the driver.

We have to guarantee that bothing reading and writing will always work and that
write operations will always be atomic relative to userspace.  The current
implementation of cpufreq does this through the following locks:

cpufreq_driver_lock: protects the cpufreq_cpu_data array and 
cpufreq_driver->boost
cpufreq_governor_lock: protects the current governor
cpufreq_governor_mutex: protects the cpufreq_governor_list
cpufreq_rwsem: protects the driver from being unloaded
global_kobj_lock: protects the "cpufreq" kobject
each policy has a rwsem (policy->rwsem): protects the cpufreq_policy struct
each policy has a transition_lock (policy->transition): synchronizes
frequency transitions

While examining this code I was wondering exactly why we allow multiple readers
and writers in cpufreq.  I could understand if we felt that this data was
critical; but it really isn't.  A short delay here isn't that big of a deal IMO
(if someone can produce a case where a delay would cause a serious problem I'd
 like to hear it).  I don't even think it is safe in most cases to allow readers
while cpufreq values are changing; if we're changing the governor userspace
cannot rely on the value of (for example) cpuinfo_max_freq.

So I'm proposing that we move to a single threaded read/write using, if
possible, a single policy lock for now.  We might transition this back to a
rwsem later on, however, for the first attempt at cleaning this up I think we
should just stick with a simple lock.  In doing that, IMO we remove

cpufreq_rwsem: protects the driver from being unloaded
cpufreq_governor_lock: protects the current governor
each policy has a rwsem (policy->rwsem): protects the cpufreq_policy struct

and potentially

cpufreq_driver_lock: protects the cpufreq_cpu_data array and 
cpufreq_driver->boost

After looking at the way the code would be structured, I'm wondering if

cpufreq_governor_mutex: protects the cpufreq_governor_list

is overkill.  The loading of a module should be atomic relative to the cpufreq
code, so this lock may not be required.  (Admittedly I haven't tested that...)

That would leave:

global_kobj_lock: protects the "cpufreq" kobject
each policy has a transition_lock (policy->transition): synchronizes
frequency transitions

and a new lock, perhaps called policy->lock, to serialize all events.

Pros: We clean all this up to a simpler single threaded model.  Bugs and races
here would be much easier to handle.  We're currently putting band-aid on
band-aids in this code ATM and it looks like we're seeing old races expanded
or new races exposed.

Cons: We lose the ability to do simultaneous reads and writes ... although
I remain unconvinced that this would ever be safe to do.  ie) If I change the
governor while at the same time reading, for example, the current cpu
frequency I cannot rely on that value to be valid.

After that we can add some reference counting to the sysfs file accesses
so that we can block after the sysfs removal when we change cpufreq
governors.  I think that would be trivial and that it would resolve any races
when adding and removing governor's sysfs files.

P.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Locking issues with cpufreq and sysfs

2014-10-14 Thread Viresh Kumar
On 13 October 2014 18:41, Prarit Bhargava  wrote:
> There are several issues with the current locking design of cpufreq.  Most

Sadly yes :(

> [Question: was the original reported deadlock "real"?  Did it really happen or
> did lockdep only report it (I may have asked this question previously and
> forgot the answer)?  The reason I ask is that this situation is very similar 
> to
> USB's device removal in which the sysfs attributes are removed for a device 
> but
> not the device it was called for.  I actually think that's part of the problem
> here.]

I am still not sure about those lockdep warnings :(

> The above, obviously, is a complete hack of the code but in a sense does
> mimic a proper locking fix.  However, even with this fix we are still left
> with a race in accessing the sysfs files.  Consider the following example,
>
> CPU 1: accesses scaling_setspeed to set cpu speed
>
> simultaneously,
>
> CPU 2: accesses scaling_governor to set governor to ondemand
>
> CPU 1 & 2 race ... and this can result in different critical situations.
> The first is that CPU 1 holds the scalling_setspeed open while CPU attempts
> to change the governor.  This results in a syfs warning about creating a
> file with an existing file name which in some cases can lead to additional
> corruption and a panic.  The second case is that CPU 1's setting of the speed
> is now done on the new governor -- which may or may not be correct.  In any
> case an argument could be made that the userspace program doing this type
> of action should be "smart" enough to confirm simultaneous changes... but
> in any case the kernel should not panic or corrupt data.
>
> The locking is insufficient here, Viresh.  I no longer believe that fixes
> to this locking scheme are the right way to move forward here.  I'm wondering
> if we can look at other alternatives such as maintaining a refcount or
> perhaps using a queuing mechanism for governor and policy related changes.

Probably this is similar to what I have been trying, i.e. to give access to only
a single thread to call __cpufreq_governor().

Can you please try the cpufreq/governor-fixes-v4 branch once ?

On 13 October 2014 18:41, Prarit Bhargava  wrote:
> There are several issues with the current locking design of cpufreq.  Most
> notably is the panic reported here:
>
> http://marc.info/?l=linux-kernel=140622451625236=2
>
> which was introduced by commit 955ef4833574636819cd269cfbae12f79cbde63a,
> cpufreq: Drop rwsem lock around CPUFREQ_GOV_POLICY_EXIT, which introduces
> a race in the changing of the cpufreq policy.  This change was introduced
> because of a lockdep deadlock warning that can be reproduced (on x86 with
> the acpi_cpufreq driver) via the following debug patch:
>
> iff --git a/drivers/cpufreq/acpi-cpufreq.c b/drivers/cpufreq/acpi-cpufreq.c
> index b0c18ed..366cfb7 100644
> --- a/drivers/cpufreq/acpi-cpufreq.c
> +++ b/drivers/cpufreq/acpi-cpufreq.c
> @@ -885,6 +885,7 @@ static struct freq_attr *acpi_cpufreq_attr[] = {
>
>  static struct cpufreq_driver acpi_cpufreq_driver = {
> .verify = cpufreq_generic_frequency_table_verify,
> +   .flags  = CPUFREQ_HAVE_GOVERNOR_PER_POLICY,
> .target_index   = acpi_cpufreq_target,
> .bios_limit = acpi_processor_get_bios_limit,
> .init   = acpi_cpufreq_cpu_init,
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 61190f6..4cb488a 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -2195,9 +2195,7 @@ static int cpufreq_set_policy(struct cpufreq_policy 
> *polic
> /* end old governor */
> if (old_gov) {
> __cpufreq_governor(policy, CPUFREQ_GOV_STOP);
> -   up_write(>rwsem);
> __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT);
> -   down_write(>rwsem);
> }
>
> /* start new governor */
> @@ -2206,9 +2204,7 @@ static int cpufreq_set_policy(struct cpufreq_policy 
> *polic
> if (!__cpufreq_governor(policy, CPUFREQ_GOV_START))
> goto out;
>
> -   up_write(>rwsem);
> __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT);
> -   down_write(>rwsem);
> }
>
> /* new governor failed, so re-start old one */
>
> (which causes the acpi-cpufreq driver to emulate the behaviour of the arm
> cpufreq driver), and by doing
>
> echo ondemand > /sys/devices/system/cpu/cpu5/cpufreq/scaling_governor
> cat /sys/devices/system/cpu/cpu5/cpufreq/ondemand/*
> echo conservative > /sys/devices/system/cpu/cpu5/cpufreq/scaling_governor
> exit 0
>
> [Question: was the original reported deadlock "real"?  Did it really happen or
> did lockdep only report it (I may have asked this question previously and
> forgot the answer)?  The reason I ask is that this situation is very similar 
> to
> USB's device removal in which the sysfs attributes are removed for a device 
> but
> not the device 

Re: Locking issues with cpufreq and sysfs

2014-10-14 Thread Viresh Kumar
On 13 October 2014 18:41, Prarit Bhargava pra...@redhat.com wrote:
 There are several issues with the current locking design of cpufreq.  Most

Sadly yes :(

 [Question: was the original reported deadlock real?  Did it really happen or
 did lockdep only report it (I may have asked this question previously and
 forgot the answer)?  The reason I ask is that this situation is very similar 
 to
 USB's device removal in which the sysfs attributes are removed for a device 
 but
 not the device it was called for.  I actually think that's part of the problem
 here.]

I am still not sure about those lockdep warnings :(

 The above, obviously, is a complete hack of the code but in a sense does
 mimic a proper locking fix.  However, even with this fix we are still left
 with a race in accessing the sysfs files.  Consider the following example,

 CPU 1: accesses scaling_setspeed to set cpu speed

 simultaneously,

 CPU 2: accesses scaling_governor to set governor to ondemand

 CPU 1  2 race ... and this can result in different critical situations.
 The first is that CPU 1 holds the scalling_setspeed open while CPU attempts
 to change the governor.  This results in a syfs warning about creating a
 file with an existing file name which in some cases can lead to additional
 corruption and a panic.  The second case is that CPU 1's setting of the speed
 is now done on the new governor -- which may or may not be correct.  In any
 case an argument could be made that the userspace program doing this type
 of action should be smart enough to confirm simultaneous changes... but
 in any case the kernel should not panic or corrupt data.

 The locking is insufficient here, Viresh.  I no longer believe that fixes
 to this locking scheme are the right way to move forward here.  I'm wondering
 if we can look at other alternatives such as maintaining a refcount or
 perhaps using a queuing mechanism for governor and policy related changes.

Probably this is similar to what I have been trying, i.e. to give access to only
a single thread to call __cpufreq_governor().

Can you please try the cpufreq/governor-fixes-v4 branch once ?

On 13 October 2014 18:41, Prarit Bhargava pra...@redhat.com wrote:
 There are several issues with the current locking design of cpufreq.  Most
 notably is the panic reported here:

 http://marc.info/?l=linux-kernelm=140622451625236w=2

 which was introduced by commit 955ef4833574636819cd269cfbae12f79cbde63a,
 cpufreq: Drop rwsem lock around CPUFREQ_GOV_POLICY_EXIT, which introduces
 a race in the changing of the cpufreq policy.  This change was introduced
 because of a lockdep deadlock warning that can be reproduced (on x86 with
 the acpi_cpufreq driver) via the following debug patch:

 iff --git a/drivers/cpufreq/acpi-cpufreq.c b/drivers/cpufreq/acpi-cpufreq.c
 index b0c18ed..366cfb7 100644
 --- a/drivers/cpufreq/acpi-cpufreq.c
 +++ b/drivers/cpufreq/acpi-cpufreq.c
 @@ -885,6 +885,7 @@ static struct freq_attr *acpi_cpufreq_attr[] = {

  static struct cpufreq_driver acpi_cpufreq_driver = {
 .verify = cpufreq_generic_frequency_table_verify,
 +   .flags  = CPUFREQ_HAVE_GOVERNOR_PER_POLICY,
 .target_index   = acpi_cpufreq_target,
 .bios_limit = acpi_processor_get_bios_limit,
 .init   = acpi_cpufreq_cpu_init,
 diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
 index 61190f6..4cb488a 100644
 --- a/drivers/cpufreq/cpufreq.c
 +++ b/drivers/cpufreq/cpufreq.c
 @@ -2195,9 +2195,7 @@ static int cpufreq_set_policy(struct cpufreq_policy 
 *polic
 /* end old governor */
 if (old_gov) {
 __cpufreq_governor(policy, CPUFREQ_GOV_STOP);
 -   up_write(policy-rwsem);
 __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT);
 -   down_write(policy-rwsem);
 }

 /* start new governor */
 @@ -2206,9 +2204,7 @@ static int cpufreq_set_policy(struct cpufreq_policy 
 *polic
 if (!__cpufreq_governor(policy, CPUFREQ_GOV_START))
 goto out;

 -   up_write(policy-rwsem);
 __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT);
 -   down_write(policy-rwsem);
 }

 /* new governor failed, so re-start old one */

 (which causes the acpi-cpufreq driver to emulate the behaviour of the arm
 cpufreq driver), and by doing

 echo ondemand  /sys/devices/system/cpu/cpu5/cpufreq/scaling_governor
 cat /sys/devices/system/cpu/cpu5/cpufreq/ondemand/*
 echo conservative  /sys/devices/system/cpu/cpu5/cpufreq/scaling_governor
 exit 0

 [Question: was the original reported deadlock real?  Did it really happen or
 did lockdep only report it (I may have asked this question previously and
 forgot the answer)?  The reason I ask is that this situation is very similar 
 to
 USB's device removal in which the sysfs attributes are removed for a device 
 but
 not the device it was called for.  I actually think that's 

Re: Locking issues with cpufreq and sysfs

2014-10-14 Thread Prarit Bhargava


On 10/14/2014 03:10 AM, Viresh Kumar wrote:
 On 13 October 2014 18:41, Prarit Bhargava pra...@redhat.com wrote:

 The locking is insufficient here, Viresh.  I no longer believe that fixes
 to this locking scheme are the right way to move forward here.  I'm wondering
 if we can look at other alternatives such as maintaining a refcount or
 perhaps using a queuing mechanism for governor and policy related changes.


Here's what I think we should do.  Taking a step back, the purpose of the
cpufreq sysfs files is to allow userspace to read current cpu frequency
settings, and to allow userspce to modify the governor and set the max  min
ranges for cpu frequencies.  This can be done per device or for all cpus
depending on the driver.

We have to guarantee that bothing reading and writing will always work and that
write operations will always be atomic relative to userspace.  The current
implementation of cpufreq does this through the following locks:

cpufreq_driver_lock: protects the cpufreq_cpu_data array and 
cpufreq_driver-boost
cpufreq_governor_lock: protects the current governor
cpufreq_governor_mutex: protects the cpufreq_governor_list
cpufreq_rwsem: protects the driver from being unloaded
global_kobj_lock: protects the cpufreq kobject
each policy has a rwsem (policy-rwsem): protects the cpufreq_policy struct
each policy has a transition_lock (policy-transition): synchronizes
frequency transitions

While examining this code I was wondering exactly why we allow multiple readers
and writers in cpufreq.  I could understand if we felt that this data was
critical; but it really isn't.  A short delay here isn't that big of a deal IMO
(if someone can produce a case where a delay would cause a serious problem I'd
 like to hear it).  I don't even think it is safe in most cases to allow readers
while cpufreq values are changing; if we're changing the governor userspace
cannot rely on the value of (for example) cpuinfo_max_freq.

So I'm proposing that we move to a single threaded read/write using, if
possible, a single policy lock for now.  We might transition this back to a
rwsem later on, however, for the first attempt at cleaning this up I think we
should just stick with a simple lock.  In doing that, IMO we remove

cpufreq_rwsem: protects the driver from being unloaded
cpufreq_governor_lock: protects the current governor
each policy has a rwsem (policy-rwsem): protects the cpufreq_policy struct

and potentially

cpufreq_driver_lock: protects the cpufreq_cpu_data array and 
cpufreq_driver-boost

After looking at the way the code would be structured, I'm wondering if

cpufreq_governor_mutex: protects the cpufreq_governor_list

is overkill.  The loading of a module should be atomic relative to the cpufreq
code, so this lock may not be required.  (Admittedly I haven't tested that...)

That would leave:

global_kobj_lock: protects the cpufreq kobject
each policy has a transition_lock (policy-transition): synchronizes
frequency transitions

and a new lock, perhaps called policy-lock, to serialize all events.

Pros: We clean all this up to a simpler single threaded model.  Bugs and races
here would be much easier to handle.  We're currently putting band-aid on
band-aids in this code ATM and it looks like we're seeing old races expanded
or new races exposed.

Cons: We lose the ability to do simultaneous reads and writes ... although
I remain unconvinced that this would ever be safe to do.  ie) If I change the
governor while at the same time reading, for example, the current cpu
frequency I cannot rely on that value to be valid.

After that we can add some reference counting to the sysfs file accesses
so that we can block after the sysfs removal when we change cpufreq
governors.  I think that would be trivial and that it would resolve any races
when adding and removing governor's sysfs files.

P.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Locking issues with cpufreq and sysfs

2014-10-14 Thread Elliott, Robert (Server Storage)


 -Original Message-
 From: linux-kernel-ow...@vger.kernel.org [mailto:linux-kernel-
 ow...@vger.kernel.org] On Behalf Of Prarit Bhargava
 Sent: Tuesday, 14 October, 2014 1:24 PM
 To: Viresh Kumar
 Cc: Saravana Kannan; Rafael J. Wysocki; linux...@vger.kernel.org; Linux
 Kernel; Robert Schöne
 Subject: Re: Locking issues with cpufreq and sysfs
 
 On 10/14/2014 03:10 AM, Viresh Kumar wrote:
  On 13 October 2014 18:41, Prarit Bhargava pra...@redhat.com wrote:
 
  The locking is insufficient here, Viresh.  I no longer believe that fixes
  to this locking scheme are the right way to move forward here.  I'm
 wondering
  if we can look at other alternatives such as maintaining a refcount or
  perhaps using a queuing mechanism for governor and policy related changes.
 
...
 So I'm proposing that we move to a single threaded read/write using, if
 possible, a single policy lock for now.  We might transition this back to a
 rwsem later on, however, for the first attempt at cleaning this up I think we
 should just stick with a simple lock.  In doing that, IMO we remove
 cpufreq_rwsem: protects the driver from being unloaded
 cpufreq_governor_lock: protects the current governor
 each policy has a rwsem (policy-rwsem): protects the cpufreq_policy struct
 
 and potentially
 cpufreq_driver_lock: protects the cpufreq_cpu_data array and cpufreq_driver-
 boost
 
 After looking at the way the code would be structured, I'm wondering if
 cpufreq_governor_mutex: protects the cpufreq_governor_list
 is overkill.  The loading of a module should be atomic relative to the
 cpufreq code, so this lock may not be required.  (Admittedly I haven't
 tested that...)
 
 That would leave:
 global_kobj_lock: protects the cpufreq kobject
 each policy has a transition_lock (policy-transition): synchronizes
 frequency transitions
 and a new lock, perhaps called policy-lock, to serialize all events.
 

Please keep performance in mind too.  cpufreq_governor_lock
contention is a bit of an issue with heavy IO workloads
as described in:
http://marc.info/?l=linux-pmm=140924051503827w=2


---
Rob ElliottHP Server Storage



N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a���
0��h���i

Re: Locking issues with cpufreq and sysfs

2014-10-13 Thread Rafael J. Wysocki
On Monday, October 13, 2014 09:22:49 AM Prarit Bhargava wrote:
> 
> On 10/13/2014 09:11 AM, Prarit Bhargava wrote:
> > There are several issues with the current locking design of cpufreq.  Most
> > notably is the panic reported here:
> > 
> > http://marc.info/?l=linux-kernel=140622451625236=2
> > 
> > which was introduced by commit 955ef4833574636819cd269cfbae12f79cbde63a,
> > cpufreq: Drop rwsem lock around CPUFREQ_GOV_POLICY_EXIT, which introduces
> > a race in the changing of the cpufreq policy.  This change was introduced
> > because of a lockdep deadlock warning that can be reproduced (on x86 with
> > the acpi_cpufreq driver) via the following debug patch:
> > 
> > iff --git a/drivers/cpufreq/acpi-cpufreq.c b/drivers/cpufreq/acpi-cpufreq.c
> > index b0c18ed..366cfb7 100644
> > --- a/drivers/cpufreq/acpi-cpufreq.c
> > +++ b/drivers/cpufreq/acpi-cpufreq.c
> > @@ -885,6 +885,7 @@ static struct freq_attr *acpi_cpufreq_attr[] = {
> > 
> >  static struct cpufreq_driver acpi_cpufreq_driver = {
> > .verify = cpufreq_generic_frequency_table_verify,
> > +   .flags  = CPUFREQ_HAVE_GOVERNOR_PER_POLICY,
> > .target_index   = acpi_cpufreq_target,
> > .bios_limit = acpi_processor_get_bios_limit,
> > .init   = acpi_cpufreq_cpu_init,
> > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> > index 61190f6..4cb488a 100644
> > --- a/drivers/cpufreq/cpufreq.c
> > +++ b/drivers/cpufreq/cpufreq.c
> > @@ -2195,9 +2195,7 @@ static int cpufreq_set_policy(struct cpufreq_policy 
> > *polic
> > /* end old governor */
> > if (old_gov) {
> > __cpufreq_governor(policy, CPUFREQ_GOV_STOP);
> > -   up_write(>rwsem);
> > __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT);
> > -   down_write(>rwsem);
> > }
> > 
> > /* start new governor */
> > @@ -2206,9 +2204,7 @@ static int cpufreq_set_policy(struct cpufreq_policy 
> > *polic
> > if (!__cpufreq_governor(policy, CPUFREQ_GOV_START))
> > goto out;
> > 
> > -   up_write(>rwsem);
> > __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT);
> > -   down_write(>rwsem);
> > }
> > 
> > /* new governor failed, so re-start old one */
> > 
> > (which causes the acpi-cpufreq driver to emulate the behaviour of the arm
> > cpufreq driver), and by doing
> > 
> > echo ondemand > /sys/devices/system/cpu/cpu5/cpufreq/scaling_governor
> > cat /sys/devices/system/cpu/cpu5/cpufreq/ondemand/*
> > echo conservative > /sys/devices/system/cpu/cpu5/cpufreq/scaling_governor
> > exit 0
> > 
> > [Question: was the original reported deadlock "real"?  Did it really happen 
> > or
> > did lockdep only report it (I may have asked this question previously and
> > forgot the answer)?  The reason I ask is that this situation is very 
> > similar to
> > USB's device removal in which the sysfs attributes are removed for a device 
> > but
> > not the device it was called for.  I actually think that's part of the 
> > problem
> > here.]
> > 
> > The above, obviously, is a complete hack of the code but in a sense does
> > mimic a proper locking fix.  However, even with this fix we are still left
> > with a race in accessing the sysfs files.  Consider the following example,
> > 
> > CPU 1: accesses scaling_setspeed to set cpu speed
> > 
> > simultaneously,
> > 
> > CPU 2: accesses scaling_governor to set governor to ondemand
> > 
> > CPU 1 & 2 race ... and this can result in different critical situations.
> > The first is that CPU 1 holds the scalling_setspeed open while CPU attempts
> > to change the governor.  This results in a syfs warning about creating a
> > file with an existing file name which in some cases can lead to additional
> > corruption and a panic.  The second case is that CPU 1's setting of the 
> > speed
> > is now done on the new governor -- which may or may not be correct.  In any
> > case an argument could be made that the userspace program doing this type
> > of action should be "smart" enough to confirm simultaneous changes... but
> > in any case the kernel should not panic or corrupt data.
> > 
> > The locking is insufficient here, Viresh.  I no longer believe that fixes
> > to this locking scheme are the right way to move forward here.  I'm 
> > wondering
> > if we can look at other alternatives such as maintaining a refcount or
> > perhaps using a queuing mechanism for governor and policy related changes.
> > 
> 
> Uh ... I meant this as "I'm willing to modify the code to do this but I'd like
> to know what everyone else thinks before I do anything" ;)

OK, that's constructive. :-)

Can we discuss the target design first, please?  You certainly have something
in mind, so can you describe it?

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe 

Re: Locking issues with cpufreq and sysfs

2014-10-13 Thread Prarit Bhargava


On 10/13/2014 09:11 AM, Prarit Bhargava wrote:
> There are several issues with the current locking design of cpufreq.  Most
> notably is the panic reported here:
> 
> http://marc.info/?l=linux-kernel=140622451625236=2
> 
> which was introduced by commit 955ef4833574636819cd269cfbae12f79cbde63a,
> cpufreq: Drop rwsem lock around CPUFREQ_GOV_POLICY_EXIT, which introduces
> a race in the changing of the cpufreq policy.  This change was introduced
> because of a lockdep deadlock warning that can be reproduced (on x86 with
> the acpi_cpufreq driver) via the following debug patch:
> 
> iff --git a/drivers/cpufreq/acpi-cpufreq.c b/drivers/cpufreq/acpi-cpufreq.c
> index b0c18ed..366cfb7 100644
> --- a/drivers/cpufreq/acpi-cpufreq.c
> +++ b/drivers/cpufreq/acpi-cpufreq.c
> @@ -885,6 +885,7 @@ static struct freq_attr *acpi_cpufreq_attr[] = {
> 
>  static struct cpufreq_driver acpi_cpufreq_driver = {
> .verify = cpufreq_generic_frequency_table_verify,
> +   .flags  = CPUFREQ_HAVE_GOVERNOR_PER_POLICY,
> .target_index   = acpi_cpufreq_target,
> .bios_limit = acpi_processor_get_bios_limit,
> .init   = acpi_cpufreq_cpu_init,
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 61190f6..4cb488a 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -2195,9 +2195,7 @@ static int cpufreq_set_policy(struct cpufreq_policy 
> *polic
> /* end old governor */
> if (old_gov) {
> __cpufreq_governor(policy, CPUFREQ_GOV_STOP);
> -   up_write(>rwsem);
> __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT);
> -   down_write(>rwsem);
> }
> 
> /* start new governor */
> @@ -2206,9 +2204,7 @@ static int cpufreq_set_policy(struct cpufreq_policy 
> *polic
> if (!__cpufreq_governor(policy, CPUFREQ_GOV_START))
> goto out;
> 
> -   up_write(>rwsem);
> __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT);
> -   down_write(>rwsem);
> }
> 
> /* new governor failed, so re-start old one */
> 
> (which causes the acpi-cpufreq driver to emulate the behaviour of the arm
> cpufreq driver), and by doing
> 
> echo ondemand > /sys/devices/system/cpu/cpu5/cpufreq/scaling_governor
> cat /sys/devices/system/cpu/cpu5/cpufreq/ondemand/*
> echo conservative > /sys/devices/system/cpu/cpu5/cpufreq/scaling_governor
> exit 0
> 
> [Question: was the original reported deadlock "real"?  Did it really happen or
> did lockdep only report it (I may have asked this question previously and
> forgot the answer)?  The reason I ask is that this situation is very similar 
> to
> USB's device removal in which the sysfs attributes are removed for a device 
> but
> not the device it was called for.  I actually think that's part of the problem
> here.]
> 
> The above, obviously, is a complete hack of the code but in a sense does
> mimic a proper locking fix.  However, even with this fix we are still left
> with a race in accessing the sysfs files.  Consider the following example,
> 
> CPU 1: accesses scaling_setspeed to set cpu speed
> 
> simultaneously,
> 
> CPU 2: accesses scaling_governor to set governor to ondemand
> 
> CPU 1 & 2 race ... and this can result in different critical situations.
> The first is that CPU 1 holds the scalling_setspeed open while CPU attempts
> to change the governor.  This results in a syfs warning about creating a
> file with an existing file name which in some cases can lead to additional
> corruption and a panic.  The second case is that CPU 1's setting of the speed
> is now done on the new governor -- which may or may not be correct.  In any
> case an argument could be made that the userspace program doing this type
> of action should be "smart" enough to confirm simultaneous changes... but
> in any case the kernel should not panic or corrupt data.
> 
> The locking is insufficient here, Viresh.  I no longer believe that fixes
> to this locking scheme are the right way to move forward here.  I'm wondering
> if we can look at other alternatives such as maintaining a refcount or
> perhaps using a queuing mechanism for governor and policy related changes.
> 

Uh ... I meant this as "I'm willing to modify the code to do this but I'd like
to know what everyone else thinks before I do anything" ;)

P.

> P.
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Locking issues with cpufreq and sysfs

2014-10-13 Thread Prarit Bhargava


On 10/13/2014 09:11 AM, Prarit Bhargava wrote:
 There are several issues with the current locking design of cpufreq.  Most
 notably is the panic reported here:
 
 http://marc.info/?l=linux-kernelm=140622451625236w=2
 
 which was introduced by commit 955ef4833574636819cd269cfbae12f79cbde63a,
 cpufreq: Drop rwsem lock around CPUFREQ_GOV_POLICY_EXIT, which introduces
 a race in the changing of the cpufreq policy.  This change was introduced
 because of a lockdep deadlock warning that can be reproduced (on x86 with
 the acpi_cpufreq driver) via the following debug patch:
 
 iff --git a/drivers/cpufreq/acpi-cpufreq.c b/drivers/cpufreq/acpi-cpufreq.c
 index b0c18ed..366cfb7 100644
 --- a/drivers/cpufreq/acpi-cpufreq.c
 +++ b/drivers/cpufreq/acpi-cpufreq.c
 @@ -885,6 +885,7 @@ static struct freq_attr *acpi_cpufreq_attr[] = {
 
  static struct cpufreq_driver acpi_cpufreq_driver = {
 .verify = cpufreq_generic_frequency_table_verify,
 +   .flags  = CPUFREQ_HAVE_GOVERNOR_PER_POLICY,
 .target_index   = acpi_cpufreq_target,
 .bios_limit = acpi_processor_get_bios_limit,
 .init   = acpi_cpufreq_cpu_init,
 diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
 index 61190f6..4cb488a 100644
 --- a/drivers/cpufreq/cpufreq.c
 +++ b/drivers/cpufreq/cpufreq.c
 @@ -2195,9 +2195,7 @@ static int cpufreq_set_policy(struct cpufreq_policy 
 *polic
 /* end old governor */
 if (old_gov) {
 __cpufreq_governor(policy, CPUFREQ_GOV_STOP);
 -   up_write(policy-rwsem);
 __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT);
 -   down_write(policy-rwsem);
 }
 
 /* start new governor */
 @@ -2206,9 +2204,7 @@ static int cpufreq_set_policy(struct cpufreq_policy 
 *polic
 if (!__cpufreq_governor(policy, CPUFREQ_GOV_START))
 goto out;
 
 -   up_write(policy-rwsem);
 __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT);
 -   down_write(policy-rwsem);
 }
 
 /* new governor failed, so re-start old one */
 
 (which causes the acpi-cpufreq driver to emulate the behaviour of the arm
 cpufreq driver), and by doing
 
 echo ondemand  /sys/devices/system/cpu/cpu5/cpufreq/scaling_governor
 cat /sys/devices/system/cpu/cpu5/cpufreq/ondemand/*
 echo conservative  /sys/devices/system/cpu/cpu5/cpufreq/scaling_governor
 exit 0
 
 [Question: was the original reported deadlock real?  Did it really happen or
 did lockdep only report it (I may have asked this question previously and
 forgot the answer)?  The reason I ask is that this situation is very similar 
 to
 USB's device removal in which the sysfs attributes are removed for a device 
 but
 not the device it was called for.  I actually think that's part of the problem
 here.]
 
 The above, obviously, is a complete hack of the code but in a sense does
 mimic a proper locking fix.  However, even with this fix we are still left
 with a race in accessing the sysfs files.  Consider the following example,
 
 CPU 1: accesses scaling_setspeed to set cpu speed
 
 simultaneously,
 
 CPU 2: accesses scaling_governor to set governor to ondemand
 
 CPU 1  2 race ... and this can result in different critical situations.
 The first is that CPU 1 holds the scalling_setspeed open while CPU attempts
 to change the governor.  This results in a syfs warning about creating a
 file with an existing file name which in some cases can lead to additional
 corruption and a panic.  The second case is that CPU 1's setting of the speed
 is now done on the new governor -- which may or may not be correct.  In any
 case an argument could be made that the userspace program doing this type
 of action should be smart enough to confirm simultaneous changes... but
 in any case the kernel should not panic or corrupt data.
 
 The locking is insufficient here, Viresh.  I no longer believe that fixes
 to this locking scheme are the right way to move forward here.  I'm wondering
 if we can look at other alternatives such as maintaining a refcount or
 perhaps using a queuing mechanism for governor and policy related changes.
 

Uh ... I meant this as I'm willing to modify the code to do this but I'd like
to know what everyone else thinks before I do anything ;)

P.

 P.
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Locking issues with cpufreq and sysfs

2014-10-13 Thread Rafael J. Wysocki
On Monday, October 13, 2014 09:22:49 AM Prarit Bhargava wrote:
 
 On 10/13/2014 09:11 AM, Prarit Bhargava wrote:
  There are several issues with the current locking design of cpufreq.  Most
  notably is the panic reported here:
  
  http://marc.info/?l=linux-kernelm=140622451625236w=2
  
  which was introduced by commit 955ef4833574636819cd269cfbae12f79cbde63a,
  cpufreq: Drop rwsem lock around CPUFREQ_GOV_POLICY_EXIT, which introduces
  a race in the changing of the cpufreq policy.  This change was introduced
  because of a lockdep deadlock warning that can be reproduced (on x86 with
  the acpi_cpufreq driver) via the following debug patch:
  
  iff --git a/drivers/cpufreq/acpi-cpufreq.c b/drivers/cpufreq/acpi-cpufreq.c
  index b0c18ed..366cfb7 100644
  --- a/drivers/cpufreq/acpi-cpufreq.c
  +++ b/drivers/cpufreq/acpi-cpufreq.c
  @@ -885,6 +885,7 @@ static struct freq_attr *acpi_cpufreq_attr[] = {
  
   static struct cpufreq_driver acpi_cpufreq_driver = {
  .verify = cpufreq_generic_frequency_table_verify,
  +   .flags  = CPUFREQ_HAVE_GOVERNOR_PER_POLICY,
  .target_index   = acpi_cpufreq_target,
  .bios_limit = acpi_processor_get_bios_limit,
  .init   = acpi_cpufreq_cpu_init,
  diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
  index 61190f6..4cb488a 100644
  --- a/drivers/cpufreq/cpufreq.c
  +++ b/drivers/cpufreq/cpufreq.c
  @@ -2195,9 +2195,7 @@ static int cpufreq_set_policy(struct cpufreq_policy 
  *polic
  /* end old governor */
  if (old_gov) {
  __cpufreq_governor(policy, CPUFREQ_GOV_STOP);
  -   up_write(policy-rwsem);
  __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT);
  -   down_write(policy-rwsem);
  }
  
  /* start new governor */
  @@ -2206,9 +2204,7 @@ static int cpufreq_set_policy(struct cpufreq_policy 
  *polic
  if (!__cpufreq_governor(policy, CPUFREQ_GOV_START))
  goto out;
  
  -   up_write(policy-rwsem);
  __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT);
  -   down_write(policy-rwsem);
  }
  
  /* new governor failed, so re-start old one */
  
  (which causes the acpi-cpufreq driver to emulate the behaviour of the arm
  cpufreq driver), and by doing
  
  echo ondemand  /sys/devices/system/cpu/cpu5/cpufreq/scaling_governor
  cat /sys/devices/system/cpu/cpu5/cpufreq/ondemand/*
  echo conservative  /sys/devices/system/cpu/cpu5/cpufreq/scaling_governor
  exit 0
  
  [Question: was the original reported deadlock real?  Did it really happen 
  or
  did lockdep only report it (I may have asked this question previously and
  forgot the answer)?  The reason I ask is that this situation is very 
  similar to
  USB's device removal in which the sysfs attributes are removed for a device 
  but
  not the device it was called for.  I actually think that's part of the 
  problem
  here.]
  
  The above, obviously, is a complete hack of the code but in a sense does
  mimic a proper locking fix.  However, even with this fix we are still left
  with a race in accessing the sysfs files.  Consider the following example,
  
  CPU 1: accesses scaling_setspeed to set cpu speed
  
  simultaneously,
  
  CPU 2: accesses scaling_governor to set governor to ondemand
  
  CPU 1  2 race ... and this can result in different critical situations.
  The first is that CPU 1 holds the scalling_setspeed open while CPU attempts
  to change the governor.  This results in a syfs warning about creating a
  file with an existing file name which in some cases can lead to additional
  corruption and a panic.  The second case is that CPU 1's setting of the 
  speed
  is now done on the new governor -- which may or may not be correct.  In any
  case an argument could be made that the userspace program doing this type
  of action should be smart enough to confirm simultaneous changes... but
  in any case the kernel should not panic or corrupt data.
  
  The locking is insufficient here, Viresh.  I no longer believe that fixes
  to this locking scheme are the right way to move forward here.  I'm 
  wondering
  if we can look at other alternatives such as maintaining a refcount or
  perhaps using a queuing mechanism for governor and policy related changes.
  
 
 Uh ... I meant this as I'm willing to modify the code to do this but I'd like
 to know what everyone else thinks before I do anything ;)

OK, that's constructive. :-)

Can we discuss the target design first, please?  You certainly have something
in mind, so can you describe it?

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/