Re: [RFC PATCH] cpufreq/hotplug: Fix cpu-hotplug cpufreq race conditions

2015-06-02 Thread Viresh Kumar
On 02-06-15, 12:26, Preeti U Murthy wrote:
> dbs_data->mutex_lock is used to serialize only START,STOP,LIMIT calls.
> It does not serialize EXIT/INIT with these operations, but that is

Yeah, we need to fix that.

To be honest, locking is in real bad shape in cpufreq core and its
required to get it fixed. I had some patches and was waiting for my
current series to get applied first, which would make life simpler.

> understandable. We need to note that EXIT can proceed in parallel with
> START/STOP/LIMIT operations which can result in null pointer
> dereferences of dbs_data.

That should be fixed, yeah.

> Yet again, this is due to the reason that you pointed out. One such case
> is __cpufreq_remove_dev_finish(). It also drops policy locks before
> calling into START/LIMIT. So this can proceed in parallel with an EXIT
> operation from a store. So dbs_data->mutex does not help serialize these
> two and START can refer a freed dbs_data. Consider the scenario today
> where CPUFREQ_HAVE_GOVERNOR_PER_POLICY is set itself.
> 
> CPU0  CPU1
> 
> cpufreq_set_policy()
> 
> __cpufreq_governor
> (CPUFREQ_GOV_POLICY_EXIT)
> since the only usage
> becomes 0.
>   __cpufreq_remove_dev_finish()
> 
> free(dbs_data)__cpufreq_governor
>   (CPUFRQ_GOV_START)
> 
>   dbs_data->mutex <= NULL dereference
> 
> This is what we hit initially.

That's why we shouldn't drop policy->rwsem lock at all for calling
governor-thing..

> So we will need the policy wide lock even for those drivers that have a
> governor per policy, before calls to __cpufreq_governor() in order to
> avoid such scenarios. So, your patch at

For all drivers actually.

> https://git.linaro.org/people/viresh.kumar/linux.git/patch/57714d5b1778f2f610bcc5c74d85b29ba1cc1995
> can lead to above races between different store operations themselves,
> won't it ?

Not sure, and I am not in real good touch right now around locking in
cpufreq core, need to spend enough time on that before getting that
resolved.

But the above patch + all the patches in that branch:
cpufreq/core/locking were an attempt to get the ABBA thing out of the
way. But I was still getting those warnings. One of the issues I faced
was that I never saw these ABBA warnings on my hardware :( and so was
difficult to get this tested.

> An EXIT on one of the policy cpus, followed by a STOP on
> another will lead to null dereference of dbs_data(For
> GOVERNOR_PER_POLICY). Today stores are serialized through the policy lock.
> 
> Anyway, since taking the policy lock leads to ABBA deadlock and

We need to solve this ABBA problem first. And things will simplify
then.

-- 
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] cpufreq/hotplug: Fix cpu-hotplug cpufreq race conditions

2015-06-02 Thread Preeti U Murthy
On 06/02/2015 11:57 AM, Viresh Kumar wrote:
> On 02-06-15, 11:50, Preeti U Murthy wrote:
>>  CPU0CPU1
>>
>> store* store*
>>
>> lock(policy 1)  lock(policy 2)
>> cpufreq_set_policy()   cpufreq_set_policy()
>> EXIT() :
>> dbs-data->usage_count--
>>
>> INIT()  EXIT()
> 
> When will INIT() follow EXIT() in set_policy() for the same governor ?
> Maybe not, and so this sequence is hypothetical ?

Ah, yes, this will be hypothetical.
> 
>> dbs_data exists dbs_data->usage_count -- = 0
>> kfree(dbs_data)
>> dbs-data->usage_count++
>> *NULL dereference*
> 
> But even if this happens, it should be handled with
> dbs_data->mutex_lock, which is used at other places already.

dbs_data->mutex_lock is used to serialize only START,STOP,LIMIT calls.
It does not serialize EXIT/INIT with these operations, but that is
understandable. We need to note that EXIT can proceed in parallel with
START/STOP/LIMIT operations which can result in null pointer
dereferences of dbs_data.

Yet again, this is due to the reason that you pointed out. One such case
is __cpufreq_remove_dev_finish(). It also drops policy locks before
calling into START/LIMIT. So this can proceed in parallel with an EXIT
operation from a store. So dbs_data->mutex does not help serialize these
two and START can refer a freed dbs_data. Consider the scenario today
where CPUFREQ_HAVE_GOVERNOR_PER_POLICY is set itself.

CPU0CPU1

cpufreq_set_policy()

__cpufreq_governor
(CPUFREQ_GOV_POLICY_EXIT)
since the only usage
becomes 0.
__cpufreq_remove_dev_finish()

free(dbs_data)  __cpufreq_governor
(CPUFRQ_GOV_START)

dbs_data->mutex <= NULL dereference

This is what we hit initially.

So we will need the policy wide lock even for those drivers that have a
governor per policy, before calls to __cpufreq_governor() in order to
avoid such scenarios. So, your patch at
https://git.linaro.org/people/viresh.kumar/linux.git/patch/57714d5b1778f2f610bcc5c74d85b29ba1cc1995
can lead to above races between different store operations themselves,
won't it ? An EXIT on one of the policy cpus, followed by a STOP on
another will lead to null dereference of dbs_data(For
GOVERNOR_PER_POLICY). Today stores are serialized through the policy lock.

Anyway, since taking the policy lock leads to ABBA deadlock and
releasing it can lead to races like above, shouldn't we try another
approach at serialization ?


Regards
Preeti U Murthy
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] cpufreq/hotplug: Fix cpu-hotplug cpufreq race conditions

2015-06-02 Thread Viresh Kumar
On 02-06-15, 11:50, Preeti U Murthy wrote:
>  CPU0CPU1
> 
> store* store*
> 
> lock(policy 1)  lock(policy 2)
> cpufreq_set_policy()   cpufreq_set_policy()
> EXIT() :
> dbs-data->usage_count--
> 
> INIT()  EXIT()

When will INIT() follow EXIT() in set_policy() for the same governor ?
Maybe not, and so this sequence is hypothetical ?

> dbs_data exists dbs_data->usage_count -- = 0
> kfree(dbs_data)
> dbs-data->usage_count++
> *NULL dereference*

But even if this happens, it should be handled with
dbs_data->mutex_lock, which is used at other places already.

-- 
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] cpufreq/hotplug: Fix cpu-hotplug cpufreq race conditions

2015-06-02 Thread Preeti U Murthy
On 06/02/2015 11:41 AM, Viresh Kumar wrote:
> On 02-06-15, 11:33, Preeti U Murthy wrote:
>> No, dbs_data is a governor wide data structure and not a policy wide
> 
> Yeah, that's the common part which I was referring to. But normally
> its just read for policies in START/STOP, they just update per-cpu
> data for policy->cpus.
> 
>> one, which is manipulated in START/STOP calls for drivers where the
>> CPUFREQ_HAVE_GOVERNOR_PER_POLICY is not set.
>>
>> So even if we assume that we hold per-policy locks, the following race
>> is still present. Assume that we have just two cpus which do not have a
>> governor-per-policy set.
>>
>> CPU0CPU1
>>
>>  store* store*
>>
>> lock(policy 1)  lock(policy 2)
>> cpufreq_set_policy()   cpufreq_set_policy()
>> EXIT() :
>> dbs-data->usage_count--
>>
>> INIT()
>> dbs_data exists
> 
> You missed the usage_count++ here.

Ok, sorry about that. How about the below ?
> 
>> so return
>> EXIT()
>> dbs_data->usage_count -- = 0
>> kfree(dbs_data)
> 
> And so this shouldn't happen. Else we
> are missing locking in governor's
> code, rather than cpufreq.c
> 
 CPU0CPU1

store* store*

lock(policy 1)  lock(policy 2)
cpufreq_set_policy()   cpufreq_set_policy()
EXIT() :
dbs-data->usage_count--

INIT()  EXIT()
dbs_data exists dbs_data->usage_count -- = 0
kfree(dbs_data)
dbs-data->usage_count++
*NULL dereference*

The point is there are potential race conditions. Its just a matter of
interleaving ?

Regards
Preeti U Murthy

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] cpufreq/hotplug: Fix cpu-hotplug cpufreq race conditions

2015-06-02 Thread Viresh Kumar
On 02-06-15, 11:33, Preeti U Murthy wrote:
> No, dbs_data is a governor wide data structure and not a policy wide

Yeah, that's the common part which I was referring to. But normally
its just read for policies in START/STOP, they just update per-cpu
data for policy->cpus.

> one, which is manipulated in START/STOP calls for drivers where the
> CPUFREQ_HAVE_GOVERNOR_PER_POLICY is not set.
> 
> So even if we assume that we hold per-policy locks, the following race
> is still present. Assume that we have just two cpus which do not have a
> governor-per-policy set.
> 
> CPU0CPU1
> 
>  store* store*
> 
> lock(policy 1)  lock(policy 2)
> cpufreq_set_policy()   cpufreq_set_policy()
> EXIT() :
> dbs-data->usage_count--
> 
> INIT()
> dbs_data exists

You missed the usage_count++ here.

> so return
> EXIT()
> dbs_data->usage_count -- = 0
> kfree(dbs_data)

And so this shouldn't happen. Else we
are missing locking in governor's
code, rather than cpufreq.c

-- 
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] cpufreq/hotplug: Fix cpu-hotplug cpufreq race conditions

2015-06-02 Thread Preeti U Murthy
On 06/02/2015 11:09 AM, Viresh Kumar wrote:
> On 02-06-15, 11:01, Preeti U Murthy wrote:
>> How will a policy lock help here at all, when cpus from multiple
>> policies are calling into __cpufreq_governor() ? How will a policy lock
>> serialize their entry into cpufreq_governor_dbs() ?
> 
> So different policies don't really depend on each other. The only
> thing common to them are the governor's sysfs files (only if
> governor-per-policy isn't set, i.e. in your case). Those sysfs files
> and their kernel counterpart variables aren't touched unless all the
> policies have EXITED. All these START/STOP calls touch only the data
> relevant to those policies only.

No, dbs_data is a governor wide data structure and not a policy wide
one, which is manipulated in START/STOP calls for drivers where the
CPUFREQ_HAVE_GOVERNOR_PER_POLICY is not set.

So even if we assume that we hold per-policy locks, the following race
is still present. Assume that we have just two cpus which do not have a
governor-per-policy set.

CPU0CPU1

 store* store*

lock(policy 1)  lock(policy 2)
cpufreq_set_policy()   cpufreq_set_policy()
EXIT() :
dbs-data->usage_count--

INIT()
dbs_data exists
so return
EXIT()
dbs_data->usage_count -- = 0
kfree(dbs_data)

START()
dereference dbs_data
*NULL dereference*


Regards
Preeti U Murthy

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] cpufreq/hotplug: Fix cpu-hotplug cpufreq race conditions

2015-06-02 Thread Viresh Kumar
On 02-06-15, 12:26, Preeti U Murthy wrote:
 dbs_data-mutex_lock is used to serialize only START,STOP,LIMIT calls.
 It does not serialize EXIT/INIT with these operations, but that is

Yeah, we need to fix that.

To be honest, locking is in real bad shape in cpufreq core and its
required to get it fixed. I had some patches and was waiting for my
current series to get applied first, which would make life simpler.

 understandable. We need to note that EXIT can proceed in parallel with
 START/STOP/LIMIT operations which can result in null pointer
 dereferences of dbs_data.

That should be fixed, yeah.

 Yet again, this is due to the reason that you pointed out. One such case
 is __cpufreq_remove_dev_finish(). It also drops policy locks before
 calling into START/LIMIT. So this can proceed in parallel with an EXIT
 operation from a store. So dbs_data-mutex does not help serialize these
 two and START can refer a freed dbs_data. Consider the scenario today
 where CPUFREQ_HAVE_GOVERNOR_PER_POLICY is set itself.
 
 CPU0  CPU1
 
 cpufreq_set_policy()
 
 __cpufreq_governor
 (CPUFREQ_GOV_POLICY_EXIT)
 since the only usage
 becomes 0.
   __cpufreq_remove_dev_finish()
 
 free(dbs_data)__cpufreq_governor
   (CPUFRQ_GOV_START)
 
   dbs_data-mutex = NULL dereference
 
 This is what we hit initially.

That's why we shouldn't drop policy-rwsem lock at all for calling
governor-thing..

 So we will need the policy wide lock even for those drivers that have a
 governor per policy, before calls to __cpufreq_governor() in order to
 avoid such scenarios. So, your patch at

For all drivers actually.

 https://git.linaro.org/people/viresh.kumar/linux.git/patch/57714d5b1778f2f610bcc5c74d85b29ba1cc1995
 can lead to above races between different store operations themselves,
 won't it ?

Not sure, and I am not in real good touch right now around locking in
cpufreq core, need to spend enough time on that before getting that
resolved.

But the above patch + all the patches in that branch:
cpufreq/core/locking were an attempt to get the ABBA thing out of the
way. But I was still getting those warnings. One of the issues I faced
was that I never saw these ABBA warnings on my hardware :( and so was
difficult to get this tested.

 An EXIT on one of the policy cpus, followed by a STOP on
 another will lead to null dereference of dbs_data(For
 GOVERNOR_PER_POLICY). Today stores are serialized through the policy lock.
 
 Anyway, since taking the policy lock leads to ABBA deadlock and

We need to solve this ABBA problem first. And things will simplify
then.

-- 
viresh
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] cpufreq/hotplug: Fix cpu-hotplug cpufreq race conditions

2015-06-02 Thread Preeti U Murthy
On 06/02/2015 11:57 AM, Viresh Kumar wrote:
 On 02-06-15, 11:50, Preeti U Murthy wrote:
  CPU0CPU1

 store* store*

 lock(policy 1)  lock(policy 2)
 cpufreq_set_policy()   cpufreq_set_policy()
 EXIT() :
 dbs-data-usage_count--

 INIT()  EXIT()
 
 When will INIT() follow EXIT() in set_policy() for the same governor ?
 Maybe not, and so this sequence is hypothetical ?

Ah, yes, this will be hypothetical.
 
 dbs_data exists dbs_data-usage_count -- = 0
 kfree(dbs_data)
 dbs-data-usage_count++
 *NULL dereference*
 
 But even if this happens, it should be handled with
 dbs_data-mutex_lock, which is used at other places already.

dbs_data-mutex_lock is used to serialize only START,STOP,LIMIT calls.
It does not serialize EXIT/INIT with these operations, but that is
understandable. We need to note that EXIT can proceed in parallel with
START/STOP/LIMIT operations which can result in null pointer
dereferences of dbs_data.

Yet again, this is due to the reason that you pointed out. One such case
is __cpufreq_remove_dev_finish(). It also drops policy locks before
calling into START/LIMIT. So this can proceed in parallel with an EXIT
operation from a store. So dbs_data-mutex does not help serialize these
two and START can refer a freed dbs_data. Consider the scenario today
where CPUFREQ_HAVE_GOVERNOR_PER_POLICY is set itself.

CPU0CPU1

cpufreq_set_policy()

__cpufreq_governor
(CPUFREQ_GOV_POLICY_EXIT)
since the only usage
becomes 0.
__cpufreq_remove_dev_finish()

free(dbs_data)  __cpufreq_governor
(CPUFRQ_GOV_START)

dbs_data-mutex = NULL dereference

This is what we hit initially.

So we will need the policy wide lock even for those drivers that have a
governor per policy, before calls to __cpufreq_governor() in order to
avoid such scenarios. So, your patch at
https://git.linaro.org/people/viresh.kumar/linux.git/patch/57714d5b1778f2f610bcc5c74d85b29ba1cc1995
can lead to above races between different store operations themselves,
won't it ? An EXIT on one of the policy cpus, followed by a STOP on
another will lead to null dereference of dbs_data(For
GOVERNOR_PER_POLICY). Today stores are serialized through the policy lock.

Anyway, since taking the policy lock leads to ABBA deadlock and
releasing it can lead to races like above, shouldn't we try another
approach at serialization ?


Regards
Preeti U Murthy
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] cpufreq/hotplug: Fix cpu-hotplug cpufreq race conditions

2015-06-02 Thread Preeti U Murthy
On 06/02/2015 11:41 AM, Viresh Kumar wrote:
 On 02-06-15, 11:33, Preeti U Murthy wrote:
 No, dbs_data is a governor wide data structure and not a policy wide
 
 Yeah, that's the common part which I was referring to. But normally
 its just read for policies in START/STOP, they just update per-cpu
 data for policy-cpus.
 
 one, which is manipulated in START/STOP calls for drivers where the
 CPUFREQ_HAVE_GOVERNOR_PER_POLICY is not set.

 So even if we assume that we hold per-policy locks, the following race
 is still present. Assume that we have just two cpus which do not have a
 governor-per-policy set.

 CPU0CPU1

  store* store*

 lock(policy 1)  lock(policy 2)
 cpufreq_set_policy()   cpufreq_set_policy()
 EXIT() :
 dbs-data-usage_count--

 INIT()
 dbs_data exists
 
 You missed the usage_count++ here.

Ok, sorry about that. How about the below ?
 
 so return
 EXIT()
 dbs_data-usage_count -- = 0
 kfree(dbs_data)
 
 And so this shouldn't happen. Else we
 are missing locking in governor's
 code, rather than cpufreq.c
 
 CPU0CPU1

store* store*

lock(policy 1)  lock(policy 2)
cpufreq_set_policy()   cpufreq_set_policy()
EXIT() :
dbs-data-usage_count--

INIT()  EXIT()
dbs_data exists dbs_data-usage_count -- = 0
kfree(dbs_data)
dbs-data-usage_count++
*NULL dereference*

The point is there are potential race conditions. Its just a matter of
interleaving ?

Regards
Preeti U Murthy

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] cpufreq/hotplug: Fix cpu-hotplug cpufreq race conditions

2015-06-02 Thread Viresh Kumar
On 02-06-15, 11:50, Preeti U Murthy wrote:
  CPU0CPU1
 
 store* store*
 
 lock(policy 1)  lock(policy 2)
 cpufreq_set_policy()   cpufreq_set_policy()
 EXIT() :
 dbs-data-usage_count--
 
 INIT()  EXIT()

When will INIT() follow EXIT() in set_policy() for the same governor ?
Maybe not, and so this sequence is hypothetical ?

 dbs_data exists dbs_data-usage_count -- = 0
 kfree(dbs_data)
 dbs-data-usage_count++
 *NULL dereference*

But even if this happens, it should be handled with
dbs_data-mutex_lock, which is used at other places already.

-- 
viresh
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] cpufreq/hotplug: Fix cpu-hotplug cpufreq race conditions

2015-06-02 Thread Preeti U Murthy
On 06/02/2015 11:09 AM, Viresh Kumar wrote:
 On 02-06-15, 11:01, Preeti U Murthy wrote:
 How will a policy lock help here at all, when cpus from multiple
 policies are calling into __cpufreq_governor() ? How will a policy lock
 serialize their entry into cpufreq_governor_dbs() ?
 
 So different policies don't really depend on each other. The only
 thing common to them are the governor's sysfs files (only if
 governor-per-policy isn't set, i.e. in your case). Those sysfs files
 and their kernel counterpart variables aren't touched unless all the
 policies have EXITED. All these START/STOP calls touch only the data
 relevant to those policies only.

No, dbs_data is a governor wide data structure and not a policy wide
one, which is manipulated in START/STOP calls for drivers where the
CPUFREQ_HAVE_GOVERNOR_PER_POLICY is not set.

So even if we assume that we hold per-policy locks, the following race
is still present. Assume that we have just two cpus which do not have a
governor-per-policy set.

CPU0CPU1

 store* store*

lock(policy 1)  lock(policy 2)
cpufreq_set_policy()   cpufreq_set_policy()
EXIT() :
dbs-data-usage_count--

INIT()
dbs_data exists
so return
EXIT()
dbs_data-usage_count -- = 0
kfree(dbs_data)

START()
dereference dbs_data
*NULL dereference*


Regards
Preeti U Murthy

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] cpufreq/hotplug: Fix cpu-hotplug cpufreq race conditions

2015-06-02 Thread Viresh Kumar
On 02-06-15, 11:33, Preeti U Murthy wrote:
 No, dbs_data is a governor wide data structure and not a policy wide

Yeah, that's the common part which I was referring to. But normally
its just read for policies in START/STOP, they just update per-cpu
data for policy-cpus.

 one, which is manipulated in START/STOP calls for drivers where the
 CPUFREQ_HAVE_GOVERNOR_PER_POLICY is not set.
 
 So even if we assume that we hold per-policy locks, the following race
 is still present. Assume that we have just two cpus which do not have a
 governor-per-policy set.
 
 CPU0CPU1
 
  store* store*
 
 lock(policy 1)  lock(policy 2)
 cpufreq_set_policy()   cpufreq_set_policy()
 EXIT() :
 dbs-data-usage_count--
 
 INIT()
 dbs_data exists

You missed the usage_count++ here.

 so return
 EXIT()
 dbs_data-usage_count -- = 0
 kfree(dbs_data)

And so this shouldn't happen. Else we
are missing locking in governor's
code, rather than cpufreq.c

-- 
viresh
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] cpufreq/hotplug: Fix cpu-hotplug cpufreq race conditions

2015-06-01 Thread Viresh Kumar
On 02-06-15, 11:01, Preeti U Murthy wrote:
> How will a policy lock help here at all, when cpus from multiple
> policies are calling into __cpufreq_governor() ? How will a policy lock
> serialize their entry into cpufreq_governor_dbs() ?

So different policies don't really depend on each other. The only
thing common to them are the governor's sysfs files (only if
governor-per-policy isn't set, i.e. in your case). Those sysfs files
and their kernel counterpart variables aren't touched unless all the
policies have EXITED. All these START/STOP calls touch only the data
relevant to those policies only.

In case of per-policy  governors, even those sysfs files are separate
for each policy.

And so a policy lock should be sufficient, rest should be handled
within the governors with locks or whatever.

> > These band-aid wouldn't take us anywhere.
> 
> Why do you say that the approach mentioned in this patch is a bandaid ?
> The patch ensures that there are no interruptions in a logical sequence
> of calls into cpufreq_governor_dbs(), as it should be.

Because this happened as we are forced to drop the policy-locks.
That's the real problem. This whole thing should be performed under
locks, instead of setting variables to mark governor busy under locks.

-- 
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] cpufreq/hotplug: Fix cpu-hotplug cpufreq race conditions

2015-06-01 Thread Preeti U Murthy
On 06/01/2015 12:49 PM, Viresh Kumar wrote:
> On 01-06-15, 01:40, Preeti U Murthy wrote:
> 
> I have to mention that this is somewhat inspired by:
> 
> https://git.linaro.org/people/viresh.kumar/linux.git/commit/1e37f1d6ae12f5896e4e216f986762c3050129a5
> 
> and I was waiting to finish some core-changes to make all this simple.
> 
> I am fine to you trying to finish it though :)
> 
>> The problem showed up when running hotplug operations and changing
>> governors in parallel. The crash would be at:
>>
>> [  174.319645] Unable to handle kernel paging request for data at address 
>> 0x
>> [  174.319782] Faulting instruction address: 0xc053b3e0
>> cpu 0x1: Vector: 300 (Data Access) at [c3fdb870]
>> pc: c053b3e0: __bitmap_weight+0x70/0x100
>> lr: c085a338: need_load_eval+0x38/0xf0
>> sp: c3fdbaf0
>>msr: 90019033
>>dar: 0
>>  dsisr: 4000
>>   current = 0xc3151a40
>>   paca= 0xc7da0980softe: 0irq_happened: 0x01
>> pid   = 842, comm = kworker/1:2
>> enter ? for help
>> [c3fdbb40] c085a338 need_load_eval+0x38/0xf0
>> [c3fdbb70] c0856a10 od_dbs_timer+0x90/0x1e0
>> [c3fdbbe0] c00f489c process_one_work+0x24c/0x910
>> [c3fdbc90] c00f50dc worker_thread+0x17c/0x540
>> [c3fdbd20] c00fed70 kthread+0x120/0x140
>> [c3fdbe30] c0009678 ret_from_kernel_thread+0x5c/0x64
>>
>> While debugging the issue, other problems in this area were uncovered,
>> all of them necessitating serialized calls to __cpufreq_governor().  One
>> potential race condition that can happen today is the below:
>>
>> CPU0 CPU1
>>
>> cpufreq_set_policy()
>>
>> __cpufreq_governor
>> (CPUFREQ_GOV_POLICY_EXIT)
>>  __cpufreq_remove_dev_finish()
>>
>> free(dbs_data)   __cpufreq_governor
>>  (CPUFRQ_GOV_START)
>>
>>  dbs_data->mutex <= NULL dereference
>>
>> The issue here is that calls to cpufreq_governor_dbs() is not serialized
>> and they can conflict with each other in numerous ways. One way to sort
>> this out would be to serialize all calls to cpufreq_governor_dbs()
>> by setting the governor busy if a call is in progress and
>> blocking all other calls. But this approach will not cover all loop
>> holes. Take the above scenario: CPU1 will still hit a NULL dereference if
>> care is not taken to check for a NULL dbs_data.
>>
>> To sort such scenarios, we could filter out the sequence of events: A
>> CPUFREQ_GOV_START cannot be called without an INIT, if the previous
>> event was an EXIT. However this results in analysing all possible
>> sequence of events and adding each of them as a filter. This results in
>> unmanagable code. There is high probability of missing out on a race
>> condition. Both the above approaches were tried out earlier [1]
> 
> I agree.
> 
>> Let us therefore look at the heart of the issue.
> 
> Yeah, we should :)
> 
>> It is not really about
>> serializing calls to cpufreq_governor_dbs(), it seems to be about
>> serializing entire sequence of CPUFREQ_GOV* operations. For instance, in
>> cpufreq_set_policy(), we STOP,EXIT the old policy and INIT and START the
>> new policy. Between the EXIT and INIT, there must not be
>> anybody else starting the policy. And between INIT and START, there must
>> be nobody stopping the policy.
> 
> Hmm..
> 
>> A similar argument holds for the CPUFREQ_GOV* operations in
>>  __cpufreq_policy_dev_{prepare|finish} and cpufreq_add_policy(). Hence
>> until each of these functions complete in totality, none of the others
>> should run in parallel. The interleaving of the individual calls to
>> cpufreq_governor_dbs() is resulting in invalid operations. This patch
>> therefore tries to serialize entire cpufreq functions calling CPUFREQ_GOV*
>> operations, with respect to each other.
> 
> We were forced to put band-aids until this time and I am really
> looking into getting this fixed at the root.
> 
> The problem is that we drop policy locks before calling
> __cpufreq_governor() and that's the root cause of all these problems
> we are facing. We did that because we were getting warnings about
> circular locks (955ef4833574 ("cpufreq: Drop rwsem lock around
> CPUFREQ_GOV_POLICY_EXIT"))..
> 
> I have explained that problem here (Never sent this upstream, as I was
> waiting for some other patches to get included first):
> https://git.linaro.org/people/viresh.kumar/linux.git/commit/57714d5b1778f2f610bcc5c74d85b29ba1cc1995
> 
> The actual problem was:
> If we hold any locks, that the attribute operations grab, when
> removing the attribute, then it can result in a ABBA deadlock.
> 
> show()/store() holds the policy->rwsem lock while accessing any sysfs
> attributes under cpu/cpuX/cpufreq/ directory.
> 
> But something like what I have done is the real way to tackle all
> 

Re: [RFC PATCH] cpufreq/hotplug: Fix cpu-hotplug cpufreq race conditions

2015-06-01 Thread Preeti U Murthy
On 06/01/2015 12:49 PM, Viresh Kumar wrote:
> On 01-06-15, 01:40, Preeti U Murthy wrote:
> 
> I have to mention that this is somewhat inspired by:
> 
> https://git.linaro.org/people/viresh.kumar/linux.git/commit/1e37f1d6ae12f5896e4e216f986762c3050129a5
> 
> and I was waiting to finish some core-changes to make all this simple.
> 
> I am fine to you trying to finish it though :)

I am extremely sorry for not having pointed it out explicitly. I assumed
mentioning a reference to it would do. But in retrospect, I should have
stated it clearly.

I will remember to check with the previous authors about their progress
on a previously posted out patch before posting out my version of the
same. Thank you for pointing it out :)

Regards
Preeti U Murthy

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] cpufreq/hotplug: Fix cpu-hotplug cpufreq race conditions

2015-06-01 Thread Viresh Kumar
On 01-06-15, 01:40, Preeti U Murthy wrote:

I have to mention that this is somewhat inspired by:

https://git.linaro.org/people/viresh.kumar/linux.git/commit/1e37f1d6ae12f5896e4e216f986762c3050129a5

and I was waiting to finish some core-changes to make all this simple.

I am fine to you trying to finish it though :)

> The problem showed up when running hotplug operations and changing
> governors in parallel. The crash would be at:
> 
> [  174.319645] Unable to handle kernel paging request for data at address 
> 0x
> [  174.319782] Faulting instruction address: 0xc053b3e0
> cpu 0x1: Vector: 300 (Data Access) at [c3fdb870]
> pc: c053b3e0: __bitmap_weight+0x70/0x100
> lr: c085a338: need_load_eval+0x38/0xf0
> sp: c3fdbaf0
>msr: 90019033
>dar: 0
>  dsisr: 4000
>   current = 0xc3151a40
>   paca= 0xc7da0980 softe: 0irq_happened: 0x01
> pid   = 842, comm = kworker/1:2
> enter ? for help
> [c3fdbb40] c085a338 need_load_eval+0x38/0xf0
> [c3fdbb70] c0856a10 od_dbs_timer+0x90/0x1e0
> [c3fdbbe0] c00f489c process_one_work+0x24c/0x910
> [c3fdbc90] c00f50dc worker_thread+0x17c/0x540
> [c3fdbd20] c00fed70 kthread+0x120/0x140
> [c3fdbe30] c0009678 ret_from_kernel_thread+0x5c/0x64
> 
> While debugging the issue, other problems in this area were uncovered,
> all of them necessitating serialized calls to __cpufreq_governor().  One
> potential race condition that can happen today is the below:
> 
> CPU0  CPU1
> 
> cpufreq_set_policy()
> 
> __cpufreq_governor
> (CPUFREQ_GOV_POLICY_EXIT)
>   __cpufreq_remove_dev_finish()
> 
> free(dbs_data)__cpufreq_governor
>   (CPUFRQ_GOV_START)
> 
>   dbs_data->mutex <= NULL dereference
> 
> The issue here is that calls to cpufreq_governor_dbs() is not serialized
> and they can conflict with each other in numerous ways. One way to sort
> this out would be to serialize all calls to cpufreq_governor_dbs()
> by setting the governor busy if a call is in progress and
> blocking all other calls. But this approach will not cover all loop
> holes. Take the above scenario: CPU1 will still hit a NULL dereference if
> care is not taken to check for a NULL dbs_data.
> 
> To sort such scenarios, we could filter out the sequence of events: A
> CPUFREQ_GOV_START cannot be called without an INIT, if the previous
> event was an EXIT. However this results in analysing all possible
> sequence of events and adding each of them as a filter. This results in
> unmanagable code. There is high probability of missing out on a race
> condition. Both the above approaches were tried out earlier [1]

I agree.

> Let us therefore look at the heart of the issue.

Yeah, we should :)

> It is not really about
> serializing calls to cpufreq_governor_dbs(), it seems to be about
> serializing entire sequence of CPUFREQ_GOV* operations. For instance, in
> cpufreq_set_policy(), we STOP,EXIT the old policy and INIT and START the
> new policy. Between the EXIT and INIT, there must not be
> anybody else starting the policy. And between INIT and START, there must
> be nobody stopping the policy.

Hmm..

> A similar argument holds for the CPUFREQ_GOV* operations in
>  __cpufreq_policy_dev_{prepare|finish} and cpufreq_add_policy(). Hence
> until each of these functions complete in totality, none of the others
> should run in parallel. The interleaving of the individual calls to
> cpufreq_governor_dbs() is resulting in invalid operations. This patch
> therefore tries to serialize entire cpufreq functions calling CPUFREQ_GOV*
> operations, with respect to each other.

We were forced to put band-aids until this time and I am really
looking into getting this fixed at the root.

The problem is that we drop policy locks before calling
__cpufreq_governor() and that's the root cause of all these problems
we are facing. We did that because we were getting warnings about
circular locks (955ef4833574 ("cpufreq: Drop rwsem lock around
CPUFREQ_GOV_POLICY_EXIT"))..

I have explained that problem here (Never sent this upstream, as I was
waiting for some other patches to get included first):
https://git.linaro.org/people/viresh.kumar/linux.git/commit/57714d5b1778f2f610bcc5c74d85b29ba1cc1995

The actual problem was:
If we hold any locks, that the attribute operations grab, when
removing the attribute, then it can result in a ABBA deadlock.

show()/store() holds the policy->rwsem lock while accessing any sysfs
attributes under cpu/cpuX/cpufreq/ directory.

But something like what I have done is the real way to tackle all
these problems.

These band-aid wouldn't take us anywhere.

-- 
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to 

Re: [RFC PATCH] cpufreq/hotplug: Fix cpu-hotplug cpufreq race conditions

2015-06-01 Thread Viresh Kumar
On 01-06-15, 01:40, Preeti U Murthy wrote:

I have to mention that this is somewhat inspired by:

https://git.linaro.org/people/viresh.kumar/linux.git/commit/1e37f1d6ae12f5896e4e216f986762c3050129a5

and I was waiting to finish some core-changes to make all this simple.

I am fine to you trying to finish it though :)

 The problem showed up when running hotplug operations and changing
 governors in parallel. The crash would be at:
 
 [  174.319645] Unable to handle kernel paging request for data at address 
 0x
 [  174.319782] Faulting instruction address: 0xc053b3e0
 cpu 0x1: Vector: 300 (Data Access) at [c3fdb870]
 pc: c053b3e0: __bitmap_weight+0x70/0x100
 lr: c085a338: need_load_eval+0x38/0xf0
 sp: c3fdbaf0
msr: 90019033
dar: 0
  dsisr: 4000
   current = 0xc3151a40
   paca= 0xc7da0980 softe: 0irq_happened: 0x01
 pid   = 842, comm = kworker/1:2
 enter ? for help
 [c3fdbb40] c085a338 need_load_eval+0x38/0xf0
 [c3fdbb70] c0856a10 od_dbs_timer+0x90/0x1e0
 [c3fdbbe0] c00f489c process_one_work+0x24c/0x910
 [c3fdbc90] c00f50dc worker_thread+0x17c/0x540
 [c3fdbd20] c00fed70 kthread+0x120/0x140
 [c3fdbe30] c0009678 ret_from_kernel_thread+0x5c/0x64
 
 While debugging the issue, other problems in this area were uncovered,
 all of them necessitating serialized calls to __cpufreq_governor().  One
 potential race condition that can happen today is the below:
 
 CPU0  CPU1
 
 cpufreq_set_policy()
 
 __cpufreq_governor
 (CPUFREQ_GOV_POLICY_EXIT)
   __cpufreq_remove_dev_finish()
 
 free(dbs_data)__cpufreq_governor
   (CPUFRQ_GOV_START)
 
   dbs_data-mutex = NULL dereference
 
 The issue here is that calls to cpufreq_governor_dbs() is not serialized
 and they can conflict with each other in numerous ways. One way to sort
 this out would be to serialize all calls to cpufreq_governor_dbs()
 by setting the governor busy if a call is in progress and
 blocking all other calls. But this approach will not cover all loop
 holes. Take the above scenario: CPU1 will still hit a NULL dereference if
 care is not taken to check for a NULL dbs_data.
 
 To sort such scenarios, we could filter out the sequence of events: A
 CPUFREQ_GOV_START cannot be called without an INIT, if the previous
 event was an EXIT. However this results in analysing all possible
 sequence of events and adding each of them as a filter. This results in
 unmanagable code. There is high probability of missing out on a race
 condition. Both the above approaches were tried out earlier [1]

I agree.

 Let us therefore look at the heart of the issue.

Yeah, we should :)

 It is not really about
 serializing calls to cpufreq_governor_dbs(), it seems to be about
 serializing entire sequence of CPUFREQ_GOV* operations. For instance, in
 cpufreq_set_policy(), we STOP,EXIT the old policy and INIT and START the
 new policy. Between the EXIT and INIT, there must not be
 anybody else starting the policy. And between INIT and START, there must
 be nobody stopping the policy.

Hmm..

 A similar argument holds for the CPUFREQ_GOV* operations in
  __cpufreq_policy_dev_{prepare|finish} and cpufreq_add_policy(). Hence
 until each of these functions complete in totality, none of the others
 should run in parallel. The interleaving of the individual calls to
 cpufreq_governor_dbs() is resulting in invalid operations. This patch
 therefore tries to serialize entire cpufreq functions calling CPUFREQ_GOV*
 operations, with respect to each other.

We were forced to put band-aids until this time and I am really
looking into getting this fixed at the root.

The problem is that we drop policy locks before calling
__cpufreq_governor() and that's the root cause of all these problems
we are facing. We did that because we were getting warnings about
circular locks (955ef4833574 (cpufreq: Drop rwsem lock around
CPUFREQ_GOV_POLICY_EXIT))..

I have explained that problem here (Never sent this upstream, as I was
waiting for some other patches to get included first):
https://git.linaro.org/people/viresh.kumar/linux.git/commit/57714d5b1778f2f610bcc5c74d85b29ba1cc1995

The actual problem was:
If we hold any locks, that the attribute operations grab, when
removing the attribute, then it can result in a ABBA deadlock.

show()/store() holds the policy-rwsem lock while accessing any sysfs
attributes under cpu/cpuX/cpufreq/ directory.

But something like what I have done is the real way to tackle all
these problems.

These band-aid wouldn't take us anywhere.

-- 
viresh
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

Re: [RFC PATCH] cpufreq/hotplug: Fix cpu-hotplug cpufreq race conditions

2015-06-01 Thread Preeti U Murthy
On 06/01/2015 12:49 PM, Viresh Kumar wrote:
 On 01-06-15, 01:40, Preeti U Murthy wrote:
 
 I have to mention that this is somewhat inspired by:
 
 https://git.linaro.org/people/viresh.kumar/linux.git/commit/1e37f1d6ae12f5896e4e216f986762c3050129a5
 
 and I was waiting to finish some core-changes to make all this simple.
 
 I am fine to you trying to finish it though :)

I am extremely sorry for not having pointed it out explicitly. I assumed
mentioning a reference to it would do. But in retrospect, I should have
stated it clearly.

I will remember to check with the previous authors about their progress
on a previously posted out patch before posting out my version of the
same. Thank you for pointing it out :)

Regards
Preeti U Murthy

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] cpufreq/hotplug: Fix cpu-hotplug cpufreq race conditions

2015-06-01 Thread Preeti U Murthy
On 06/01/2015 12:49 PM, Viresh Kumar wrote:
 On 01-06-15, 01:40, Preeti U Murthy wrote:
 
 I have to mention that this is somewhat inspired by:
 
 https://git.linaro.org/people/viresh.kumar/linux.git/commit/1e37f1d6ae12f5896e4e216f986762c3050129a5
 
 and I was waiting to finish some core-changes to make all this simple.
 
 I am fine to you trying to finish it though :)
 
 The problem showed up when running hotplug operations and changing
 governors in parallel. The crash would be at:

 [  174.319645] Unable to handle kernel paging request for data at address 
 0x
 [  174.319782] Faulting instruction address: 0xc053b3e0
 cpu 0x1: Vector: 300 (Data Access) at [c3fdb870]
 pc: c053b3e0: __bitmap_weight+0x70/0x100
 lr: c085a338: need_load_eval+0x38/0xf0
 sp: c3fdbaf0
msr: 90019033
dar: 0
  dsisr: 4000
   current = 0xc3151a40
   paca= 0xc7da0980softe: 0irq_happened: 0x01
 pid   = 842, comm = kworker/1:2
 enter ? for help
 [c3fdbb40] c085a338 need_load_eval+0x38/0xf0
 [c3fdbb70] c0856a10 od_dbs_timer+0x90/0x1e0
 [c3fdbbe0] c00f489c process_one_work+0x24c/0x910
 [c3fdbc90] c00f50dc worker_thread+0x17c/0x540
 [c3fdbd20] c00fed70 kthread+0x120/0x140
 [c3fdbe30] c0009678 ret_from_kernel_thread+0x5c/0x64

 While debugging the issue, other problems in this area were uncovered,
 all of them necessitating serialized calls to __cpufreq_governor().  One
 potential race condition that can happen today is the below:

 CPU0 CPU1

 cpufreq_set_policy()

 __cpufreq_governor
 (CPUFREQ_GOV_POLICY_EXIT)
  __cpufreq_remove_dev_finish()

 free(dbs_data)   __cpufreq_governor
  (CPUFRQ_GOV_START)

  dbs_data-mutex = NULL dereference

 The issue here is that calls to cpufreq_governor_dbs() is not serialized
 and they can conflict with each other in numerous ways. One way to sort
 this out would be to serialize all calls to cpufreq_governor_dbs()
 by setting the governor busy if a call is in progress and
 blocking all other calls. But this approach will not cover all loop
 holes. Take the above scenario: CPU1 will still hit a NULL dereference if
 care is not taken to check for a NULL dbs_data.

 To sort such scenarios, we could filter out the sequence of events: A
 CPUFREQ_GOV_START cannot be called without an INIT, if the previous
 event was an EXIT. However this results in analysing all possible
 sequence of events and adding each of them as a filter. This results in
 unmanagable code. There is high probability of missing out on a race
 condition. Both the above approaches were tried out earlier [1]
 
 I agree.
 
 Let us therefore look at the heart of the issue.
 
 Yeah, we should :)
 
 It is not really about
 serializing calls to cpufreq_governor_dbs(), it seems to be about
 serializing entire sequence of CPUFREQ_GOV* operations. For instance, in
 cpufreq_set_policy(), we STOP,EXIT the old policy and INIT and START the
 new policy. Between the EXIT and INIT, there must not be
 anybody else starting the policy. And between INIT and START, there must
 be nobody stopping the policy.
 
 Hmm..
 
 A similar argument holds for the CPUFREQ_GOV* operations in
  __cpufreq_policy_dev_{prepare|finish} and cpufreq_add_policy(). Hence
 until each of these functions complete in totality, none of the others
 should run in parallel. The interleaving of the individual calls to
 cpufreq_governor_dbs() is resulting in invalid operations. This patch
 therefore tries to serialize entire cpufreq functions calling CPUFREQ_GOV*
 operations, with respect to each other.
 
 We were forced to put band-aids until this time and I am really
 looking into getting this fixed at the root.
 
 The problem is that we drop policy locks before calling
 __cpufreq_governor() and that's the root cause of all these problems
 we are facing. We did that because we were getting warnings about
 circular locks (955ef4833574 (cpufreq: Drop rwsem lock around
 CPUFREQ_GOV_POLICY_EXIT))..
 
 I have explained that problem here (Never sent this upstream, as I was
 waiting for some other patches to get included first):
 https://git.linaro.org/people/viresh.kumar/linux.git/commit/57714d5b1778f2f610bcc5c74d85b29ba1cc1995
 
 The actual problem was:
 If we hold any locks, that the attribute operations grab, when
 removing the attribute, then it can result in a ABBA deadlock.
 
 show()/store() holds the policy-rwsem lock while accessing any sysfs
 attributes under cpu/cpuX/cpufreq/ directory.
 
 But something like what I have done is the real way to tackle all
 these problems.

How will a policy lock help here at all, when cpus from multiple
policies are calling into __cpufreq_governor() ? How will a policy lock
serialize their entry into 

Re: [RFC PATCH] cpufreq/hotplug: Fix cpu-hotplug cpufreq race conditions

2015-06-01 Thread Viresh Kumar
On 02-06-15, 11:01, Preeti U Murthy wrote:
 How will a policy lock help here at all, when cpus from multiple
 policies are calling into __cpufreq_governor() ? How will a policy lock
 serialize their entry into cpufreq_governor_dbs() ?

So different policies don't really depend on each other. The only
thing common to them are the governor's sysfs files (only if
governor-per-policy isn't set, i.e. in your case). Those sysfs files
and their kernel counterpart variables aren't touched unless all the
policies have EXITED. All these START/STOP calls touch only the data
relevant to those policies only.

In case of per-policy  governors, even those sysfs files are separate
for each policy.

And so a policy lock should be sufficient, rest should be handled
within the governors with locks or whatever.

  These band-aid wouldn't take us anywhere.
 
 Why do you say that the approach mentioned in this patch is a bandaid ?
 The patch ensures that there are no interruptions in a logical sequence
 of calls into cpufreq_governor_dbs(), as it should be.

Because this happened as we are forced to drop the policy-locks.
That's the real problem. This whole thing should be performed under
locks, instead of setting variables to mark governor busy under locks.

-- 
viresh
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/