Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops

2016-02-03 Thread Saravana Kannan

On 02/02/2016 10:54 PM, Viresh Kumar wrote:

On 02-02-16, 17:32, Saravana Kannan wrote:

But if we are expecting sched dvfs to come in, why make it worse for it. It
would be completely pointless to try and shoehorn sched dvfs to use
cpufreq_governor.c


We can move the common part to cpufreq core and not make sched-dvfs
reuse cpufreq_governor.c



Let's do this please.

-Saravana

--
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project


Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops

2016-02-03 Thread Saravana Kannan

On 02/02/2016 10:57 PM, Viresh Kumar wrote:

On 02-02-16, 20:03, Saravana Kannan wrote:

This is the hotplug case I mentioned. The sysfs file removals will happen
only for the last CPU in that policy (we thankfully optimized that part last
year). We also know that multiple CPUs can't be hotplugged at the same time.
So, in the start of cpufreq_offline_prepare, we just need to check if this
is the last CPU in the policy and if that's the case, do the gov sysfs
remove and then grab the policy lock and do all our crap. If a read is going
on, that's going to finish before the sysfs attr remove can go ahead and it
can grab the policy lock if it needs to and that still won't cause a
deadlock because we haven't yet grabbed the policy lock in
cpufreq_offline_prepare(). If the read comes after the sysfs remove, then
the read is obviously going to fail (we can depend on the sysfs framework on
doing its job there).


IMHO, these are all dirty hacks we should stay away from. Adding such
hunks in code is considered a band-aid kind of solution and hurts
readability badly. The new solution (new governor show/store)
implement this in a very clean and proper way I feel..

Others are free to disagree though :)



I think it looks clean since we haven't sorted out the race conditions 
that Juri pointed out. So, it's early to call this series clean :)


Also, I don't see it as a dirty hack at all. What's so hacky about it? 
We are just identifying conditions when we'll have to remove the sysfs 
files and removing them before grabbing the policy lock.


The unlock/lock that we have now is what is a dirty hack.

-Saravana

--
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project


Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops

2016-02-03 Thread Rafael J. Wysocki
On Wed, Feb 3, 2016 at 2:21 PM, Viresh Kumar  wrote:
> On 03-02-16, 13:42, Rafael J. Wysocki wrote:
>> > +static ssize_t governor_show(struct kobject *kobj, struct attribute *attr,
>> > +char *buf)
>> > +{
>> > +   struct dbs_data *dbs_data = to_dbs_data(kobj);
>> > +   struct governor_attr *gattr = to_gov_attr(attr);
>> > +   int ret = -EIO;
>> > +
>> > +   down_read(_data->rwsem);
>> > +
>> > +   if (gattr->show)
>> > +   ret = gattr->show(dbs_data, buf);
>> > +
>> > +   up_read(_data->rwsem);
>>
>> Do we need the lock here too?
>>
>> show() is only going to read the value, isn't it?  And everything u32
>> or smaller is read atomically anyway.
>
> Okay, will drop it for now.
>
>> Apart from this it looks good to me.
>>
>> When you're ready, please resend the whole series without patch [5/5]
>> which is premature IMO.
>
> I have changed that patch a bit, and am dropping just the lock now and
> not governor_enable thing. That should be sane enough I believe.

In any case this is not suitable for 4.5 IMO.

> Anyway I will be posting 7 patches now, pick only first 4 if you
> aren't confident about the rest.

OK

Thanks,
Rafael


Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops

2016-02-03 Thread Viresh Kumar
On 03-02-16, 13:42, Rafael J. Wysocki wrote:
> > +static ssize_t governor_show(struct kobject *kobj, struct attribute *attr,
> > +char *buf)
> > +{
> > +   struct dbs_data *dbs_data = to_dbs_data(kobj);
> > +   struct governor_attr *gattr = to_gov_attr(attr);
> > +   int ret = -EIO;
> > +
> > +   down_read(_data->rwsem);
> > +
> > +   if (gattr->show)
> > +   ret = gattr->show(dbs_data, buf);
> > +
> > +   up_read(_data->rwsem);
> 
> Do we need the lock here too?
> 
> show() is only going to read the value, isn't it?  And everything u32
> or smaller is read atomically anyway.

Okay, will drop it for now.

> Apart from this it looks good to me.
> 
> When you're ready, please resend the whole series without patch [5/5]
> which is premature IMO.

I have changed that patch a bit, and am dropping just the lock now and
not governor_enable thing. That should be sane enough I believe.

Anyway I will be posting 7 patches now, pick only first 4 if you
aren't confident about the rest.

-- 
viresh


Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops

2016-02-03 Thread Rafael J. Wysocki
On Wed, Feb 3, 2016 at 7:58 AM, Viresh Kumar  wrote:
> On 02-02-16, 22:23, Rafael J. Wysocki wrote:
>> On Tue, Feb 2, 2016 at 11:57 AM, Viresh Kumar  
>> wrote:
>
>> "The ondemand and conservative governors use the global-attr or
>> freq-attr structures to represent sysfs attributes corresponding to
>> their tunables
>
>> (which of them is actually used depends on whether or
>> not different policy objects can use different governors at the same
>> time
>
> Not exactly. Different policies can always use different governors.
> What made the difference was that different policies using same
> governor, with different tunables or separate governor directories.
>
> I have reworded this para like:
>
> The ondemand and conservative governors use the global-attr or freq-attr
> structures to represent sysfs attributes corresponding to their tunables
> (which of them is actually used depends on whether or not different
> policy objects can use same governor with different tunables at the same
> time and, consequently, on where those attributes are located in sysfs).
>
> Please let me know if isn't clear.

That's OK.  IMO you should say "use the same governor", but that's
easily fixable. :-)

>> > --- a/drivers/cpufreq/cpufreq_governor.c
>> > +   ret = kobject_init_and_add(_data->kobj, _data->kobj_type,
>> > +  get_governor_parent_kobj(policy),
>> > +  attr_group->name);
>> > +   if (ret) {
>> > +   pr_err("%s: failed to init dbs_data kobj: %d\n", __func__, 
>> > ret);
>>
>> pr_debug() would be better here.
>
> Its a real error, why pr_debug for that ?

What's the value of printing that on user systems?  It contains debug
information only and it is not useful to anyone unfamiliar with the
code in question anyway.

>> > goto reset_gdbs_data;
>> > +   }
>> >
>> > return 0;
>> >
>> > @@ -426,8 +457,7 @@ static int cpufreq_governor_exit(struct cpufreq_policy 
>> > *policy,
>> > return -EBUSY;
>> >
>> > if (!--dbs_data->usage_count) {
>> > -   sysfs_remove_group(get_governor_parent_kobj(policy),
>> > -  get_sysfs_attr(dbs_data));
>> > +   kobject_put(_data->kobj);
>>
>> Don't we need a ->release callback for this kobject?
>
> There is nothing that we need to free from the ->release() callback.
> We are using the kobject here just to get separate show/store
> callbacks.

Well, I guess the answer should be that there can't be more active
references to the kobject, so it is safe to free it synchronously
later.

> Here is the new version based on the review comments received until
> now:
>
> -8<-
>
> From: Viresh Kumar 
> Date: Tue, 2 Feb 2016 12:35:01 +0530
> Subject: [PATCH] cpufreq: governor: New sysfs show/store callbacks for
>  governor tunables
>

[cut]

> @@ -22,14 +22,62 @@
>
>  #include "cpufreq_governor.h"
>
> -static struct attribute_group *get_sysfs_attr(struct dbs_data *dbs_data)
> +static inline struct dbs_data *to_dbs_data(struct kobject *kobj)
>  {
> -   if (have_governor_per_policy())
> -   return dbs_data->cdata->attr_group_gov_pol;
> -   else
> -   return dbs_data->cdata->attr_group_gov_sys;
> +   return container_of(kobj, struct dbs_data, kobj);
> +}
> +
> +static inline struct governor_attr *to_gov_attr(struct attribute *attr)
> +{
> +   return container_of(attr, struct governor_attr, attr);
> +}
> +
> +static ssize_t governor_show(struct kobject *kobj, struct attribute *attr,
> +char *buf)
> +{
> +   struct dbs_data *dbs_data = to_dbs_data(kobj);
> +   struct governor_attr *gattr = to_gov_attr(attr);
> +   int ret = -EIO;
> +
> +   down_read(_data->rwsem);
> +
> +   if (gattr->show)
> +   ret = gattr->show(dbs_data, buf);
> +
> +   up_read(_data->rwsem);

Do we need the lock here too?

show() is only going to read the value, isn't it?  And everything u32
or smaller is read atomically anyway.

Apart from this it looks good to me.

When you're ready, please resend the whole series without patch [5/5]
which is premature IMO.

Thanks,
Rafael


Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops

2016-02-03 Thread Viresh Kumar
On 03-02-16, 10:51, Juri Lelli wrote:
> I also think that sched-dvfs should not use cpufreq_governor.c.  It is
> useful boilerplate code for ondemand and conservative, as they share lot
> of data structures and how they work, but it doesn't necessarily suit
> everybody's needs, IMHO.
> 
> OTOH, fixing the current issue in the best way we can come up with has
> still value of course :).

Right. cpufreq_governor.c is more about the technique where we do load
evaluation using deferred timers and workqueues, which isn't required
for sched-dvfs.

We can just move the common parts, like, governor_show/governor_store
routines, and the new macros being added here.

-- 
viresh


Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops

2016-02-03 Thread Juri Lelli
On 03/02/16 12:24, Viresh Kumar wrote:
> On 02-02-16, 17:32, Saravana Kannan wrote:
> > But if we are expecting sched dvfs to come in, why make it worse for it. It
> > would be completely pointless to try and shoehorn sched dvfs to use
> > cpufreq_governor.c
> 
> We can move the common part to cpufreq core and not make sched-dvfs
> reuse cpufreq_governor.c
> 

I also think that sched-dvfs should not use cpufreq_governor.c.  It is
useful boilerplate code for ondemand and conservative, as they share lot
of data structures and how they work, but it doesn't necessarily suit
everybody's needs, IMHO.

OTOH, fixing the current issue in the best way we can come up with has
still value of course :).

Best,

- Juri


Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops

2016-02-03 Thread Viresh Kumar
On 03-02-16, 10:51, Juri Lelli wrote:
> I also think that sched-dvfs should not use cpufreq_governor.c.  It is
> useful boilerplate code for ondemand and conservative, as they share lot
> of data structures and how they work, but it doesn't necessarily suit
> everybody's needs, IMHO.
> 
> OTOH, fixing the current issue in the best way we can come up with has
> still value of course :).

Right. cpufreq_governor.c is more about the technique where we do load
evaluation using deferred timers and workqueues, which isn't required
for sched-dvfs.

We can just move the common parts, like, governor_show/governor_store
routines, and the new macros being added here.

-- 
viresh


Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops

2016-02-03 Thread Juri Lelli
On 03/02/16 12:24, Viresh Kumar wrote:
> On 02-02-16, 17:32, Saravana Kannan wrote:
> > But if we are expecting sched dvfs to come in, why make it worse for it. It
> > would be completely pointless to try and shoehorn sched dvfs to use
> > cpufreq_governor.c
> 
> We can move the common part to cpufreq core and not make sched-dvfs
> reuse cpufreq_governor.c
> 

I also think that sched-dvfs should not use cpufreq_governor.c.  It is
useful boilerplate code for ondemand and conservative, as they share lot
of data structures and how they work, but it doesn't necessarily suit
everybody's needs, IMHO.

OTOH, fixing the current issue in the best way we can come up with has
still value of course :).

Best,

- Juri


Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops

2016-02-03 Thread Viresh Kumar
On 03-02-16, 13:42, Rafael J. Wysocki wrote:
> > +static ssize_t governor_show(struct kobject *kobj, struct attribute *attr,
> > +char *buf)
> > +{
> > +   struct dbs_data *dbs_data = to_dbs_data(kobj);
> > +   struct governor_attr *gattr = to_gov_attr(attr);
> > +   int ret = -EIO;
> > +
> > +   down_read(_data->rwsem);
> > +
> > +   if (gattr->show)
> > +   ret = gattr->show(dbs_data, buf);
> > +
> > +   up_read(_data->rwsem);
> 
> Do we need the lock here too?
> 
> show() is only going to read the value, isn't it?  And everything u32
> or smaller is read atomically anyway.

Okay, will drop it for now.

> Apart from this it looks good to me.
> 
> When you're ready, please resend the whole series without patch [5/5]
> which is premature IMO.

I have changed that patch a bit, and am dropping just the lock now and
not governor_enable thing. That should be sane enough I believe.

Anyway I will be posting 7 patches now, pick only first 4 if you
aren't confident about the rest.

-- 
viresh


Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops

2016-02-03 Thread Rafael J. Wysocki
On Wed, Feb 3, 2016 at 2:21 PM, Viresh Kumar  wrote:
> On 03-02-16, 13:42, Rafael J. Wysocki wrote:
>> > +static ssize_t governor_show(struct kobject *kobj, struct attribute *attr,
>> > +char *buf)
>> > +{
>> > +   struct dbs_data *dbs_data = to_dbs_data(kobj);
>> > +   struct governor_attr *gattr = to_gov_attr(attr);
>> > +   int ret = -EIO;
>> > +
>> > +   down_read(_data->rwsem);
>> > +
>> > +   if (gattr->show)
>> > +   ret = gattr->show(dbs_data, buf);
>> > +
>> > +   up_read(_data->rwsem);
>>
>> Do we need the lock here too?
>>
>> show() is only going to read the value, isn't it?  And everything u32
>> or smaller is read atomically anyway.
>
> Okay, will drop it for now.
>
>> Apart from this it looks good to me.
>>
>> When you're ready, please resend the whole series without patch [5/5]
>> which is premature IMO.
>
> I have changed that patch a bit, and am dropping just the lock now and
> not governor_enable thing. That should be sane enough I believe.

In any case this is not suitable for 4.5 IMO.

> Anyway I will be posting 7 patches now, pick only first 4 if you
> aren't confident about the rest.

OK

Thanks,
Rafael


Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops

2016-02-03 Thread Rafael J. Wysocki
On Wed, Feb 3, 2016 at 7:58 AM, Viresh Kumar  wrote:
> On 02-02-16, 22:23, Rafael J. Wysocki wrote:
>> On Tue, Feb 2, 2016 at 11:57 AM, Viresh Kumar  
>> wrote:
>
>> "The ondemand and conservative governors use the global-attr or
>> freq-attr structures to represent sysfs attributes corresponding to
>> their tunables
>
>> (which of them is actually used depends on whether or
>> not different policy objects can use different governors at the same
>> time
>
> Not exactly. Different policies can always use different governors.
> What made the difference was that different policies using same
> governor, with different tunables or separate governor directories.
>
> I have reworded this para like:
>
> The ondemand and conservative governors use the global-attr or freq-attr
> structures to represent sysfs attributes corresponding to their tunables
> (which of them is actually used depends on whether or not different
> policy objects can use same governor with different tunables at the same
> time and, consequently, on where those attributes are located in sysfs).
>
> Please let me know if isn't clear.

That's OK.  IMO you should say "use the same governor", but that's
easily fixable. :-)

>> > --- a/drivers/cpufreq/cpufreq_governor.c
>> > +   ret = kobject_init_and_add(_data->kobj, _data->kobj_type,
>> > +  get_governor_parent_kobj(policy),
>> > +  attr_group->name);
>> > +   if (ret) {
>> > +   pr_err("%s: failed to init dbs_data kobj: %d\n", __func__, 
>> > ret);
>>
>> pr_debug() would be better here.
>
> Its a real error, why pr_debug for that ?

What's the value of printing that on user systems?  It contains debug
information only and it is not useful to anyone unfamiliar with the
code in question anyway.

>> > goto reset_gdbs_data;
>> > +   }
>> >
>> > return 0;
>> >
>> > @@ -426,8 +457,7 @@ static int cpufreq_governor_exit(struct cpufreq_policy 
>> > *policy,
>> > return -EBUSY;
>> >
>> > if (!--dbs_data->usage_count) {
>> > -   sysfs_remove_group(get_governor_parent_kobj(policy),
>> > -  get_sysfs_attr(dbs_data));
>> > +   kobject_put(_data->kobj);
>>
>> Don't we need a ->release callback for this kobject?
>
> There is nothing that we need to free from the ->release() callback.
> We are using the kobject here just to get separate show/store
> callbacks.

Well, I guess the answer should be that there can't be more active
references to the kobject, so it is safe to free it synchronously
later.

> Here is the new version based on the review comments received until
> now:
>
> -8<-
>
> From: Viresh Kumar 
> Date: Tue, 2 Feb 2016 12:35:01 +0530
> Subject: [PATCH] cpufreq: governor: New sysfs show/store callbacks for
>  governor tunables
>

[cut]

> @@ -22,14 +22,62 @@
>
>  #include "cpufreq_governor.h"
>
> -static struct attribute_group *get_sysfs_attr(struct dbs_data *dbs_data)
> +static inline struct dbs_data *to_dbs_data(struct kobject *kobj)
>  {
> -   if (have_governor_per_policy())
> -   return dbs_data->cdata->attr_group_gov_pol;
> -   else
> -   return dbs_data->cdata->attr_group_gov_sys;
> +   return container_of(kobj, struct dbs_data, kobj);
> +}
> +
> +static inline struct governor_attr *to_gov_attr(struct attribute *attr)
> +{
> +   return container_of(attr, struct governor_attr, attr);
> +}
> +
> +static ssize_t governor_show(struct kobject *kobj, struct attribute *attr,
> +char *buf)
> +{
> +   struct dbs_data *dbs_data = to_dbs_data(kobj);
> +   struct governor_attr *gattr = to_gov_attr(attr);
> +   int ret = -EIO;
> +
> +   down_read(_data->rwsem);
> +
> +   if (gattr->show)
> +   ret = gattr->show(dbs_data, buf);
> +
> +   up_read(_data->rwsem);

Do we need the lock here too?

show() is only going to read the value, isn't it?  And everything u32
or smaller is read atomically anyway.

Apart from this it looks good to me.

When you're ready, please resend the whole series without patch [5/5]
which is premature IMO.

Thanks,
Rafael


Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops

2016-02-03 Thread Saravana Kannan

On 02/02/2016 10:57 PM, Viresh Kumar wrote:

On 02-02-16, 20:03, Saravana Kannan wrote:

This is the hotplug case I mentioned. The sysfs file removals will happen
only for the last CPU in that policy (we thankfully optimized that part last
year). We also know that multiple CPUs can't be hotplugged at the same time.
So, in the start of cpufreq_offline_prepare, we just need to check if this
is the last CPU in the policy and if that's the case, do the gov sysfs
remove and then grab the policy lock and do all our crap. If a read is going
on, that's going to finish before the sysfs attr remove can go ahead and it
can grab the policy lock if it needs to and that still won't cause a
deadlock because we haven't yet grabbed the policy lock in
cpufreq_offline_prepare(). If the read comes after the sysfs remove, then
the read is obviously going to fail (we can depend on the sysfs framework on
doing its job there).


IMHO, these are all dirty hacks we should stay away from. Adding such
hunks in code is considered a band-aid kind of solution and hurts
readability badly. The new solution (new governor show/store)
implement this in a very clean and proper way I feel..

Others are free to disagree though :)



I think it looks clean since we haven't sorted out the race conditions 
that Juri pointed out. So, it's early to call this series clean :)


Also, I don't see it as a dirty hack at all. What's so hacky about it? 
We are just identifying conditions when we'll have to remove the sysfs 
files and removing them before grabbing the policy lock.


The unlock/lock that we have now is what is a dirty hack.

-Saravana

--
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project


Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops

2016-02-03 Thread Saravana Kannan

On 02/02/2016 10:54 PM, Viresh Kumar wrote:

On 02-02-16, 17:32, Saravana Kannan wrote:

But if we are expecting sched dvfs to come in, why make it worse for it. It
would be completely pointless to try and shoehorn sched dvfs to use
cpufreq_governor.c


We can move the common part to cpufreq core and not make sched-dvfs
reuse cpufreq_governor.c



Let's do this please.

-Saravana

--
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project


Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops

2016-02-02 Thread Viresh Kumar
On 02-02-16, 22:23, Rafael J. Wysocki wrote:
> On Tue, Feb 2, 2016 at 11:57 AM, Viresh Kumar  wrote:

> "The ondemand and conservative governors use the global-attr or
> freq-attr structures to represent sysfs attributes corresponding to
> their tunables

> (which of them is actually used depends on whether or
> not different policy objects can use different governors at the same
> time

Not exactly. Different policies can always use different governors.
What made the difference was that different policies using same
governor, with different tunables or separate governor directories.

I have reworded this para like:

The ondemand and conservative governors use the global-attr or freq-attr
structures to represent sysfs attributes corresponding to their tunables
(which of them is actually used depends on whether or not different
policy objects can use same governor with different tunables at the same
time and, consequently, on where those attributes are located in sysfs).

Please let me know if isn't clear.

> > --- a/drivers/cpufreq/cpufreq_governor.c
> > +   ret = kobject_init_and_add(_data->kobj, _data->kobj_type,
> > +  get_governor_parent_kobj(policy),
> > +  attr_group->name);
> > +   if (ret) {
> > +   pr_err("%s: failed to init dbs_data kobj: %d\n", __func__, 
> > ret);
> 
> pr_debug() would be better here.

Its a real error, why pr_debug for that ?

> > goto reset_gdbs_data;
> > +   }
> >
> > return 0;
> >
> > @@ -426,8 +457,7 @@ static int cpufreq_governor_exit(struct cpufreq_policy 
> > *policy,
> > return -EBUSY;
> >
> > if (!--dbs_data->usage_count) {
> > -   sysfs_remove_group(get_governor_parent_kobj(policy),
> > -  get_sysfs_attr(dbs_data));
> > +   kobject_put(_data->kobj);
> 
> Don't we need a ->release callback for this kobject?

There is nothing that we need to free from the ->release() callback.
We are using the kobject here just to get separate show/store
callbacks.

Here is the new version based on the review comments received until
now:

-8<-

From: Viresh Kumar 
Date: Tue, 2 Feb 2016 12:35:01 +0530
Subject: [PATCH] cpufreq: governor: New sysfs show/store callbacks for
 governor tunables

The ondemand and conservative governors use the global-attr or freq-attr
structures to represent sysfs attributes corresponding to their tunables
(which of them is actually used depends on whether or not different
policy objects can use same governor with different tunables at the same
time and, consequently, on where those attributes are located in sysfs).

Unfortunately, in the freq-attr case, the standard cpufreq show/store
sysfs attribute callbacks are applied to the governor tunable attributes
and they always acquire the policy->rwsem lock before carrying out the
operation.  That may lead to an ABBA deadlock if governor tunable
attributes are removed under policy->rwsem while one of them is being
accessed concurrently (if sysfs attributes removal wins the race, it
will wait for the access to complete with policy->rwsem held while the
attribute callback will block on policy->rwsem indefinitely).

We attempted to address this issue by dropping policy->rwsem around
governor tunable attributes removal (that is, around invocations of the
->governor callback with the event arg equal to CPUFREQ_GOV_POLICY_EXIT)
in cpufreq_set_policy(), but that opened up race conditions that had not
been possible with policy->rwsem held all the time.  Therefore
policy->rwsem cannot be dropped in cpufreq_set_policy() at any point,
but the deadlock situation described above must be avoided too.

To that end, use the observation that in principle governor tunables may
be represented by the same data type regardless of whether the governor
is system-wide or per-policy and introduce a new structure, struct
governor_attr, for representing them and new corresponding macros for
creating show/store sysfs callbacks for them.  Also make their parent
kobject use a new kobject type whose default show/store callbacks are
not related to the standard core cpufreq ones in any way (and they don't
acquire policy->rwsem in particular).

[ Rafael: Written changelog ]
Signed-off-by: Viresh Kumar 
---
 drivers/cpufreq/cpufreq_conservative.c | 73 --
 drivers/cpufreq/cpufreq_governor.c | 71 -
 drivers/cpufreq/cpufreq_governor.h | 34 ++--
 drivers/cpufreq/cpufreq_ondemand.c | 73 --
 4 files changed, 144 insertions(+), 107 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_conservative.c 
b/drivers/cpufreq/cpufreq_conservative.c
index 57750367bd26..c749fb4fe5d2 100644
--- a/drivers/cpufreq/cpufreq_conservative.c
+++ b/drivers/cpufreq/cpufreq_conservative.c
@@ -275,54 

Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops

2016-02-02 Thread Viresh Kumar
On 02-02-16, 20:03, Saravana Kannan wrote:
> This is the hotplug case I mentioned. The sysfs file removals will happen
> only for the last CPU in that policy (we thankfully optimized that part last
> year). We also know that multiple CPUs can't be hotplugged at the same time.
> So, in the start of cpufreq_offline_prepare, we just need to check if this
> is the last CPU in the policy and if that's the case, do the gov sysfs
> remove and then grab the policy lock and do all our crap. If a read is going
> on, that's going to finish before the sysfs attr remove can go ahead and it
> can grab the policy lock if it needs to and that still won't cause a
> deadlock because we haven't yet grabbed the policy lock in
> cpufreq_offline_prepare(). If the read comes after the sysfs remove, then
> the read is obviously going to fail (we can depend on the sysfs framework on
> doing its job there).

IMHO, these are all dirty hacks we should stay away from. Adding such
hunks in code is considered a band-aid kind of solution and hurts
readability badly. The new solution (new governor show/store)
implement this in a very clean and proper way I feel..

Others are free to disagree though :)

-- 
viresh


Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops

2016-02-02 Thread Viresh Kumar
On 02-02-16, 17:32, Saravana Kannan wrote:
> But if we are expecting sched dvfs to come in, why make it worse for it. It
> would be completely pointless to try and shoehorn sched dvfs to use
> cpufreq_governor.c

We can move the common part to cpufreq core and not make sched-dvfs
reuse cpufreq_governor.c

-- 
viresh


Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops

2016-02-02 Thread Viresh Kumar
On 02-02-16, 14:21, Saravana Kannan wrote:
> I also don't like this patch because it forces governors to either implement
> their own macros and management of their attributes or force them to use the
> governor structs that come with cpufreq_governor.h. cpufreq_governor.h IMHO
> is very ondemand and conservative governor specific and is very irrelevant
> for sched-dvfs or any other governors (hint hint).

But who is stopping us from breaking that file and moving some of it
into include/linux/cpufreq.h ?

We can do that today as well, but it would be fine to do that, when we
add more governors to the core. Though, it would only take a simple
patch if people want me to do it now.

> The only time this ABBA locking is an issue is when governor are changing
> and trying to add/remove attributes. That can easily be checked in
> store_governor

store_scaling_governor ??

> and dealt with without holding the policy rwsem if the

Are you saying that we could have taken the rwsem from the generic
cpufreq.c:store() and dropped it from store_scaling_governor() ? That
would have been something similar to what I tried earlier, which I
never posted (I gave the link to that few days back).

> governors can provide their per sys and per policy attribute arrays as part
> of registering themselves.

These per-sys and per-policy attributes really suck. There is nothing
really different in the implementation, just that the show/store
callbacks have different prototype. One accept 'kboj' as the parameter,
other accept 'policy'. I would call that a HACK as well (I only
implemented it though).

That should just die. A single list of attributes is what we should
have had initially as well.

-- 
viresh


Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops

2016-02-02 Thread Viresh Kumar
On 02-02-16, 17:01, Juri Lelli wrote:
> Hi Rafael,
> 
> On 02/02/16 17:35, Rafael J. Wysocki wrote:
> > On Tue, Feb 2, 2016 at 4:47 PM, Juri Lelli  wrote:

> > > This patch cleans things up a lot, that's good.
> > >
> > > One thing I'm still concerned about, though: don't we need some locking
> > > in place for some of the store operations on governors attributes? Are
> > > store_{ignore_nice_load, sampling_down_fact, etc} safe without locking?
> > 
> > That would require some investigation I suppose.

Yeah, that protection is required. Sorry about that.

> > > It seems that we can call them from different cpus concurrently.
> > 
> > Yes, we can.
> > 
> > One quick-and-dirty way of dealing with that might be to introduce a
> > "sysfs lock" into struct dbs_data and hold that around the invocation
> > of gattr->store() in the sysfs_ops's ->store callback.

s/dirty/sane ? :)

> Can't we actually try to use the policy->rwsem (or one of the core
> locks) + wait_for_completion approach as we do in cpufreq core?

policy->rwsem will defeat the purpose of this change as it will
reintroduce the ABBA issue.

-- 
viresh


Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops

2016-02-02 Thread Saravana Kannan

On 02/02/2016 05:52 PM, Rafael J. Wysocki wrote:

On Wed, Feb 3, 2016 at 2:32 AM, Saravana Kannan  wrote:

On 02/02/2016 05:07 PM, Rafael J. Wysocki wrote:


On Wed, Feb 3, 2016 at 12:42 AM, Rafael J. Wysocki 
wrote:


On Tue, Feb 2, 2016 at 11:21 PM, Saravana Kannan 
wrote:


On 02/02/2016 11:40 AM, Rafael J. Wysocki wrote:



On Tue, Feb 2, 2016 at 6:01 PM, Juri Lelli  wrote:





[cut]



I also don't like this patch because it forces governors to either
implement
their own macros and management of their attributes or force them to use
the
governor structs that come with cpufreq_governor.h. cpufreq_governor.h
IMHO
is very ondemand and conservative governor specific and is very
irrelevant
for sched-dvfs or any other governors (hint hint).

The only time this ABBA locking is an issue is when governor are
changing
and trying to add/remove attributes. That can easily be checked in
store_governor and dealt with without holding the policy rwsem if the
governors can provide their per sys and per policy attribute arrays as
part
of registering themselves.

I'm sorry that I just keep talking about the idea and not sending out
the
patches.



I think you have a point, though.

The deadlock really is specific to the governors using the code in
cpufreq_governor.c.



That said no other governors in the tree use any sysfs attributes for
tunables AFAICS and the out-of-the tree ones are out of interest here.



But if we are expecting sched dvfs to come in, why make it worse for it. It
would be completely pointless to try and shoehorn sched dvfs to use
cpufreq_governor.c


Well, do you honestly think that using the existing stuff in it would
be a good idea?

If not, then why it matters at all?


Also the deadlock happens if one of the tunable attributes is accessed
while we're trying to remove it which very well may happen on read
access too.


Isn't this THE deadlock we are talking about? The removal of the attributes
only happen when governors are changes and we send a POLICY_EXIT and or all
the cores are hotplugged out.


It generally happens when the "old" governor is going away, whatever the reason.


And my suggestion would work just as well there.

Why are you prefixing your sentence with "Also"? Is there some other case
I'm not considering?


Say someone is reading sampling_rate for a policy with 1 CPU in it and
someone else is taking the CPU offline.  The governor EXIT code path
(that will trigger as a result) will try to remove the sampling_rate
attribute and (if it does that under policy->rwsem) it'll wait for the
read access to finish.  Where exactly would you put the deadlock
prevention in this case?


This is the hotplug case I mentioned. The sysfs file removals will 
happen only for the last CPU in that policy (we thankfully optimized 
that part last year). We also know that multiple CPUs can't be 
hotplugged at the same time. So, in the start of 
cpufreq_offline_prepare, we just need to check if this is the last CPU 
in the policy and if that's the case, do the gov sysfs remove and then 
grab the policy lock and do all our crap. If a read is going on, that's 
going to finish before the sysfs attr remove can go ahead and it can 
grab the policy lock if it needs to and that still won't cause a 
deadlock because we haven't yet grabbed the policy lock in 
cpufreq_offline_prepare(). If the read comes after the sysfs remove, 
then the read is obviously going to fail (we can depend on the sysfs 
framework on doing its job there).


Will that still leave any race conditions in?

-Saravana
--
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project


Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops

2016-02-02 Thread Rafael J. Wysocki
On Wed, Feb 3, 2016 at 2:32 AM, Saravana Kannan  wrote:
> On 02/02/2016 05:07 PM, Rafael J. Wysocki wrote:
>>
>> On Wed, Feb 3, 2016 at 12:42 AM, Rafael J. Wysocki 
>> wrote:
>>>
>>> On Tue, Feb 2, 2016 at 11:21 PM, Saravana Kannan 
>>> wrote:

 On 02/02/2016 11:40 AM, Rafael J. Wysocki wrote:
>
>
> On Tue, Feb 2, 2016 at 6:01 PM, Juri Lelli  wrote:
>>
>>
>>
>> [cut]
>>

 I also don't like this patch because it forces governors to either
 implement
 their own macros and management of their attributes or force them to use
 the
 governor structs that come with cpufreq_governor.h. cpufreq_governor.h
 IMHO
 is very ondemand and conservative governor specific and is very
 irrelevant
 for sched-dvfs or any other governors (hint hint).

 The only time this ABBA locking is an issue is when governor are
 changing
 and trying to add/remove attributes. That can easily be checked in
 store_governor and dealt with without holding the policy rwsem if the
 governors can provide their per sys and per policy attribute arrays as
 part
 of registering themselves.

 I'm sorry that I just keep talking about the idea and not sending out
 the
 patches.
>>>
>>>
>>> I think you have a point, though.
>>>
>>> The deadlock really is specific to the governors using the code in
>>> cpufreq_governor.c.
>>
>>
>> That said no other governors in the tree use any sysfs attributes for
>> tunables AFAICS and the out-of-the tree ones are out of interest here.
>
>
> But if we are expecting sched dvfs to come in, why make it worse for it. It
> would be completely pointless to try and shoehorn sched dvfs to use
> cpufreq_governor.c

Well, do you honestly think that using the existing stuff in it would
be a good idea?

If not, then why it matters at all?

>> Also the deadlock happens if one of the tunable attributes is accessed
>> while we're trying to remove it which very well may happen on read
>> access too.
>
> Isn't this THE deadlock we are talking about? The removal of the attributes
> only happen when governors are changes and we send a POLICY_EXIT and or all
> the cores are hotplugged out.

It generally happens when the "old" governor is going away, whatever the reason.

> And my suggestion would work just as well there.
>
> Why are you prefixing your sentence with "Also"? Is there some other case
> I'm not considering?

Say someone is reading sampling_rate for a policy with 1 CPU in it and
someone else is taking the CPU offline.  The governor EXIT code path
(that will trigger as a result) will try to remove the sampling_rate
attribute and (if it does that under policy->rwsem) it'll wait for the
read access to finish.  Where exactly would you put the deadlock
prevention in this case?

Thanks,
Rafael


Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops

2016-02-02 Thread Saravana Kannan

On 02/02/2016 05:07 PM, Rafael J. Wysocki wrote:

On Wed, Feb 3, 2016 at 12:42 AM, Rafael J. Wysocki  wrote:

On Tue, Feb 2, 2016 at 11:21 PM, Saravana Kannan  wrote:

On 02/02/2016 11:40 AM, Rafael J. Wysocki wrote:


On Tue, Feb 2, 2016 at 6:01 PM, Juri Lelli  wrote:




[cut]



I also don't like this patch because it forces governors to either implement
their own macros and management of their attributes or force them to use the
governor structs that come with cpufreq_governor.h. cpufreq_governor.h IMHO
is very ondemand and conservative governor specific and is very irrelevant
for sched-dvfs or any other governors (hint hint).

The only time this ABBA locking is an issue is when governor are changing
and trying to add/remove attributes. That can easily be checked in
store_governor and dealt with without holding the policy rwsem if the
governors can provide their per sys and per policy attribute arrays as part
of registering themselves.

I'm sorry that I just keep talking about the idea and not sending out the
patches.


I think you have a point, though.

The deadlock really is specific to the governors using the code in
cpufreq_governor.c.


That said no other governors in the tree use any sysfs attributes for
tunables AFAICS and the out-of-the tree ones are out of interest here.


But if we are expecting sched dvfs to come in, why make it worse for it. 
It would be completely pointless to try and shoehorn sched dvfs to use 
cpufreq_governor.c



Also the deadlock happens if one of the tunable attributes is accessed
while we're trying to remove it which very well may happen on read
access too.


Isn't this THE deadlock we are talking about? The removal of the 
attributes only happen when governors are changes and we send a 
POLICY_EXIT and or all the cores are hotplugged out. And my suggestion 
would work just as well there.


Why are you prefixing your sentence with "Also"? Is there some other 
case I'm not considering?


-Saravana
--
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project


Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops

2016-02-02 Thread Rafael J. Wysocki
On Wed, Feb 3, 2016 at 12:42 AM, Rafael J. Wysocki  wrote:
> On Tue, Feb 2, 2016 at 11:21 PM, Saravana Kannan  
> wrote:
>> On 02/02/2016 11:40 AM, Rafael J. Wysocki wrote:
>>>
>>> On Tue, Feb 2, 2016 at 6:01 PM, Juri Lelli  wrote:


[cut]

>>
>> I also don't like this patch because it forces governors to either implement
>> their own macros and management of their attributes or force them to use the
>> governor structs that come with cpufreq_governor.h. cpufreq_governor.h IMHO
>> is very ondemand and conservative governor specific and is very irrelevant
>> for sched-dvfs or any other governors (hint hint).
>>
>> The only time this ABBA locking is an issue is when governor are changing
>> and trying to add/remove attributes. That can easily be checked in
>> store_governor and dealt with without holding the policy rwsem if the
>> governors can provide their per sys and per policy attribute arrays as part
>> of registering themselves.
>>
>> I'm sorry that I just keep talking about the idea and not sending out the
>> patches.
>
> I think you have a point, though.
>
> The deadlock really is specific to the governors using the code in
> cpufreq_governor.c.

That said no other governors in the tree use any sysfs attributes for
tunables AFAICS and the out-of-the tree ones are out of interest here.

Also the deadlock happens if one of the tunable attributes is accessed
while we're trying to remove it which very well may happen on read
access too.

Thanks,
Rafael


Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops

2016-02-02 Thread Rafael J. Wysocki
On Tue, Feb 2, 2016 at 11:21 PM, Saravana Kannan  wrote:
> On 02/02/2016 11:40 AM, Rafael J. Wysocki wrote:
>>
>> On Tue, Feb 2, 2016 at 6:01 PM, Juri Lelli  wrote:
>>>
>>> Hi Rafael,
>>>
>>> On 02/02/16 17:35, Rafael J. Wysocki wrote:

 On Tue, Feb 2, 2016 at 4:47 PM, Juri Lelli  wrote:
>
> Hi Viresh,
>
> On 02/02/16 16:27, Viresh Kumar wrote:
>>
>> Until now, governors (ondemand/conservative) were using the
>> 'global-attr' or 'freq-attr', depending on the sysfs location where we
>> want to create governor's directory.
>>
>> The problem is that, in case of 'freq-attr', we are forced to use
>> show()/store() present in cpufreq.c, which always take policy->rwsem.
>>
>> And because of that we were facing some ABBA lockups during governor
>> callback event CPUFREQ_GOV_POLICY_EXIT. And so we were dropping the
>> rwsem right before calling governor callback for
>> CPUFREQ_GOV_POLICY_EXIT
>> event.
>>
>> That caused further problems and it never worked perfectly.
>>
>> This patch attempts to fix that by creating separate sysfs-ops for
>> cpufreq governors.
>>
>> Because things got much simplified now, we don't need separate
>> show/store callbacks for governor-for-system and governor-per-policy
>> cases.
>>
>> Signed-off-by: Viresh Kumar 
>
>
> This patch cleans things up a lot, that's good.
>
> One thing I'm still concerned about, though: don't we need some locking
> in place for some of the store operations on governors attributes? Are
> store_{ignore_nice_load, sampling_down_fact, etc} safe without locking?


 That would require some investigation I suppose.

> It seems that we can call them from different cpus concurrently.


 Yes, we can.

 One quick-and-dirty way of dealing with that might be to introduce a
 "sysfs lock" into struct dbs_data and hold that around the invocation
 of gattr->store() in the sysfs_ops's ->store callback.

>>>
>>> There is value in trying to solve this issue by using some of the
>>> existing locks, IMHO.
>>
>>
>> Some value - maybe.  I'm not sure how much of it, though.
>>
>> Finer-grained locking is generally easier to follow, because the locks
>> tend to be used for specific purposes only.
>>
>>> Can't we actually try to use the policy->rwsem (or one of the core
>>> locks) + wait_for_completion approach as we do in cpufreq core?
>>
>>
>> No.  Too many things depend on that lock already and some of them work
>> by accident rather than by design.
>
>
> Also, wait_for_completion() and complete() is just another way to implement
> a lock. So, it won't necessarily solve any deadlock issues.
>
> I also don't like this patch because it forces governors to either implement
> their own macros and management of their attributes or force them to use the
> governor structs that come with cpufreq_governor.h. cpufreq_governor.h IMHO
> is very ondemand and conservative governor specific and is very irrelevant
> for sched-dvfs or any other governors (hint hint).
>
> The only time this ABBA locking is an issue is when governor are changing
> and trying to add/remove attributes. That can easily be checked in
> store_governor and dealt with without holding the policy rwsem if the
> governors can provide their per sys and per policy attribute arrays as part
> of registering themselves.
>
> I'm sorry that I just keep talking about the idea and not sending out the
> patches.

I think you have a point, though.

The deadlock really is specific to the governors using the code in
cpufreq_governor.c.

Thanks,
Rafael


Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops

2016-02-02 Thread Saravana Kannan

On 02/02/2016 11:40 AM, Rafael J. Wysocki wrote:

On Tue, Feb 2, 2016 at 6:01 PM, Juri Lelli  wrote:

Hi Rafael,

On 02/02/16 17:35, Rafael J. Wysocki wrote:

On Tue, Feb 2, 2016 at 4:47 PM, Juri Lelli  wrote:

Hi Viresh,

On 02/02/16 16:27, Viresh Kumar wrote:

Until now, governors (ondemand/conservative) were using the
'global-attr' or 'freq-attr', depending on the sysfs location where we
want to create governor's directory.

The problem is that, in case of 'freq-attr', we are forced to use
show()/store() present in cpufreq.c, which always take policy->rwsem.

And because of that we were facing some ABBA lockups during governor
callback event CPUFREQ_GOV_POLICY_EXIT. And so we were dropping the
rwsem right before calling governor callback for CPUFREQ_GOV_POLICY_EXIT
event.

That caused further problems and it never worked perfectly.

This patch attempts to fix that by creating separate sysfs-ops for
cpufreq governors.

Because things got much simplified now, we don't need separate
show/store callbacks for governor-for-system and governor-per-policy
cases.

Signed-off-by: Viresh Kumar 


This patch cleans things up a lot, that's good.

One thing I'm still concerned about, though: don't we need some locking
in place for some of the store operations on governors attributes? Are
store_{ignore_nice_load, sampling_down_fact, etc} safe without locking?


That would require some investigation I suppose.


It seems that we can call them from different cpus concurrently.


Yes, we can.

One quick-and-dirty way of dealing with that might be to introduce a
"sysfs lock" into struct dbs_data and hold that around the invocation
of gattr->store() in the sysfs_ops's ->store callback.



There is value in trying to solve this issue by using some of the
existing locks, IMHO.


Some value - maybe.  I'm not sure how much of it, though.

Finer-grained locking is generally easier to follow, because the locks
tend to be used for specific purposes only.


Can't we actually try to use the policy->rwsem (or one of the core
locks) + wait_for_completion approach as we do in cpufreq core?


No.  Too many things depend on that lock already and some of them work
by accident rather than by design.


Also, wait_for_completion() and complete() is just another way to 
implement a lock. So, it won't necessarily solve any deadlock issues.


I also don't like this patch because it forces governors to either 
implement their own macros and management of their attributes or force 
them to use the governor structs that come with cpufreq_governor.h. 
cpufreq_governor.h IMHO is very ondemand and conservative governor 
specific and is very irrelevant for sched-dvfs or any other governors 
(hint hint).


The only time this ABBA locking is an issue is when governor are 
changing and trying to add/remove attributes. That can easily be checked 
in store_governor and dealt with without holding the policy rwsem if the 
governors can provide their per sys and per policy attribute arrays as 
part of registering themselves.


I'm sorry that I just keep talking about the idea and not sending out 
the patches.


-Saravana

--
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project


Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops

2016-02-02 Thread Rafael J. Wysocki
On Tue, Feb 2, 2016 at 11:57 AM, Viresh Kumar  wrote:

First, the subject might be better.  What about something like
"cpufreq: governor: New sysfs show/store callbacks for governor
tunables", for example?

> Until now, governors (ondemand/conservative) were using the
> 'global-attr' or 'freq-attr', depending on the sysfs location where we
> want to create governor's directory.

"The ondemand and conservative governors use the global-attr or
freq-attr structures to represent sysfs attributes corresponding to
their tunables (which of them is actually used depends on whether or
not different policy objects can use different governors at the same
time and, consequently, on where those attributes are located in
sysfs).

Unfortunately, in the freq-attr case, the standard cpufreq show/store
sysfs attribute callbacks are applied to the governor tunable
attributes and they always acquire the policy->rwsem lock before
carrying out the operation.  That may lead to an ABBA deadlock if
governor tunable attributes are removed under policy->rwsem while one
of them is being accessed concurrently (if sysfs attributes removal
wins the race, it will wait for the access to complete with
policy->rwsem held while the attribute callback will block on
policy->rwsem indefinitely).

We attempted to address this issue by dropping policy->rwsem around
governor tunable attributes removal (that is, around invocations of
the ->governor callback with the event arg equal to
CPUFREQ_GOV_POLICY_EXIT) in cpufreq_set_policy(), but that opened up
race conditions that had not been possible with policy->rwsem held all
the time.  Therefore policy->rwsem cannot be dropped in
cpufreq_set_policy() at any point, but the deadlock situation
described above must be avoided too.

To that end, use the observation that in principle governor tunables
may be represented by the same data type regardless of whether the
governor is system-wide or per-policy and introduce a new structure,
struct governor_attr, for representing them and new corresponding
macros for creating show/store sysfs callbacks for them.  Also make
their parent kobject use a new kobject type whose default show/store
callbacks are not related to the standard core cpufreq ones in any way
(and they don't acquire policy->rwsem in particular)."

IOW, (1) describe the problem you're addressing so that people
unfamiliar with the code in question can understand it, (2) describe
what is done to address the problem (what's the idea and what changes
are made to implement it).

[cut]

> --- a/drivers/cpufreq/cpufreq_governor.c
> +++ b/drivers/cpufreq/cpufreq_governor.c
> @@ -22,14 +22,37 @@
>
>  #include "cpufreq_governor.h"
>
> -static struct attribute_group *get_sysfs_attr(struct dbs_data *dbs_data)
> +#define to_dbs_data(k) container_of(k, struct dbs_data, kobj)
> +#define to_attr(a) container_of(a, struct governor_attr, attr)

Please change the above to static inline routines.

> +
> +static ssize_t show(struct kobject *kobj, struct attribute *attr, char *buf)

A better name please.  Something that will correspond to the purpose.

>  {
> -   if (have_governor_per_policy())
> -   return dbs_data->cdata->attr_group_gov_pol;
> -   else
> -   return dbs_data->cdata->attr_group_gov_sys;
> +   struct dbs_data *dbs_data = to_dbs_data(kobj);
> +   struct governor_attr *gattr = to_attr(attr);
> +
> +   if (gattr->show)
> +   return gattr->show(dbs_data, buf);
> +
> +   return -EIO;
> +}
> +
> +static ssize_t store(struct kobject *kobj, struct attribute *attr,
> +const char *buf, size_t count)

Ditto.

> +{
> +   struct dbs_data *dbs_data = to_dbs_data(kobj);
> +   struct governor_attr *gattr = to_attr(attr);
> +
> +   if (gattr->store)
> +   return gattr->store(dbs_data, buf, count);

Say two instances of this run in parallel with each other, either for
the same attribute or for different attributes under the same
dbs_data.  What's the guarantee that they won't make conflicting
changes?

> +
> +   return -EIO;
>  }
>
> +static const struct sysfs_ops sysfs_ops = {
> +   .show   = show,
> +   .store  = store,
> +};

That is completely enigmatic, so please at least add a comment describing it.

> +
>  void dbs_check_cpu(struct dbs_data *dbs_data, int cpu)
>  {
> struct cpu_dbs_info *cdbs = dbs_data->cdata->get_cpu_cdbs(cpu);
> @@ -354,6 +377,7 @@ static int cpufreq_governor_init(struct cpufreq_policy 
> *policy,
>  struct dbs_data *dbs_data,
>  struct common_dbs_data *cdata)
>  {
> +   struct attribute_group *attr_group;
> int ret;
>
> /* State should be equivalent to EXIT */
> @@ -395,10 +419,17 @@ static int cpufreq_governor_init(struct cpufreq_policy 
> *policy,
>
> policy->governor_data = dbs_data;
>
> -   ret = sysfs_create_group(get_governor_parent_kobj(policy),
> -  

Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops

2016-02-02 Thread Rafael J. Wysocki
On Tue, Feb 2, 2016 at 6:01 PM, Juri Lelli  wrote:
> Hi Rafael,
>
> On 02/02/16 17:35, Rafael J. Wysocki wrote:
>> On Tue, Feb 2, 2016 at 4:47 PM, Juri Lelli  wrote:
>> > Hi Viresh,
>> >
>> > On 02/02/16 16:27, Viresh Kumar wrote:
>> >> Until now, governors (ondemand/conservative) were using the
>> >> 'global-attr' or 'freq-attr', depending on the sysfs location where we
>> >> want to create governor's directory.
>> >>
>> >> The problem is that, in case of 'freq-attr', we are forced to use
>> >> show()/store() present in cpufreq.c, which always take policy->rwsem.
>> >>
>> >> And because of that we were facing some ABBA lockups during governor
>> >> callback event CPUFREQ_GOV_POLICY_EXIT. And so we were dropping the
>> >> rwsem right before calling governor callback for CPUFREQ_GOV_POLICY_EXIT
>> >> event.
>> >>
>> >> That caused further problems and it never worked perfectly.
>> >>
>> >> This patch attempts to fix that by creating separate sysfs-ops for
>> >> cpufreq governors.
>> >>
>> >> Because things got much simplified now, we don't need separate
>> >> show/store callbacks for governor-for-system and governor-per-policy
>> >> cases.
>> >>
>> >> Signed-off-by: Viresh Kumar 
>> >
>> > This patch cleans things up a lot, that's good.
>> >
>> > One thing I'm still concerned about, though: don't we need some locking
>> > in place for some of the store operations on governors attributes? Are
>> > store_{ignore_nice_load, sampling_down_fact, etc} safe without locking?
>>
>> That would require some investigation I suppose.
>>
>> > It seems that we can call them from different cpus concurrently.
>>
>> Yes, we can.
>>
>> One quick-and-dirty way of dealing with that might be to introduce a
>> "sysfs lock" into struct dbs_data and hold that around the invocation
>> of gattr->store() in the sysfs_ops's ->store callback.
>>
>
> There is value in trying to solve this issue by using some of the
> existing locks, IMHO.

Some value - maybe.  I'm not sure how much of it, though.

Finer-grained locking is generally easier to follow, because the locks
tend to be used for specific purposes only.

> Can't we actually try to use the policy->rwsem (or one of the core
> locks) + wait_for_completion approach as we do in cpufreq core?

No.  Too many things depend on that lock already and some of them work
by accident rather than by design.

Thanks,
Rafael


Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops

2016-02-02 Thread Juri Lelli
Hi Rafael,

On 02/02/16 17:35, Rafael J. Wysocki wrote:
> On Tue, Feb 2, 2016 at 4:47 PM, Juri Lelli  wrote:
> > Hi Viresh,
> >
> > On 02/02/16 16:27, Viresh Kumar wrote:
> >> Until now, governors (ondemand/conservative) were using the
> >> 'global-attr' or 'freq-attr', depending on the sysfs location where we
> >> want to create governor's directory.
> >>
> >> The problem is that, in case of 'freq-attr', we are forced to use
> >> show()/store() present in cpufreq.c, which always take policy->rwsem.
> >>
> >> And because of that we were facing some ABBA lockups during governor
> >> callback event CPUFREQ_GOV_POLICY_EXIT. And so we were dropping the
> >> rwsem right before calling governor callback for CPUFREQ_GOV_POLICY_EXIT
> >> event.
> >>
> >> That caused further problems and it never worked perfectly.
> >>
> >> This patch attempts to fix that by creating separate sysfs-ops for
> >> cpufreq governors.
> >>
> >> Because things got much simplified now, we don't need separate
> >> show/store callbacks for governor-for-system and governor-per-policy
> >> cases.
> >>
> >> Signed-off-by: Viresh Kumar 
> >
> > This patch cleans things up a lot, that's good.
> >
> > One thing I'm still concerned about, though: don't we need some locking
> > in place for some of the store operations on governors attributes? Are
> > store_{ignore_nice_load, sampling_down_fact, etc} safe without locking?
> 
> That would require some investigation I suppose.
> 
> > It seems that we can call them from different cpus concurrently.
> 
> Yes, we can.
> 
> One quick-and-dirty way of dealing with that might be to introduce a
> "sysfs lock" into struct dbs_data and hold that around the invocation
> of gattr->store() in the sysfs_ops's ->store callback.
> 

There is value in trying to solve this issue by using some of the
existing locks, IMHO.

Can't we actually try to use the policy->rwsem (or one of the core
locks) + wait_for_completion approach as we do in cpufreq core?

> BTW, you could have dropped the stuff below this line from your reply
> message.  That at least would have prevented tools like Patchwork from
> storing useless garbage.
> 

Right. Sorry for the garbage; I'll check twice that I trim my replies in
the future.

Best,

- Juri


Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops

2016-02-02 Thread Rafael J. Wysocki
On Tue, Feb 2, 2016 at 4:47 PM, Juri Lelli  wrote:
> Hi Viresh,
>
> On 02/02/16 16:27, Viresh Kumar wrote:
>> Until now, governors (ondemand/conservative) were using the
>> 'global-attr' or 'freq-attr', depending on the sysfs location where we
>> want to create governor's directory.
>>
>> The problem is that, in case of 'freq-attr', we are forced to use
>> show()/store() present in cpufreq.c, which always take policy->rwsem.
>>
>> And because of that we were facing some ABBA lockups during governor
>> callback event CPUFREQ_GOV_POLICY_EXIT. And so we were dropping the
>> rwsem right before calling governor callback for CPUFREQ_GOV_POLICY_EXIT
>> event.
>>
>> That caused further problems and it never worked perfectly.
>>
>> This patch attempts to fix that by creating separate sysfs-ops for
>> cpufreq governors.
>>
>> Because things got much simplified now, we don't need separate
>> show/store callbacks for governor-for-system and governor-per-policy
>> cases.
>>
>> Signed-off-by: Viresh Kumar 
>
> This patch cleans things up a lot, that's good.
>
> One thing I'm still concerned about, though: don't we need some locking
> in place for some of the store operations on governors attributes? Are
> store_{ignore_nice_load, sampling_down_fact, etc} safe without locking?

That would require some investigation I suppose.

> It seems that we can call them from different cpus concurrently.

Yes, we can.

One quick-and-dirty way of dealing with that might be to introduce a
"sysfs lock" into struct dbs_data and hold that around the invocation
of gattr->store() in the sysfs_ops's ->store callback.

>
> Best,
>
> - Juri

BTW, you could have dropped the stuff below this line from your reply
message.  That at least would have prevented tools like Patchwork from
storing useless garbage.

Thanks,
Rafael


Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops

2016-02-02 Thread Juri Lelli
Hi Viresh,

On 02/02/16 16:27, Viresh Kumar wrote:
> Until now, governors (ondemand/conservative) were using the
> 'global-attr' or 'freq-attr', depending on the sysfs location where we
> want to create governor's directory.
> 
> The problem is that, in case of 'freq-attr', we are forced to use
> show()/store() present in cpufreq.c, which always take policy->rwsem.
> 
> And because of that we were facing some ABBA lockups during governor
> callback event CPUFREQ_GOV_POLICY_EXIT. And so we were dropping the
> rwsem right before calling governor callback for CPUFREQ_GOV_POLICY_EXIT
> event.
> 
> That caused further problems and it never worked perfectly.
> 
> This patch attempts to fix that by creating separate sysfs-ops for
> cpufreq governors.
> 
> Because things got much simplified now, we don't need separate
> show/store callbacks for governor-for-system and governor-per-policy
> cases.
> 
> Signed-off-by: Viresh Kumar 

This patch cleans things up a lot, that's good.

One thing I'm still concerned about, though: don't we need some locking
in place for some of the store operations on governors attributes? Are
store_{ignore_nice_load, sampling_down_fact, etc} safe without locking?
It seems that we can call them from different cpus concurrently.

Best,

- Juri

> ---
>  drivers/cpufreq/cpufreq_conservative.c | 71 
> +-
>  drivers/cpufreq/cpufreq_governor.c | 50 +++-
>  drivers/cpufreq/cpufreq_governor.h | 31 +--
>  drivers/cpufreq/cpufreq_ondemand.c | 71 
> +-
>  4 files changed, 122 insertions(+), 101 deletions(-)
> 
> diff --git a/drivers/cpufreq/cpufreq_conservative.c 
> b/drivers/cpufreq/cpufreq_conservative.c
> index 57750367bd26..980145da796a 100644
> --- a/drivers/cpufreq/cpufreq_conservative.c
> +++ b/drivers/cpufreq/cpufreq_conservative.c
> @@ -275,51 +275,35 @@ static ssize_t store_freq_step(struct dbs_data 
> *dbs_data, const char *buf,
>   return count;
>  }
>  
> -show_store_one(cs, sampling_rate);
> -show_store_one(cs, sampling_down_factor);
> -show_store_one(cs, up_threshold);
> -show_store_one(cs, down_threshold);
> -show_store_one(cs, ignore_nice_load);
> -show_store_one(cs, freq_step);
> -show_one(cs, min_sampling_rate);
> -
> -gov_sys_pol_attr_rw(sampling_rate);
> -gov_sys_pol_attr_rw(sampling_down_factor);
> -gov_sys_pol_attr_rw(up_threshold);
> -gov_sys_pol_attr_rw(down_threshold);
> -gov_sys_pol_attr_rw(ignore_nice_load);
> -gov_sys_pol_attr_rw(freq_step);
> -gov_sys_pol_attr_ro(min_sampling_rate);
> -
> -static struct attribute *dbs_attributes_gov_sys[] = {
> - _sampling_rate_gov_sys.attr,
> - _rate_gov_sys.attr,
> - _down_factor_gov_sys.attr,
> - _threshold_gov_sys.attr,
> - _threshold_gov_sys.attr,
> - _nice_load_gov_sys.attr,
> - _step_gov_sys.attr,
> +gov_show_one(cs, sampling_rate);
> +gov_show_one(cs, sampling_down_factor);
> +gov_show_one(cs, up_threshold);
> +gov_show_one(cs, down_threshold);
> +gov_show_one(cs, ignore_nice_load);
> +gov_show_one(cs, freq_step);
> +gov_show_one(cs, min_sampling_rate);
> +
> +gov_attr_rw(sampling_rate);
> +gov_attr_rw(sampling_down_factor);
> +gov_attr_rw(up_threshold);
> +gov_attr_rw(down_threshold);
> +gov_attr_rw(ignore_nice_load);
> +gov_attr_rw(freq_step);
> +gov_attr_ro(min_sampling_rate);
> +
> +static struct attribute *dbs_attributes[] = {
> + _sampling_rate.attr,
> + _rate.attr,
> + _down_factor.attr,
> + _threshold.attr,
> + _threshold.attr,
> + _nice_load.attr,
> + _step.attr,
>   NULL
>  };
>  
> -static struct attribute_group cs_attr_group_gov_sys = {
> - .attrs = dbs_attributes_gov_sys,
> - .name = "conservative",
> -};
> -
> -static struct attribute *dbs_attributes_gov_pol[] = {
> - _sampling_rate_gov_pol.attr,
> - _rate_gov_pol.attr,
> - _down_factor_gov_pol.attr,
> - _threshold_gov_pol.attr,
> - _threshold_gov_pol.attr,
> - _nice_load_gov_pol.attr,
> - _step_gov_pol.attr,
> - NULL
> -};
> -
> -static struct attribute_group cs_attr_group_gov_pol = {
> - .attrs = dbs_attributes_gov_pol,
> +static struct attribute_group cs_attr_group = {
> + .attrs = dbs_attributes,
>   .name = "conservative",
>  };
>  
> @@ -365,8 +349,7 @@ define_get_cpu_dbs_routines(cs_cpu_dbs_info);
>  
>  static struct common_dbs_data cs_dbs_cdata = {
>   .governor = GOV_CONSERVATIVE,
> - .attr_group_gov_sys = _attr_group_gov_sys,
> - .attr_group_gov_pol = _attr_group_gov_pol,
> + .attr_group = _attr_group,
>   .get_cpu_cdbs = get_cpu_cdbs,
>   .get_cpu_dbs_info_s = get_cpu_dbs_info_s,
>   .gov_dbs_timer = cs_dbs_timer,
> diff --git a/drivers/cpufreq/cpufreq_governor.c 
> b/drivers/cpufreq/cpufreq_governor.c
> index 9a7edc91ad57..e785a118cbdc 100644
> --- a/drivers/cpufreq/cpufreq_governor.c
> +++ b/drivers/cpufreq/cpufreq_governor.c
> @@ -22,14 +22,37 @@
>  
>  #include "cpufreq_governor.h"
>  
> -static 

Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops

2016-02-02 Thread Rafael J. Wysocki
On Tue, Feb 2, 2016 at 11:57 AM, Viresh Kumar  wrote:

First, the subject might be better.  What about something like
"cpufreq: governor: New sysfs show/store callbacks for governor
tunables", for example?

> Until now, governors (ondemand/conservative) were using the
> 'global-attr' or 'freq-attr', depending on the sysfs location where we
> want to create governor's directory.

"The ondemand and conservative governors use the global-attr or
freq-attr structures to represent sysfs attributes corresponding to
their tunables (which of them is actually used depends on whether or
not different policy objects can use different governors at the same
time and, consequently, on where those attributes are located in
sysfs).

Unfortunately, in the freq-attr case, the standard cpufreq show/store
sysfs attribute callbacks are applied to the governor tunable
attributes and they always acquire the policy->rwsem lock before
carrying out the operation.  That may lead to an ABBA deadlock if
governor tunable attributes are removed under policy->rwsem while one
of them is being accessed concurrently (if sysfs attributes removal
wins the race, it will wait for the access to complete with
policy->rwsem held while the attribute callback will block on
policy->rwsem indefinitely).

We attempted to address this issue by dropping policy->rwsem around
governor tunable attributes removal (that is, around invocations of
the ->governor callback with the event arg equal to
CPUFREQ_GOV_POLICY_EXIT) in cpufreq_set_policy(), but that opened up
race conditions that had not been possible with policy->rwsem held all
the time.  Therefore policy->rwsem cannot be dropped in
cpufreq_set_policy() at any point, but the deadlock situation
described above must be avoided too.

To that end, use the observation that in principle governor tunables
may be represented by the same data type regardless of whether the
governor is system-wide or per-policy and introduce a new structure,
struct governor_attr, for representing them and new corresponding
macros for creating show/store sysfs callbacks for them.  Also make
their parent kobject use a new kobject type whose default show/store
callbacks are not related to the standard core cpufreq ones in any way
(and they don't acquire policy->rwsem in particular)."

IOW, (1) describe the problem you're addressing so that people
unfamiliar with the code in question can understand it, (2) describe
what is done to address the problem (what's the idea and what changes
are made to implement it).

[cut]

> --- a/drivers/cpufreq/cpufreq_governor.c
> +++ b/drivers/cpufreq/cpufreq_governor.c
> @@ -22,14 +22,37 @@
>
>  #include "cpufreq_governor.h"
>
> -static struct attribute_group *get_sysfs_attr(struct dbs_data *dbs_data)
> +#define to_dbs_data(k) container_of(k, struct dbs_data, kobj)
> +#define to_attr(a) container_of(a, struct governor_attr, attr)

Please change the above to static inline routines.

> +
> +static ssize_t show(struct kobject *kobj, struct attribute *attr, char *buf)

A better name please.  Something that will correspond to the purpose.

>  {
> -   if (have_governor_per_policy())
> -   return dbs_data->cdata->attr_group_gov_pol;
> -   else
> -   return dbs_data->cdata->attr_group_gov_sys;
> +   struct dbs_data *dbs_data = to_dbs_data(kobj);
> +   struct governor_attr *gattr = to_attr(attr);
> +
> +   if (gattr->show)
> +   return gattr->show(dbs_data, buf);
> +
> +   return -EIO;
> +}
> +
> +static ssize_t store(struct kobject *kobj, struct attribute *attr,
> +const char *buf, size_t count)

Ditto.

> +{
> +   struct dbs_data *dbs_data = to_dbs_data(kobj);
> +   struct governor_attr *gattr = to_attr(attr);
> +
> +   if (gattr->store)
> +   return gattr->store(dbs_data, buf, count);

Say two instances of this run in parallel with each other, either for
the same attribute or for different attributes under the same
dbs_data.  What's the guarantee that they won't make conflicting
changes?

> +
> +   return -EIO;
>  }
>
> +static const struct sysfs_ops sysfs_ops = {
> +   .show   = show,
> +   .store  = store,
> +};

That is completely enigmatic, so please at least add a comment describing it.

> +
>  void dbs_check_cpu(struct dbs_data *dbs_data, int cpu)
>  {
> struct cpu_dbs_info *cdbs = dbs_data->cdata->get_cpu_cdbs(cpu);
> @@ -354,6 +377,7 @@ static int cpufreq_governor_init(struct cpufreq_policy 
> *policy,
>  struct dbs_data *dbs_data,
>  struct common_dbs_data *cdata)
>  {
> +   struct attribute_group *attr_group;
> int ret;
>
> /* State should be equivalent to EXIT */
> @@ -395,10 +419,17 @@ static int cpufreq_governor_init(struct cpufreq_policy 
> *policy,
>
> policy->governor_data = dbs_data;
>
> -   ret = sysfs_create_group(get_governor_parent_kobj(policy),

Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops

2016-02-02 Thread Saravana Kannan

On 02/02/2016 11:40 AM, Rafael J. Wysocki wrote:

On Tue, Feb 2, 2016 at 6:01 PM, Juri Lelli  wrote:

Hi Rafael,

On 02/02/16 17:35, Rafael J. Wysocki wrote:

On Tue, Feb 2, 2016 at 4:47 PM, Juri Lelli  wrote:

Hi Viresh,

On 02/02/16 16:27, Viresh Kumar wrote:

Until now, governors (ondemand/conservative) were using the
'global-attr' or 'freq-attr', depending on the sysfs location where we
want to create governor's directory.

The problem is that, in case of 'freq-attr', we are forced to use
show()/store() present in cpufreq.c, which always take policy->rwsem.

And because of that we were facing some ABBA lockups during governor
callback event CPUFREQ_GOV_POLICY_EXIT. And so we were dropping the
rwsem right before calling governor callback for CPUFREQ_GOV_POLICY_EXIT
event.

That caused further problems and it never worked perfectly.

This patch attempts to fix that by creating separate sysfs-ops for
cpufreq governors.

Because things got much simplified now, we don't need separate
show/store callbacks for governor-for-system and governor-per-policy
cases.

Signed-off-by: Viresh Kumar 


This patch cleans things up a lot, that's good.

One thing I'm still concerned about, though: don't we need some locking
in place for some of the store operations on governors attributes? Are
store_{ignore_nice_load, sampling_down_fact, etc} safe without locking?


That would require some investigation I suppose.


It seems that we can call them from different cpus concurrently.


Yes, we can.

One quick-and-dirty way of dealing with that might be to introduce a
"sysfs lock" into struct dbs_data and hold that around the invocation
of gattr->store() in the sysfs_ops's ->store callback.



There is value in trying to solve this issue by using some of the
existing locks, IMHO.


Some value - maybe.  I'm not sure how much of it, though.

Finer-grained locking is generally easier to follow, because the locks
tend to be used for specific purposes only.


Can't we actually try to use the policy->rwsem (or one of the core
locks) + wait_for_completion approach as we do in cpufreq core?


No.  Too many things depend on that lock already and some of them work
by accident rather than by design.


Also, wait_for_completion() and complete() is just another way to 
implement a lock. So, it won't necessarily solve any deadlock issues.


I also don't like this patch because it forces governors to either 
implement their own macros and management of their attributes or force 
them to use the governor structs that come with cpufreq_governor.h. 
cpufreq_governor.h IMHO is very ondemand and conservative governor 
specific and is very irrelevant for sched-dvfs or any other governors 
(hint hint).


The only time this ABBA locking is an issue is when governor are 
changing and trying to add/remove attributes. That can easily be checked 
in store_governor and dealt with without holding the policy rwsem if the 
governors can provide their per sys and per policy attribute arrays as 
part of registering themselves.


I'm sorry that I just keep talking about the idea and not sending out 
the patches.


-Saravana

--
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project


Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops

2016-02-02 Thread Rafael J. Wysocki
On Tue, Feb 2, 2016 at 11:21 PM, Saravana Kannan  wrote:
> On 02/02/2016 11:40 AM, Rafael J. Wysocki wrote:
>>
>> On Tue, Feb 2, 2016 at 6:01 PM, Juri Lelli  wrote:
>>>
>>> Hi Rafael,
>>>
>>> On 02/02/16 17:35, Rafael J. Wysocki wrote:

 On Tue, Feb 2, 2016 at 4:47 PM, Juri Lelli  wrote:
>
> Hi Viresh,
>
> On 02/02/16 16:27, Viresh Kumar wrote:
>>
>> Until now, governors (ondemand/conservative) were using the
>> 'global-attr' or 'freq-attr', depending on the sysfs location where we
>> want to create governor's directory.
>>
>> The problem is that, in case of 'freq-attr', we are forced to use
>> show()/store() present in cpufreq.c, which always take policy->rwsem.
>>
>> And because of that we were facing some ABBA lockups during governor
>> callback event CPUFREQ_GOV_POLICY_EXIT. And so we were dropping the
>> rwsem right before calling governor callback for
>> CPUFREQ_GOV_POLICY_EXIT
>> event.
>>
>> That caused further problems and it never worked perfectly.
>>
>> This patch attempts to fix that by creating separate sysfs-ops for
>> cpufreq governors.
>>
>> Because things got much simplified now, we don't need separate
>> show/store callbacks for governor-for-system and governor-per-policy
>> cases.
>>
>> Signed-off-by: Viresh Kumar 
>
>
> This patch cleans things up a lot, that's good.
>
> One thing I'm still concerned about, though: don't we need some locking
> in place for some of the store operations on governors attributes? Are
> store_{ignore_nice_load, sampling_down_fact, etc} safe without locking?


 That would require some investigation I suppose.

> It seems that we can call them from different cpus concurrently.


 Yes, we can.

 One quick-and-dirty way of dealing with that might be to introduce a
 "sysfs lock" into struct dbs_data and hold that around the invocation
 of gattr->store() in the sysfs_ops's ->store callback.

>>>
>>> There is value in trying to solve this issue by using some of the
>>> existing locks, IMHO.
>>
>>
>> Some value - maybe.  I'm not sure how much of it, though.
>>
>> Finer-grained locking is generally easier to follow, because the locks
>> tend to be used for specific purposes only.
>>
>>> Can't we actually try to use the policy->rwsem (or one of the core
>>> locks) + wait_for_completion approach as we do in cpufreq core?
>>
>>
>> No.  Too many things depend on that lock already and some of them work
>> by accident rather than by design.
>
>
> Also, wait_for_completion() and complete() is just another way to implement
> a lock. So, it won't necessarily solve any deadlock issues.
>
> I also don't like this patch because it forces governors to either implement
> their own macros and management of their attributes or force them to use the
> governor structs that come with cpufreq_governor.h. cpufreq_governor.h IMHO
> is very ondemand and conservative governor specific and is very irrelevant
> for sched-dvfs or any other governors (hint hint).
>
> The only time this ABBA locking is an issue is when governor are changing
> and trying to add/remove attributes. That can easily be checked in
> store_governor and dealt with without holding the policy rwsem if the
> governors can provide their per sys and per policy attribute arrays as part
> of registering themselves.
>
> I'm sorry that I just keep talking about the idea and not sending out the
> patches.

I think you have a point, though.

The deadlock really is specific to the governors using the code in
cpufreq_governor.c.

Thanks,
Rafael


Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops

2016-02-02 Thread Rafael J. Wysocki
On Wed, Feb 3, 2016 at 12:42 AM, Rafael J. Wysocki  wrote:
> On Tue, Feb 2, 2016 at 11:21 PM, Saravana Kannan  
> wrote:
>> On 02/02/2016 11:40 AM, Rafael J. Wysocki wrote:
>>>
>>> On Tue, Feb 2, 2016 at 6:01 PM, Juri Lelli  wrote:


[cut]

>>
>> I also don't like this patch because it forces governors to either implement
>> their own macros and management of their attributes or force them to use the
>> governor structs that come with cpufreq_governor.h. cpufreq_governor.h IMHO
>> is very ondemand and conservative governor specific and is very irrelevant
>> for sched-dvfs or any other governors (hint hint).
>>
>> The only time this ABBA locking is an issue is when governor are changing
>> and trying to add/remove attributes. That can easily be checked in
>> store_governor and dealt with without holding the policy rwsem if the
>> governors can provide their per sys and per policy attribute arrays as part
>> of registering themselves.
>>
>> I'm sorry that I just keep talking about the idea and not sending out the
>> patches.
>
> I think you have a point, though.
>
> The deadlock really is specific to the governors using the code in
> cpufreq_governor.c.

That said no other governors in the tree use any sysfs attributes for
tunables AFAICS and the out-of-the tree ones are out of interest here.

Also the deadlock happens if one of the tunable attributes is accessed
while we're trying to remove it which very well may happen on read
access too.

Thanks,
Rafael


Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops

2016-02-02 Thread Saravana Kannan

On 02/02/2016 05:07 PM, Rafael J. Wysocki wrote:

On Wed, Feb 3, 2016 at 12:42 AM, Rafael J. Wysocki  wrote:

On Tue, Feb 2, 2016 at 11:21 PM, Saravana Kannan  wrote:

On 02/02/2016 11:40 AM, Rafael J. Wysocki wrote:


On Tue, Feb 2, 2016 at 6:01 PM, Juri Lelli  wrote:




[cut]



I also don't like this patch because it forces governors to either implement
their own macros and management of their attributes or force them to use the
governor structs that come with cpufreq_governor.h. cpufreq_governor.h IMHO
is very ondemand and conservative governor specific and is very irrelevant
for sched-dvfs or any other governors (hint hint).

The only time this ABBA locking is an issue is when governor are changing
and trying to add/remove attributes. That can easily be checked in
store_governor and dealt with without holding the policy rwsem if the
governors can provide their per sys and per policy attribute arrays as part
of registering themselves.

I'm sorry that I just keep talking about the idea and not sending out the
patches.


I think you have a point, though.

The deadlock really is specific to the governors using the code in
cpufreq_governor.c.


That said no other governors in the tree use any sysfs attributes for
tunables AFAICS and the out-of-the tree ones are out of interest here.


But if we are expecting sched dvfs to come in, why make it worse for it. 
It would be completely pointless to try and shoehorn sched dvfs to use 
cpufreq_governor.c



Also the deadlock happens if one of the tunable attributes is accessed
while we're trying to remove it which very well may happen on read
access too.


Isn't this THE deadlock we are talking about? The removal of the 
attributes only happen when governors are changes and we send a 
POLICY_EXIT and or all the cores are hotplugged out. And my suggestion 
would work just as well there.


Why are you prefixing your sentence with "Also"? Is there some other 
case I'm not considering?


-Saravana
--
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project


Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops

2016-02-02 Thread Rafael J. Wysocki
On Wed, Feb 3, 2016 at 2:32 AM, Saravana Kannan  wrote:
> On 02/02/2016 05:07 PM, Rafael J. Wysocki wrote:
>>
>> On Wed, Feb 3, 2016 at 12:42 AM, Rafael J. Wysocki 
>> wrote:
>>>
>>> On Tue, Feb 2, 2016 at 11:21 PM, Saravana Kannan 
>>> wrote:

 On 02/02/2016 11:40 AM, Rafael J. Wysocki wrote:
>
>
> On Tue, Feb 2, 2016 at 6:01 PM, Juri Lelli  wrote:
>>
>>
>>
>> [cut]
>>

 I also don't like this patch because it forces governors to either
 implement
 their own macros and management of their attributes or force them to use
 the
 governor structs that come with cpufreq_governor.h. cpufreq_governor.h
 IMHO
 is very ondemand and conservative governor specific and is very
 irrelevant
 for sched-dvfs or any other governors (hint hint).

 The only time this ABBA locking is an issue is when governor are
 changing
 and trying to add/remove attributes. That can easily be checked in
 store_governor and dealt with without holding the policy rwsem if the
 governors can provide their per sys and per policy attribute arrays as
 part
 of registering themselves.

 I'm sorry that I just keep talking about the idea and not sending out
 the
 patches.
>>>
>>>
>>> I think you have a point, though.
>>>
>>> The deadlock really is specific to the governors using the code in
>>> cpufreq_governor.c.
>>
>>
>> That said no other governors in the tree use any sysfs attributes for
>> tunables AFAICS and the out-of-the tree ones are out of interest here.
>
>
> But if we are expecting sched dvfs to come in, why make it worse for it. It
> would be completely pointless to try and shoehorn sched dvfs to use
> cpufreq_governor.c

Well, do you honestly think that using the existing stuff in it would
be a good idea?

If not, then why it matters at all?

>> Also the deadlock happens if one of the tunable attributes is accessed
>> while we're trying to remove it which very well may happen on read
>> access too.
>
> Isn't this THE deadlock we are talking about? The removal of the attributes
> only happen when governors are changes and we send a POLICY_EXIT and or all
> the cores are hotplugged out.

It generally happens when the "old" governor is going away, whatever the reason.

> And my suggestion would work just as well there.
>
> Why are you prefixing your sentence with "Also"? Is there some other case
> I'm not considering?

Say someone is reading sampling_rate for a policy with 1 CPU in it and
someone else is taking the CPU offline.  The governor EXIT code path
(that will trigger as a result) will try to remove the sampling_rate
attribute and (if it does that under policy->rwsem) it'll wait for the
read access to finish.  Where exactly would you put the deadlock
prevention in this case?

Thanks,
Rafael


Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops

2016-02-02 Thread Rafael J. Wysocki
On Tue, Feb 2, 2016 at 6:01 PM, Juri Lelli  wrote:
> Hi Rafael,
>
> On 02/02/16 17:35, Rafael J. Wysocki wrote:
>> On Tue, Feb 2, 2016 at 4:47 PM, Juri Lelli  wrote:
>> > Hi Viresh,
>> >
>> > On 02/02/16 16:27, Viresh Kumar wrote:
>> >> Until now, governors (ondemand/conservative) were using the
>> >> 'global-attr' or 'freq-attr', depending on the sysfs location where we
>> >> want to create governor's directory.
>> >>
>> >> The problem is that, in case of 'freq-attr', we are forced to use
>> >> show()/store() present in cpufreq.c, which always take policy->rwsem.
>> >>
>> >> And because of that we were facing some ABBA lockups during governor
>> >> callback event CPUFREQ_GOV_POLICY_EXIT. And so we were dropping the
>> >> rwsem right before calling governor callback for CPUFREQ_GOV_POLICY_EXIT
>> >> event.
>> >>
>> >> That caused further problems and it never worked perfectly.
>> >>
>> >> This patch attempts to fix that by creating separate sysfs-ops for
>> >> cpufreq governors.
>> >>
>> >> Because things got much simplified now, we don't need separate
>> >> show/store callbacks for governor-for-system and governor-per-policy
>> >> cases.
>> >>
>> >> Signed-off-by: Viresh Kumar 
>> >
>> > This patch cleans things up a lot, that's good.
>> >
>> > One thing I'm still concerned about, though: don't we need some locking
>> > in place for some of the store operations on governors attributes? Are
>> > store_{ignore_nice_load, sampling_down_fact, etc} safe without locking?
>>
>> That would require some investigation I suppose.
>>
>> > It seems that we can call them from different cpus concurrently.
>>
>> Yes, we can.
>>
>> One quick-and-dirty way of dealing with that might be to introduce a
>> "sysfs lock" into struct dbs_data and hold that around the invocation
>> of gattr->store() in the sysfs_ops's ->store callback.
>>
>
> There is value in trying to solve this issue by using some of the
> existing locks, IMHO.

Some value - maybe.  I'm not sure how much of it, though.

Finer-grained locking is generally easier to follow, because the locks
tend to be used for specific purposes only.

> Can't we actually try to use the policy->rwsem (or one of the core
> locks) + wait_for_completion approach as we do in cpufreq core?

No.  Too many things depend on that lock already and some of them work
by accident rather than by design.

Thanks,
Rafael


Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops

2016-02-02 Thread Juri Lelli
Hi Viresh,

On 02/02/16 16:27, Viresh Kumar wrote:
> Until now, governors (ondemand/conservative) were using the
> 'global-attr' or 'freq-attr', depending on the sysfs location where we
> want to create governor's directory.
> 
> The problem is that, in case of 'freq-attr', we are forced to use
> show()/store() present in cpufreq.c, which always take policy->rwsem.
> 
> And because of that we were facing some ABBA lockups during governor
> callback event CPUFREQ_GOV_POLICY_EXIT. And so we were dropping the
> rwsem right before calling governor callback for CPUFREQ_GOV_POLICY_EXIT
> event.
> 
> That caused further problems and it never worked perfectly.
> 
> This patch attempts to fix that by creating separate sysfs-ops for
> cpufreq governors.
> 
> Because things got much simplified now, we don't need separate
> show/store callbacks for governor-for-system and governor-per-policy
> cases.
> 
> Signed-off-by: Viresh Kumar 

This patch cleans things up a lot, that's good.

One thing I'm still concerned about, though: don't we need some locking
in place for some of the store operations on governors attributes? Are
store_{ignore_nice_load, sampling_down_fact, etc} safe without locking?
It seems that we can call them from different cpus concurrently.

Best,

- Juri

> ---
>  drivers/cpufreq/cpufreq_conservative.c | 71 
> +-
>  drivers/cpufreq/cpufreq_governor.c | 50 +++-
>  drivers/cpufreq/cpufreq_governor.h | 31 +--
>  drivers/cpufreq/cpufreq_ondemand.c | 71 
> +-
>  4 files changed, 122 insertions(+), 101 deletions(-)
> 
> diff --git a/drivers/cpufreq/cpufreq_conservative.c 
> b/drivers/cpufreq/cpufreq_conservative.c
> index 57750367bd26..980145da796a 100644
> --- a/drivers/cpufreq/cpufreq_conservative.c
> +++ b/drivers/cpufreq/cpufreq_conservative.c
> @@ -275,51 +275,35 @@ static ssize_t store_freq_step(struct dbs_data 
> *dbs_data, const char *buf,
>   return count;
>  }
>  
> -show_store_one(cs, sampling_rate);
> -show_store_one(cs, sampling_down_factor);
> -show_store_one(cs, up_threshold);
> -show_store_one(cs, down_threshold);
> -show_store_one(cs, ignore_nice_load);
> -show_store_one(cs, freq_step);
> -show_one(cs, min_sampling_rate);
> -
> -gov_sys_pol_attr_rw(sampling_rate);
> -gov_sys_pol_attr_rw(sampling_down_factor);
> -gov_sys_pol_attr_rw(up_threshold);
> -gov_sys_pol_attr_rw(down_threshold);
> -gov_sys_pol_attr_rw(ignore_nice_load);
> -gov_sys_pol_attr_rw(freq_step);
> -gov_sys_pol_attr_ro(min_sampling_rate);
> -
> -static struct attribute *dbs_attributes_gov_sys[] = {
> - _sampling_rate_gov_sys.attr,
> - _rate_gov_sys.attr,
> - _down_factor_gov_sys.attr,
> - _threshold_gov_sys.attr,
> - _threshold_gov_sys.attr,
> - _nice_load_gov_sys.attr,
> - _step_gov_sys.attr,
> +gov_show_one(cs, sampling_rate);
> +gov_show_one(cs, sampling_down_factor);
> +gov_show_one(cs, up_threshold);
> +gov_show_one(cs, down_threshold);
> +gov_show_one(cs, ignore_nice_load);
> +gov_show_one(cs, freq_step);
> +gov_show_one(cs, min_sampling_rate);
> +
> +gov_attr_rw(sampling_rate);
> +gov_attr_rw(sampling_down_factor);
> +gov_attr_rw(up_threshold);
> +gov_attr_rw(down_threshold);
> +gov_attr_rw(ignore_nice_load);
> +gov_attr_rw(freq_step);
> +gov_attr_ro(min_sampling_rate);
> +
> +static struct attribute *dbs_attributes[] = {
> + _sampling_rate.attr,
> + _rate.attr,
> + _down_factor.attr,
> + _threshold.attr,
> + _threshold.attr,
> + _nice_load.attr,
> + _step.attr,
>   NULL
>  };
>  
> -static struct attribute_group cs_attr_group_gov_sys = {
> - .attrs = dbs_attributes_gov_sys,
> - .name = "conservative",
> -};
> -
> -static struct attribute *dbs_attributes_gov_pol[] = {
> - _sampling_rate_gov_pol.attr,
> - _rate_gov_pol.attr,
> - _down_factor_gov_pol.attr,
> - _threshold_gov_pol.attr,
> - _threshold_gov_pol.attr,
> - _nice_load_gov_pol.attr,
> - _step_gov_pol.attr,
> - NULL
> -};
> -
> -static struct attribute_group cs_attr_group_gov_pol = {
> - .attrs = dbs_attributes_gov_pol,
> +static struct attribute_group cs_attr_group = {
> + .attrs = dbs_attributes,
>   .name = "conservative",
>  };
>  
> @@ -365,8 +349,7 @@ define_get_cpu_dbs_routines(cs_cpu_dbs_info);
>  
>  static struct common_dbs_data cs_dbs_cdata = {
>   .governor = GOV_CONSERVATIVE,
> - .attr_group_gov_sys = _attr_group_gov_sys,
> - .attr_group_gov_pol = _attr_group_gov_pol,
> + .attr_group = _attr_group,
>   .get_cpu_cdbs = get_cpu_cdbs,
>   .get_cpu_dbs_info_s = get_cpu_dbs_info_s,
>   .gov_dbs_timer = cs_dbs_timer,
> diff --git a/drivers/cpufreq/cpufreq_governor.c 
> b/drivers/cpufreq/cpufreq_governor.c
> index 9a7edc91ad57..e785a118cbdc 100644
> --- a/drivers/cpufreq/cpufreq_governor.c
> +++ b/drivers/cpufreq/cpufreq_governor.c
> @@ -22,14 +22,37 @@
>  
>  #include 

Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops

2016-02-02 Thread Juri Lelli
Hi Rafael,

On 02/02/16 17:35, Rafael J. Wysocki wrote:
> On Tue, Feb 2, 2016 at 4:47 PM, Juri Lelli  wrote:
> > Hi Viresh,
> >
> > On 02/02/16 16:27, Viresh Kumar wrote:
> >> Until now, governors (ondemand/conservative) were using the
> >> 'global-attr' or 'freq-attr', depending on the sysfs location where we
> >> want to create governor's directory.
> >>
> >> The problem is that, in case of 'freq-attr', we are forced to use
> >> show()/store() present in cpufreq.c, which always take policy->rwsem.
> >>
> >> And because of that we were facing some ABBA lockups during governor
> >> callback event CPUFREQ_GOV_POLICY_EXIT. And so we were dropping the
> >> rwsem right before calling governor callback for CPUFREQ_GOV_POLICY_EXIT
> >> event.
> >>
> >> That caused further problems and it never worked perfectly.
> >>
> >> This patch attempts to fix that by creating separate sysfs-ops for
> >> cpufreq governors.
> >>
> >> Because things got much simplified now, we don't need separate
> >> show/store callbacks for governor-for-system and governor-per-policy
> >> cases.
> >>
> >> Signed-off-by: Viresh Kumar 
> >
> > This patch cleans things up a lot, that's good.
> >
> > One thing I'm still concerned about, though: don't we need some locking
> > in place for some of the store operations on governors attributes? Are
> > store_{ignore_nice_load, sampling_down_fact, etc} safe without locking?
> 
> That would require some investigation I suppose.
> 
> > It seems that we can call them from different cpus concurrently.
> 
> Yes, we can.
> 
> One quick-and-dirty way of dealing with that might be to introduce a
> "sysfs lock" into struct dbs_data and hold that around the invocation
> of gattr->store() in the sysfs_ops's ->store callback.
> 

There is value in trying to solve this issue by using some of the
existing locks, IMHO.

Can't we actually try to use the policy->rwsem (or one of the core
locks) + wait_for_completion approach as we do in cpufreq core?

> BTW, you could have dropped the stuff below this line from your reply
> message.  That at least would have prevented tools like Patchwork from
> storing useless garbage.
> 

Right. Sorry for the garbage; I'll check twice that I trim my replies in
the future.

Best,

- Juri


Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops

2016-02-02 Thread Rafael J. Wysocki
On Tue, Feb 2, 2016 at 4:47 PM, Juri Lelli  wrote:
> Hi Viresh,
>
> On 02/02/16 16:27, Viresh Kumar wrote:
>> Until now, governors (ondemand/conservative) were using the
>> 'global-attr' or 'freq-attr', depending on the sysfs location where we
>> want to create governor's directory.
>>
>> The problem is that, in case of 'freq-attr', we are forced to use
>> show()/store() present in cpufreq.c, which always take policy->rwsem.
>>
>> And because of that we were facing some ABBA lockups during governor
>> callback event CPUFREQ_GOV_POLICY_EXIT. And so we were dropping the
>> rwsem right before calling governor callback for CPUFREQ_GOV_POLICY_EXIT
>> event.
>>
>> That caused further problems and it never worked perfectly.
>>
>> This patch attempts to fix that by creating separate sysfs-ops for
>> cpufreq governors.
>>
>> Because things got much simplified now, we don't need separate
>> show/store callbacks for governor-for-system and governor-per-policy
>> cases.
>>
>> Signed-off-by: Viresh Kumar 
>
> This patch cleans things up a lot, that's good.
>
> One thing I'm still concerned about, though: don't we need some locking
> in place for some of the store operations on governors attributes? Are
> store_{ignore_nice_load, sampling_down_fact, etc} safe without locking?

That would require some investigation I suppose.

> It seems that we can call them from different cpus concurrently.

Yes, we can.

One quick-and-dirty way of dealing with that might be to introduce a
"sysfs lock" into struct dbs_data and hold that around the invocation
of gattr->store() in the sysfs_ops's ->store callback.

>
> Best,
>
> - Juri

BTW, you could have dropped the stuff below this line from your reply
message.  That at least would have prevented tools like Patchwork from
storing useless garbage.

Thanks,
Rafael


Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops

2016-02-02 Thread Viresh Kumar
On 02-02-16, 17:01, Juri Lelli wrote:
> Hi Rafael,
> 
> On 02/02/16 17:35, Rafael J. Wysocki wrote:
> > On Tue, Feb 2, 2016 at 4:47 PM, Juri Lelli  wrote:

> > > This patch cleans things up a lot, that's good.
> > >
> > > One thing I'm still concerned about, though: don't we need some locking
> > > in place for some of the store operations on governors attributes? Are
> > > store_{ignore_nice_load, sampling_down_fact, etc} safe without locking?
> > 
> > That would require some investigation I suppose.

Yeah, that protection is required. Sorry about that.

> > > It seems that we can call them from different cpus concurrently.
> > 
> > Yes, we can.
> > 
> > One quick-and-dirty way of dealing with that might be to introduce a
> > "sysfs lock" into struct dbs_data and hold that around the invocation
> > of gattr->store() in the sysfs_ops's ->store callback.

s/dirty/sane ? :)

> Can't we actually try to use the policy->rwsem (or one of the core
> locks) + wait_for_completion approach as we do in cpufreq core?

policy->rwsem will defeat the purpose of this change as it will
reintroduce the ABBA issue.

-- 
viresh


Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops

2016-02-02 Thread Viresh Kumar
On 02-02-16, 14:21, Saravana Kannan wrote:
> I also don't like this patch because it forces governors to either implement
> their own macros and management of their attributes or force them to use the
> governor structs that come with cpufreq_governor.h. cpufreq_governor.h IMHO
> is very ondemand and conservative governor specific and is very irrelevant
> for sched-dvfs or any other governors (hint hint).

But who is stopping us from breaking that file and moving some of it
into include/linux/cpufreq.h ?

We can do that today as well, but it would be fine to do that, when we
add more governors to the core. Though, it would only take a simple
patch if people want me to do it now.

> The only time this ABBA locking is an issue is when governor are changing
> and trying to add/remove attributes. That can easily be checked in
> store_governor

store_scaling_governor ??

> and dealt with without holding the policy rwsem if the

Are you saying that we could have taken the rwsem from the generic
cpufreq.c:store() and dropped it from store_scaling_governor() ? That
would have been something similar to what I tried earlier, which I
never posted (I gave the link to that few days back).

> governors can provide their per sys and per policy attribute arrays as part
> of registering themselves.

These per-sys and per-policy attributes really suck. There is nothing
really different in the implementation, just that the show/store
callbacks have different prototype. One accept 'kboj' as the parameter,
other accept 'policy'. I would call that a HACK as well (I only
implemented it though).

That should just die. A single list of attributes is what we should
have had initially as well.

-- 
viresh


Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops

2016-02-02 Thread Viresh Kumar
On 02-02-16, 17:32, Saravana Kannan wrote:
> But if we are expecting sched dvfs to come in, why make it worse for it. It
> would be completely pointless to try and shoehorn sched dvfs to use
> cpufreq_governor.c

We can move the common part to cpufreq core and not make sched-dvfs
reuse cpufreq_governor.c

-- 
viresh


Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops

2016-02-02 Thread Viresh Kumar
On 02-02-16, 20:03, Saravana Kannan wrote:
> This is the hotplug case I mentioned. The sysfs file removals will happen
> only for the last CPU in that policy (we thankfully optimized that part last
> year). We also know that multiple CPUs can't be hotplugged at the same time.
> So, in the start of cpufreq_offline_prepare, we just need to check if this
> is the last CPU in the policy and if that's the case, do the gov sysfs
> remove and then grab the policy lock and do all our crap. If a read is going
> on, that's going to finish before the sysfs attr remove can go ahead and it
> can grab the policy lock if it needs to and that still won't cause a
> deadlock because we haven't yet grabbed the policy lock in
> cpufreq_offline_prepare(). If the read comes after the sysfs remove, then
> the read is obviously going to fail (we can depend on the sysfs framework on
> doing its job there).

IMHO, these are all dirty hacks we should stay away from. Adding such
hunks in code is considered a band-aid kind of solution and hurts
readability badly. The new solution (new governor show/store)
implement this in a very clean and proper way I feel..

Others are free to disagree though :)

-- 
viresh


Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops

2016-02-02 Thread Viresh Kumar
On 02-02-16, 22:23, Rafael J. Wysocki wrote:
> On Tue, Feb 2, 2016 at 11:57 AM, Viresh Kumar  wrote:

> "The ondemand and conservative governors use the global-attr or
> freq-attr structures to represent sysfs attributes corresponding to
> their tunables

> (which of them is actually used depends on whether or
> not different policy objects can use different governors at the same
> time

Not exactly. Different policies can always use different governors.
What made the difference was that different policies using same
governor, with different tunables or separate governor directories.

I have reworded this para like:

The ondemand and conservative governors use the global-attr or freq-attr
structures to represent sysfs attributes corresponding to their tunables
(which of them is actually used depends on whether or not different
policy objects can use same governor with different tunables at the same
time and, consequently, on where those attributes are located in sysfs).

Please let me know if isn't clear.

> > --- a/drivers/cpufreq/cpufreq_governor.c
> > +   ret = kobject_init_and_add(_data->kobj, _data->kobj_type,
> > +  get_governor_parent_kobj(policy),
> > +  attr_group->name);
> > +   if (ret) {
> > +   pr_err("%s: failed to init dbs_data kobj: %d\n", __func__, 
> > ret);
> 
> pr_debug() would be better here.

Its a real error, why pr_debug for that ?

> > goto reset_gdbs_data;
> > +   }
> >
> > return 0;
> >
> > @@ -426,8 +457,7 @@ static int cpufreq_governor_exit(struct cpufreq_policy 
> > *policy,
> > return -EBUSY;
> >
> > if (!--dbs_data->usage_count) {
> > -   sysfs_remove_group(get_governor_parent_kobj(policy),
> > -  get_sysfs_attr(dbs_data));
> > +   kobject_put(_data->kobj);
> 
> Don't we need a ->release callback for this kobject?

There is nothing that we need to free from the ->release() callback.
We are using the kobject here just to get separate show/store
callbacks.

Here is the new version based on the review comments received until
now:

-8<-

From: Viresh Kumar 
Date: Tue, 2 Feb 2016 12:35:01 +0530
Subject: [PATCH] cpufreq: governor: New sysfs show/store callbacks for
 governor tunables

The ondemand and conservative governors use the global-attr or freq-attr
structures to represent sysfs attributes corresponding to their tunables
(which of them is actually used depends on whether or not different
policy objects can use same governor with different tunables at the same
time and, consequently, on where those attributes are located in sysfs).

Unfortunately, in the freq-attr case, the standard cpufreq show/store
sysfs attribute callbacks are applied to the governor tunable attributes
and they always acquire the policy->rwsem lock before carrying out the
operation.  That may lead to an ABBA deadlock if governor tunable
attributes are removed under policy->rwsem while one of them is being
accessed concurrently (if sysfs attributes removal wins the race, it
will wait for the access to complete with policy->rwsem held while the
attribute callback will block on policy->rwsem indefinitely).

We attempted to address this issue by dropping policy->rwsem around
governor tunable attributes removal (that is, around invocations of the
->governor callback with the event arg equal to CPUFREQ_GOV_POLICY_EXIT)
in cpufreq_set_policy(), but that opened up race conditions that had not
been possible with policy->rwsem held all the time.  Therefore
policy->rwsem cannot be dropped in cpufreq_set_policy() at any point,
but the deadlock situation described above must be avoided too.

To that end, use the observation that in principle governor tunables may
be represented by the same data type regardless of whether the governor
is system-wide or per-policy and introduce a new structure, struct
governor_attr, for representing them and new corresponding macros for
creating show/store sysfs callbacks for them.  Also make their parent
kobject use a new kobject type whose default show/store callbacks are
not related to the standard core cpufreq ones in any way (and they don't
acquire policy->rwsem in particular).

[ Rafael: Written changelog ]
Signed-off-by: Viresh Kumar 
---
 drivers/cpufreq/cpufreq_conservative.c | 73 --
 drivers/cpufreq/cpufreq_governor.c | 71 -
 drivers/cpufreq/cpufreq_governor.h | 34 ++--
 drivers/cpufreq/cpufreq_ondemand.c | 73 --
 4 files changed, 144 insertions(+), 107 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_conservative.c 
b/drivers/cpufreq/cpufreq_conservative.c
index 57750367bd26..c749fb4fe5d2 100644
--- 

Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops

2016-02-02 Thread Saravana Kannan

On 02/02/2016 05:52 PM, Rafael J. Wysocki wrote:

On Wed, Feb 3, 2016 at 2:32 AM, Saravana Kannan  wrote:

On 02/02/2016 05:07 PM, Rafael J. Wysocki wrote:


On Wed, Feb 3, 2016 at 12:42 AM, Rafael J. Wysocki 
wrote:


On Tue, Feb 2, 2016 at 11:21 PM, Saravana Kannan 
wrote:


On 02/02/2016 11:40 AM, Rafael J. Wysocki wrote:



On Tue, Feb 2, 2016 at 6:01 PM, Juri Lelli  wrote:





[cut]



I also don't like this patch because it forces governors to either
implement
their own macros and management of their attributes or force them to use
the
governor structs that come with cpufreq_governor.h. cpufreq_governor.h
IMHO
is very ondemand and conservative governor specific and is very
irrelevant
for sched-dvfs or any other governors (hint hint).

The only time this ABBA locking is an issue is when governor are
changing
and trying to add/remove attributes. That can easily be checked in
store_governor and dealt with without holding the policy rwsem if the
governors can provide their per sys and per policy attribute arrays as
part
of registering themselves.

I'm sorry that I just keep talking about the idea and not sending out
the
patches.



I think you have a point, though.

The deadlock really is specific to the governors using the code in
cpufreq_governor.c.



That said no other governors in the tree use any sysfs attributes for
tunables AFAICS and the out-of-the tree ones are out of interest here.



But if we are expecting sched dvfs to come in, why make it worse for it. It
would be completely pointless to try and shoehorn sched dvfs to use
cpufreq_governor.c


Well, do you honestly think that using the existing stuff in it would
be a good idea?

If not, then why it matters at all?


Also the deadlock happens if one of the tunable attributes is accessed
while we're trying to remove it which very well may happen on read
access too.


Isn't this THE deadlock we are talking about? The removal of the attributes
only happen when governors are changes and we send a POLICY_EXIT and or all
the cores are hotplugged out.


It generally happens when the "old" governor is going away, whatever the reason.


And my suggestion would work just as well there.

Why are you prefixing your sentence with "Also"? Is there some other case
I'm not considering?


Say someone is reading sampling_rate for a policy with 1 CPU in it and
someone else is taking the CPU offline.  The governor EXIT code path
(that will trigger as a result) will try to remove the sampling_rate
attribute and (if it does that under policy->rwsem) it'll wait for the
read access to finish.  Where exactly would you put the deadlock
prevention in this case?


This is the hotplug case I mentioned. The sysfs file removals will 
happen only for the last CPU in that policy (we thankfully optimized 
that part last year). We also know that multiple CPUs can't be 
hotplugged at the same time. So, in the start of 
cpufreq_offline_prepare, we just need to check if this is the last CPU 
in the policy and if that's the case, do the gov sysfs remove and then 
grab the policy lock and do all our crap. If a read is going on, that's 
going to finish before the sysfs attr remove can go ahead and it can 
grab the policy lock if it needs to and that still won't cause a 
deadlock because we haven't yet grabbed the policy lock in 
cpufreq_offline_prepare(). If the read comes after the sysfs remove, 
then the read is obviously going to fail (we can depend on the sysfs 
framework on doing its job there).


Will that still leave any race conditions in?

-Saravana
--
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project