Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops
On 02/02/2016 10:54 PM, Viresh Kumar wrote: On 02-02-16, 17:32, Saravana Kannan wrote: But if we are expecting sched dvfs to come in, why make it worse for it. It would be completely pointless to try and shoehorn sched dvfs to use cpufreq_governor.c We can move the common part to cpufreq core and not make sched-dvfs reuse cpufreq_governor.c Let's do this please. -Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops
On 02/02/2016 10:57 PM, Viresh Kumar wrote: On 02-02-16, 20:03, Saravana Kannan wrote: This is the hotplug case I mentioned. The sysfs file removals will happen only for the last CPU in that policy (we thankfully optimized that part last year). We also know that multiple CPUs can't be hotplugged at the same time. So, in the start of cpufreq_offline_prepare, we just need to check if this is the last CPU in the policy and if that's the case, do the gov sysfs remove and then grab the policy lock and do all our crap. If a read is going on, that's going to finish before the sysfs attr remove can go ahead and it can grab the policy lock if it needs to and that still won't cause a deadlock because we haven't yet grabbed the policy lock in cpufreq_offline_prepare(). If the read comes after the sysfs remove, then the read is obviously going to fail (we can depend on the sysfs framework on doing its job there). IMHO, these are all dirty hacks we should stay away from. Adding such hunks in code is considered a band-aid kind of solution and hurts readability badly. The new solution (new governor show/store) implement this in a very clean and proper way I feel.. Others are free to disagree though :) I think it looks clean since we haven't sorted out the race conditions that Juri pointed out. So, it's early to call this series clean :) Also, I don't see it as a dirty hack at all. What's so hacky about it? We are just identifying conditions when we'll have to remove the sysfs files and removing them before grabbing the policy lock. The unlock/lock that we have now is what is a dirty hack. -Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops
On Wed, Feb 3, 2016 at 2:21 PM, Viresh Kumar wrote: > On 03-02-16, 13:42, Rafael J. Wysocki wrote: >> > +static ssize_t governor_show(struct kobject *kobj, struct attribute *attr, >> > +char *buf) >> > +{ >> > + struct dbs_data *dbs_data = to_dbs_data(kobj); >> > + struct governor_attr *gattr = to_gov_attr(attr); >> > + int ret = -EIO; >> > + >> > + down_read(&dbs_data->rwsem); >> > + >> > + if (gattr->show) >> > + ret = gattr->show(dbs_data, buf); >> > + >> > + up_read(&dbs_data->rwsem); >> >> Do we need the lock here too? >> >> show() is only going to read the value, isn't it? And everything u32 >> or smaller is read atomically anyway. > > Okay, will drop it for now. > >> Apart from this it looks good to me. >> >> When you're ready, please resend the whole series without patch [5/5] >> which is premature IMO. > > I have changed that patch a bit, and am dropping just the lock now and > not governor_enable thing. That should be sane enough I believe. In any case this is not suitable for 4.5 IMO. > Anyway I will be posting 7 patches now, pick only first 4 if you > aren't confident about the rest. OK Thanks, Rafael
Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops
On 03-02-16, 13:42, Rafael J. Wysocki wrote: > > +static ssize_t governor_show(struct kobject *kobj, struct attribute *attr, > > +char *buf) > > +{ > > + struct dbs_data *dbs_data = to_dbs_data(kobj); > > + struct governor_attr *gattr = to_gov_attr(attr); > > + int ret = -EIO; > > + > > + down_read(&dbs_data->rwsem); > > + > > + if (gattr->show) > > + ret = gattr->show(dbs_data, buf); > > + > > + up_read(&dbs_data->rwsem); > > Do we need the lock here too? > > show() is only going to read the value, isn't it? And everything u32 > or smaller is read atomically anyway. Okay, will drop it for now. > Apart from this it looks good to me. > > When you're ready, please resend the whole series without patch [5/5] > which is premature IMO. I have changed that patch a bit, and am dropping just the lock now and not governor_enable thing. That should be sane enough I believe. Anyway I will be posting 7 patches now, pick only first 4 if you aren't confident about the rest. -- viresh
Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops
On Wed, Feb 3, 2016 at 7:58 AM, Viresh Kumar wrote: > On 02-02-16, 22:23, Rafael J. Wysocki wrote: >> On Tue, Feb 2, 2016 at 11:57 AM, Viresh Kumar >> wrote: > >> "The ondemand and conservative governors use the global-attr or >> freq-attr structures to represent sysfs attributes corresponding to >> their tunables > >> (which of them is actually used depends on whether or >> not different policy objects can use different governors at the same >> time > > Not exactly. Different policies can always use different governors. > What made the difference was that different policies using same > governor, with different tunables or separate governor directories. > > I have reworded this para like: > > The ondemand and conservative governors use the global-attr or freq-attr > structures to represent sysfs attributes corresponding to their tunables > (which of them is actually used depends on whether or not different > policy objects can use same governor with different tunables at the same > time and, consequently, on where those attributes are located in sysfs). > > Please let me know if isn't clear. That's OK. IMO you should say "use the same governor", but that's easily fixable. :-) >> > --- a/drivers/cpufreq/cpufreq_governor.c >> > + ret = kobject_init_and_add(&dbs_data->kobj, &dbs_data->kobj_type, >> > + get_governor_parent_kobj(policy), >> > + attr_group->name); >> > + if (ret) { >> > + pr_err("%s: failed to init dbs_data kobj: %d\n", __func__, >> > ret); >> >> pr_debug() would be better here. > > Its a real error, why pr_debug for that ? What's the value of printing that on user systems? It contains debug information only and it is not useful to anyone unfamiliar with the code in question anyway. >> > goto reset_gdbs_data; >> > + } >> > >> > return 0; >> > >> > @@ -426,8 +457,7 @@ static int cpufreq_governor_exit(struct cpufreq_policy >> > *policy, >> > return -EBUSY; >> > >> > if (!--dbs_data->usage_count) { >> > - sysfs_remove_group(get_governor_parent_kobj(policy), >> > - get_sysfs_attr(dbs_data)); >> > + kobject_put(&dbs_data->kobj); >> >> Don't we need a ->release callback for this kobject? > > There is nothing that we need to free from the ->release() callback. > We are using the kobject here just to get separate show/store > callbacks. Well, I guess the answer should be that there can't be more active references to the kobject, so it is safe to free it synchronously later. > Here is the new version based on the review comments received until > now: > > -8<- > > From: Viresh Kumar > Date: Tue, 2 Feb 2016 12:35:01 +0530 > Subject: [PATCH] cpufreq: governor: New sysfs show/store callbacks for > governor tunables > [cut] > @@ -22,14 +22,62 @@ > > #include "cpufreq_governor.h" > > -static struct attribute_group *get_sysfs_attr(struct dbs_data *dbs_data) > +static inline struct dbs_data *to_dbs_data(struct kobject *kobj) > { > - if (have_governor_per_policy()) > - return dbs_data->cdata->attr_group_gov_pol; > - else > - return dbs_data->cdata->attr_group_gov_sys; > + return container_of(kobj, struct dbs_data, kobj); > +} > + > +static inline struct governor_attr *to_gov_attr(struct attribute *attr) > +{ > + return container_of(attr, struct governor_attr, attr); > +} > + > +static ssize_t governor_show(struct kobject *kobj, struct attribute *attr, > +char *buf) > +{ > + struct dbs_data *dbs_data = to_dbs_data(kobj); > + struct governor_attr *gattr = to_gov_attr(attr); > + int ret = -EIO; > + > + down_read(&dbs_data->rwsem); > + > + if (gattr->show) > + ret = gattr->show(dbs_data, buf); > + > + up_read(&dbs_data->rwsem); Do we need the lock here too? show() is only going to read the value, isn't it? And everything u32 or smaller is read atomically anyway. Apart from this it looks good to me. When you're ready, please resend the whole series without patch [5/5] which is premature IMO. Thanks, Rafael
Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops
On 03-02-16, 10:51, Juri Lelli wrote: > I also think that sched-dvfs should not use cpufreq_governor.c. It is > useful boilerplate code for ondemand and conservative, as they share lot > of data structures and how they work, but it doesn't necessarily suit > everybody's needs, IMHO. > > OTOH, fixing the current issue in the best way we can come up with has > still value of course :). Right. cpufreq_governor.c is more about the technique where we do load evaluation using deferred timers and workqueues, which isn't required for sched-dvfs. We can just move the common parts, like, governor_show/governor_store routines, and the new macros being added here. -- viresh
Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops
On 03/02/16 12:24, Viresh Kumar wrote: > On 02-02-16, 17:32, Saravana Kannan wrote: > > But if we are expecting sched dvfs to come in, why make it worse for it. It > > would be completely pointless to try and shoehorn sched dvfs to use > > cpufreq_governor.c > > We can move the common part to cpufreq core and not make sched-dvfs > reuse cpufreq_governor.c > I also think that sched-dvfs should not use cpufreq_governor.c. It is useful boilerplate code for ondemand and conservative, as they share lot of data structures and how they work, but it doesn't necessarily suit everybody's needs, IMHO. OTOH, fixing the current issue in the best way we can come up with has still value of course :). Best, - Juri
Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops
On 02-02-16, 22:23, Rafael J. Wysocki wrote: > On Tue, Feb 2, 2016 at 11:57 AM, Viresh Kumar wrote: > "The ondemand and conservative governors use the global-attr or > freq-attr structures to represent sysfs attributes corresponding to > their tunables > (which of them is actually used depends on whether or > not different policy objects can use different governors at the same > time Not exactly. Different policies can always use different governors. What made the difference was that different policies using same governor, with different tunables or separate governor directories. I have reworded this para like: The ondemand and conservative governors use the global-attr or freq-attr structures to represent sysfs attributes corresponding to their tunables (which of them is actually used depends on whether or not different policy objects can use same governor with different tunables at the same time and, consequently, on where those attributes are located in sysfs). Please let me know if isn't clear. > > --- a/drivers/cpufreq/cpufreq_governor.c > > + ret = kobject_init_and_add(&dbs_data->kobj, &dbs_data->kobj_type, > > + get_governor_parent_kobj(policy), > > + attr_group->name); > > + if (ret) { > > + pr_err("%s: failed to init dbs_data kobj: %d\n", __func__, > > ret); > > pr_debug() would be better here. Its a real error, why pr_debug for that ? > > goto reset_gdbs_data; > > + } > > > > return 0; > > > > @@ -426,8 +457,7 @@ static int cpufreq_governor_exit(struct cpufreq_policy > > *policy, > > return -EBUSY; > > > > if (!--dbs_data->usage_count) { > > - sysfs_remove_group(get_governor_parent_kobj(policy), > > - get_sysfs_attr(dbs_data)); > > + kobject_put(&dbs_data->kobj); > > Don't we need a ->release callback for this kobject? There is nothing that we need to free from the ->release() callback. We are using the kobject here just to get separate show/store callbacks. Here is the new version based on the review comments received until now: -8<- From: Viresh Kumar Date: Tue, 2 Feb 2016 12:35:01 +0530 Subject: [PATCH] cpufreq: governor: New sysfs show/store callbacks for governor tunables The ondemand and conservative governors use the global-attr or freq-attr structures to represent sysfs attributes corresponding to their tunables (which of them is actually used depends on whether or not different policy objects can use same governor with different tunables at the same time and, consequently, on where those attributes are located in sysfs). Unfortunately, in the freq-attr case, the standard cpufreq show/store sysfs attribute callbacks are applied to the governor tunable attributes and they always acquire the policy->rwsem lock before carrying out the operation. That may lead to an ABBA deadlock if governor tunable attributes are removed under policy->rwsem while one of them is being accessed concurrently (if sysfs attributes removal wins the race, it will wait for the access to complete with policy->rwsem held while the attribute callback will block on policy->rwsem indefinitely). We attempted to address this issue by dropping policy->rwsem around governor tunable attributes removal (that is, around invocations of the ->governor callback with the event arg equal to CPUFREQ_GOV_POLICY_EXIT) in cpufreq_set_policy(), but that opened up race conditions that had not been possible with policy->rwsem held all the time. Therefore policy->rwsem cannot be dropped in cpufreq_set_policy() at any point, but the deadlock situation described above must be avoided too. To that end, use the observation that in principle governor tunables may be represented by the same data type regardless of whether the governor is system-wide or per-policy and introduce a new structure, struct governor_attr, for representing them and new corresponding macros for creating show/store sysfs callbacks for them. Also make their parent kobject use a new kobject type whose default show/store callbacks are not related to the standard core cpufreq ones in any way (and they don't acquire policy->rwsem in particular). [ Rafael: Written changelog ] Signed-off-by: Viresh Kumar --- drivers/cpufreq/cpufreq_conservative.c | 73 -- drivers/cpufreq/cpufreq_governor.c | 71 - drivers/cpufreq/cpufreq_governor.h | 34 ++-- drivers/cpufreq/cpufreq_ondemand.c | 73 -- 4 files changed, 144 insertions(+), 107 deletions(-) diff --git a/drivers/cpufreq/cpufreq_conservative.c b/drivers/cpufreq/cpufreq_conservative.c index 57750367bd26..c749fb4fe5d2 100644 --- a/drivers/cpufreq/cpufreq_conservative.c +++ b/drivers/cpufreq/cpufreq_conservative.c @@ -
Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops
On 02-02-16, 20:03, Saravana Kannan wrote: > This is the hotplug case I mentioned. The sysfs file removals will happen > only for the last CPU in that policy (we thankfully optimized that part last > year). We also know that multiple CPUs can't be hotplugged at the same time. > So, in the start of cpufreq_offline_prepare, we just need to check if this > is the last CPU in the policy and if that's the case, do the gov sysfs > remove and then grab the policy lock and do all our crap. If a read is going > on, that's going to finish before the sysfs attr remove can go ahead and it > can grab the policy lock if it needs to and that still won't cause a > deadlock because we haven't yet grabbed the policy lock in > cpufreq_offline_prepare(). If the read comes after the sysfs remove, then > the read is obviously going to fail (we can depend on the sysfs framework on > doing its job there). IMHO, these are all dirty hacks we should stay away from. Adding such hunks in code is considered a band-aid kind of solution and hurts readability badly. The new solution (new governor show/store) implement this in a very clean and proper way I feel.. Others are free to disagree though :) -- viresh
Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops
On 02-02-16, 17:32, Saravana Kannan wrote: > But if we are expecting sched dvfs to come in, why make it worse for it. It > would be completely pointless to try and shoehorn sched dvfs to use > cpufreq_governor.c We can move the common part to cpufreq core and not make sched-dvfs reuse cpufreq_governor.c -- viresh
Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops
On 02-02-16, 14:21, Saravana Kannan wrote: > I also don't like this patch because it forces governors to either implement > their own macros and management of their attributes or force them to use the > governor structs that come with cpufreq_governor.h. cpufreq_governor.h IMHO > is very ondemand and conservative governor specific and is very irrelevant > for sched-dvfs or any other governors (hint hint). But who is stopping us from breaking that file and moving some of it into include/linux/cpufreq.h ? We can do that today as well, but it would be fine to do that, when we add more governors to the core. Though, it would only take a simple patch if people want me to do it now. > The only time this ABBA locking is an issue is when governor are changing > and trying to add/remove attributes. That can easily be checked in > store_governor store_scaling_governor ?? > and dealt with without holding the policy rwsem if the Are you saying that we could have taken the rwsem from the generic cpufreq.c:store() and dropped it from store_scaling_governor() ? That would have been something similar to what I tried earlier, which I never posted (I gave the link to that few days back). > governors can provide their per sys and per policy attribute arrays as part > of registering themselves. These per-sys and per-policy attributes really suck. There is nothing really different in the implementation, just that the show/store callbacks have different prototype. One accept 'kboj' as the parameter, other accept 'policy'. I would call that a HACK as well (I only implemented it though). That should just die. A single list of attributes is what we should have had initially as well. -- viresh
Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops
On 02-02-16, 17:01, Juri Lelli wrote: > Hi Rafael, > > On 02/02/16 17:35, Rafael J. Wysocki wrote: > > On Tue, Feb 2, 2016 at 4:47 PM, Juri Lelli wrote: > > > This patch cleans things up a lot, that's good. > > > > > > One thing I'm still concerned about, though: don't we need some locking > > > in place for some of the store operations on governors attributes? Are > > > store_{ignore_nice_load, sampling_down_fact, etc} safe without locking? > > > > That would require some investigation I suppose. Yeah, that protection is required. Sorry about that. > > > It seems that we can call them from different cpus concurrently. > > > > Yes, we can. > > > > One quick-and-dirty way of dealing with that might be to introduce a > > "sysfs lock" into struct dbs_data and hold that around the invocation > > of gattr->store() in the sysfs_ops's ->store callback. s/dirty/sane ? :) > Can't we actually try to use the policy->rwsem (or one of the core > locks) + wait_for_completion approach as we do in cpufreq core? policy->rwsem will defeat the purpose of this change as it will reintroduce the ABBA issue. -- viresh
Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops
On 02/02/2016 05:52 PM, Rafael J. Wysocki wrote: On Wed, Feb 3, 2016 at 2:32 AM, Saravana Kannan wrote: On 02/02/2016 05:07 PM, Rafael J. Wysocki wrote: On Wed, Feb 3, 2016 at 12:42 AM, Rafael J. Wysocki wrote: On Tue, Feb 2, 2016 at 11:21 PM, Saravana Kannan wrote: On 02/02/2016 11:40 AM, Rafael J. Wysocki wrote: On Tue, Feb 2, 2016 at 6:01 PM, Juri Lelli wrote: [cut] I also don't like this patch because it forces governors to either implement their own macros and management of their attributes or force them to use the governor structs that come with cpufreq_governor.h. cpufreq_governor.h IMHO is very ondemand and conservative governor specific and is very irrelevant for sched-dvfs or any other governors (hint hint). The only time this ABBA locking is an issue is when governor are changing and trying to add/remove attributes. That can easily be checked in store_governor and dealt with without holding the policy rwsem if the governors can provide their per sys and per policy attribute arrays as part of registering themselves. I'm sorry that I just keep talking about the idea and not sending out the patches. I think you have a point, though. The deadlock really is specific to the governors using the code in cpufreq_governor.c. That said no other governors in the tree use any sysfs attributes for tunables AFAICS and the out-of-the tree ones are out of interest here. But if we are expecting sched dvfs to come in, why make it worse for it. It would be completely pointless to try and shoehorn sched dvfs to use cpufreq_governor.c Well, do you honestly think that using the existing stuff in it would be a good idea? If not, then why it matters at all? Also the deadlock happens if one of the tunable attributes is accessed while we're trying to remove it which very well may happen on read access too. Isn't this THE deadlock we are talking about? The removal of the attributes only happen when governors are changes and we send a POLICY_EXIT and or all the cores are hotplugged out. It generally happens when the "old" governor is going away, whatever the reason. And my suggestion would work just as well there. Why are you prefixing your sentence with "Also"? Is there some other case I'm not considering? Say someone is reading sampling_rate for a policy with 1 CPU in it and someone else is taking the CPU offline. The governor EXIT code path (that will trigger as a result) will try to remove the sampling_rate attribute and (if it does that under policy->rwsem) it'll wait for the read access to finish. Where exactly would you put the deadlock prevention in this case? This is the hotplug case I mentioned. The sysfs file removals will happen only for the last CPU in that policy (we thankfully optimized that part last year). We also know that multiple CPUs can't be hotplugged at the same time. So, in the start of cpufreq_offline_prepare, we just need to check if this is the last CPU in the policy and if that's the case, do the gov sysfs remove and then grab the policy lock and do all our crap. If a read is going on, that's going to finish before the sysfs attr remove can go ahead and it can grab the policy lock if it needs to and that still won't cause a deadlock because we haven't yet grabbed the policy lock in cpufreq_offline_prepare(). If the read comes after the sysfs remove, then the read is obviously going to fail (we can depend on the sysfs framework on doing its job there). Will that still leave any race conditions in? -Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops
On Wed, Feb 3, 2016 at 2:32 AM, Saravana Kannan wrote: > On 02/02/2016 05:07 PM, Rafael J. Wysocki wrote: >> >> On Wed, Feb 3, 2016 at 12:42 AM, Rafael J. Wysocki >> wrote: >>> >>> On Tue, Feb 2, 2016 at 11:21 PM, Saravana Kannan >>> wrote: On 02/02/2016 11:40 AM, Rafael J. Wysocki wrote: > > > On Tue, Feb 2, 2016 at 6:01 PM, Juri Lelli wrote: >> >> >> >> [cut] >> I also don't like this patch because it forces governors to either implement their own macros and management of their attributes or force them to use the governor structs that come with cpufreq_governor.h. cpufreq_governor.h IMHO is very ondemand and conservative governor specific and is very irrelevant for sched-dvfs or any other governors (hint hint). The only time this ABBA locking is an issue is when governor are changing and trying to add/remove attributes. That can easily be checked in store_governor and dealt with without holding the policy rwsem if the governors can provide their per sys and per policy attribute arrays as part of registering themselves. I'm sorry that I just keep talking about the idea and not sending out the patches. >>> >>> >>> I think you have a point, though. >>> >>> The deadlock really is specific to the governors using the code in >>> cpufreq_governor.c. >> >> >> That said no other governors in the tree use any sysfs attributes for >> tunables AFAICS and the out-of-the tree ones are out of interest here. > > > But if we are expecting sched dvfs to come in, why make it worse for it. It > would be completely pointless to try and shoehorn sched dvfs to use > cpufreq_governor.c Well, do you honestly think that using the existing stuff in it would be a good idea? If not, then why it matters at all? >> Also the deadlock happens if one of the tunable attributes is accessed >> while we're trying to remove it which very well may happen on read >> access too. > > Isn't this THE deadlock we are talking about? The removal of the attributes > only happen when governors are changes and we send a POLICY_EXIT and or all > the cores are hotplugged out. It generally happens when the "old" governor is going away, whatever the reason. > And my suggestion would work just as well there. > > Why are you prefixing your sentence with "Also"? Is there some other case > I'm not considering? Say someone is reading sampling_rate for a policy with 1 CPU in it and someone else is taking the CPU offline. The governor EXIT code path (that will trigger as a result) will try to remove the sampling_rate attribute and (if it does that under policy->rwsem) it'll wait for the read access to finish. Where exactly would you put the deadlock prevention in this case? Thanks, Rafael
Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops
On 02/02/2016 05:07 PM, Rafael J. Wysocki wrote: On Wed, Feb 3, 2016 at 12:42 AM, Rafael J. Wysocki wrote: On Tue, Feb 2, 2016 at 11:21 PM, Saravana Kannan wrote: On 02/02/2016 11:40 AM, Rafael J. Wysocki wrote: On Tue, Feb 2, 2016 at 6:01 PM, Juri Lelli wrote: [cut] I also don't like this patch because it forces governors to either implement their own macros and management of their attributes or force them to use the governor structs that come with cpufreq_governor.h. cpufreq_governor.h IMHO is very ondemand and conservative governor specific and is very irrelevant for sched-dvfs or any other governors (hint hint). The only time this ABBA locking is an issue is when governor are changing and trying to add/remove attributes. That can easily be checked in store_governor and dealt with without holding the policy rwsem if the governors can provide their per sys and per policy attribute arrays as part of registering themselves. I'm sorry that I just keep talking about the idea and not sending out the patches. I think you have a point, though. The deadlock really is specific to the governors using the code in cpufreq_governor.c. That said no other governors in the tree use any sysfs attributes for tunables AFAICS and the out-of-the tree ones are out of interest here. But if we are expecting sched dvfs to come in, why make it worse for it. It would be completely pointless to try and shoehorn sched dvfs to use cpufreq_governor.c Also the deadlock happens if one of the tunable attributes is accessed while we're trying to remove it which very well may happen on read access too. Isn't this THE deadlock we are talking about? The removal of the attributes only happen when governors are changes and we send a POLICY_EXIT and or all the cores are hotplugged out. And my suggestion would work just as well there. Why are you prefixing your sentence with "Also"? Is there some other case I'm not considering? -Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops
On Wed, Feb 3, 2016 at 12:42 AM, Rafael J. Wysocki wrote: > On Tue, Feb 2, 2016 at 11:21 PM, Saravana Kannan > wrote: >> On 02/02/2016 11:40 AM, Rafael J. Wysocki wrote: >>> >>> On Tue, Feb 2, 2016 at 6:01 PM, Juri Lelli wrote: [cut] >> >> I also don't like this patch because it forces governors to either implement >> their own macros and management of their attributes or force them to use the >> governor structs that come with cpufreq_governor.h. cpufreq_governor.h IMHO >> is very ondemand and conservative governor specific and is very irrelevant >> for sched-dvfs or any other governors (hint hint). >> >> The only time this ABBA locking is an issue is when governor are changing >> and trying to add/remove attributes. That can easily be checked in >> store_governor and dealt with without holding the policy rwsem if the >> governors can provide their per sys and per policy attribute arrays as part >> of registering themselves. >> >> I'm sorry that I just keep talking about the idea and not sending out the >> patches. > > I think you have a point, though. > > The deadlock really is specific to the governors using the code in > cpufreq_governor.c. That said no other governors in the tree use any sysfs attributes for tunables AFAICS and the out-of-the tree ones are out of interest here. Also the deadlock happens if one of the tunable attributes is accessed while we're trying to remove it which very well may happen on read access too. Thanks, Rafael
Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops
On Tue, Feb 2, 2016 at 11:21 PM, Saravana Kannan wrote: > On 02/02/2016 11:40 AM, Rafael J. Wysocki wrote: >> >> On Tue, Feb 2, 2016 at 6:01 PM, Juri Lelli wrote: >>> >>> Hi Rafael, >>> >>> On 02/02/16 17:35, Rafael J. Wysocki wrote: On Tue, Feb 2, 2016 at 4:47 PM, Juri Lelli wrote: > > Hi Viresh, > > On 02/02/16 16:27, Viresh Kumar wrote: >> >> Until now, governors (ondemand/conservative) were using the >> 'global-attr' or 'freq-attr', depending on the sysfs location where we >> want to create governor's directory. >> >> The problem is that, in case of 'freq-attr', we are forced to use >> show()/store() present in cpufreq.c, which always take policy->rwsem. >> >> And because of that we were facing some ABBA lockups during governor >> callback event CPUFREQ_GOV_POLICY_EXIT. And so we were dropping the >> rwsem right before calling governor callback for >> CPUFREQ_GOV_POLICY_EXIT >> event. >> >> That caused further problems and it never worked perfectly. >> >> This patch attempts to fix that by creating separate sysfs-ops for >> cpufreq governors. >> >> Because things got much simplified now, we don't need separate >> show/store callbacks for governor-for-system and governor-per-policy >> cases. >> >> Signed-off-by: Viresh Kumar > > > This patch cleans things up a lot, that's good. > > One thing I'm still concerned about, though: don't we need some locking > in place for some of the store operations on governors attributes? Are > store_{ignore_nice_load, sampling_down_fact, etc} safe without locking? That would require some investigation I suppose. > It seems that we can call them from different cpus concurrently. Yes, we can. One quick-and-dirty way of dealing with that might be to introduce a "sysfs lock" into struct dbs_data and hold that around the invocation of gattr->store() in the sysfs_ops's ->store callback. >>> >>> There is value in trying to solve this issue by using some of the >>> existing locks, IMHO. >> >> >> Some value - maybe. I'm not sure how much of it, though. >> >> Finer-grained locking is generally easier to follow, because the locks >> tend to be used for specific purposes only. >> >>> Can't we actually try to use the policy->rwsem (or one of the core >>> locks) + wait_for_completion approach as we do in cpufreq core? >> >> >> No. Too many things depend on that lock already and some of them work >> by accident rather than by design. > > > Also, wait_for_completion() and complete() is just another way to implement > a lock. So, it won't necessarily solve any deadlock issues. > > I also don't like this patch because it forces governors to either implement > their own macros and management of their attributes or force them to use the > governor structs that come with cpufreq_governor.h. cpufreq_governor.h IMHO > is very ondemand and conservative governor specific and is very irrelevant > for sched-dvfs or any other governors (hint hint). > > The only time this ABBA locking is an issue is when governor are changing > and trying to add/remove attributes. That can easily be checked in > store_governor and dealt with without holding the policy rwsem if the > governors can provide their per sys and per policy attribute arrays as part > of registering themselves. > > I'm sorry that I just keep talking about the idea and not sending out the > patches. I think you have a point, though. The deadlock really is specific to the governors using the code in cpufreq_governor.c. Thanks, Rafael
Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops
On 02/02/2016 11:40 AM, Rafael J. Wysocki wrote: On Tue, Feb 2, 2016 at 6:01 PM, Juri Lelli wrote: Hi Rafael, On 02/02/16 17:35, Rafael J. Wysocki wrote: On Tue, Feb 2, 2016 at 4:47 PM, Juri Lelli wrote: Hi Viresh, On 02/02/16 16:27, Viresh Kumar wrote: Until now, governors (ondemand/conservative) were using the 'global-attr' or 'freq-attr', depending on the sysfs location where we want to create governor's directory. The problem is that, in case of 'freq-attr', we are forced to use show()/store() present in cpufreq.c, which always take policy->rwsem. And because of that we were facing some ABBA lockups during governor callback event CPUFREQ_GOV_POLICY_EXIT. And so we were dropping the rwsem right before calling governor callback for CPUFREQ_GOV_POLICY_EXIT event. That caused further problems and it never worked perfectly. This patch attempts to fix that by creating separate sysfs-ops for cpufreq governors. Because things got much simplified now, we don't need separate show/store callbacks for governor-for-system and governor-per-policy cases. Signed-off-by: Viresh Kumar This patch cleans things up a lot, that's good. One thing I'm still concerned about, though: don't we need some locking in place for some of the store operations on governors attributes? Are store_{ignore_nice_load, sampling_down_fact, etc} safe without locking? That would require some investigation I suppose. It seems that we can call them from different cpus concurrently. Yes, we can. One quick-and-dirty way of dealing with that might be to introduce a "sysfs lock" into struct dbs_data and hold that around the invocation of gattr->store() in the sysfs_ops's ->store callback. There is value in trying to solve this issue by using some of the existing locks, IMHO. Some value - maybe. I'm not sure how much of it, though. Finer-grained locking is generally easier to follow, because the locks tend to be used for specific purposes only. Can't we actually try to use the policy->rwsem (or one of the core locks) + wait_for_completion approach as we do in cpufreq core? No. Too many things depend on that lock already and some of them work by accident rather than by design. Also, wait_for_completion() and complete() is just another way to implement a lock. So, it won't necessarily solve any deadlock issues. I also don't like this patch because it forces governors to either implement their own macros and management of their attributes or force them to use the governor structs that come with cpufreq_governor.h. cpufreq_governor.h IMHO is very ondemand and conservative governor specific and is very irrelevant for sched-dvfs or any other governors (hint hint). The only time this ABBA locking is an issue is when governor are changing and trying to add/remove attributes. That can easily be checked in store_governor and dealt with without holding the policy rwsem if the governors can provide their per sys and per policy attribute arrays as part of registering themselves. I'm sorry that I just keep talking about the idea and not sending out the patches. -Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops
On Tue, Feb 2, 2016 at 11:57 AM, Viresh Kumar wrote: First, the subject might be better. What about something like "cpufreq: governor: New sysfs show/store callbacks for governor tunables", for example? > Until now, governors (ondemand/conservative) were using the > 'global-attr' or 'freq-attr', depending on the sysfs location where we > want to create governor's directory. "The ondemand and conservative governors use the global-attr or freq-attr structures to represent sysfs attributes corresponding to their tunables (which of them is actually used depends on whether or not different policy objects can use different governors at the same time and, consequently, on where those attributes are located in sysfs). Unfortunately, in the freq-attr case, the standard cpufreq show/store sysfs attribute callbacks are applied to the governor tunable attributes and they always acquire the policy->rwsem lock before carrying out the operation. That may lead to an ABBA deadlock if governor tunable attributes are removed under policy->rwsem while one of them is being accessed concurrently (if sysfs attributes removal wins the race, it will wait for the access to complete with policy->rwsem held while the attribute callback will block on policy->rwsem indefinitely). We attempted to address this issue by dropping policy->rwsem around governor tunable attributes removal (that is, around invocations of the ->governor callback with the event arg equal to CPUFREQ_GOV_POLICY_EXIT) in cpufreq_set_policy(), but that opened up race conditions that had not been possible with policy->rwsem held all the time. Therefore policy->rwsem cannot be dropped in cpufreq_set_policy() at any point, but the deadlock situation described above must be avoided too. To that end, use the observation that in principle governor tunables may be represented by the same data type regardless of whether the governor is system-wide or per-policy and introduce a new structure, struct governor_attr, for representing them and new corresponding macros for creating show/store sysfs callbacks for them. Also make their parent kobject use a new kobject type whose default show/store callbacks are not related to the standard core cpufreq ones in any way (and they don't acquire policy->rwsem in particular)." IOW, (1) describe the problem you're addressing so that people unfamiliar with the code in question can understand it, (2) describe what is done to address the problem (what's the idea and what changes are made to implement it). [cut] > --- a/drivers/cpufreq/cpufreq_governor.c > +++ b/drivers/cpufreq/cpufreq_governor.c > @@ -22,14 +22,37 @@ > > #include "cpufreq_governor.h" > > -static struct attribute_group *get_sysfs_attr(struct dbs_data *dbs_data) > +#define to_dbs_data(k) container_of(k, struct dbs_data, kobj) > +#define to_attr(a) container_of(a, struct governor_attr, attr) Please change the above to static inline routines. > + > +static ssize_t show(struct kobject *kobj, struct attribute *attr, char *buf) A better name please. Something that will correspond to the purpose. > { > - if (have_governor_per_policy()) > - return dbs_data->cdata->attr_group_gov_pol; > - else > - return dbs_data->cdata->attr_group_gov_sys; > + struct dbs_data *dbs_data = to_dbs_data(kobj); > + struct governor_attr *gattr = to_attr(attr); > + > + if (gattr->show) > + return gattr->show(dbs_data, buf); > + > + return -EIO; > +} > + > +static ssize_t store(struct kobject *kobj, struct attribute *attr, > +const char *buf, size_t count) Ditto. > +{ > + struct dbs_data *dbs_data = to_dbs_data(kobj); > + struct governor_attr *gattr = to_attr(attr); > + > + if (gattr->store) > + return gattr->store(dbs_data, buf, count); Say two instances of this run in parallel with each other, either for the same attribute or for different attributes under the same dbs_data. What's the guarantee that they won't make conflicting changes? > + > + return -EIO; > } > > +static const struct sysfs_ops sysfs_ops = { > + .show = show, > + .store = store, > +}; That is completely enigmatic, so please at least add a comment describing it. > + > void dbs_check_cpu(struct dbs_data *dbs_data, int cpu) > { > struct cpu_dbs_info *cdbs = dbs_data->cdata->get_cpu_cdbs(cpu); > @@ -354,6 +377,7 @@ static int cpufreq_governor_init(struct cpufreq_policy > *policy, > struct dbs_data *dbs_data, > struct common_dbs_data *cdata) > { > + struct attribute_group *attr_group; > int ret; > > /* State should be equivalent to EXIT */ > @@ -395,10 +419,17 @@ static int cpufreq_governor_init(struct cpufreq_policy > *policy, > > policy->governor_data = dbs_data; > > - ret = sysfs_create_group(get_governor_parent_kobj(policy), > -
Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops
On Tue, Feb 2, 2016 at 6:01 PM, Juri Lelli wrote: > Hi Rafael, > > On 02/02/16 17:35, Rafael J. Wysocki wrote: >> On Tue, Feb 2, 2016 at 4:47 PM, Juri Lelli wrote: >> > Hi Viresh, >> > >> > On 02/02/16 16:27, Viresh Kumar wrote: >> >> Until now, governors (ondemand/conservative) were using the >> >> 'global-attr' or 'freq-attr', depending on the sysfs location where we >> >> want to create governor's directory. >> >> >> >> The problem is that, in case of 'freq-attr', we are forced to use >> >> show()/store() present in cpufreq.c, which always take policy->rwsem. >> >> >> >> And because of that we were facing some ABBA lockups during governor >> >> callback event CPUFREQ_GOV_POLICY_EXIT. And so we were dropping the >> >> rwsem right before calling governor callback for CPUFREQ_GOV_POLICY_EXIT >> >> event. >> >> >> >> That caused further problems and it never worked perfectly. >> >> >> >> This patch attempts to fix that by creating separate sysfs-ops for >> >> cpufreq governors. >> >> >> >> Because things got much simplified now, we don't need separate >> >> show/store callbacks for governor-for-system and governor-per-policy >> >> cases. >> >> >> >> Signed-off-by: Viresh Kumar >> > >> > This patch cleans things up a lot, that's good. >> > >> > One thing I'm still concerned about, though: don't we need some locking >> > in place for some of the store operations on governors attributes? Are >> > store_{ignore_nice_load, sampling_down_fact, etc} safe without locking? >> >> That would require some investigation I suppose. >> >> > It seems that we can call them from different cpus concurrently. >> >> Yes, we can. >> >> One quick-and-dirty way of dealing with that might be to introduce a >> "sysfs lock" into struct dbs_data and hold that around the invocation >> of gattr->store() in the sysfs_ops's ->store callback. >> > > There is value in trying to solve this issue by using some of the > existing locks, IMHO. Some value - maybe. I'm not sure how much of it, though. Finer-grained locking is generally easier to follow, because the locks tend to be used for specific purposes only. > Can't we actually try to use the policy->rwsem (or one of the core > locks) + wait_for_completion approach as we do in cpufreq core? No. Too many things depend on that lock already and some of them work by accident rather than by design. Thanks, Rafael
Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops
Hi Rafael, On 02/02/16 17:35, Rafael J. Wysocki wrote: > On Tue, Feb 2, 2016 at 4:47 PM, Juri Lelli wrote: > > Hi Viresh, > > > > On 02/02/16 16:27, Viresh Kumar wrote: > >> Until now, governors (ondemand/conservative) were using the > >> 'global-attr' or 'freq-attr', depending on the sysfs location where we > >> want to create governor's directory. > >> > >> The problem is that, in case of 'freq-attr', we are forced to use > >> show()/store() present in cpufreq.c, which always take policy->rwsem. > >> > >> And because of that we were facing some ABBA lockups during governor > >> callback event CPUFREQ_GOV_POLICY_EXIT. And so we were dropping the > >> rwsem right before calling governor callback for CPUFREQ_GOV_POLICY_EXIT > >> event. > >> > >> That caused further problems and it never worked perfectly. > >> > >> This patch attempts to fix that by creating separate sysfs-ops for > >> cpufreq governors. > >> > >> Because things got much simplified now, we don't need separate > >> show/store callbacks for governor-for-system and governor-per-policy > >> cases. > >> > >> Signed-off-by: Viresh Kumar > > > > This patch cleans things up a lot, that's good. > > > > One thing I'm still concerned about, though: don't we need some locking > > in place for some of the store operations on governors attributes? Are > > store_{ignore_nice_load, sampling_down_fact, etc} safe without locking? > > That would require some investigation I suppose. > > > It seems that we can call them from different cpus concurrently. > > Yes, we can. > > One quick-and-dirty way of dealing with that might be to introduce a > "sysfs lock" into struct dbs_data and hold that around the invocation > of gattr->store() in the sysfs_ops's ->store callback. > There is value in trying to solve this issue by using some of the existing locks, IMHO. Can't we actually try to use the policy->rwsem (or one of the core locks) + wait_for_completion approach as we do in cpufreq core? > BTW, you could have dropped the stuff below this line from your reply > message. That at least would have prevented tools like Patchwork from > storing useless garbage. > Right. Sorry for the garbage; I'll check twice that I trim my replies in the future. Best, - Juri
Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops
On Tue, Feb 2, 2016 at 4:47 PM, Juri Lelli wrote: > Hi Viresh, > > On 02/02/16 16:27, Viresh Kumar wrote: >> Until now, governors (ondemand/conservative) were using the >> 'global-attr' or 'freq-attr', depending on the sysfs location where we >> want to create governor's directory. >> >> The problem is that, in case of 'freq-attr', we are forced to use >> show()/store() present in cpufreq.c, which always take policy->rwsem. >> >> And because of that we were facing some ABBA lockups during governor >> callback event CPUFREQ_GOV_POLICY_EXIT. And so we were dropping the >> rwsem right before calling governor callback for CPUFREQ_GOV_POLICY_EXIT >> event. >> >> That caused further problems and it never worked perfectly. >> >> This patch attempts to fix that by creating separate sysfs-ops for >> cpufreq governors. >> >> Because things got much simplified now, we don't need separate >> show/store callbacks for governor-for-system and governor-per-policy >> cases. >> >> Signed-off-by: Viresh Kumar > > This patch cleans things up a lot, that's good. > > One thing I'm still concerned about, though: don't we need some locking > in place for some of the store operations on governors attributes? Are > store_{ignore_nice_load, sampling_down_fact, etc} safe without locking? That would require some investigation I suppose. > It seems that we can call them from different cpus concurrently. Yes, we can. One quick-and-dirty way of dealing with that might be to introduce a "sysfs lock" into struct dbs_data and hold that around the invocation of gattr->store() in the sysfs_ops's ->store callback. > > Best, > > - Juri BTW, you could have dropped the stuff below this line from your reply message. That at least would have prevented tools like Patchwork from storing useless garbage. Thanks, Rafael
Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops
Hi Viresh, On 02/02/16 16:27, Viresh Kumar wrote: > Until now, governors (ondemand/conservative) were using the > 'global-attr' or 'freq-attr', depending on the sysfs location where we > want to create governor's directory. > > The problem is that, in case of 'freq-attr', we are forced to use > show()/store() present in cpufreq.c, which always take policy->rwsem. > > And because of that we were facing some ABBA lockups during governor > callback event CPUFREQ_GOV_POLICY_EXIT. And so we were dropping the > rwsem right before calling governor callback for CPUFREQ_GOV_POLICY_EXIT > event. > > That caused further problems and it never worked perfectly. > > This patch attempts to fix that by creating separate sysfs-ops for > cpufreq governors. > > Because things got much simplified now, we don't need separate > show/store callbacks for governor-for-system and governor-per-policy > cases. > > Signed-off-by: Viresh Kumar This patch cleans things up a lot, that's good. One thing I'm still concerned about, though: don't we need some locking in place for some of the store operations on governors attributes? Are store_{ignore_nice_load, sampling_down_fact, etc} safe without locking? It seems that we can call them from different cpus concurrently. Best, - Juri > --- > drivers/cpufreq/cpufreq_conservative.c | 71 > +- > drivers/cpufreq/cpufreq_governor.c | 50 +++- > drivers/cpufreq/cpufreq_governor.h | 31 +-- > drivers/cpufreq/cpufreq_ondemand.c | 71 > +- > 4 files changed, 122 insertions(+), 101 deletions(-) > > diff --git a/drivers/cpufreq/cpufreq_conservative.c > b/drivers/cpufreq/cpufreq_conservative.c > index 57750367bd26..980145da796a 100644 > --- a/drivers/cpufreq/cpufreq_conservative.c > +++ b/drivers/cpufreq/cpufreq_conservative.c > @@ -275,51 +275,35 @@ static ssize_t store_freq_step(struct dbs_data > *dbs_data, const char *buf, > return count; > } > > -show_store_one(cs, sampling_rate); > -show_store_one(cs, sampling_down_factor); > -show_store_one(cs, up_threshold); > -show_store_one(cs, down_threshold); > -show_store_one(cs, ignore_nice_load); > -show_store_one(cs, freq_step); > -show_one(cs, min_sampling_rate); > - > -gov_sys_pol_attr_rw(sampling_rate); > -gov_sys_pol_attr_rw(sampling_down_factor); > -gov_sys_pol_attr_rw(up_threshold); > -gov_sys_pol_attr_rw(down_threshold); > -gov_sys_pol_attr_rw(ignore_nice_load); > -gov_sys_pol_attr_rw(freq_step); > -gov_sys_pol_attr_ro(min_sampling_rate); > - > -static struct attribute *dbs_attributes_gov_sys[] = { > - &min_sampling_rate_gov_sys.attr, > - &sampling_rate_gov_sys.attr, > - &sampling_down_factor_gov_sys.attr, > - &up_threshold_gov_sys.attr, > - &down_threshold_gov_sys.attr, > - &ignore_nice_load_gov_sys.attr, > - &freq_step_gov_sys.attr, > +gov_show_one(cs, sampling_rate); > +gov_show_one(cs, sampling_down_factor); > +gov_show_one(cs, up_threshold); > +gov_show_one(cs, down_threshold); > +gov_show_one(cs, ignore_nice_load); > +gov_show_one(cs, freq_step); > +gov_show_one(cs, min_sampling_rate); > + > +gov_attr_rw(sampling_rate); > +gov_attr_rw(sampling_down_factor); > +gov_attr_rw(up_threshold); > +gov_attr_rw(down_threshold); > +gov_attr_rw(ignore_nice_load); > +gov_attr_rw(freq_step); > +gov_attr_ro(min_sampling_rate); > + > +static struct attribute *dbs_attributes[] = { > + &min_sampling_rate.attr, > + &sampling_rate.attr, > + &sampling_down_factor.attr, > + &up_threshold.attr, > + &down_threshold.attr, > + &ignore_nice_load.attr, > + &freq_step.attr, > NULL > }; > > -static struct attribute_group cs_attr_group_gov_sys = { > - .attrs = dbs_attributes_gov_sys, > - .name = "conservative", > -}; > - > -static struct attribute *dbs_attributes_gov_pol[] = { > - &min_sampling_rate_gov_pol.attr, > - &sampling_rate_gov_pol.attr, > - &sampling_down_factor_gov_pol.attr, > - &up_threshold_gov_pol.attr, > - &down_threshold_gov_pol.attr, > - &ignore_nice_load_gov_pol.attr, > - &freq_step_gov_pol.attr, > - NULL > -}; > - > -static struct attribute_group cs_attr_group_gov_pol = { > - .attrs = dbs_attributes_gov_pol, > +static struct attribute_group cs_attr_group = { > + .attrs = dbs_attributes, > .name = "conservative", > }; > > @@ -365,8 +349,7 @@ define_get_cpu_dbs_routines(cs_cpu_dbs_info); > > static struct common_dbs_data cs_dbs_cdata = { > .governor = GOV_CONSERVATIVE, > - .attr_group_gov_sys = &cs_attr_group_gov_sys, > - .attr_group_gov_pol = &cs_attr_group_gov_pol, > + .attr_group = &cs_attr_group, > .get_cpu_cdbs = get_cpu_cdbs, > .get_cpu_dbs_info_s = get_cpu_dbs_info_s, > .gov_dbs_timer = cs_dbs_timer, > diff --git a/drivers/cpufreq/cpufreq_governor.c > b/drivers/cpufreq/cpufreq_governor.c > index 9a7edc91ad57..e785a118cbdc 100644 > --- a/drivers/cpufreq/cpufr