Re: [PATCH 0/2] Add exit_prepare callback to the cpufreq_driver interface.
On Thursday, September 04, 2014 04:07:28 PM Viresh Kumar wrote: > On 4 September 2014 15:33, Preeti U Murthy wrote: > > I think Rafael's point was that since no driver that had implemented the > > target_index callback was using it at the time that this patch was > > proposed, it was be best to couple the check on existence of stop_cpu > > callback with the the check on the kind of cpufreq driver. However > > powerpc is also in need of this today and we implement the target_index > > callback and find it convenient to use the stop CPU callback. > > No, this is what he said.. > > " > So to me, (1) the new ->stop() should *only* be called for ->setpolicy > drivers, > because the purpose of it should be to "allow ->setpolicy drivers to do what > the > GOV_STOP will do for regular drivers" > " > > > Rafael, in which case would it not make sense to remove the check on > > driver->setpolicy above? > > > > Besides, I don't understand very well why we had this double check in > > the first place. Only if the drivers are in need of the functionality > > like stop_cpu, would they have implemented this callback right? If we > > are to assume that the drivers which have implemented the target_index > > callback should never be needing it, they would not have implemented the > > stop CPU callback either. So what was that, which was blatantly wrong > > with just having a check on stop_cpu? I did go through the discussion > > but did not find a convincing answer to this. > > The idea was to get something similar to GOV_STOP for setpolicy drivers. > But in the end we didn't get to that. What we do in GOV_STOP is stop > changing CPUs frequency, but here in stop_cpu() we can stop changing > CPUs frequency OR take it to minimum, whatever we want.. > > As I said earlier, probably we should just do what you did in your patch + > some documentation changes. OK, if that works for everybody. For one, I wouldn't like to end up with a callback used for different things in every drvier implementing it. -- I speak only for myself. Rafael J. Wysocki, Intel Open Source Technology Center. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] Add exit_prepare callback to the cpufreq_driver interface.
On 4 September 2014 15:33, Preeti U Murthy wrote: > I think Rafael's point was that since no driver that had implemented the > target_index callback was using it at the time that this patch was > proposed, it was be best to couple the check on existence of stop_cpu > callback with the the check on the kind of cpufreq driver. However > powerpc is also in need of this today and we implement the target_index > callback and find it convenient to use the stop CPU callback. No, this is what he said.. " So to me, (1) the new ->stop() should *only* be called for ->setpolicy drivers, because the purpose of it should be to "allow ->setpolicy drivers to do what the GOV_STOP will do for regular drivers" " > Rafael, in which case would it not make sense to remove the check on > driver->setpolicy above? > > Besides, I don't understand very well why we had this double check in > the first place. Only if the drivers are in need of the functionality > like stop_cpu, would they have implemented this callback right? If we > are to assume that the drivers which have implemented the target_index > callback should never be needing it, they would not have implemented the > stop CPU callback either. So what was that, which was blatantly wrong > with just having a check on stop_cpu? I did go through the discussion > but did not find a convincing answer to this. The idea was to get something similar to GOV_STOP for setpolicy drivers. But in the end we didn't get to that. What we do in GOV_STOP is stop changing CPUs frequency, but here in stop_cpu() we can stop changing CPUs frequency OR take it to minimum, whatever we want.. As I said earlier, probably we should just do what you did in your patch + some documentation changes. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] Add exit_prepare callback to the cpufreq_driver interface.
On 09/04/2014 02:46 PM, Viresh Kumar wrote: > On 4 September 2014 14:40, Preeti U Murthy wrote: >> cpufreq: Allow stop CPU callback to be used by all cpufreq drivers >> >> Commit 367dc4aa introduced the stop CPU callback for intel_pstate >> drivers. During the CPU_DOWN_PREPARE stage, this callback is invoked >> so that drivers can take some action on the pstate of the cpu >> before it is taken offline. This callback was assumed to be useful >> only for those drivers which have implemented the set_policy CPU >> callback because they have no other way to take action about the >> cpufreq of a CPU which is being hotplugged out except in the exit >> callback which is called very late in the offline process. >> >> The drivers which implement the target/target_index callbacks were >> expected to take care of requirements like the ones that commit >> 367dc4aa addresses in the GOV_STOP notification event. But there >> are disadvantages to restricting the usage of stop CPU callback >> to cpufreq drivers that implement the set_policy callbacks and who >> want to take explicit action on the setting the cpufreq during a >> hotplug operation. >> >> 1.GOV_STOP gets called for every CPU offline and drivers would usually >> want to take action when the last cpu in the policy->cpus mask >> is taken offline. As long as there is more than one cpu in the >> policy->cpus mask, cpufreq core itself makes sure that the freq >> for the other cpus in this mask is set according to the maximum load. >> This is sensible and drivers which implement the target_index callback >> would mostly not want to modify that. However the cpufreq core leaves a >> loose end when the cpu in the policy->cpus mask is the last one to go >> offline; >> it does nothing explicit to the frequency of the core. Drivers may need >> a way to take some action here and stop CPU callback mechanism is the >> best way to do it today. >> >> 2.We cannot implement driver specific actions in the GOV_STOP mechanism. >> So we will need another driver callback which is invoked from here which is >> unnecessary. >> >> Therefore this patch extends the usage of stop CPU callback to be used >> by all cpufreq drivers as long as they have this callback implemented >> and irrespective of whether they are set_policy/target_index drivers. >> The assumption is if the drivers find the GOV_STOP path to be a suitable >> way of implementing what they want to do with the freq of the cpu >> going offine,they will not implement the stop CPU callback at all. >> >> Signed-off-by: Preeti U Murthy >> >> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c >> index d9fdedd..6463f35 100644 >> --- a/drivers/cpufreq/cpufreq.c >> +++ b/drivers/cpufreq/cpufreq.c >> @@ -1380,7 +1380,7 @@ static int __cpufreq_remove_dev_prepare(struct device >> *dev, >> if (!cpufreq_suspended) >> pr_debug("%s: policy Kobject moved to cpu: %d from: >> %d\n", >> __func__, new_cpu, cpu); >> - } else if (cpufreq_driver->stop_cpu && cpufreq_driver->setpolicy) { >> + } else if (cpufreq_driver->stop_cpu) { >> cpufreq_driver->stop_cpu(policy); >> } > > Rafael explicitly said earlier that he want to see a separate callback for > ->target() drivers, don't know why.. I think Rafael's point was that since no driver that had implemented the target_index callback was using it at the time that this patch was proposed, it was be best to couple the check on existence of stop_cpu callback with the the check on the kind of cpufreq driver. However powerpc is also in need of this today and we implement the target_index callback and find it convenient to use the stop CPU callback. Rafael, in which case would it not make sense to remove the check on driver->setpolicy above? Besides, I don't understand very well why we had this double check in the first place. Only if the drivers are in need of the functionality like stop_cpu, would they have implemented this callback right? If we are to assume that the drivers which have implemented the target_index callback should never be needing it, they would not have implemented the stop CPU callback either. So what was that, which was blatantly wrong with just having a check on stop_cpu? I did go through the discussion but did not find a convincing answer to this. Rafael? Regards Preeti U Murthy > > It looks fine to me though. > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] Add exit_prepare callback to the cpufreq_driver interface.
On 4 September 2014 14:40, Preeti U Murthy wrote: > cpufreq: Allow stop CPU callback to be used by all cpufreq drivers > > Commit 367dc4aa introduced the stop CPU callback for intel_pstate > drivers. During the CPU_DOWN_PREPARE stage, this callback is invoked > so that drivers can take some action on the pstate of the cpu > before it is taken offline. This callback was assumed to be useful > only for those drivers which have implemented the set_policy CPU > callback because they have no other way to take action about the > cpufreq of a CPU which is being hotplugged out except in the exit > callback which is called very late in the offline process. > > The drivers which implement the target/target_index callbacks were > expected to take care of requirements like the ones that commit > 367dc4aa addresses in the GOV_STOP notification event. But there > are disadvantages to restricting the usage of stop CPU callback > to cpufreq drivers that implement the set_policy callbacks and who > want to take explicit action on the setting the cpufreq during a > hotplug operation. > > 1.GOV_STOP gets called for every CPU offline and drivers would usually > want to take action when the last cpu in the policy->cpus mask > is taken offline. As long as there is more than one cpu in the > policy->cpus mask, cpufreq core itself makes sure that the freq > for the other cpus in this mask is set according to the maximum load. > This is sensible and drivers which implement the target_index callback > would mostly not want to modify that. However the cpufreq core leaves a > loose end when the cpu in the policy->cpus mask is the last one to go > offline; > it does nothing explicit to the frequency of the core. Drivers may need > a way to take some action here and stop CPU callback mechanism is the > best way to do it today. > > 2.We cannot implement driver specific actions in the GOV_STOP mechanism. > So we will need another driver callback which is invoked from here which is > unnecessary. > > Therefore this patch extends the usage of stop CPU callback to be used > by all cpufreq drivers as long as they have this callback implemented > and irrespective of whether they are set_policy/target_index drivers. > The assumption is if the drivers find the GOV_STOP path to be a suitable > way of implementing what they want to do with the freq of the cpu > going offine,they will not implement the stop CPU callback at all. > > Signed-off-by: Preeti U Murthy > > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c > index d9fdedd..6463f35 100644 > --- a/drivers/cpufreq/cpufreq.c > +++ b/drivers/cpufreq/cpufreq.c > @@ -1380,7 +1380,7 @@ static int __cpufreq_remove_dev_prepare(struct device > *dev, > if (!cpufreq_suspended) > pr_debug("%s: policy Kobject moved to cpu: %d from: > %d\n", > __func__, new_cpu, cpu); > - } else if (cpufreq_driver->stop_cpu && cpufreq_driver->setpolicy) { > + } else if (cpufreq_driver->stop_cpu) { > cpufreq_driver->stop_cpu(policy); > } Rafael explicitly said earlier that he want to see a separate callback for ->target() drivers, don't know why.. It looks fine to me though. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] Add exit_prepare callback to the cpufreq_driver interface.
Hi, I went through the cpufreq code and the previous discussion in the context of this patch and I propose the below patch. Please let me know if it misses something that you all had discussed. On 09/04/2014 11:38 AM, Viresh Kumar wrote: > On 19 March 2014 19:49, Rafael J. Wysocki wrote: > >> That said, for the intel_pstate case ->stop() as proposed by Dirk is >> demonstrably >> sufficient and there are no other ->setpolicy drivers in sight wanting or >> needing >> anything else. >> >> So to me, (1) the new ->stop() should *only* be called for ->setpolicy >> drivers, >> because the purpose of it should be to "allow ->setpolicy drivers to do what >> the >> GOV_STOP will do for regular drivers" as you put it above, and (2) some code >> in >> the original intel_pstate's ->exit() may/should stay in there (instead of >> being >> moved to the new ->stop()), which is the only possibly remaining issue here. >> >> The whole discussion about possibly re-using ->stop() for ->target drivers >> goes >> in a totally wrong direction, because *if* ->target drivers need a new >> callback >> to be executed around where ->stop() is called for ->setpolicy drivers, >> *then* >> that has to be a *different* callback. >> >> And by the way, ->get() in fact has a different meaning for ->setpolicy >> drivers, >> so it would be good to consider logical separation of ->setpolicy and >> ->target >> drivers so that each kind has its own separate set of callbacks with no >> overlaps. >> That would make it easier to avoid breakage resulting from changes made with >> ->setpolicy drivers that also affect ->target drivers in unpredictable ways >> and >> the other way around. > > Okay, I have picked up a very old thread but it looks more sensible to start > replying here.. > > Preeti (Cc'd) wants to do something similar, i.e. reduce freq of a > core before it > goes down. And the driver is probably: drivers/cpufreq/powernv-cpufreq.c, > which > is ->target() type. > > Now should we reuse the same callback ->stop_cpu() or implement a new one? > I don't know if adding a new callback would be a good idea here.. > > -- > viresh > cpufreq: Allow stop CPU callback to be used by all cpufreq drivers Commit 367dc4aa introduced the stop CPU callback for intel_pstate drivers. During the CPU_DOWN_PREPARE stage, this callback is invoked so that drivers can take some action on the pstate of the cpu before it is taken offline. This callback was assumed to be useful only for those drivers which have implemented the set_policy CPU callback because they have no other way to take action about the cpufreq of a CPU which is being hotplugged out except in the exit callback which is called very late in the offline process. The drivers which implement the target/target_index callbacks were expected to take care of requirements like the ones that commit 367dc4aa addresses in the GOV_STOP notification event. But there are disadvantages to restricting the usage of stop CPU callback to cpufreq drivers that implement the set_policy callbacks and who want to take explicit action on the setting the cpufreq during a hotplug operation. 1.GOV_STOP gets called for every CPU offline and drivers would usually want to take action when the last cpu in the policy->cpus mask is taken offline. As long as there is more than one cpu in the policy->cpus mask, cpufreq core itself makes sure that the freq for the other cpus in this mask is set according to the maximum load. This is sensible and drivers which implement the target_index callback would mostly not want to modify that. However the cpufreq core leaves a loose end when the cpu in the policy->cpus mask is the last one to go offline; it does nothing explicit to the frequency of the core. Drivers may need a way to take some action here and stop CPU callback mechanism is the best way to do it today. 2.We cannot implement driver specific actions in the GOV_STOP mechanism. So we will need another driver callback which is invoked from here which is unnecessary. Therefore this patch extends the usage of stop CPU callback to be used by all cpufreq drivers as long as they have this callback implemented and irrespective of whether they are set_policy/target_index drivers. The assumption is if the drivers find the GOV_STOP path to be a suitable way of implementing what they want to do with the freq of the cpu going offine,they will not implement the stop CPU callback at all. Signed-off-by: Preeti U Murthy diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index d9fdedd..6463f35 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -1380,7 +1380,7 @@ static int __cpufreq_remove_dev_prepare(struct device *dev, if (!cpufreq_suspended) pr_debug("%s: policy Kobject moved to cpu: %d from: %d\n", __func__, new_cpu, cpu); - } else if
Re: [PATCH 0/2] Add exit_prepare callback to the cpufreq_driver interface.
On 19 March 2014 19:49, Rafael J. Wysocki wrote: > That said, for the intel_pstate case ->stop() as proposed by Dirk is > demonstrably > sufficient and there are no other ->setpolicy drivers in sight wanting or > needing > anything else. > > So to me, (1) the new ->stop() should *only* be called for ->setpolicy > drivers, > because the purpose of it should be to "allow ->setpolicy drivers to do what > the > GOV_STOP will do for regular drivers" as you put it above, and (2) some code > in > the original intel_pstate's ->exit() may/should stay in there (instead of > being > moved to the new ->stop()), which is the only possibly remaining issue here. > > The whole discussion about possibly re-using ->stop() for ->target drivers > goes > in a totally wrong direction, because *if* ->target drivers need a new > callback > to be executed around where ->stop() is called for ->setpolicy drivers, *then* > that has to be a *different* callback. > > And by the way, ->get() in fact has a different meaning for ->setpolicy > drivers, > so it would be good to consider logical separation of ->setpolicy and ->target > drivers so that each kind has its own separate set of callbacks with no > overlaps. > That would make it easier to avoid breakage resulting from changes made with > ->setpolicy drivers that also affect ->target drivers in unpredictable ways > and > the other way around. Okay, I have picked up a very old thread but it looks more sensible to start replying here.. Preeti (Cc'd) wants to do something similar, i.e. reduce freq of a core before it goes down. And the driver is probably: drivers/cpufreq/powernv-cpufreq.c, which is ->target() type. Now should we reuse the same callback ->stop_cpu() or implement a new one? I don't know if adding a new callback would be a good idea here.. -- viresh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] Add exit_prepare callback to the cpufreq_driver interface.
On Thursday, September 04, 2014 04:07:28 PM Viresh Kumar wrote: On 4 September 2014 15:33, Preeti U Murthy pre...@linux.vnet.ibm.com wrote: I think Rafael's point was that since no driver that had implemented the target_index callback was using it at the time that this patch was proposed, it was be best to couple the check on existence of stop_cpu callback with the the check on the kind of cpufreq driver. However powerpc is also in need of this today and we implement the target_index callback and find it convenient to use the stop CPU callback. No, this is what he said.. So to me, (1) the new -stop() should *only* be called for -setpolicy drivers, because the purpose of it should be to allow -setpolicy drivers to do what the GOV_STOP will do for regular drivers Rafael, in which case would it not make sense to remove the check on driver-setpolicy above? Besides, I don't understand very well why we had this double check in the first place. Only if the drivers are in need of the functionality like stop_cpu, would they have implemented this callback right? If we are to assume that the drivers which have implemented the target_index callback should never be needing it, they would not have implemented the stop CPU callback either. So what was that, which was blatantly wrong with just having a check on stop_cpu? I did go through the discussion but did not find a convincing answer to this. The idea was to get something similar to GOV_STOP for setpolicy drivers. But in the end we didn't get to that. What we do in GOV_STOP is stop changing CPUs frequency, but here in stop_cpu() we can stop changing CPUs frequency OR take it to minimum, whatever we want.. As I said earlier, probably we should just do what you did in your patch + some documentation changes. OK, if that works for everybody. For one, I wouldn't like to end up with a callback used for different things in every drvier implementing it. -- I speak only for myself. Rafael J. Wysocki, Intel Open Source Technology Center. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] Add exit_prepare callback to the cpufreq_driver interface.
On 19 March 2014 19:49, Rafael J. Wysocki r...@rjwysocki.net wrote: That said, for the intel_pstate case -stop() as proposed by Dirk is demonstrably sufficient and there are no other -setpolicy drivers in sight wanting or needing anything else. So to me, (1) the new -stop() should *only* be called for -setpolicy drivers, because the purpose of it should be to allow -setpolicy drivers to do what the GOV_STOP will do for regular drivers as you put it above, and (2) some code in the original intel_pstate's -exit() may/should stay in there (instead of being moved to the new -stop()), which is the only possibly remaining issue here. The whole discussion about possibly re-using -stop() for -target drivers goes in a totally wrong direction, because *if* -target drivers need a new callback to be executed around where -stop() is called for -setpolicy drivers, *then* that has to be a *different* callback. And by the way, -get() in fact has a different meaning for -setpolicy drivers, so it would be good to consider logical separation of -setpolicy and -target drivers so that each kind has its own separate set of callbacks with no overlaps. That would make it easier to avoid breakage resulting from changes made with -setpolicy drivers that also affect -target drivers in unpredictable ways and the other way around. Okay, I have picked up a very old thread but it looks more sensible to start replying here.. Preeti (Cc'd) wants to do something similar, i.e. reduce freq of a core before it goes down. And the driver is probably: drivers/cpufreq/powernv-cpufreq.c, which is -target() type. Now should we reuse the same callback -stop_cpu() or implement a new one? I don't know if adding a new callback would be a good idea here.. -- viresh -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] Add exit_prepare callback to the cpufreq_driver interface.
Hi, I went through the cpufreq code and the previous discussion in the context of this patch and I propose the below patch. Please let me know if it misses something that you all had discussed. On 09/04/2014 11:38 AM, Viresh Kumar wrote: On 19 March 2014 19:49, Rafael J. Wysocki r...@rjwysocki.net wrote: That said, for the intel_pstate case -stop() as proposed by Dirk is demonstrably sufficient and there are no other -setpolicy drivers in sight wanting or needing anything else. So to me, (1) the new -stop() should *only* be called for -setpolicy drivers, because the purpose of it should be to allow -setpolicy drivers to do what the GOV_STOP will do for regular drivers as you put it above, and (2) some code in the original intel_pstate's -exit() may/should stay in there (instead of being moved to the new -stop()), which is the only possibly remaining issue here. The whole discussion about possibly re-using -stop() for -target drivers goes in a totally wrong direction, because *if* -target drivers need a new callback to be executed around where -stop() is called for -setpolicy drivers, *then* that has to be a *different* callback. And by the way, -get() in fact has a different meaning for -setpolicy drivers, so it would be good to consider logical separation of -setpolicy and -target drivers so that each kind has its own separate set of callbacks with no overlaps. That would make it easier to avoid breakage resulting from changes made with -setpolicy drivers that also affect -target drivers in unpredictable ways and the other way around. Okay, I have picked up a very old thread but it looks more sensible to start replying here.. Preeti (Cc'd) wants to do something similar, i.e. reduce freq of a core before it goes down. And the driver is probably: drivers/cpufreq/powernv-cpufreq.c, which is -target() type. Now should we reuse the same callback -stop_cpu() or implement a new one? I don't know if adding a new callback would be a good idea here.. -- viresh cpufreq: Allow stop CPU callback to be used by all cpufreq drivers Commit 367dc4aa introduced the stop CPU callback for intel_pstate drivers. During the CPU_DOWN_PREPARE stage, this callback is invoked so that drivers can take some action on the pstate of the cpu before it is taken offline. This callback was assumed to be useful only for those drivers which have implemented the set_policy CPU callback because they have no other way to take action about the cpufreq of a CPU which is being hotplugged out except in the exit callback which is called very late in the offline process. The drivers which implement the target/target_index callbacks were expected to take care of requirements like the ones that commit 367dc4aa addresses in the GOV_STOP notification event. But there are disadvantages to restricting the usage of stop CPU callback to cpufreq drivers that implement the set_policy callbacks and who want to take explicit action on the setting the cpufreq during a hotplug operation. 1.GOV_STOP gets called for every CPU offline and drivers would usually want to take action when the last cpu in the policy-cpus mask is taken offline. As long as there is more than one cpu in the policy-cpus mask, cpufreq core itself makes sure that the freq for the other cpus in this mask is set according to the maximum load. This is sensible and drivers which implement the target_index callback would mostly not want to modify that. However the cpufreq core leaves a loose end when the cpu in the policy-cpus mask is the last one to go offline; it does nothing explicit to the frequency of the core. Drivers may need a way to take some action here and stop CPU callback mechanism is the best way to do it today. 2.We cannot implement driver specific actions in the GOV_STOP mechanism. So we will need another driver callback which is invoked from here which is unnecessary. Therefore this patch extends the usage of stop CPU callback to be used by all cpufreq drivers as long as they have this callback implemented and irrespective of whether they are set_policy/target_index drivers. The assumption is if the drivers find the GOV_STOP path to be a suitable way of implementing what they want to do with the freq of the cpu going offine,they will not implement the stop CPU callback at all. Signed-off-by: Preeti U Murthy pre...@linux.vnet.ibm.com diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index d9fdedd..6463f35 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -1380,7 +1380,7 @@ static int __cpufreq_remove_dev_prepare(struct device *dev, if (!cpufreq_suspended) pr_debug(%s: policy Kobject moved to cpu: %d from: %d\n, __func__, new_cpu, cpu); - } else if (cpufreq_driver-stop_cpu cpufreq_driver-setpolicy) { + } else if
Re: [PATCH 0/2] Add exit_prepare callback to the cpufreq_driver interface.
On 4 September 2014 14:40, Preeti U Murthy pre...@linux.vnet.ibm.com wrote: cpufreq: Allow stop CPU callback to be used by all cpufreq drivers Commit 367dc4aa introduced the stop CPU callback for intel_pstate drivers. During the CPU_DOWN_PREPARE stage, this callback is invoked so that drivers can take some action on the pstate of the cpu before it is taken offline. This callback was assumed to be useful only for those drivers which have implemented the set_policy CPU callback because they have no other way to take action about the cpufreq of a CPU which is being hotplugged out except in the exit callback which is called very late in the offline process. The drivers which implement the target/target_index callbacks were expected to take care of requirements like the ones that commit 367dc4aa addresses in the GOV_STOP notification event. But there are disadvantages to restricting the usage of stop CPU callback to cpufreq drivers that implement the set_policy callbacks and who want to take explicit action on the setting the cpufreq during a hotplug operation. 1.GOV_STOP gets called for every CPU offline and drivers would usually want to take action when the last cpu in the policy-cpus mask is taken offline. As long as there is more than one cpu in the policy-cpus mask, cpufreq core itself makes sure that the freq for the other cpus in this mask is set according to the maximum load. This is sensible and drivers which implement the target_index callback would mostly not want to modify that. However the cpufreq core leaves a loose end when the cpu in the policy-cpus mask is the last one to go offline; it does nothing explicit to the frequency of the core. Drivers may need a way to take some action here and stop CPU callback mechanism is the best way to do it today. 2.We cannot implement driver specific actions in the GOV_STOP mechanism. So we will need another driver callback which is invoked from here which is unnecessary. Therefore this patch extends the usage of stop CPU callback to be used by all cpufreq drivers as long as they have this callback implemented and irrespective of whether they are set_policy/target_index drivers. The assumption is if the drivers find the GOV_STOP path to be a suitable way of implementing what they want to do with the freq of the cpu going offine,they will not implement the stop CPU callback at all. Signed-off-by: Preeti U Murthy pre...@linux.vnet.ibm.com diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index d9fdedd..6463f35 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -1380,7 +1380,7 @@ static int __cpufreq_remove_dev_prepare(struct device *dev, if (!cpufreq_suspended) pr_debug(%s: policy Kobject moved to cpu: %d from: %d\n, __func__, new_cpu, cpu); - } else if (cpufreq_driver-stop_cpu cpufreq_driver-setpolicy) { + } else if (cpufreq_driver-stop_cpu) { cpufreq_driver-stop_cpu(policy); } Rafael explicitly said earlier that he want to see a separate callback for -target() drivers, don't know why.. It looks fine to me though. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] Add exit_prepare callback to the cpufreq_driver interface.
On 09/04/2014 02:46 PM, Viresh Kumar wrote: On 4 September 2014 14:40, Preeti U Murthy pre...@linux.vnet.ibm.com wrote: cpufreq: Allow stop CPU callback to be used by all cpufreq drivers Commit 367dc4aa introduced the stop CPU callback for intel_pstate drivers. During the CPU_DOWN_PREPARE stage, this callback is invoked so that drivers can take some action on the pstate of the cpu before it is taken offline. This callback was assumed to be useful only for those drivers which have implemented the set_policy CPU callback because they have no other way to take action about the cpufreq of a CPU which is being hotplugged out except in the exit callback which is called very late in the offline process. The drivers which implement the target/target_index callbacks were expected to take care of requirements like the ones that commit 367dc4aa addresses in the GOV_STOP notification event. But there are disadvantages to restricting the usage of stop CPU callback to cpufreq drivers that implement the set_policy callbacks and who want to take explicit action on the setting the cpufreq during a hotplug operation. 1.GOV_STOP gets called for every CPU offline and drivers would usually want to take action when the last cpu in the policy-cpus mask is taken offline. As long as there is more than one cpu in the policy-cpus mask, cpufreq core itself makes sure that the freq for the other cpus in this mask is set according to the maximum load. This is sensible and drivers which implement the target_index callback would mostly not want to modify that. However the cpufreq core leaves a loose end when the cpu in the policy-cpus mask is the last one to go offline; it does nothing explicit to the frequency of the core. Drivers may need a way to take some action here and stop CPU callback mechanism is the best way to do it today. 2.We cannot implement driver specific actions in the GOV_STOP mechanism. So we will need another driver callback which is invoked from here which is unnecessary. Therefore this patch extends the usage of stop CPU callback to be used by all cpufreq drivers as long as they have this callback implemented and irrespective of whether they are set_policy/target_index drivers. The assumption is if the drivers find the GOV_STOP path to be a suitable way of implementing what they want to do with the freq of the cpu going offine,they will not implement the stop CPU callback at all. Signed-off-by: Preeti U Murthy pre...@linux.vnet.ibm.com diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index d9fdedd..6463f35 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -1380,7 +1380,7 @@ static int __cpufreq_remove_dev_prepare(struct device *dev, if (!cpufreq_suspended) pr_debug(%s: policy Kobject moved to cpu: %d from: %d\n, __func__, new_cpu, cpu); - } else if (cpufreq_driver-stop_cpu cpufreq_driver-setpolicy) { + } else if (cpufreq_driver-stop_cpu) { cpufreq_driver-stop_cpu(policy); } Rafael explicitly said earlier that he want to see a separate callback for -target() drivers, don't know why.. I think Rafael's point was that since no driver that had implemented the target_index callback was using it at the time that this patch was proposed, it was be best to couple the check on existence of stop_cpu callback with the the check on the kind of cpufreq driver. However powerpc is also in need of this today and we implement the target_index callback and find it convenient to use the stop CPU callback. Rafael, in which case would it not make sense to remove the check on driver-setpolicy above? Besides, I don't understand very well why we had this double check in the first place. Only if the drivers are in need of the functionality like stop_cpu, would they have implemented this callback right? If we are to assume that the drivers which have implemented the target_index callback should never be needing it, they would not have implemented the stop CPU callback either. So what was that, which was blatantly wrong with just having a check on stop_cpu? I did go through the discussion but did not find a convincing answer to this. Rafael? Regards Preeti U Murthy It looks fine to me though. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] Add exit_prepare callback to the cpufreq_driver interface.
On 4 September 2014 15:33, Preeti U Murthy pre...@linux.vnet.ibm.com wrote: I think Rafael's point was that since no driver that had implemented the target_index callback was using it at the time that this patch was proposed, it was be best to couple the check on existence of stop_cpu callback with the the check on the kind of cpufreq driver. However powerpc is also in need of this today and we implement the target_index callback and find it convenient to use the stop CPU callback. No, this is what he said.. So to me, (1) the new -stop() should *only* be called for -setpolicy drivers, because the purpose of it should be to allow -setpolicy drivers to do what the GOV_STOP will do for regular drivers Rafael, in which case would it not make sense to remove the check on driver-setpolicy above? Besides, I don't understand very well why we had this double check in the first place. Only if the drivers are in need of the functionality like stop_cpu, would they have implemented this callback right? If we are to assume that the drivers which have implemented the target_index callback should never be needing it, they would not have implemented the stop CPU callback either. So what was that, which was blatantly wrong with just having a check on stop_cpu? I did go through the discussion but did not find a convincing answer to this. The idea was to get something similar to GOV_STOP for setpolicy drivers. But in the end we didn't get to that. What we do in GOV_STOP is stop changing CPUs frequency, but here in stop_cpu() we can stop changing CPUs frequency OR take it to minimum, whatever we want.. As I said earlier, probably we should just do what you did in your patch + some documentation changes. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] Add exit_prepare callback to the cpufreq_driver interface.
On Wednesday, March 19, 2014 03:30:48 PM Srivatsa S. Bhat wrote: > On 03/19/2014 10:33 AM, Viresh Kumar wrote: > > On 18 March 2014 17:46, Srivatsa S. Bhat > > wrote: > >> Agreed. As far as I understand, for ->target drivers, today we use GOV_STOP > >> to stop managing the CPU going offline. And for ->setpolicy drivers, we > >> will > >> use this new callback to achieve the same goal. > > > > So a better question would be: What's the purpose of ->stop() call for a > > policy? > > Ideally, it should remove the outgoing CPU from the policy and "stop managing > that CPU", whatever that means to the driver (for intel_pstate, it means > setting it to min P state and destroying the timer). > > > Stop managing CPUs of that policy? > > Stop managing only the particular CPU going offline. IOW, we should somehow > communicate to the ->stop() callback that we are taking CPU 'x' offline. > > If adding a ->stop() callback in the cpufreq_driver is not the best way to > achieve it, then lets think of an alternative. The way I look at it, this > new mechanism what we want, should allow ->setpolicy drivers to do what the > GOV_STOP will do for regular drivers. That is, allow it to "shutdown the > CPU from a cpufreq perspective", whatever that means to the driver. > We can think of a completely different way of achieving it, if ->stop() > is not suitable for that purpose. I agree. That said, for the intel_pstate case ->stop() as proposed by Dirk is demonstrably sufficient and there are no other ->setpolicy drivers in sight wanting or needing anything else. So to me, (1) the new ->stop() should *only* be called for ->setpolicy drivers, because the purpose of it should be to "allow ->setpolicy drivers to do what the GOV_STOP will do for regular drivers" as you put it above, and (2) some code in the original intel_pstate's ->exit() may/should stay in there (instead of being moved to the new ->stop()), which is the only possibly remaining issue here. The whole discussion about possibly re-using ->stop() for ->target drivers goes in a totally wrong direction, because *if* ->target drivers need a new callback to be executed around where ->stop() is called for ->setpolicy drivers, *then* that has to be a *different* callback. And by the way, ->get() in fact has a different meaning for ->setpolicy drivers, so it would be good to consider logical separation of ->setpolicy and ->target drivers so that each kind has its own separate set of callbacks with no overlaps. That would make it easier to avoid breakage resulting from changes made with ->setpolicy drivers that also affect ->target drivers in unpredictable ways and the other way around. -- I speak only for myself. Rafael J. Wysocki, Intel Open Source Technology Center. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] Add exit_prepare callback to the cpufreq_driver interface.
On 03/19/2014 10:33 AM, Viresh Kumar wrote: > On 18 March 2014 17:46, Srivatsa S. Bhat > wrote: >> Agreed. As far as I understand, for ->target drivers, today we use GOV_STOP >> to stop managing the CPU going offline. And for ->setpolicy drivers, we will >> use this new callback to achieve the same goal. > > So a better question would be: What's the purpose of ->stop() call for a > policy? Ideally, it should remove the outgoing CPU from the policy and "stop managing that CPU", whatever that means to the driver (for intel_pstate, it means setting it to min P state and destroying the timer). > Stop managing CPUs of that policy? Stop managing only the particular CPU going offline. IOW, we should somehow communicate to the ->stop() callback that we are taking CPU 'x' offline. If adding a ->stop() callback in the cpufreq_driver is not the best way to achieve it, then lets think of an alternative. The way I look at it, this new mechanism what we want, should allow ->setpolicy drivers to do what the GOV_STOP will do for regular drivers. That is, allow it to "shutdown the CPU from a cpufreq perspective", whatever that means to the driver. We can think of a completely different way of achieving it, if ->stop() is not suitable for that purpose. > Or even do something on CPUs of a policy > before CPUs are offlined? > > Probably in the current solution Dirk is doing both these things.. > > And so I thought maybe its better not to restrict ->stop() to just > setpolicy() drivers. Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] Add exit_prepare callback to the cpufreq_driver interface.
On 03/19/2014 10:33 AM, Viresh Kumar wrote: On 18 March 2014 17:46, Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com wrote: Agreed. As far as I understand, for -target drivers, today we use GOV_STOP to stop managing the CPU going offline. And for -setpolicy drivers, we will use this new callback to achieve the same goal. So a better question would be: What's the purpose of -stop() call for a policy? Ideally, it should remove the outgoing CPU from the policy and stop managing that CPU, whatever that means to the driver (for intel_pstate, it means setting it to min P state and destroying the timer). Stop managing CPUs of that policy? Stop managing only the particular CPU going offline. IOW, we should somehow communicate to the -stop() callback that we are taking CPU 'x' offline. If adding a -stop() callback in the cpufreq_driver is not the best way to achieve it, then lets think of an alternative. The way I look at it, this new mechanism what we want, should allow -setpolicy drivers to do what the GOV_STOP will do for regular drivers. That is, allow it to shutdown the CPU from a cpufreq perspective, whatever that means to the driver. We can think of a completely different way of achieving it, if -stop() is not suitable for that purpose. Or even do something on CPUs of a policy before CPUs are offlined? Probably in the current solution Dirk is doing both these things.. And so I thought maybe its better not to restrict -stop() to just setpolicy() drivers. Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] Add exit_prepare callback to the cpufreq_driver interface.
On Wednesday, March 19, 2014 03:30:48 PM Srivatsa S. Bhat wrote: On 03/19/2014 10:33 AM, Viresh Kumar wrote: On 18 March 2014 17:46, Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com wrote: Agreed. As far as I understand, for -target drivers, today we use GOV_STOP to stop managing the CPU going offline. And for -setpolicy drivers, we will use this new callback to achieve the same goal. So a better question would be: What's the purpose of -stop() call for a policy? Ideally, it should remove the outgoing CPU from the policy and stop managing that CPU, whatever that means to the driver (for intel_pstate, it means setting it to min P state and destroying the timer). Stop managing CPUs of that policy? Stop managing only the particular CPU going offline. IOW, we should somehow communicate to the -stop() callback that we are taking CPU 'x' offline. If adding a -stop() callback in the cpufreq_driver is not the best way to achieve it, then lets think of an alternative. The way I look at it, this new mechanism what we want, should allow -setpolicy drivers to do what the GOV_STOP will do for regular drivers. That is, allow it to shutdown the CPU from a cpufreq perspective, whatever that means to the driver. We can think of a completely different way of achieving it, if -stop() is not suitable for that purpose. I agree. That said, for the intel_pstate case -stop() as proposed by Dirk is demonstrably sufficient and there are no other -setpolicy drivers in sight wanting or needing anything else. So to me, (1) the new -stop() should *only* be called for -setpolicy drivers, because the purpose of it should be to allow -setpolicy drivers to do what the GOV_STOP will do for regular drivers as you put it above, and (2) some code in the original intel_pstate's -exit() may/should stay in there (instead of being moved to the new -stop()), which is the only possibly remaining issue here. The whole discussion about possibly re-using -stop() for -target drivers goes in a totally wrong direction, because *if* -target drivers need a new callback to be executed around where -stop() is called for -setpolicy drivers, *then* that has to be a *different* callback. And by the way, -get() in fact has a different meaning for -setpolicy drivers, so it would be good to consider logical separation of -setpolicy and -target drivers so that each kind has its own separate set of callbacks with no overlaps. That would make it easier to avoid breakage resulting from changes made with -setpolicy drivers that also affect -target drivers in unpredictable ways and the other way around. -- I speak only for myself. Rafael J. Wysocki, Intel Open Source Technology Center. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] Add exit_prepare callback to the cpufreq_driver interface.
On 18 March 2014 17:46, Srivatsa S. Bhat wrote: > Agreed. As far as I understand, for ->target drivers, today we use GOV_STOP > to stop managing the CPU going offline. And for ->setpolicy drivers, we will > use this new callback to achieve the same goal. So a better question would be: What's the purpose of ->stop() call for a policy? Stop managing CPUs of that policy? Or even do something on CPUs of a policy before CPUs are offlined? Probably in the current solution Dirk is doing both these things.. And so I thought maybe its better not to restrict ->stop() to just setpolicy() drivers. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] Add exit_prepare callback to the cpufreq_driver interface.
On 03/15/2014 07:29 AM, Rafael J. Wysocki wrote: > On Friday, March 14, 2014 11:29:04 AM Dirk Brandewie wrote: >> On 03/14/2014 10:07 AM, Viresh Kumar wrote: >>> On 14 March 2014 20:40, Dirk Brandewie wrote: Are you proposing adding cpufreq_generic_suspend() to the core I can not find it in the mainline code. >>> >>> Its already there in linux-next. I am suggesting to reuse that >>> infrastructure with >>> some necessary modification to support both suspend and hotplug. >> >> Suspend and hotplug are two very different things and if we start >> crossing those wires bad things are going to happen IMHO. >> >> In "normal" operation using the suspend path to do this work could >> work in principal but doesn't handle the case where the user does >> echo 0 | sudo tee /sys/devices/system/cpu/cpuX/online >> >> Trying force hotplug and suspend into a common mechanism would >> lead to a bunch of special case code or a significant rework of the >> core code IMHO. >> >> >>> > Over that I don't think Dirk's solution is going to work if we twist > the systems a bit. Could you explain what "twist the systems a bit" means? >>> >>> The one I explained in the below paragraph. >>> > For example, Dirk probably wanted to set P-STATE of every core to MIN > when it goes down. But his solution probably doesn't do that right now. > No, I wanted to set the core that was being off-lined to min P state. >>> >>> Sorry, probably my words were a bit confusing. I meant exactly what >>> you just wrote. Core going down will set its freq to min. >>> > As exit() is called only when the policy expires or all the CPUs of that > policy > are down. Suppose only one CPU out of 4 goes down from a policy, then > pstate driver would never know that happened. And that core wouldn't go > to min state. My patch does not change the semantics of exit() or when it is called. For intel_pstate their is one cpu per policy so I moved all the cleanup to >>> >>> I didn't knew that its guaranteed by pstate driver. I thought it would >>> still be >>> hardware dependent as some cores might share clock line. >> >> This is guaranteed by the hardware. Each core has its own MSR for P state >> request. Any coordination that is required between cores to select the >> package P state is handled by the hardware. >> >>> exit_prepare() if there were work I needed to do at CPU_POST_DEAD I would have continued to export the *optional* exit callback. The callback name exit_prepare() was probably unfortunate and might be causing some confusion. I will be changing the name per Rafael's feedback. >>> >>> Don't know.. There is another problem here that exit_prepare() would be >>> called >>> for each CPU whereas exit() would be called for each policy. >> >> Granted but I don't see this as a problem in this case there is a 1:1 >> relationship. If a driver chooses to use the *optional* exit_prepare() >> callback >> and knows that there is a many:1 relationship between the policy and CPUs >> then it would have to deal with it. > > Actually, I think we should make it clear that the new callback is for > ->setpolicy drivers only, which will make things a bit clearer. > Agreed. As far as I understand, for ->target drivers, today we use GOV_STOP to stop managing the CPU going offline. And for ->setpolicy drivers, we will use this new callback to achieve the same goal. Regards, Srivatsa S. Bhat > We seem to get caught by the difference between ->setpolicy and ->target > drivers on a regular basis, so it might be a good idea to make the distinction > more clear in the code. I have an idea how to do that, but need some time > to prototype it. > >>> And I strongly feel that we shouldn't give another callback here but instead >>> just set core to a specific freq as mentioned by driver with some other >>> field. >>> > I think we have two solutions here: > - If its just about setting core a particular freq when it goes down, I > think it > looks a generic enough problem and so better fix core for that. Probably > with > help of flags field/suspend-freq (maybe renamed) and without calling > drivers > exit() at all.. ATM the only thing that needs to be done in this case is to allow intel_pstate to set the P state on the core when it is going done. My solution from the cores point of view is more generic, it allows any driver that needs to do work during CPU_DOWN_PREPARE to do it without adding any new logic to the core. >>> >>> Yeah, do we really need to give that freedom right now? Would be better >>> to get this into core as that would be more generic and people looking to >>> set >>> core to some freq at shutdown wouldn't be replicating that code. > > Question is if it needs to be more generic. > > I honestly don't think that ->target drivers will ever do anything like it, >
Re: [PATCH 0/2] Add exit_prepare callback to the cpufreq_driver interface.
On 03/15/2014 07:29 AM, Rafael J. Wysocki wrote: On Friday, March 14, 2014 11:29:04 AM Dirk Brandewie wrote: On 03/14/2014 10:07 AM, Viresh Kumar wrote: On 14 March 2014 20:40, Dirk Brandewie dirk.brande...@gmail.com wrote: Are you proposing adding cpufreq_generic_suspend() to the core I can not find it in the mainline code. Its already there in linux-next. I am suggesting to reuse that infrastructure with some necessary modification to support both suspend and hotplug. Suspend and hotplug are two very different things and if we start crossing those wires bad things are going to happen IMHO. In normal operation using the suspend path to do this work could work in principal but doesn't handle the case where the user does echo 0 | sudo tee /sys/devices/system/cpu/cpuX/online Trying force hotplug and suspend into a common mechanism would lead to a bunch of special case code or a significant rework of the core code IMHO. Over that I don't think Dirk's solution is going to work if we twist the systems a bit. Could you explain what twist the systems a bit means? The one I explained in the below paragraph. For example, Dirk probably wanted to set P-STATE of every core to MIN when it goes down. But his solution probably doesn't do that right now. No, I wanted to set the core that was being off-lined to min P state. Sorry, probably my words were a bit confusing. I meant exactly what you just wrote. Core going down will set its freq to min. As exit() is called only when the policy expires or all the CPUs of that policy are down. Suppose only one CPU out of 4 goes down from a policy, then pstate driver would never know that happened. And that core wouldn't go to min state. My patch does not change the semantics of exit() or when it is called. For intel_pstate their is one cpu per policy so I moved all the cleanup to I didn't knew that its guaranteed by pstate driver. I thought it would still be hardware dependent as some cores might share clock line. This is guaranteed by the hardware. Each core has its own MSR for P state request. Any coordination that is required between cores to select the package P state is handled by the hardware. exit_prepare() if there were work I needed to do at CPU_POST_DEAD I would have continued to export the *optional* exit callback. The callback name exit_prepare() was probably unfortunate and might be causing some confusion. I will be changing the name per Rafael's feedback. Don't know.. There is another problem here that exit_prepare() would be called for each CPU whereas exit() would be called for each policy. Granted but I don't see this as a problem in this case there is a 1:1 relationship. If a driver chooses to use the *optional* exit_prepare() callback and knows that there is a many:1 relationship between the policy and CPUs then it would have to deal with it. Actually, I think we should make it clear that the new callback is for -setpolicy drivers only, which will make things a bit clearer. Agreed. As far as I understand, for -target drivers, today we use GOV_STOP to stop managing the CPU going offline. And for -setpolicy drivers, we will use this new callback to achieve the same goal. Regards, Srivatsa S. Bhat We seem to get caught by the difference between -setpolicy and -target drivers on a regular basis, so it might be a good idea to make the distinction more clear in the code. I have an idea how to do that, but need some time to prototype it. And I strongly feel that we shouldn't give another callback here but instead just set core to a specific freq as mentioned by driver with some other field. I think we have two solutions here: - If its just about setting core a particular freq when it goes down, I think it looks a generic enough problem and so better fix core for that. Probably with help of flags field/suspend-freq (maybe renamed) and without calling drivers exit() at all.. ATM the only thing that needs to be done in this case is to allow intel_pstate to set the P state on the core when it is going done. My solution from the cores point of view is more generic, it allows any driver that needs to do work during CPU_DOWN_PREPARE to do it without adding any new logic to the core. Yeah, do we really need to give that freedom right now? Would be better to get this into core as that would be more generic and people looking to set core to some freq at shutdown wouldn't be replicating that code. Question is if it needs to be more generic. I honestly don't think that -target drivers will ever do anything like it, because they need the governor to exit before. So we are talking about the only two -setpolicy drivers in the tree here. IMHO yes and it would be hard to be more generic, if your platform needs to do architecture specific during the PREPARE phase of cpu hotplug use this callback or not. BTW now that you have added a path
Re: [PATCH 0/2] Add exit_prepare callback to the cpufreq_driver interface.
On 18 March 2014 17:46, Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com wrote: Agreed. As far as I understand, for -target drivers, today we use GOV_STOP to stop managing the CPU going offline. And for -setpolicy drivers, we will use this new callback to achieve the same goal. So a better question would be: What's the purpose of -stop() call for a policy? Stop managing CPUs of that policy? Or even do something on CPUs of a policy before CPUs are offlined? Probably in the current solution Dirk is doing both these things.. And so I thought maybe its better not to restrict -stop() to just setpolicy() drivers. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] Add exit_prepare callback to the cpufreq_driver interface.
On Sat, Mar 15, 2014 at 7:29 AM, Rafael J. Wysocki wrote: > I honestly don't think that ->target drivers will ever do anything like it, > because they need the governor to "exit" before. So we are talking about the > only two ->setpolicy drivers in the tree here. I don't have a example platform which would need it but as I can see there are platforms which want to do this when CPUs goes down from disable_nonboot_cpus(), I would be in favor to keep this option available for them as well.. They might want to do this when last CPU of a policy goes down, they might be able to save some power over there. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] Add exit_prepare callback to the cpufreq_driver interface.
Hi, It was a long weekend in India due to some holidays and so couldn't reply. On Fri, Mar 14, 2014 at 11:59 PM, Dirk Brandewie wrote: > On 03/14/2014 10:07 AM, Viresh Kumar wrote: > Suspend and hotplug are two very different things and if we start > crossing those wires bad things are going to happen IMHO. > > In "normal" operation using the suspend path to do this work could > work in principal but doesn't handle the case where the user does >echo 0 | sudo tee /sys/devices/system/cpu/cpuX/online > > Trying force hotplug and suspend into a common mechanism would > lead to a bunch of special case code or a significant rework of the > core code IMHO. What you said is correct, we shouldn't do it. But what I am asking for is a bit different. The stuff we are doing in core on system suspend isn't actually related to suspend but only CPU online/offline. There are platforms which want to set CPUs to a particular frequency before they are taken out by disable_nonboot_cpus. And then there are platforms which want to do similar thing when CPUs are taken down with help of sysfs files. But there is a common baseline there: Set CPUs to a particular P-state before they are taken down. And so I wanted to keep a common solution for both these requirements. > This is guaranteed by the hardware. Each core has its own MSR for P state > request. Any coordination that is required between cores to select the > package P state is handled by the hardware. I see.. Let me send some patches which I have in my mind and then we can decide which set looks more reasonable :) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] Add exit_prepare callback to the cpufreq_driver interface.
Hi, It was a long weekend in India due to some holidays and so couldn't reply. On Fri, Mar 14, 2014 at 11:59 PM, Dirk Brandewie dirk.brande...@gmail.com wrote: On 03/14/2014 10:07 AM, Viresh Kumar wrote: Suspend and hotplug are two very different things and if we start crossing those wires bad things are going to happen IMHO. In normal operation using the suspend path to do this work could work in principal but doesn't handle the case where the user does echo 0 | sudo tee /sys/devices/system/cpu/cpuX/online Trying force hotplug and suspend into a common mechanism would lead to a bunch of special case code or a significant rework of the core code IMHO. What you said is correct, we shouldn't do it. But what I am asking for is a bit different. The stuff we are doing in core on system suspend isn't actually related to suspend but only CPU online/offline. There are platforms which want to set CPUs to a particular frequency before they are taken out by disable_nonboot_cpus. And then there are platforms which want to do similar thing when CPUs are taken down with help of sysfs files. But there is a common baseline there: Set CPUs to a particular P-state before they are taken down. And so I wanted to keep a common solution for both these requirements. This is guaranteed by the hardware. Each core has its own MSR for P state request. Any coordination that is required between cores to select the package P state is handled by the hardware. I see.. Let me send some patches which I have in my mind and then we can decide which set looks more reasonable :) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] Add exit_prepare callback to the cpufreq_driver interface.
On Sat, Mar 15, 2014 at 7:29 AM, Rafael J. Wysocki r...@rjwysocki.net wrote: I honestly don't think that -target drivers will ever do anything like it, because they need the governor to exit before. So we are talking about the only two -setpolicy drivers in the tree here. I don't have a example platform which would need it but as I can see there are platforms which want to do this when CPUs goes down from disable_nonboot_cpus(), I would be in favor to keep this option available for them as well.. They might want to do this when last CPU of a policy goes down, they might be able to save some power over there. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] Add exit_prepare callback to the cpufreq_driver interface.
On Friday, March 14, 2014 11:29:04 AM Dirk Brandewie wrote: > On 03/14/2014 10:07 AM, Viresh Kumar wrote: > > On 14 March 2014 20:40, Dirk Brandewie wrote: > >> Are you proposing adding cpufreq_generic_suspend() to the core I can not > >> find > >> it in the mainline code. > > > > Its already there in linux-next. I am suggesting to reuse that > > infrastructure with > > some necessary modification to support both suspend and hotplug. > > Suspend and hotplug are two very different things and if we start > crossing those wires bad things are going to happen IMHO. > > In "normal" operation using the suspend path to do this work could > work in principal but doesn't handle the case where the user does > echo 0 | sudo tee /sys/devices/system/cpu/cpuX/online > > Trying force hotplug and suspend into a common mechanism would > lead to a bunch of special case code or a significant rework of the > core code IMHO. > > > > > >>> Over that I don't think Dirk's solution is going to work if we twist > >>> the systems a bit. > >> > >> Could you explain what "twist the systems a bit" means? > > > > The one I explained in the below paragraph. > > > >>> For example, Dirk probably wanted to set P-STATE of every core to MIN > >>> when it goes down. But his solution probably doesn't do that right now. > >>> > >> > >> No, I wanted to set the core that was being off-lined to min P state. > > > > Sorry, probably my words were a bit confusing. I meant exactly what > > you just wrote. Core going down will set its freq to min. > > > >>> As exit() is called only when the policy expires or all the CPUs of that > >>> policy > >>> are down. Suppose only one CPU out of 4 goes down from a policy, then > >>> pstate driver would never know that happened. And that core wouldn't go > >>> to min state. > >> > >> My patch does not change the semantics of exit() or when it is called. For > >> intel_pstate their is one cpu per policy so I moved all the cleanup to > > > > I didn't knew that its guaranteed by pstate driver. I thought it would > > still be > > hardware dependent as some cores might share clock line. > > This is guaranteed by the hardware. Each core has its own MSR for P state > request. Any coordination that is required between cores to select the > package P state is handled by the hardware. > > > > >> exit_prepare() if there were work I needed to do at CPU_POST_DEAD I would > >> have > >> continued to export the *optional* exit callback. > >> > >> The callback name exit_prepare() was probably unfortunate and might be > >> causing > >> some confusion. I will be changing the name per Rafael's feedback. > > > > Don't know.. There is another problem here that exit_prepare() would be > > called > > for each CPU whereas exit() would be called for each policy. > > Granted but I don't see this as a problem in this case there is a 1:1 > relationship. If a driver chooses to use the *optional* exit_prepare() > callback > and knows that there is a many:1 relationship between the policy and CPUs > then it would have to deal with it. Actually, I think we should make it clear that the new callback is for ->setpolicy drivers only, which will make things a bit clearer. We seem to get caught by the difference between ->setpolicy and ->target drivers on a regular basis, so it might be a good idea to make the distinction more clear in the code. I have an idea how to do that, but need some time to prototype it. > > And I strongly feel that we shouldn't give another callback here but instead > > just set core to a specific freq as mentioned by driver with some other > > field. > > > >>> I think we have two solutions here: > >>> - If its just about setting core a particular freq when it goes down, I > >>> think it > >>> looks a generic enough problem and so better fix core for that. Probably > >>> with > >>> help of flags field/suspend-freq (maybe renamed) and without calling > >>> drivers > >>> exit() at all.. > >> > >> > >> ATM the only thing that needs to be done in this case is to allow > >> intel_pstate > >> to set the P state on the core when it is going done. My solution from the > >> cores point of view is more generic, it allows any driver that needs to do > >> work > >> during CPU_DOWN_PREPARE to do it without adding any new logic to the core. > > > > Yeah, do we really need to give that freedom right now? Would be better > > to get this into core as that would be more generic and people looking to > > set > > core to some freq at shutdown wouldn't be replicating that code. Question is if it needs to be more generic. I honestly don't think that ->target drivers will ever do anything like it, because they need the governor to "exit" before. So we are talking about the only two ->setpolicy drivers in the tree here. > IMHO yes and it would be hard to be more generic, if your platform needs to > do architecture specific during the PREPARE phase of cpu hotplug use this > callback or not. > > BTW now that you have
Re: [PATCH 0/2] Add exit_prepare callback to the cpufreq_driver interface.
On 03/14/2014 10:07 AM, Viresh Kumar wrote: On 14 March 2014 20:40, Dirk Brandewie wrote: Are you proposing adding cpufreq_generic_suspend() to the core I can not find it in the mainline code. Its already there in linux-next. I am suggesting to reuse that infrastructure with some necessary modification to support both suspend and hotplug. Suspend and hotplug are two very different things and if we start crossing those wires bad things are going to happen IMHO. In "normal" operation using the suspend path to do this work could work in principal but doesn't handle the case where the user does echo 0 | sudo tee /sys/devices/system/cpu/cpuX/online Trying force hotplug and suspend into a common mechanism would lead to a bunch of special case code or a significant rework of the core code IMHO. Over that I don't think Dirk's solution is going to work if we twist the systems a bit. Could you explain what "twist the systems a bit" means? The one I explained in the below paragraph. For example, Dirk probably wanted to set P-STATE of every core to MIN when it goes down. But his solution probably doesn't do that right now. No, I wanted to set the core that was being off-lined to min P state. Sorry, probably my words were a bit confusing. I meant exactly what you just wrote. Core going down will set its freq to min. As exit() is called only when the policy expires or all the CPUs of that policy are down. Suppose only one CPU out of 4 goes down from a policy, then pstate driver would never know that happened. And that core wouldn't go to min state. My patch does not change the semantics of exit() or when it is called. For intel_pstate their is one cpu per policy so I moved all the cleanup to I didn't knew that its guaranteed by pstate driver. I thought it would still be hardware dependent as some cores might share clock line. This is guaranteed by the hardware. Each core has its own MSR for P state request. Any coordination that is required between cores to select the package P state is handled by the hardware. exit_prepare() if there were work I needed to do at CPU_POST_DEAD I would have continued to export the *optional* exit callback. The callback name exit_prepare() was probably unfortunate and might be causing some confusion. I will be changing the name per Rafael's feedback. Don't know.. There is another problem here that exit_prepare() would be called for each CPU whereas exit() would be called for each policy. Granted but I don't see this as a problem in this case there is a 1:1 relationship. If a driver chooses to use the *optional* exit_prepare() callback and knows that there is a many:1 relationship between the policy and CPUs then it would have to deal with it. And I strongly feel that we shouldn't give another callback here but instead just set core to a specific freq as mentioned by driver with some other field. I think we have two solutions here: - If its just about setting core a particular freq when it goes down, I think it looks a generic enough problem and so better fix core for that. Probably with help of flags field/suspend-freq (maybe renamed) and without calling drivers exit() at all.. ATM the only thing that needs to be done in this case is to allow intel_pstate to set the P state on the core when it is going done. My solution from the cores point of view is more generic, it allows any driver that needs to do work during CPU_DOWN_PREPARE to do it without adding any new logic to the core. Yeah, do we really need to give that freedom right now? Would be better to get this into core as that would be more generic and people looking to set core to some freq at shutdown wouldn't be replicating that code. IMHO yes and it would be hard to be more generic, if your platform needs to do architecture specific during the PREPARE phase of cpu hotplug use this callback or not. BTW now that you have added a path where the cpufreq_suspend() could fail it return a value and be checked in dpm_suspend() instead of printing an error and just continuing. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] Add exit_prepare callback to the cpufreq_driver interface.
On 14 March 2014 20:40, Dirk Brandewie wrote: > Are you proposing adding cpufreq_generic_suspend() to the core I can not > find > it in the mainline code. Its already there in linux-next. I am suggesting to reuse that infrastructure with some necessary modification to support both suspend and hotplug. >> Over that I don't think Dirk's solution is going to work if we twist >> the systems a bit. > > Could you explain what "twist the systems a bit" means? The one I explained in the below paragraph. >> For example, Dirk probably wanted to set P-STATE of every core to MIN >> when it goes down. But his solution probably doesn't do that right now. >> > > No, I wanted to set the core that was being off-lined to min P state. Sorry, probably my words were a bit confusing. I meant exactly what you just wrote. Core going down will set its freq to min. >> As exit() is called only when the policy expires or all the CPUs of that >> policy >> are down. Suppose only one CPU out of 4 goes down from a policy, then >> pstate driver would never know that happened. And that core wouldn't go >> to min state. > > My patch does not change the semantics of exit() or when it is called. For > intel_pstate their is one cpu per policy so I moved all the cleanup to I didn't knew that its guaranteed by pstate driver. I thought it would still be hardware dependent as some cores might share clock line. > exit_prepare() if there were work I needed to do at CPU_POST_DEAD I would > have > continued to export the *optional* exit callback. > > The callback name exit_prepare() was probably unfortunate and might be > causing > some confusion. I will be changing the name per Rafael's feedback. Don't know.. There is another problem here that exit_prepare() would be called for each CPU whereas exit() would be called for each policy. And I strongly feel that we shouldn't give another callback here but instead just set core to a specific freq as mentioned by driver with some other field. >> I think we have two solutions here: >> - If its just about setting core a particular freq when it goes down, I >> think it >> looks a generic enough problem and so better fix core for that. Probably >> with >> help of flags field/suspend-freq (maybe renamed) and without calling >> drivers >> exit() at all.. > > > ATM the only thing that needs to be done in this case is to allow > intel_pstate > to set the P state on the core when it is going done. My solution from the > cores point of view is more generic, it allows any driver that needs to do > work > during CPU_DOWN_PREPARE to do it without adding any new logic to the core. Yeah, do we really need to give that freedom right now? Would be better to get this into core as that would be more generic and people looking to set core to some freq at shutdown wouldn't be replicating that code. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] Add exit_prepare callback to the cpufreq_driver interface.
On 03/13/2014 09:59 PM, Viresh Kumar wrote: On Thu, Mar 13, 2014 at 11:06 PM, wrote: From: Dirk Brandewie Some drivers (intel_pstate) need to modify state on a core before it is completely offline. The ->exit() callback is executed during the CPU_POST_DEAD phase of the cpu offline process which is too late to change the state of the core. Patch 1 adds an optional callback function to the cpufreq_driver interface which is called by the core during the CPU_DOWN_PREPARE phase of cpu offline in __cpufreq_remove_dev_prepare(). Patch 2 changes intel_pstate to use the ->exit_prepare callback to do its cleanup during cpu offline. Copying stuff from other mail thread here so that we can discuss on a single mail chain. On 14 March 2014 03:09, Rafael J. Wysocki wrote: On Thursday, March 13, 2014 12:56:02 PM Viresh Kumar wrote: On 13 March 2014 08:07, Rafael J. Wysocki wrote: On Wednesday, March 12, 2014 02:27:07 PM Dirk Brandewie wrote: I see two possibilities: 1. Move the exit() callback to __cpufreq_remove_dev_prepare(). I don't have a good understanding of what carnage this would cause in the core or other scaling drivers. 2. Add another callback to the cpufreq_driver interface that would be call from __cpufreq_remove_dev_prepare() if the callback was set. I prefer 2, the reason being that it pretty much is guaranteed not to break anything. For the record, I'm not a fan of adding new cpufreq driver callbacks, but in this particular case it seems we can't really do anything better. I haven't thought a lot about which one of these two looks better, probably Rafael might be correct. But I can see another way out here as this is very much driver specific. Why can't we do a register_hotcpu_notifier() from pstate driver alone? Why would that be better than adding a new callback? Because its becoming more and more confusing. Probably we got the problem right but have wrong solutions for it. But having considered this issue in detail now, I have more inputs. All Dirk and Patrick wanted is to set core to min P-state before it gets offlined. We already have some infrastructure in core which is called this today: cpufreq_generic_suspend(). We can rework on that to get a more ideal solution for both the problems. Are you proposing adding cpufreq_generic_suspend() to the core I can not find it in the mainline code. Over that I don't think Dirk's solution is going to work if we twist the systems a bit. Could you explain what "twist the systems a bit" means? For example, Dirk probably wanted to set P-STATE of every core to MIN when it goes down. But his solution probably doesn't do that right now. No, I wanted to set the core that was being off-lined to min P state. As exit() is called only when the policy expires or all the CPUs of that policy are down. Suppose only one CPU out of 4 goes down from a policy, then pstate driver would never know that happened. And that core wouldn't go to min state. My patch does not change the semantics of exit() or when it is called. For intel_pstate their is one cpu per policy so I moved all the cleanup to exit_prepare() if there were work I needed to do at CPU_POST_DEAD I would have continued to export the *optional* exit callback. The callback name exit_prepare() was probably unfortunate and might be causing some confusion. I will be changing the name per Rafael's feedback. I think we have two solutions here: - If its just about setting core a particular freq when it goes down, I think it looks a generic enough problem and so better fix core for that. Probably with help of flags field/suspend-freq (maybe renamed) and without calling drivers exit() at all.. ATM the only thing that needs to be done in this case is to allow intel_pstate to set the P state on the core when it is going done. My solution from the cores point of view is more generic, it allows any driver that needs to do work during CPU_DOWN_PREPARE to do it without adding any new logic to the core. - If this is highly driver specific (which doesn't look like if all we have to do is setting freq to MIN), That is why intel_pstate is the *only* driver using exit_prepare(). If other drivers need to do work during this phase of the cpu hotplug process then they can export the callback otherwise it has no affect on existing or future scaling drivers. then better have something like register_hotcpu_notifier() with priority set to -1, so that it gets called after cpufreq. intel_pstate or any scaling driver having its own hotcpu_notifier makes possible for the core and the driver to disagree on the state of the driver. Having the driver take hotplug notifications from two directions is a bad idea IMHO. -- viresh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at
Re: [PATCH 0/2] Add exit_prepare callback to the cpufreq_driver interface.
On 03/13/2014 09:59 PM, Viresh Kumar wrote: On Thu, Mar 13, 2014 at 11:06 PM, dirk.brande...@gmail.com wrote: From: Dirk Brandewie dirk.j.brande...@intel.com Some drivers (intel_pstate) need to modify state on a core before it is completely offline. The -exit() callback is executed during the CPU_POST_DEAD phase of the cpu offline process which is too late to change the state of the core. Patch 1 adds an optional callback function to the cpufreq_driver interface which is called by the core during the CPU_DOWN_PREPARE phase of cpu offline in __cpufreq_remove_dev_prepare(). Patch 2 changes intel_pstate to use the -exit_prepare callback to do its cleanup during cpu offline. Copying stuff from other mail thread here so that we can discuss on a single mail chain. On 14 March 2014 03:09, Rafael J. Wysocki r...@rjwysocki.net wrote: On Thursday, March 13, 2014 12:56:02 PM Viresh Kumar wrote: On 13 March 2014 08:07, Rafael J. Wysocki r...@rjwysocki.net wrote: On Wednesday, March 12, 2014 02:27:07 PM Dirk Brandewie wrote: I see two possibilities: 1. Move the exit() callback to __cpufreq_remove_dev_prepare(). I don't have a good understanding of what carnage this would cause in the core or other scaling drivers. 2. Add another callback to the cpufreq_driver interface that would be call from __cpufreq_remove_dev_prepare() if the callback was set. I prefer 2, the reason being that it pretty much is guaranteed not to break anything. For the record, I'm not a fan of adding new cpufreq driver callbacks, but in this particular case it seems we can't really do anything better. I haven't thought a lot about which one of these two looks better, probably Rafael might be correct. But I can see another way out here as this is very much driver specific. Why can't we do a register_hotcpu_notifier() from pstate driver alone? Why would that be better than adding a new callback? Because its becoming more and more confusing. Probably we got the problem right but have wrong solutions for it. But having considered this issue in detail now, I have more inputs. All Dirk and Patrick wanted is to set core to min P-state before it gets offlined. We already have some infrastructure in core which is called this today: cpufreq_generic_suspend(). We can rework on that to get a more ideal solution for both the problems. Are you proposing adding cpufreq_generic_suspend() to the core I can not find it in the mainline code. Over that I don't think Dirk's solution is going to work if we twist the systems a bit. Could you explain what twist the systems a bit means? For example, Dirk probably wanted to set P-STATE of every core to MIN when it goes down. But his solution probably doesn't do that right now. No, I wanted to set the core that was being off-lined to min P state. As exit() is called only when the policy expires or all the CPUs of that policy are down. Suppose only one CPU out of 4 goes down from a policy, then pstate driver would never know that happened. And that core wouldn't go to min state. My patch does not change the semantics of exit() or when it is called. For intel_pstate their is one cpu per policy so I moved all the cleanup to exit_prepare() if there were work I needed to do at CPU_POST_DEAD I would have continued to export the *optional* exit callback. The callback name exit_prepare() was probably unfortunate and might be causing some confusion. I will be changing the name per Rafael's feedback. I think we have two solutions here: - If its just about setting core a particular freq when it goes down, I think it looks a generic enough problem and so better fix core for that. Probably with help of flags field/suspend-freq (maybe renamed) and without calling drivers exit() at all.. ATM the only thing that needs to be done in this case is to allow intel_pstate to set the P state on the core when it is going done. My solution from the cores point of view is more generic, it allows any driver that needs to do work during CPU_DOWN_PREPARE to do it without adding any new logic to the core. - If this is highly driver specific (which doesn't look like if all we have to do is setting freq to MIN), That is why intel_pstate is the *only* driver using exit_prepare(). If other drivers need to do work during this phase of the cpu hotplug process then they can export the callback otherwise it has no affect on existing or future scaling drivers. then better have something like register_hotcpu_notifier() with priority set to -1, so that it gets called after cpufreq. intel_pstate or any scaling driver having its own hotcpu_notifier makes possible for the core and the driver to disagree on the state of the driver. Having the driver take hotplug notifications from two directions is a bad idea IMHO. -- viresh -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at
Re: [PATCH 0/2] Add exit_prepare callback to the cpufreq_driver interface.
On 14 March 2014 20:40, Dirk Brandewie dirk.brande...@gmail.com wrote: Are you proposing adding cpufreq_generic_suspend() to the core I can not find it in the mainline code. Its already there in linux-next. I am suggesting to reuse that infrastructure with some necessary modification to support both suspend and hotplug. Over that I don't think Dirk's solution is going to work if we twist the systems a bit. Could you explain what twist the systems a bit means? The one I explained in the below paragraph. For example, Dirk probably wanted to set P-STATE of every core to MIN when it goes down. But his solution probably doesn't do that right now. No, I wanted to set the core that was being off-lined to min P state. Sorry, probably my words were a bit confusing. I meant exactly what you just wrote. Core going down will set its freq to min. As exit() is called only when the policy expires or all the CPUs of that policy are down. Suppose only one CPU out of 4 goes down from a policy, then pstate driver would never know that happened. And that core wouldn't go to min state. My patch does not change the semantics of exit() or when it is called. For intel_pstate their is one cpu per policy so I moved all the cleanup to I didn't knew that its guaranteed by pstate driver. I thought it would still be hardware dependent as some cores might share clock line. exit_prepare() if there were work I needed to do at CPU_POST_DEAD I would have continued to export the *optional* exit callback. The callback name exit_prepare() was probably unfortunate and might be causing some confusion. I will be changing the name per Rafael's feedback. Don't know.. There is another problem here that exit_prepare() would be called for each CPU whereas exit() would be called for each policy. And I strongly feel that we shouldn't give another callback here but instead just set core to a specific freq as mentioned by driver with some other field. I think we have two solutions here: - If its just about setting core a particular freq when it goes down, I think it looks a generic enough problem and so better fix core for that. Probably with help of flags field/suspend-freq (maybe renamed) and without calling drivers exit() at all.. ATM the only thing that needs to be done in this case is to allow intel_pstate to set the P state on the core when it is going done. My solution from the cores point of view is more generic, it allows any driver that needs to do work during CPU_DOWN_PREPARE to do it without adding any new logic to the core. Yeah, do we really need to give that freedom right now? Would be better to get this into core as that would be more generic and people looking to set core to some freq at shutdown wouldn't be replicating that code. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] Add exit_prepare callback to the cpufreq_driver interface.
On 03/14/2014 10:07 AM, Viresh Kumar wrote: On 14 March 2014 20:40, Dirk Brandewie dirk.brande...@gmail.com wrote: Are you proposing adding cpufreq_generic_suspend() to the core I can not find it in the mainline code. Its already there in linux-next. I am suggesting to reuse that infrastructure with some necessary modification to support both suspend and hotplug. Suspend and hotplug are two very different things and if we start crossing those wires bad things are going to happen IMHO. In normal operation using the suspend path to do this work could work in principal but doesn't handle the case where the user does echo 0 | sudo tee /sys/devices/system/cpu/cpuX/online Trying force hotplug and suspend into a common mechanism would lead to a bunch of special case code or a significant rework of the core code IMHO. Over that I don't think Dirk's solution is going to work if we twist the systems a bit. Could you explain what twist the systems a bit means? The one I explained in the below paragraph. For example, Dirk probably wanted to set P-STATE of every core to MIN when it goes down. But his solution probably doesn't do that right now. No, I wanted to set the core that was being off-lined to min P state. Sorry, probably my words were a bit confusing. I meant exactly what you just wrote. Core going down will set its freq to min. As exit() is called only when the policy expires or all the CPUs of that policy are down. Suppose only one CPU out of 4 goes down from a policy, then pstate driver would never know that happened. And that core wouldn't go to min state. My patch does not change the semantics of exit() or when it is called. For intel_pstate their is one cpu per policy so I moved all the cleanup to I didn't knew that its guaranteed by pstate driver. I thought it would still be hardware dependent as some cores might share clock line. This is guaranteed by the hardware. Each core has its own MSR for P state request. Any coordination that is required between cores to select the package P state is handled by the hardware. exit_prepare() if there were work I needed to do at CPU_POST_DEAD I would have continued to export the *optional* exit callback. The callback name exit_prepare() was probably unfortunate and might be causing some confusion. I will be changing the name per Rafael's feedback. Don't know.. There is another problem here that exit_prepare() would be called for each CPU whereas exit() would be called for each policy. Granted but I don't see this as a problem in this case there is a 1:1 relationship. If a driver chooses to use the *optional* exit_prepare() callback and knows that there is a many:1 relationship between the policy and CPUs then it would have to deal with it. And I strongly feel that we shouldn't give another callback here but instead just set core to a specific freq as mentioned by driver with some other field. I think we have two solutions here: - If its just about setting core a particular freq when it goes down, I think it looks a generic enough problem and so better fix core for that. Probably with help of flags field/suspend-freq (maybe renamed) and without calling drivers exit() at all.. ATM the only thing that needs to be done in this case is to allow intel_pstate to set the P state on the core when it is going done. My solution from the cores point of view is more generic, it allows any driver that needs to do work during CPU_DOWN_PREPARE to do it without adding any new logic to the core. Yeah, do we really need to give that freedom right now? Would be better to get this into core as that would be more generic and people looking to set core to some freq at shutdown wouldn't be replicating that code. IMHO yes and it would be hard to be more generic, if your platform needs to do architecture specific during the PREPARE phase of cpu hotplug use this callback or not. BTW now that you have added a path where the cpufreq_suspend() could fail it return a value and be checked in dpm_suspend() instead of printing an error and just continuing. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] Add exit_prepare callback to the cpufreq_driver interface.
On Friday, March 14, 2014 11:29:04 AM Dirk Brandewie wrote: On 03/14/2014 10:07 AM, Viresh Kumar wrote: On 14 March 2014 20:40, Dirk Brandewie dirk.brande...@gmail.com wrote: Are you proposing adding cpufreq_generic_suspend() to the core I can not find it in the mainline code. Its already there in linux-next. I am suggesting to reuse that infrastructure with some necessary modification to support both suspend and hotplug. Suspend and hotplug are two very different things and if we start crossing those wires bad things are going to happen IMHO. In normal operation using the suspend path to do this work could work in principal but doesn't handle the case where the user does echo 0 | sudo tee /sys/devices/system/cpu/cpuX/online Trying force hotplug and suspend into a common mechanism would lead to a bunch of special case code or a significant rework of the core code IMHO. Over that I don't think Dirk's solution is going to work if we twist the systems a bit. Could you explain what twist the systems a bit means? The one I explained in the below paragraph. For example, Dirk probably wanted to set P-STATE of every core to MIN when it goes down. But his solution probably doesn't do that right now. No, I wanted to set the core that was being off-lined to min P state. Sorry, probably my words were a bit confusing. I meant exactly what you just wrote. Core going down will set its freq to min. As exit() is called only when the policy expires or all the CPUs of that policy are down. Suppose only one CPU out of 4 goes down from a policy, then pstate driver would never know that happened. And that core wouldn't go to min state. My patch does not change the semantics of exit() or when it is called. For intel_pstate their is one cpu per policy so I moved all the cleanup to I didn't knew that its guaranteed by pstate driver. I thought it would still be hardware dependent as some cores might share clock line. This is guaranteed by the hardware. Each core has its own MSR for P state request. Any coordination that is required between cores to select the package P state is handled by the hardware. exit_prepare() if there were work I needed to do at CPU_POST_DEAD I would have continued to export the *optional* exit callback. The callback name exit_prepare() was probably unfortunate and might be causing some confusion. I will be changing the name per Rafael's feedback. Don't know.. There is another problem here that exit_prepare() would be called for each CPU whereas exit() would be called for each policy. Granted but I don't see this as a problem in this case there is a 1:1 relationship. If a driver chooses to use the *optional* exit_prepare() callback and knows that there is a many:1 relationship between the policy and CPUs then it would have to deal with it. Actually, I think we should make it clear that the new callback is for -setpolicy drivers only, which will make things a bit clearer. We seem to get caught by the difference between -setpolicy and -target drivers on a regular basis, so it might be a good idea to make the distinction more clear in the code. I have an idea how to do that, but need some time to prototype it. And I strongly feel that we shouldn't give another callback here but instead just set core to a specific freq as mentioned by driver with some other field. I think we have two solutions here: - If its just about setting core a particular freq when it goes down, I think it looks a generic enough problem and so better fix core for that. Probably with help of flags field/suspend-freq (maybe renamed) and without calling drivers exit() at all.. ATM the only thing that needs to be done in this case is to allow intel_pstate to set the P state on the core when it is going done. My solution from the cores point of view is more generic, it allows any driver that needs to do work during CPU_DOWN_PREPARE to do it without adding any new logic to the core. Yeah, do we really need to give that freedom right now? Would be better to get this into core as that would be more generic and people looking to set core to some freq at shutdown wouldn't be replicating that code. Question is if it needs to be more generic. I honestly don't think that -target drivers will ever do anything like it, because they need the governor to exit before. So we are talking about the only two -setpolicy drivers in the tree here. IMHO yes and it would be hard to be more generic, if your platform needs to do architecture specific during the PREPARE phase of cpu hotplug use this callback or not. BTW now that you have added a path where the cpufreq_suspend() could fail it return a value and be checked in dpm_suspend() instead of printing an error and just continuing. I'm not sure what you mean? Are you saying that it might be a good idea
Re: [PATCH 0/2] Add exit_prepare callback to the cpufreq_driver interface.
On Thu, Mar 13, 2014 at 11:06 PM, wrote: > From: Dirk Brandewie > > Some drivers (intel_pstate) need to modify state on a core before it > is completely offline. The ->exit() callback is executed during the > CPU_POST_DEAD phase of the cpu offline process which is too late to > change the state of the core. > > Patch 1 adds an optional callback function to the cpufreq_driver > interface which is called by the core during the CPU_DOWN_PREPARE > phase of cpu offline in __cpufreq_remove_dev_prepare(). > > Patch 2 changes intel_pstate to use the ->exit_prepare callback to do > its cleanup during cpu offline. Copying stuff from other mail thread here so that we can discuss on a single mail chain. On 14 March 2014 03:09, Rafael J. Wysocki wrote: > On Thursday, March 13, 2014 12:56:02 PM Viresh Kumar wrote: >> On 13 March 2014 08:07, Rafael J. Wysocki wrote: >> > On Wednesday, March 12, 2014 02:27:07 PM Dirk Brandewie wrote: >> >> >> > I see two possibilities: >> >> > 1. Move the exit() callback to __cpufreq_remove_dev_prepare(). I >> >> > don't >> >> > have a good understanding of what carnage this would cause in the >> >> > core >> >> > or other scaling drivers. >> >> > >> >> > 2. Add another callback to the cpufreq_driver interface that would be >> >> > call >> >> > from __cpufreq_remove_dev_prepare() if the callback was set. >> > >> > I prefer 2, the reason being that it pretty much is guaranteed not to break >> > anything. For the record, I'm not a fan of adding new cpufreq driver >> > callbacks, >> > but in this particular case it seems we can't really do anything better. >> >> I haven't thought a lot about which one of these two looks better, probably >> Rafael might be correct. But I can see another way out here as this is very >> much driver specific. Why can't we do a register_hotcpu_notifier() from >> pstate driver alone? > > Why would that be better than adding a new callback? Because its becoming more and more confusing. Probably we got the problem right but have wrong solutions for it. But having considered this issue in detail now, I have more inputs. All Dirk and Patrick wanted is to set core to min P-state before it gets offlined. We already have some infrastructure in core which is called this today: cpufreq_generic_suspend(). We can rework on that to get a more ideal solution for both the problems. Over that I don't think Dirk's solution is going to work if we twist the systems a bit. For example, Dirk probably wanted to set P-STATE of every core to MIN when it goes down. But his solution probably doesn't do that right now. As exit() is called only when the policy expires or all the CPUs of that policy are down. Suppose only one CPU out of 4 goes down from a policy, then pstate driver would never know that happened. And that core wouldn't go to min state. I think we have two solutions here: - If its just about setting core a particular freq when it goes down, I think it looks a generic enough problem and so better fix core for that. Probably with help of flags field/suspend-freq (maybe renamed) and without calling drivers exit() at all.. - If this is highly driver specific (which doesn't look like if all we have to do is setting freq to MIN), then better have something like register_hotcpu_notifier() with priority set to -1, so that it gets called after cpufreq. -- viresh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/2] Add exit_prepare callback to the cpufreq_driver interface.
From: Dirk Brandewie Some drivers (intel_pstate) need to modify state on a core before it is completely offline. The ->exit() callback is executed during the CPU_POST_DEAD phase of the cpu offline process which is too late to change the state of the core. Patch 1 adds an optional callback function to the cpufreq_driver interface which is called by the core during the CPU_DOWN_PREPARE phase of cpu offline in __cpufreq_remove_dev_prepare(). Patch 2 changes intel_pstate to use the ->exit_prepare callback to do its cleanup during cpu offline. Dirk Brandewie (2): cpufreq: Add exit_prepare callback to cpufreq_driver interface intel_pstate: Set core to min P state during core offline Documentation/cpu-freq/cpu-drivers.txt | 8 +++- drivers/cpufreq/cpufreq.c | 3 +++ drivers/cpufreq/intel_pstate.c | 20 +--- include/linux/cpufreq.h| 1 + 4 files changed, 24 insertions(+), 8 deletions(-) -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/2] Add exit_prepare callback to the cpufreq_driver interface.
From: Dirk Brandewie dirk.j.brande...@intel.com Some drivers (intel_pstate) need to modify state on a core before it is completely offline. The -exit() callback is executed during the CPU_POST_DEAD phase of the cpu offline process which is too late to change the state of the core. Patch 1 adds an optional callback function to the cpufreq_driver interface which is called by the core during the CPU_DOWN_PREPARE phase of cpu offline in __cpufreq_remove_dev_prepare(). Patch 2 changes intel_pstate to use the -exit_prepare callback to do its cleanup during cpu offline. Dirk Brandewie (2): cpufreq: Add exit_prepare callback to cpufreq_driver interface intel_pstate: Set core to min P state during core offline Documentation/cpu-freq/cpu-drivers.txt | 8 +++- drivers/cpufreq/cpufreq.c | 3 +++ drivers/cpufreq/intel_pstate.c | 20 +--- include/linux/cpufreq.h| 1 + 4 files changed, 24 insertions(+), 8 deletions(-) -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] Add exit_prepare callback to the cpufreq_driver interface.
On Thu, Mar 13, 2014 at 11:06 PM, dirk.brande...@gmail.com wrote: From: Dirk Brandewie dirk.j.brande...@intel.com Some drivers (intel_pstate) need to modify state on a core before it is completely offline. The -exit() callback is executed during the CPU_POST_DEAD phase of the cpu offline process which is too late to change the state of the core. Patch 1 adds an optional callback function to the cpufreq_driver interface which is called by the core during the CPU_DOWN_PREPARE phase of cpu offline in __cpufreq_remove_dev_prepare(). Patch 2 changes intel_pstate to use the -exit_prepare callback to do its cleanup during cpu offline. Copying stuff from other mail thread here so that we can discuss on a single mail chain. On 14 March 2014 03:09, Rafael J. Wysocki r...@rjwysocki.net wrote: On Thursday, March 13, 2014 12:56:02 PM Viresh Kumar wrote: On 13 March 2014 08:07, Rafael J. Wysocki r...@rjwysocki.net wrote: On Wednesday, March 12, 2014 02:27:07 PM Dirk Brandewie wrote: I see two possibilities: 1. Move the exit() callback to __cpufreq_remove_dev_prepare(). I don't have a good understanding of what carnage this would cause in the core or other scaling drivers. 2. Add another callback to the cpufreq_driver interface that would be call from __cpufreq_remove_dev_prepare() if the callback was set. I prefer 2, the reason being that it pretty much is guaranteed not to break anything. For the record, I'm not a fan of adding new cpufreq driver callbacks, but in this particular case it seems we can't really do anything better. I haven't thought a lot about which one of these two looks better, probably Rafael might be correct. But I can see another way out here as this is very much driver specific. Why can't we do a register_hotcpu_notifier() from pstate driver alone? Why would that be better than adding a new callback? Because its becoming more and more confusing. Probably we got the problem right but have wrong solutions for it. But having considered this issue in detail now, I have more inputs. All Dirk and Patrick wanted is to set core to min P-state before it gets offlined. We already have some infrastructure in core which is called this today: cpufreq_generic_suspend(). We can rework on that to get a more ideal solution for both the problems. Over that I don't think Dirk's solution is going to work if we twist the systems a bit. For example, Dirk probably wanted to set P-STATE of every core to MIN when it goes down. But his solution probably doesn't do that right now. As exit() is called only when the policy expires or all the CPUs of that policy are down. Suppose only one CPU out of 4 goes down from a policy, then pstate driver would never know that happened. And that core wouldn't go to min state. I think we have two solutions here: - If its just about setting core a particular freq when it goes down, I think it looks a generic enough problem and so better fix core for that. Probably with help of flags field/suspend-freq (maybe renamed) and without calling drivers exit() at all.. - If this is highly driver specific (which doesn't look like if all we have to do is setting freq to MIN), then better have something like register_hotcpu_notifier() with priority set to -1, so that it gets called after cpufreq. -- viresh -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/