Re: [linux-pm] [RFC PATCH] PM: Introduce generic DVFS framework with device-specific OPPs
On Wed, Apr 27, 2011 at 01:48:52PM -0700, Colin Cross wrote: > OPP currently has opp_enable and opp_disable functions. I don't > understand why these are needed, they are only used at init time to > determine available voltages, which could be handled by never passing > unavailable voltages to the dvfs implementation. I queried this when OPP was originally added. The motivation which was given (which seemed fairly reasonable) was to reduce the number of data tables for similar parts and board designs. That did seem like something which it was reasonable to factor out in some way, though possibly with a different mechanism. -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [linux-pm] [RFC PATCH] PM: Introduce generic DVFS framework with device-specific OPPs
On Thu, Apr 28, 2011 at 4:06 PM, Colin Cross wrote: > On Wed, Apr 27, 2011 at 11:50 PM, MyungJoo Ham > wrote: >> On Thu, Apr 28, 2011 at 3:44 PM, Colin Cross wrote: >>> On Wed, Apr 27, 2011 at 11:12 PM, MyungJoo Ham >>> wrote: On Thu, Apr 28, 2011 at 5:48 AM, Colin Cross wrote: > OPP currently has opp_enable and opp_disable functions. I don't > understand why these are needed, they are only used at init time to > determine available voltages, which could be handled by never passing > unavailable voltages to the dvfs implementation. We need them in runtime. A device "a" may want to guarantee that a device "b" to be at least "200MHz" or faster while it does some operations. Then, "a" will opp_disable("b", 100MHz and others); and opp_enable("b", them) later on. We have similar issues with multimedia blocks (MFC, Camera, FB, GPU) and CPU/Memory Bus. Ondemand governor of CPUFREQ has some delay on catching up a workload (1.5x the sampling rate in average, <2.0x the sampling rate in worst cases), which may incur flickering/tearing issues with multimedia streams. On the other hand, a general thermal monitor or battery manager might want to limit energy usage by disabling top performance clocks if it is too hot or the battery level is low. >>> >>> That sounds like a very strange api, when what you really mean is >>> clk_set_min_rate or clk_set_max_rate. >> >> Essentially, that's what needed. >> However, with clk_set_min/max_rate, don't we need to let another >> device to be consumer of other devices' clocks? Not just introducing a >> device to other devices? > > Yes, but that's effectively what you're doing through a backwards api > anyways. The question is, for these complicated clock scenarios where > the final frequency of a clock depends on so many factors, should that > control go through the clock framework, or through some sort of global > clock governor (which is where OPP would reappear). > In the use cases of runtime clock setting by devfreq or other devices mentioned above, we are controlling the device's performance with the representative clock of the device, not a specific clock among the clocks that the device has. For a device "A" with clock "a1" and "a2", another device "B" would not control both "a1" and "a2" directly to get the guaranteed performance from "A". Besides, "B" should not do so if there are specific orders, delays, and other controls for "A" to properly change performance. Therefore, my answer is that it would be preferred to control through some wrapper/interface/or anything that is connected to the device of the controlled clocks (and let the device's callback or something control its clocks), not to control through clock framework directly. In this version of devfreq+OPP, these are handled by the "target" callback. Cheers! - MyungJoo -- MyungJoo Ham, Ph.D. Mobile Software Platform Lab, Digital Media and Communications (DMC) Business Samsung Electronics cell: 82-10-6714-2858 -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [linux-pm] [RFC PATCH] PM: Introduce generic DVFS framework with device-specific OPPs
On Thu, Apr 28, 2011 at 3:43 PM, Colin Cross wrote: > I understand the need for some sort of governor that can use device > state to determine the necessary clock frequencies. Where I disagree > is the connection to voltages. The governor should ONLY determine the > frequencies desired, and the voltage required to meet those > frequencies should be determined by the clock framework, based only on > the clock and the frequency. Yes, as long as AVS(Adaptive Voltage Scaling) is not involved, devfreq does not need to care about voltages and let device driver (such as the target callback or its callee) take care of voltages. Besides, my impression on AVS is that AVS wouldn't be depending on software DVFS scheme, at least with some AVS test on S5PC110. So, I'd say that it's safe to let devfreq framework handle frequency only and let target callback handle anything else except for choosing representative clock frequency. However, if we are going to detach devfreq from OPP, we only need to provide frequency list at init and { an interface to control max/min freq or an interface to lookup max/min freq of corresponding representative clock. } > ___ > linux-pm mailing list > linux...@lists.linux-foundation.org > https://lists.linux-foundation.org/mailman/listinfo/linux-pm > ps. In our AVS test, the device drivers had nothing to do with voltage scaling except for initializing devices. The H/W did everything about voltage scaling dynamically. Thanks, MyungJoo. -- MyungJoo Ham (함명주), Ph.D. Mobile Software Platform Lab, Digital Media and Communications (DMC) Business Samsung Electronics cell: 82-10-6714-2858 -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [linux-pm] [RFC PATCH] PM: Introduce generic DVFS framework with device-specific OPPs
On Wed, Apr 27, 2011 at 11:50 PM, MyungJoo Ham wrote: > On Thu, Apr 28, 2011 at 3:44 PM, Colin Cross wrote: >> On Wed, Apr 27, 2011 at 11:12 PM, MyungJoo Ham >> wrote: >>> On Thu, Apr 28, 2011 at 5:48 AM, Colin Cross wrote: OPP currently has opp_enable and opp_disable functions. I don't understand why these are needed, they are only used at init time to determine available voltages, which could be handled by never passing unavailable voltages to the dvfs implementation. >>> >>> We need them in runtime. >>> >>> A device "a" may want to guarantee that a device "b" to be at least >>> "200MHz" or faster while it does some operations. Then, "a" will >>> opp_disable("b", 100MHz and others); and opp_enable("b", them) later >>> on. We have similar issues with multimedia blocks (MFC, Camera, FB, >>> GPU) and CPU/Memory Bus. Ondemand governor of CPUFREQ has some delay >>> on catching up a workload (1.5x the sampling rate in average, <2.0x >>> the sampling rate in worst cases), which may incur flickering/tearing >>> issues with multimedia streams. On the other hand, a general thermal >>> monitor or battery manager might want to limit energy usage by >>> disabling top performance clocks if it is too hot or the battery level >>> is low. >> >> That sounds like a very strange api, when what you really mean is >> clk_set_min_rate or clk_set_max_rate. > > Essentially, that's what needed. > However, with clk_set_min/max_rate, don't we need to let another > device to be consumer of other devices' clocks? Not just introducing a > device to other devices? Yes, but that's effectively what you're doing through a backwards api anyways. The question is, for these complicated clock scenarios where the final frequency of a clock depends on so many factors, should that control go through the clock framework, or through some sort of global clock governor (which is where OPP would reappear). -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [linux-pm] [RFC PATCH] PM: Introduce generic DVFS framework with device-specific OPPs
On Thu, Apr 28, 2011 at 3:44 PM, Colin Cross wrote: > On Wed, Apr 27, 2011 at 11:12 PM, MyungJoo Ham > wrote: >> On Thu, Apr 28, 2011 at 5:48 AM, Colin Cross wrote: >>> OPP currently has opp_enable and opp_disable functions. I don't >>> understand why these are needed, they are only used at init time to >>> determine available voltages, which could be handled by never passing >>> unavailable voltages to the dvfs implementation. >> >> We need them in runtime. >> >> A device "a" may want to guarantee that a device "b" to be at least >> "200MHz" or faster while it does some operations. Then, "a" will >> opp_disable("b", 100MHz and others); and opp_enable("b", them) later >> on. We have similar issues with multimedia blocks (MFC, Camera, FB, >> GPU) and CPU/Memory Bus. Ondemand governor of CPUFREQ has some delay >> on catching up a workload (1.5x the sampling rate in average, <2.0x >> the sampling rate in worst cases), which may incur flickering/tearing >> issues with multimedia streams. On the other hand, a general thermal >> monitor or battery manager might want to limit energy usage by >> disabling top performance clocks if it is too hot or the battery level >> is low. > > That sounds like a very strange api, when what you really mean is > clk_set_min_rate or clk_set_max_rate. Essentially, that's what needed. However, with clk_set_min/max_rate, don't we need to let another device to be consumer of other devices' clocks? Not just introducing a device to other devices? > ___ > linux-pm mailing list > linux...@lists.linux-foundation.org > https://lists.linux-foundation.org/mailman/listinfo/linux-pm > -- MyungJoo Ham (함명주), Ph.D. Mobile Software Platform Lab, Digital Media and Communications (DMC) Business Samsung Electronics cell: 82-10-6714-2858 -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [linux-pm] [RFC PATCH] PM: Introduce generic DVFS framework with device-specific OPPs
On Wed, Apr 27, 2011 at 11:12 PM, MyungJoo Ham wrote: > On Thu, Apr 28, 2011 at 5:48 AM, Colin Cross wrote: >> OPP currently has opp_enable and opp_disable functions. I don't >> understand why these are needed, they are only used at init time to >> determine available voltages, which could be handled by never passing >> unavailable voltages to the dvfs implementation. > > We need them in runtime. > > A device "a" may want to guarantee that a device "b" to be at least > "200MHz" or faster while it does some operations. Then, "a" will > opp_disable("b", 100MHz and others); and opp_enable("b", them) later > on. We have similar issues with multimedia blocks (MFC, Camera, FB, > GPU) and CPU/Memory Bus. Ondemand governor of CPUFREQ has some delay > on catching up a workload (1.5x the sampling rate in average, <2.0x > the sampling rate in worst cases), which may incur flickering/tearing > issues with multimedia streams. On the other hand, a general thermal > monitor or battery manager might want to limit energy usage by > disabling top performance clocks if it is too hot or the battery level > is low. That sounds like a very strange api, when what you really mean is clk_set_min_rate or clk_set_max_rate. -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [linux-pm] [RFC PATCH] PM: Introduce generic DVFS framework with device-specific OPPs
On Wed, Apr 27, 2011 at 10:59 PM, MyungJoo Ham wrote: > What one instance of DVFS (devfreq) controls are clocks and > regulators. (a device may have multiple regulators as well as multiple > clocks) > What one instance of DVFS (devfreq) monitors (device load and/or > temperature) is a device that uses the clocks and regulators. > > If we focus on the things that are controlled by DVFS, connecting DVFS > with clock seems fine; however, DVFS's decision is based on the status > of the device and the decision (monitoring result) configures a set of > clocks and regulators. The clocks are not configured independently > from others if the clocks are used by a DVFS-capable device. The > frequency/voltage pair (OPP in this patch) associated with a device > becomes a representative value of a specific configuration that > configures the set of clocks and regulators. > > This is quite similar with CPUFREQ. CPUFREQ provides a single > frequency value as a result of monitoring; however the machine's > cpufreq driver may set multiple clocks and multiple voltage regulators > based on the representative value (which is usually the core clock) > although the cpufreq driver may need to control many more clocks with > different frequencies. > > With multiple clocks of a device, if there is a clock that is required > to be set independently from the "representative" clock with DVFS, it > means that the DVFS monitoring result (load/temperature) is not a > scalar value but a vector (multi-dimensional value). That implies that > we need to monitor different and independent values, which in turn, > implies that we need separated devices. Note that the DVFS monitor > result from load and temperature combined is not a multi-dimensional > value because the temperature limits "maximum possible frequency or > voltage" and the load gives "preferred lower bound of frequency" that > can be overridden by the limit set by temperature. > > Therefore, having one DVFS per clock where multiple clocks are > attached to a device will create multiple monitors that monitor the > same object(device behavior) with same metrics (load and temperature). > > Besides, the reason I've started with "target" callback, not clk and > regulator names or pointers is that a device may have multiple clks > and regulators and the OPP may only show the representative > clock/regulators as CPUFREQ does. Especially when the order of > transitions of those multiple clocks and regulators matter (if they > are in a single device, it sometimes does), running a DVFS per clock, > not per device, will be bothersome if not disasterous. I understand the need for some sort of governor that can use device state to determine the necessary clock frequencies. Where I disagree is the connection to voltages. The governor should ONLY determine the frequencies desired, and the voltage required to meet those frequencies should be determined by the clock framework, based only on the clock and the frequency. -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [linux-pm] [RFC PATCH] PM: Introduce generic DVFS framework with device-specific OPPs
On Thu, Apr 28, 2011 at 5:48 AM, Colin Cross wrote: > On Wed, Apr 27, 2011 at 12:26 PM, Thomas Gleixner wrote: >> Forget OMAP implementation details for a while, sit back and look at >> the big picture. > > Here's my proposal for DVFS: > - DVFS is implemented in drivers/clk/dvfs.c, and is called by the > common clock implementation to adjust the voltages, if necessary, on > regular clk_* calls. > - Platform code provides mappings in the form (clk, regulator, max > frequency, min voltage) to the dvfs code. > - Everything that is in OPP today gets converted to helper functions > inside the dvfs implementation, and is never called from SoC code > (except to pass tables at init), or from drivers. > - OPP can be recreated in the future as a upper level policy manager > for clocks that need to move together, if that is ever necessary. It > would not know anything about voltages. > - A few common policy implementations need to be added to the common > clock implementation, like temperature limits. I hope that my previous reply answered this. > > For Tegra: > - DVFS continues to be accessed by calling clk_* functions > > For OMAP: > - DVFS is triggered by hwmod through clk_* functions. Any cross-arch > driver can continue to call clk_* functions. > > OPP currently has opp_enable and opp_disable functions. I don't > understand why these are needed, they are only used at init time to > determine available voltages, which could be handled by never passing > unavailable voltages to the dvfs implementation. We need them in runtime. A device "a" may want to guarantee that a device "b" to be at least "200MHz" or faster while it does some operations. Then, "a" will opp_disable("b", 100MHz and others); and opp_enable("b", them) later on. We have similar issues with multimedia blocks (MFC, Camera, FB, GPU) and CPU/Memory Bus. Ondemand governor of CPUFREQ has some delay on catching up a workload (1.5x the sampling rate in average, <2.0x the sampling rate in worst cases), which may incur flickering/tearing issues with multimedia streams. On the other hand, a general thermal monitor or battery manager might want to limit energy usage by disabling top performance clocks if it is too hot or the battery level is low. > ___ > linux-pm mailing list > linux...@lists.linux-foundation.org > https://lists.linux-foundation.org/mailman/listinfo/linux-pm > -- MyungJoo Ham (함명주), Ph.D. Mobile Software Platform Lab, Digital Media and Communications (DMC) Business Samsung Electronics cell: 82-10-6714-2858 -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [linux-pm] [RFC PATCH] PM: Introduce generic DVFS framework with device-specific OPPs
On Thu, Apr 28, 2011 at 3:37 AM, Colin Cross wrote: > (sorry, missent the earlier one) > > On Wed, Apr 27, 2011 at 11:07 AM, Menon, Nishanth wrote: >> On Wed, Apr 27, 2011 at 12:49, Colin Cross wrote: >> +l-o >> >>> I'm a little confused about the design for this, and OPP as well. OPP >>> matches a struct device * and a frequency to a voltage, which is not a >>> generically useful pairing, as far as I can tell. On Tegra, it is >>> quite possible for a single device to have multiple clocks that each >>> have different voltage requirements, for example the display block can >>> have an interface clock as well as a pixel clock. Simplifying this to >>> dev + freq = voltage seems very OMAP specific, and will be difficult >>> or impossible to adapt to Tegra. >> We have the same requirements as well(iclk,fclk,pixclk etc..)! We >> group them under voltage domains in OMAP ;). if your issue was a >> ability to have a single freq to a OPP, it is upto SoC to do the >> proper mapping. Concept of an OPP still remains consistent - which is >> for a voltage, there is only so much freq you can drive that specific >> module to. > No, that is still wrong. You don't drive a module at a frequency, you > drive a clock. You can't map struct device * 1-1 to a clock. Look at > omap2_set_init_voltage: > static int __init omap2_set_init_voltage(char *vdd_name, char *clk_name, > struct device *dev) { > ... > clk = clk_get(NULL, clk_name); > freq = clk->rate; > opp = opp_find_freq_ceil(dev, &freq); > ... > } > > What happens if I have a dev with two frequencies? I can only pass a > dev into opp. It makes infinitely more sense to pass in a clock: > opp_find_freq_ceil(clk, &freq). What one instance of DVFS (devfreq) controls are clocks and regulators. (a device may have multiple regulators as well as multiple clocks) What one instance of DVFS (devfreq) monitors (device load and/or temperature) is a device that uses the clocks and regulators. If we focus on the things that are controlled by DVFS, connecting DVFS with clock seems fine; however, DVFS's decision is based on the status of the device and the decision (monitoring result) configures a set of clocks and regulators. The clocks are not configured independently from others if the clocks are used by a DVFS-capable device. The frequency/voltage pair (OPP in this patch) associated with a device becomes a representative value of a specific configuration that configures the set of clocks and regulators. This is quite similar with CPUFREQ. CPUFREQ provides a single frequency value as a result of monitoring; however the machine's cpufreq driver may set multiple clocks and multiple voltage regulators based on the representative value (which is usually the core clock) although the cpufreq driver may need to control many more clocks with different frequencies. With multiple clocks of a device, if there is a clock that is required to be set independently from the "representative" clock with DVFS, it means that the DVFS monitoring result (load/temperature) is not a scalar value but a vector (multi-dimensional value). That implies that we need to monitor different and independent values, which in turn, implies that we need separated devices. Note that the DVFS monitor result from load and temperature combined is not a multi-dimensional value because the temperature limits "maximum possible frequency or voltage" and the load gives "preferred lower bound of frequency" that can be overridden by the limit set by temperature. Therefore, having one DVFS per clock where multiple clocks are attached to a device will create multiple monitors that monitor the same object(device behavior) with same metrics (load and temperature). Besides, the reason I've started with "target" callback, not clk and regulator names or pointers is that a device may have multiple clks and regulators and the OPP may only show the representative clock/regulators as CPUFREQ does. Especially when the order of transitions of those multiple clocks and regulators matter (if they are in a single device, it sometimes does), running a DVFS per clock, not per device, will be bothersome if not disasterous. > >> It is upto SoC frameworks to implement the transitions. E.g. lets look >> at scalability: How'd the mechanism proposed work with temperature >> variances: Example: I dont want to hit 1.5GHz if temp >70C - wont it >> be an SoC specific hack I'd need to introduce? > No, because you're putting it in the wrong place, that is a policy > decision. Handle it in the clock framework, or handle it in the > device driver. That's a bad example either way - what happens if you > are already at 1.5GHz when the temperature crosses 70C? You need an > interrupt that tells you the temperature is too high, and than needs > to affect a policy decision at a much higher level than dvfs. > >> >> All OPP framework does is store that maps, and leave
Re: [linux-pm] [RFC PATCH] PM: Introduce generic DVFS framework with device-specific OPPs
On Wed, Apr 27, 2011 at 12:26 PM, Thomas Gleixner wrote: > Forget OMAP implementation details for a while, sit back and look at > the big picture. Here's my proposal for DVFS: - DVFS is implemented in drivers/clk/dvfs.c, and is called by the common clock implementation to adjust the voltages, if necessary, on regular clk_* calls. - Platform code provides mappings in the form (clk, regulator, max frequency, min voltage) to the dvfs code. - Everything that is in OPP today gets converted to helper functions inside the dvfs implementation, and is never called from SoC code (except to pass tables at init), or from drivers. - OPP can be recreated in the future as a upper level policy manager for clocks that need to move together, if that is ever necessary. It would not know anything about voltages. - A few common policy implementations need to be added to the common clock implementation, like temperature limits. For Tegra: - DVFS continues to be accessed by calling clk_* functions For OMAP: - DVFS is triggered by hwmod through clk_* functions. Any cross-arch driver can continue to call clk_* functions. OPP currently has opp_enable and opp_disable functions. I don't understand why these are needed, they are only used at init time to determine available voltages, which could be handled by never passing unavailable voltages to the dvfs implementation. -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [linux-pm] [RFC PATCH] PM: Introduce generic DVFS framework with device-specific OPPs
On Wed, 27 Apr 2011, Menon, Nishanth wrote: > OPP table is just a storage and retrieval mechanism, it is upto SoC > frameworks to choose the most adequate of solutions - e.g. OMAP has > omap_device, hwmod and a clock framework for more intricate control to > work in conjunction with cpuidle frameworks as well. Can you please stop thinking about OMAP for a minute? A clock framework is nothing SoC specific. A framework is an abstraction of common HW functionality, which implements general functionality and relies on the HW specific part to configure it and to provide access to the hardware itself. clocks are ordered as trees in HW, simply because you cannot have a clock consumer be driven by more than one active clock at the same time. A clock consumer may select a different clock producer, but that merily changes the tree structure nothing else. So why should every SoC implement it's own (different buggy) version of tree handling and call it framework? Yes, I know you might argue that some devices need two clocks enabled to be functional. That's correct, but coupling those clocks at the framework level is the wrong thing to do. If a device needs both an interface clock and a separate interconnect clock to work, then it needs to enable both clocks and become a consumer of them. > There is cross domain dependency which OMAP (yet to be pushed to > mainline) has - example: when OMAP4's MPUs are at a certain OPP, L3 > (OMAP's SoC bus) needs to be at least a certain OPP - these are > framework which may be very custom to OMAP itself. Wrong again. That's not a framework when you hack SoC specific decision functions into it. It's the OMAP internal hackery to make stuff work, but that's far from a framework. What you are describing is a restriction which can be expressed in tables or rules which are fed into a general framework. Look at generic irqs, generic timekeeping, generic clockevents and tons of other real frameworks in the kernel. They abstract out concepts and provide generic interfaces rather than claiming that the problem is unique to a particular piece of silicon. Forget OMAP implementation details for a while, sit back and look at the big picture. Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [linux-pm] [RFC PATCH] PM: Introduce generic DVFS framework with device-specific OPPs
On Wed, 27 Apr 2011, Menon, Nishanth wrote: > On Wed, Apr 27, 2011 at 12:49, Colin Cross wrote: > > I proposed in a different thread on LKML that DVFS be handled within > > the generic clock implementation. Platforms would register a > > regulator and a table of voltages for each struct clock that required > > DVFS, and the voltages would be changed on normal clk_* requests. > > This maintains compatibility with existing clk_* calls. > > It is upto SoC frameworks to implement the transitions. E.g. lets look > at scalability: How'd the mechanism proposed work with temperature > variances: Example: I dont want to hit 1.5GHz if temp >70C - wont it > be an SoC specific hack I'd need to introduce? Why is limiting the max core frequency depending on temperature a SoC specific problem ? Everyone wants to do that. x86 does it in hardware / SMM, other architectures want the kernel to take care of it. So the decision is simple. Something wants to set core freq to 1.5 GHz, so it calls clk_set_rate() and there we consult the DVFS code first to validate that setting. If it can be set, fine, then DVFS will set the voltages _before_ we change the frequency or it will simply veto the change because one of the preliminaries for such a change is not given. Please stop thinking that your SoC is sooo special. It's NOT. The HW concepts are quite similar all over the place, they are just named differently and use different IP blocks with slightly different functionality, but the problems are not unique to a particular SoC at all. > All OPP framework does is store that maps, and leaves it to users to > choose regulators, clock framework variances, SoC temperature sensors > or what ever mechanisms they choose to allow through a transition. That's how it's implemented, but that does not say that the design is correct and usable for more than the usecase it was modeled after. We are looking into a common clock framework, which abstracts out the duplicated functionality of the various implementations and reduces them to the real thing: hardware drivers. So we really need to look into that DVFS problem as well, simply because it is tightly coupled and not a complete separate entity. And looking at the struct clk disaster we really don't want another incarnation in terms of DVFS where we end up with the same decision functions in various SoCs over and over. Thanks, tglx
Re: [linux-pm] [RFC PATCH] PM: Introduce generic DVFS framework with device-specific OPPs
On Wed, Apr 27, 2011 at 11:48 AM, Menon, Nishanth wrote: > On Wed, Apr 27, 2011 at 13:29, Colin Cross wrote: >> On Wed, Apr 27, 2011 at 11:07 AM, Menon, Nishanth wrote: >>> On Wed, Apr 27, 2011 at 12:49, Colin Cross wrote: >>> +l-o >>> I'm a little confused about the design for this, and OPP as well. OPP matches a struct device * and a frequency to a voltage, which is not a generically useful pairing, as far as I can tell. On Tegra, it is quite possible for a single device to have multiple clocks that each have different voltage requirements, for example the display block can have an interface clock as well as a pixel clock. Simplifying this to dev + freq = voltage seems very OMAP specific, and will be difficult or impossible to adapt to Tegra. >>> We have the same requirements as well(iclk,fclk,pixclk etc..)! We >>> group them under voltage domains in OMAP ;). if your issue was a >>> ability to have a single freq to a OPP, it is upto SoC to do the >>> proper mapping. Concept of an OPP still remains consistent - which is >>> for a voltage, there is only so much freq you can drive that specific >>> module to. >> No, that is still wrong. You don't drive a module at a frequency, you >> drive a clock. You can't map struct device * 1-1 to a clock. Look at > Agreed, module runs on clocks - Lets say n clocks provide a module > it's functionality. > >> omap2_set_init_voltage: >> static int __init omap2_set_init_voltage(char *vdd_name, char *clk_name, >> struct device *dev) { >> >> clk = clk_get(NULL, clk_name); >> freq = clk->rate; >> opp = opp_find_freq_ceil(dev, &freq); >> ... >> } >> >> Now what happens if I have a dev with two frequencies, > we do have it - it depends on what the OPP table represents. we do > have modules which have both interface and functional clocks on OMAP > as well. for a module(represented by struct device *) which has n > clocks, choose the scheme of representation of clock that depends on > voltage for the module. > in the example you provided "the display block can have an interface > clock as well as a pixel clock" - I suppose you mean: > {.pclk = x, .iclk = y, .v = z} > The question I'd ask is this : for a voltage z, is the dependency on > pclk or iclk? I can expect a dependency of pclk to iclk requirement > (considering pixel clock drives an external display for example). the > table reduces to just > {.iclk = y, .v = z} and a different table that has divisor for .iclk > to pclk which is SoC based. No, there can be voltage requirements on both, and the higher voltage requirement of the two must be used. > OPP table is just a storage and retrieval mechanism, it is upto SoC > frameworks to choose the most adequate of solutions - e.g. OMAP has > omap_device, hwmod and a clock framework for more intricate control to > work in conjunction with cpuidle frameworks as well. > > There is cross domain dependency which OMAP (yet to be pushed to > mainline) has - example: when OMAP4's MPUs are at a certain OPP, L3 > (OMAP's SoC bus) needs to be at least a certain OPP - these are > framework which may be very custom to OMAP itself. > > --- > Regards, > Nishanth Menon > -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [linux-pm] [RFC PATCH] PM: Introduce generic DVFS framework with device-specific OPPs
On Wed, Apr 27, 2011 at 13:29, Colin Cross wrote: > On Wed, Apr 27, 2011 at 11:07 AM, Menon, Nishanth wrote: >> On Wed, Apr 27, 2011 at 12:49, Colin Cross wrote: >> +l-o >> >>> I'm a little confused about the design for this, and OPP as well. OPP >>> matches a struct device * and a frequency to a voltage, which is not a >>> generically useful pairing, as far as I can tell. On Tegra, it is >>> quite possible for a single device to have multiple clocks that each >>> have different voltage requirements, for example the display block can >>> have an interface clock as well as a pixel clock. Simplifying this to >>> dev + freq = voltage seems very OMAP specific, and will be difficult >>> or impossible to adapt to Tegra. >> We have the same requirements as well(iclk,fclk,pixclk etc..)! We >> group them under voltage domains in OMAP ;). if your issue was a >> ability to have a single freq to a OPP, it is upto SoC to do the >> proper mapping. Concept of an OPP still remains consistent - which is >> for a voltage, there is only so much freq you can drive that specific >> module to. > No, that is still wrong. You don't drive a module at a frequency, you > drive a clock. You can't map struct device * 1-1 to a clock. Look at Agreed, module runs on clocks - Lets say n clocks provide a module it's functionality. > omap2_set_init_voltage: > static int __init omap2_set_init_voltage(char *vdd_name, char *clk_name, > struct device *dev) { > > clk = clk_get(NULL, clk_name); > freq = clk->rate; > opp = opp_find_freq_ceil(dev, &freq); > ... > } > > Now what happens if I have a dev with two frequencies, we do have it - it depends on what the OPP table represents. we do have modules which have both interface and functional clocks on OMAP as well. for a module(represented by struct device *) which has n clocks, choose the scheme of representation of clock that depends on voltage for the module. in the example you provided "the display block can have an interface clock as well as a pixel clock" - I suppose you mean: {.pclk = x, .iclk = y, .v = z} The question I'd ask is this : for a voltage z, is the dependency on pclk or iclk? I can expect a dependency of pclk to iclk requirement (considering pixel clock drives an external display for example). the table reduces to just {.iclk = y, .v = z} and a different table that has divisor for .iclk to pclk which is SoC based. OPP table is just a storage and retrieval mechanism, it is upto SoC frameworks to choose the most adequate of solutions - e.g. OMAP has omap_device, hwmod and a clock framework for more intricate control to work in conjunction with cpuidle frameworks as well. There is cross domain dependency which OMAP (yet to be pushed to mainline) has - example: when OMAP4's MPUs are at a certain OPP, L3 (OMAP's SoC bus) needs to be at least a certain OPP - these are framework which may be very custom to OMAP itself. --- Regards, Nishanth Menon -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [linux-pm] [RFC PATCH] PM: Introduce generic DVFS framework with device-specific OPPs
(sorry, missent the earlier one) On Wed, Apr 27, 2011 at 11:07 AM, Menon, Nishanth wrote: > On Wed, Apr 27, 2011 at 12:49, Colin Cross wrote: > +l-o > >> I'm a little confused about the design for this, and OPP as well. OPP >> matches a struct device * and a frequency to a voltage, which is not a >> generically useful pairing, as far as I can tell. On Tegra, it is >> quite possible for a single device to have multiple clocks that each >> have different voltage requirements, for example the display block can >> have an interface clock as well as a pixel clock. Simplifying this to >> dev + freq = voltage seems very OMAP specific, and will be difficult >> or impossible to adapt to Tegra. > We have the same requirements as well(iclk,fclk,pixclk etc..)! We > group them under voltage domains in OMAP ;). if your issue was a > ability to have a single freq to a OPP, it is upto SoC to do the > proper mapping. Concept of an OPP still remains consistent - which is > for a voltage, there is only so much freq you can drive that specific > module to. No, that is still wrong. You don't drive a module at a frequency, you drive a clock. You can't map struct device * 1-1 to a clock. Look at omap2_set_init_voltage: static int __init omap2_set_init_voltage(char *vdd_name, char *clk_name, struct device *dev) { ... clk = clk_get(NULL, clk_name); freq = clk->rate; opp = opp_find_freq_ceil(dev, &freq); ... } What happens if I have a dev with two frequencies? I can only pass a dev into opp. It makes infinitely more sense to pass in a clock: opp_find_freq_ceil(clk, &freq). > It is upto SoC frameworks to implement the transitions. E.g. lets look > at scalability: How'd the mechanism proposed work with temperature > variances: Example: I dont want to hit 1.5GHz if temp >70C - wont it > be an SoC specific hack I'd need to introduce? No, because you're putting it in the wrong place, that is a policy decision. Handle it in the clock framework, or handle it in the device driver. That's a bad example either way - what happens if you are already at 1.5GHz when the temperature crosses 70C? You need an interrupt that tells you the temperature is too high, and than needs to affect a policy decision at a much higher level than dvfs. > > All OPP framework does is store that maps, and leaves it to users to > choose regulators, clock framework variances, SoC temperature sensors > or what ever mechanisms they choose to allow through a transition. I understand its just a map, but its a map between two things that don't have a direct mapping in many SoCs. I think if you changed every usage of struct dev * in opp to struct clk *, it would make much more sense. There is already a mapping from struct dev * to struct clk *, its called clk_get, and it takes a second parameter to allow devices to have multiple clocks. -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [linux-pm] [RFC PATCH] PM: Introduce generic DVFS framework with device-specific OPPs
On Wed, Apr 27, 2011 at 11:07 AM, Menon, Nishanth wrote: > On Wed, Apr 27, 2011 at 12:49, Colin Cross wrote: > +l-o > >> I'm a little confused about the design for this, and OPP as well. OPP >> matches a struct device * and a frequency to a voltage, which is not a >> generically useful pairing, as far as I can tell. On Tegra, it is >> quite possible for a single device to have multiple clocks that each >> have different voltage requirements, for example the display block can >> have an interface clock as well as a pixel clock. Simplifying this to >> dev + freq = voltage seems very OMAP specific, and will be difficult >> or impossible to adapt to Tegra. > We have the same requirements as well(iclk,fclk,pixclk etc..)! We > group them under voltage domains in OMAP ;). if your issue was a > ability to have a single freq to a OPP, it is upto SoC to do the > proper mapping. Concept of an OPP still remains consistent - which is > for a voltage, there is only so much freq you can drive that specific > module to. No, that is still wrong. You don't drive a module at a frequency, you drive a clock. You can't map struct device * 1-1 to a clock. Look at omap2_set_init_voltage: static int __init omap2_set_init_voltage(char *vdd_name, char *clk_name, struct device *dev) { clk = clk_get(NULL, clk_name); freq = clk->rate; opp = opp_find_freq_ceil(dev, &freq); ... } Now what happens if I have a dev with two frequencies, >> Moreover, from a silicon perspective, there is always a simple link >> from a single frequency to a minimum voltage for a given circuit. >> There is no need to group them into OPPs, which seem to have a group >> of clocks and their frequencies that map to a single voltage. That is >> an artifact of the way TI specifies voltages. >> >> I don't think DVFS is even the right place for any sort of governor. >> DVFS is very simple - to increase to a specific clock speed, the >> voltage must be immediately be raised, with minimum or no delay, to a >> specified value that is specific to that clock. When the frequency is >> lowered, the voltage should be decreased. There is a tiny bit of >> policy to determine when to delay dropping the voltage in case the >> frequency will immediately be raised again, but nowhere near the >> complexity of what is shown here. >> >> I proposed in a different thread on LKML that DVFS be handled within >> the generic clock implementation. Platforms would register a >> regulator and a table of voltages for each struct clock that required >> DVFS, and the voltages would be changed on normal clk_* requests. >> This maintains compatibility with existing clk_* calls. > > It is upto SoC frameworks to implement the transitions. E.g. lets look > at scalability: How'd the mechanism proposed work with temperature > variances: Example: I dont want to hit 1.5GHz if temp >70C - wont it > be an SoC specific hack I'd need to introduce? > > All OPP framework does is store that maps, and leaves it to users to > choose regulators, clock framework variances, SoC temperature sensors > or what ever mechanisms they choose to allow through a transition. > >> There is a place for a GPU, etc., frequency governor, but it is a >> completely separate issue from DVFS, and should not be mixed in. I >> could have a GPU that is not voltage scalable, but could still benefit >> from lowering the frequency when it is not in use. A devfreq >> interface sounds perfect for this, as long as it only ends up calling >> clk_* functions, and those functions handle getting the voltage >> correct. > > Regards, > Nishanth Menon > PS: > https://lists.linux-foundation.org/pipermail/linux-pm/2011-April/031113.html > for start of thread > -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [linux-pm] [RFC PATCH] PM: Introduce generic DVFS framework with device-specific OPPs
On Wed, Apr 27, 2011 at 12:49, Colin Cross wrote: +l-o > I'm a little confused about the design for this, and OPP as well. OPP > matches a struct device * and a frequency to a voltage, which is not a > generically useful pairing, as far as I can tell. On Tegra, it is > quite possible for a single device to have multiple clocks that each > have different voltage requirements, for example the display block can > have an interface clock as well as a pixel clock. Simplifying this to > dev + freq = voltage seems very OMAP specific, and will be difficult > or impossible to adapt to Tegra. We have the same requirements as well(iclk,fclk,pixclk etc..)! We group them under voltage domains in OMAP ;). if your issue was a ability to have a single freq to a OPP, it is upto SoC to do the proper mapping. Concept of an OPP still remains consistent - which is for a voltage, there is only so much freq you can drive that specific module to. > Moreover, from a silicon perspective, there is always a simple link > from a single frequency to a minimum voltage for a given circuit. > There is no need to group them into OPPs, which seem to have a group > of clocks and their frequencies that map to a single voltage. That is > an artifact of the way TI specifies voltages. > > I don't think DVFS is even the right place for any sort of governor. > DVFS is very simple - to increase to a specific clock speed, the > voltage must be immediately be raised, with minimum or no delay, to a > specified value that is specific to that clock. When the frequency is > lowered, the voltage should be decreased. There is a tiny bit of > policy to determine when to delay dropping the voltage in case the > frequency will immediately be raised again, but nowhere near the > complexity of what is shown here. > > I proposed in a different thread on LKML that DVFS be handled within > the generic clock implementation. Platforms would register a > regulator and a table of voltages for each struct clock that required > DVFS, and the voltages would be changed on normal clk_* requests. > This maintains compatibility with existing clk_* calls. It is upto SoC frameworks to implement the transitions. E.g. lets look at scalability: How'd the mechanism proposed work with temperature variances: Example: I dont want to hit 1.5GHz if temp >70C - wont it be an SoC specific hack I'd need to introduce? All OPP framework does is store that maps, and leaves it to users to choose regulators, clock framework variances, SoC temperature sensors or what ever mechanisms they choose to allow through a transition. > There is a place for a GPU, etc., frequency governor, but it is a > completely separate issue from DVFS, and should not be mixed in. I > could have a GPU that is not voltage scalable, but could still benefit > from lowering the frequency when it is not in use. A devfreq > interface sounds perfect for this, as long as it only ends up calling > clk_* functions, and those functions handle getting the voltage > correct. Regards, Nishanth Menon PS: https://lists.linux-foundation.org/pipermail/linux-pm/2011-April/031113.html for start of thread -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html