Changes in v2: ************** - Fixed patch 01/03 to actually enable CONFIG_ENERGY_MODEL - Added "depends on ENERGY_MODEL" to IPA (Daniel) - Added check to bail out if the freq table is unsorted (Viresh)
Cover letter: ************* The Intelligent Power Allocator (IPA) thermal governor uses an Energy Model (or EM) of the CPUs to re-distribute the power budget. To do so, it builds a table of <frequency, power> tuples where the power values are computed using the 'dynamic-power-coefficient' DT property. All of this is done in and only for the thermal subsystem, and more specifically for CPUs -- the power of other types of devices is obtained differently. Recently, the CPU scheduler has seen the introduction of Energy Aware Scheduling (EAS) patches, which also rely on an EM of the CPUs. This EM, however, is managed by an independent framework, called PM_EM, aimed to be used by all kernel subsystems interested in the power consumed by CPUs, and not only the scheduler. This patch series follows this logic and removes the (now redundant) thermal-specific EM computation code to migrate IPA to use PM_EM instead. Doing so should have no visible functional impact for existing users of IPA since: - during the 5.1 development cycle, a series of patches [1] introduced in PM_OPP some infrastructure (dev_pm_opp_of_register_em()) enabling the registration of EMs in PM_EM using the DT property used by IPA; - the existing upstream cpufreq drivers marked with the 'CPUFREQ_IS_COOLING_DEV' flag all call dev_pm_opp_of_register_em(), which means they all support PM_EM (the only two exceptions are qoriq-cpufreq which doesn't in fact use an EM and scmi-cpufreq which already supports PM_EM without using the PM_OPP infrastructurei because it read power costs directly from firmware); So, migrating IPA to using PM_EM should effectively be just plumbing since for the existing IPA users the PM_EM tables will contain the exact same power values that IPA used to compute on its own until now. The only new dependency is to compile in CONFIG_ENERGY_MODEL. Why is this migration still a good thing ? For three main reasons. 1. it removes redundant code; 2. it introduces an abstraction layer between IPA and the EM computation. PM_EM offers to EAS and IPA (and potentially other clients) standardized EM tables and hides 'how' these tables have been obtained. PM_EM as of now supports power values either coming from the 'dynamic-power-coefficient' DT property or obtained directly from firmware using SCMI. The latter is a new feature for IPA and that comes 'for free' with the migration. This will also be true in the future every time PM_EM gets support for other ways of loading the EM. Moreover, PM_EM is documented and has a debugfs interface which should help adding support for new platforms. 3. it builds a consistent view of the EM of CPUs across kernel subsystems, which is a pre-requisite for any kind of future work aiming at a smarter power allocation using scheduler knowledge about the system for example. [1] https://lore.kernel.org/lkml/20190204110952.16025-1-quentin.per...@arm.com/ Quentin Perret (3): arm64: defconfig: Enable CONFIG_ENERGY_MODEL PM / EM: Expose perf domain struct thermal: cpu_cooling: Migrate to using the EM framework arch/arm64/configs/defconfig | 1 + drivers/thermal/Kconfig | 1 + drivers/thermal/cpu_cooling.c | 237 ++++++++++++---------------------- include/linux/energy_model.h | 3 +- 4 files changed, 83 insertions(+), 159 deletions(-) -- 2.21.0