Hi Al, On Tue, Jul 12, 2016 at 11:16:11AM -0600, Al Stone wrote: > When CPPC is being used by ACPI on arm64, user space tools such as > cpupower report CPU frequency values from sysfs that are incorrect. > > What the driver was doing was reporting the values given by ACPI tables > in whatever scale was used to provide them. However, the ACPI spec > defines the CPPC values as unitless abstract numbers. Internal kernel > structures such as struct perf_cap, in contrast, expect these values > to be in KHz. When these struct values get reported via sysfs, the > user space tools also assume they are in KHz, causing them to report > incorrect values (for example, reporting a CPU frequency of 1MHz when > it should be 1.8GHz). > > While the investigation for a long term fix proceeds (several options > are being explored, some of which may require spec changes or other > much more invasive fixes), this patch forces the values read by CPPC > to be read in KHz, regardless of what they actually represent. > > The downside is that this approach has some assumptions: > > (1) It relies on SMBIOS3 being used, *and* that the Max Frequency > value for a processor is set to a non-zero value. > > (2) It assumes that all processors run at the same speed, or that > the CPPC values have all been scaled to reflect relative speed. > This patch retrieves the largest CPU Max Frequency from a type 4 DMI > record that it can find. This may not be an issue, however, as a > sampling of DMI data on x86 and arm64 indicates there is often only > one such record regardless. Since CPPC is relatively new, it is > unclear if the ACPI ASL will always be written to reflect any sort > of relative performance of processors of differing speeds. > > (3) It assumes that performance and frequency both scale linearly. > > For arm64 servers, this may be sufficient, but it does rely on > firmware values being set correctly. Hence, other approaches are > also being considered. > > This has been tested on three arm64 servers, with and without DMI, with > and without CPPC support. > > Changes for v4: > -- Replaced magic constants with #defines (Rafael Wysocki) > -- Renamed cppc_unitless_to_khz() to cppc_to_khz() (Rafael Wysocki) > -- Replaced hidden initialization with a clearer form (Rafael Wysocki) > -- Instead of picking up the first Max Speed value from DMI, we will > now get the largest Max Speed; still an approximation, but slightly > less subject to error (Rafael Wysocki) > -- Kconfig for cppc_cpufreq now depends on DMI, instead of selecting > it, in order to make sure DMI is set up properly (Rafael Wysocki) > > Changes for v3: > -- Added clarifying commentary re short-term vs long-term fix (Alexey > Klimov) > -- Added range checking code to ensure proper arithmetic occurs, > especially no division by zero (Alexey Klimov) > > Changes for v2: > -- Corrected thinko: needed to have DEPENDS on DMI in Kconfig.arm, > not SELECT DMI (found by build daemon) > > Signed-off-by: Al Stone <a...@redhat.com> > --- > drivers/acpi/cppc_acpi.c | 106 > +++++++++++++++++++++++++++++++++++++++++--- > drivers/cpufreq/Kconfig.arm | 2 +- > 2 files changed, 102 insertions(+), 6 deletions(-) > > diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c > index 8adac69..6e6df9c 100644 > --- a/drivers/acpi/cppc_acpi.c > +++ b/drivers/acpi/cppc_acpi.c > @@ -40,8 +40,18 @@ > #include <linux/cpufreq.h> > #include <linux/delay.h> > #include <linux/ktime.h> > +#include <linux/dmi.h> > + > +#include <asm/unaligned.h> > > #include <acpi/cppc_acpi.h> > + > +/* Minimum struct length needed for the DMI processor entry we want */ > +#define DMI_ENTRY_PROCESSOR_MIN_LENGTH 48 > + > +/* Offest in the DMI processor structure for the max frequency */ > +#define DMI_PROCESSOR_MAX_SPEED 0x14 > + > /* > * Lock to provide mutually exclusive access to the PCC > * channel. e.g. When the remote updates the shared region > @@ -709,6 +719,56 @@ static int cpc_write(struct cpc_reg *reg, u64 val) > return ret_val; > } > > +static u64 cppc_dmi_khz; > + > +static void cppc_find_dmi_mhz(const struct dmi_header *dm, void *private) > +{ > + const u8 *dmi_data = (const u8 *)dm; > + u16 *mhz = (u16 *)private; > + > + if (dm->type == DMI_ENTRY_PROCESSOR && > + dm->length >= DMI_ENTRY_PROCESSOR_MIN_LENGTH) { > + u16 val = (u16)get_unaligned((const u16 *) > + (dmi_data + DMI_PROCESSOR_MAX_SPEED)); > + *mhz = val > *mhz ? val : *mhz; > + } > +} > + > + > +static u64 cppc_get_dmi_khz(void) > +{ > + u16 mhz = 0; > + > + dmi_walk(cppc_find_dmi_mhz, &mhz); > + > + /* > + * Real stupid fallback value, just in case there is no > + * actual value set. > + */ > + mhz = mhz ? mhz : 1; > + > + return (1000 * mhz); > +} > + > +static u64 cppc_to_khz(u64 min_in, u64 max_in, u64 val) > +{ > + /* > + * The incoming val should be min <= val <= max. Our > + * job is to convert that to KHz so it can be properly > + * reported to user space via cpufreq_policy. > + */ > + u64 curval = val; > + u64 maxf = max_in; > + u64 minf = min_in; > + > + /* range check the input values */ > + curval = curval < minf ? minf : curval; > + curval = curval > maxf ? maxf : curval; > + minf = minf >= maxf ? maxf - 1 : minf;
In the pedantic world kernel should warn in dmesg about nominal value that is out of range. Or min being larger than max. Not really an issue but for debugging purposes.. > + return ((curval - minf) * cppc_dmi_khz) / (maxf - minf); > +} > + > /** > * cppc_get_perf_caps - Get a CPUs performance capabilities. > * @cpunum: CPU from which to get capabilities info. > @@ -748,17 +808,53 @@ int cppc_get_perf_caps(int cpunum, struct > cppc_perf_caps *perf_caps) > } > } > > - cpc_read(&highest_reg->cpc_entry.reg, &high); > - perf_caps->highest_perf = high; > + /* > + * Since these values in perf_caps will be used in setting > + * up the cpufreq policy, they must always be stored in units > + * of KHz. If they are not, user space tools will become very > + * confused since they assume these are in KHz when reading > + * sysfs. > + * > + * NB: there may be better approaches to this problem that, as > + * of this writing, are still being explored. Ideally, this is > + * a short term solution since correlating CPPC abstract values > + * with CPU frequency may or may not reflect actual performance. > + * > + * The reason longer term solutions are being explored is because > + * this solution requires we make the following assumptions: > + * > + * (1) It relies on SMBIOS3 being used, *and* that the Max > + * Frequency value for a processor is set to a non-zero value. > + * > + * (2) It assumes that all processors run at the same speed, or > + * that the CPPC values have all been scaled to reflect any > + * relative differences. This code retrieves the largest CPU > + * Max Frequency from a type 4 DMI record that it can find. > + * This may not be an issue, however, as a sampling of DMI > + * data on x86 and arm64 indicates there is often only one > + * such record regardless. > + * > + * (3) It assumes that performance and frequency both scale > + * linearly. > + * > + * None of these are particularly horrible assumptions. But, they > + * are assumptions and ultimately we'd like to be able to report > + * performance without quite so many of them. > + * > + */ > + cppc_dmi_khz = cppc_get_dmi_khz(); > > + cpc_read(&highest_reg->cpc_entry.reg, &high); > cpc_read(&lowest_reg->cpc_entry.reg, &low); > - perf_caps->lowest_perf = low; > + > + perf_caps->highest_perf = cppc_to_khz(low, high, high); > + perf_caps->lowest_perf = cppc_to_khz(low, high, low); Just to check. Do I understand correctly that cpufreq subsystem is populated with this converted values (policy->min and max), then cpufreq sends request to set new target_freq in converted units to CPPC that in its turn is not aware about convertation or do i miss something? There should be convertation back to abstract scale for cppc to correctly understand and handle request to set new desired performance, shouldn't it? > > cpc_read(&ref_perf->cpc_entry.reg, &ref); > - perf_caps->reference_perf = ref; > + perf_caps->reference_perf = cppc_to_khz(low, high, ref); > > cpc_read(&nom_perf->cpc_entry.reg, &nom); > - perf_caps->nominal_perf = nom; > + perf_caps->nominal_perf = cppc_to_khz(low, high, nom); > > if (!ref) > perf_caps->reference_perf = perf_caps->nominal_perf; > diff --git a/drivers/cpufreq/Kconfig.arm b/drivers/cpufreq/Kconfig.arm > index 14b1f93..b4aae52 100644 > --- a/drivers/cpufreq/Kconfig.arm > +++ b/drivers/cpufreq/Kconfig.arm > @@ -253,7 +253,7 @@ config ARM_PXA2xx_CPUFREQ > > config ACPI_CPPC_CPUFREQ > tristate "CPUFreq driver based on the ACPI CPPC spec" > - depends on ACPI > + depends on ACPI && DMI > select ACPI_CPPC_LIB > default n > help > -- Best regards, Alexey