Re: [PATCH] cpupower: add Makefile dependencies for install targets
Am Donnerstag, 7. Januar 2021, 22:15:16 CET schrieb Ivan Babrou: > On Thu, Jan 7, 2021 at 12:59 PM Thomas Renninger wrote: > > Am Donnerstag, 7. Januar 2021, 18:42:25 CET schrieb Ivan Babrou: > > > On Thu, Jan 7, 2021 at 2:07 AM Thomas Renninger wrote: > > > > Am Dienstag, 5. Januar 2021, 00:57:18 CET schrieb Ivan Babrou: > > > > > This allows building cpupower in parallel rather than serially. ... > > Can you please show the make calls, ideally with a timing to better > > understand and also to reproduce the advantages this patch introduces. > > From what I can see, it only helps if one calls "sub-install" targets > > directly? > > That's exactly what we do: make install directly: > > /linux-5.10.5$ make -C ./tools/power/cpupower DESTDIR=/tmp/cpupower > install -j $(nproc) This makes sense then. @Shuah @Rafael: Can this patch be queued up, please. Thanks for the patch and the details, Thomas
Re: [PATCH] cpupower: add Makefile dependencies for install targets
Am Donnerstag, 7. Januar 2021, 18:42:25 CET schrieb Ivan Babrou: > On Thu, Jan 7, 2021 at 2:07 AM Thomas Renninger wrote: > > Am Dienstag, 5. Januar 2021, 00:57:18 CET schrieb Ivan Babrou: > > > This allows building cpupower in parallel rather than serially. > > > > cpupower is built serially: > > > > [ make clean ] > > > > time make > > real0m3,742s > > user0m3,330s > > sys 0m1,105s > > > > [ make clean ] > > > > time make -j10 > > real0m1,045s > > user0m3,153s > > sys 0m1,037s > > > > Only advantage I see is that you can call > > make install-xy > > targets without calling the corresponding build target > > make xy > > similar to the general install target: > > install: all install-lib ... > > > > Not sure anyone needs this and whether all targets > > successfully work this way. > > If you'd show a useful usecase example... > > We build a bunch of kernel related tools (perf, cpupower, bpftool, > etc.) from our own top level Makefile, propagating parallelism > downwards like one should. I still do not understand why you do not simply build: Also if I call this from /tools directory I get a quick build: make -j20 cpupower Can you please show the make calls, ideally with a timing to better understand and also to reproduce the advantages this patch introduces. >From what I can see, it only helps if one calls "sub-install" targets directly? And I still do not understand why things should be more parallel now. > Without this patch we have to remove parallelism for cpupower, Why? > which doesn't seem like a very clean thing > to do, especially if you consider that it's 3x faster with parallelism > enabled in wall clock terms. Sure, you want to build in parallel. I still do not understand how this patch helps in this regard. BTW, I recently had a bunch of userspace tools Makefile patches. I'd like to add you to CC for a review if they are not submitted already. Thomas
Re: [PATCH] cpupower: add Makefile dependencies for install targets
Am Dienstag, 5. Januar 2021, 00:57:18 CET schrieb Ivan Babrou: > This allows building cpupower in parallel rather than serially. cpupower is built serially: [ make clean ] time make real0m3,742s user0m3,330s sys 0m1,105s [ make clean ] time make -j10 real0m1,045s user0m3,153s sys 0m1,037s Only advantage I see is that you can call make install-xy targets without calling the corresponding build target make xy similar to the general install target: install: all install-lib ... Not sure anyone needs this and whether all targets successfully work this way. If you'd show a useful usecase example... Thomas > > Signed-off-by: Ivan Babrou > --- > tools/power/cpupower/Makefile | 8 > tools/power/cpupower/bench/Makefile | 2 +- > 2 files changed, 5 insertions(+), 5 deletions(-) > > diff --git a/tools/power/cpupower/Makefile b/tools/power/cpupower/Makefile > index c7bcddbd486d..3b1594447f29 100644 > --- a/tools/power/cpupower/Makefile > +++ b/tools/power/cpupower/Makefile > @@ -270,14 +270,14 @@ clean: > $(MAKE) -C bench O=$(OUTPUT) clean > > > -install-lib: > +install-lib: libcpupower > $(INSTALL) -d $(DESTDIR)${libdir} > $(CP) $(OUTPUT)libcpupower.so* $(DESTDIR)${libdir}/ > $(INSTALL) -d $(DESTDIR)${includedir} > $(INSTALL_DATA) lib/cpufreq.h $(DESTDIR)${includedir}/cpufreq.h > $(INSTALL_DATA) lib/cpuidle.h $(DESTDIR)${includedir}/cpuidle.h > > -install-tools: > +install-tools: $(OUTPUT)cpupower > $(INSTALL) -d $(DESTDIR)${bindir} > $(INSTALL_PROGRAM) $(OUTPUT)cpupower $(DESTDIR)${bindir} > $(INSTALL) -d $(DESTDIR)${bash_completion_dir} > @@ -293,14 +293,14 @@ install-man: > $(INSTALL_DATA) -D man/cpupower-info.1 > $(DESTDIR)${mandir}/man1/cpupower-info.1 $(INSTALL_DATA) -D > man/cpupower-monitor.1 $(DESTDIR)${mandir}/man1/cpupower-monitor.1 > > -install-gmo: > +install-gmo: create-gmo > $(INSTALL) -d $(DESTDIR)${localedir} > for HLANG in $(LANGUAGES); do \ > echo '$(INSTALL_DATA) -D $(OUTPUT)po/$$HLANG.gmo > $(DESTDIR)${localedir}/$$HLANG/LC_MESSAGES/cpupower.mo'; \ $(INSTALL_DATA) > -D $(OUTPUT)po/$$HLANG.gmo > $(DESTDIR)${localedir}/$$HLANG/LC_MESSAGES/cpupower.mo; \ done; > > -install-bench: > +install-bench: compile-bench > @#DESTDIR must be set from outside to survive > @sbindir=$(sbindir) bindir=$(bindir) docdir=$(docdir) confdir=$ (confdir) > $(MAKE) -C bench O=$(OUTPUT) install > > diff --git a/tools/power/cpupower/bench/Makefile > b/tools/power/cpupower/bench/Makefile index f68b4bc55273..d9d9923af85c > 100644 > --- a/tools/power/cpupower/bench/Makefile > +++ b/tools/power/cpupower/bench/Makefile > @@ -27,7 +27,7 @@ $(OUTPUT)cpufreq-bench: $(OBJS) > > all: $(OUTPUT)cpufreq-bench > > -install: > +install: $(OUTPUT)cpufreq-bench > mkdir -p $(DESTDIR)/$(sbindir) > mkdir -p $(DESTDIR)/$(bindir) > mkdir -p $(DESTDIR)/$(docdir)
Re: [PATCH] cpufreq: stats: Switch to ktime and msec instead of jiffies and usertime
Am Mittwoch, 11. November 2020, 06:13:50 CET schrieb Viresh Kumar: > On 10-11-20, 13:53, Thomas Renninger wrote: > > Am Dienstag, 10. November 2020, 12:07:37 CET schrieb Viresh Kumar: > > > The cpufreq and thermal core, both provide sysfs statistics to help > > > userspace learn about the behavior of frequencies and cooling states. > > > > > > This is how they look: > > > /sys/devices/system/cpu/cpufreq/policy0/stats/time_in_state:120 399 > > > > > > The results look like this after this commit: > > > /sys/devices/system/cpu/cpufreq/policy0/stats/time_in_state:120 3830 > > > > How would userspace know whether it's ms or 10ms? Again: How would userspace know whether it's ms or 10ms? > > whatabout a new file with the same convention as cooling devices (adding > > ms): > Keeping two files for same stuff is not great, and renaming the file > breaks userspace ABI. No exactly the other way around: - Renaming, breaks the userspace ABI. - Two files would be the super correct way to go: - Deprecate the old file and keep the 10ms around for some years ./Documentation/ABI/obsolete - Add the new interface and document it in: ./Documentation/ABI/testing As this is about a minor cpufreq_stat debug file, it is enough if you rename to: > /sys/devices/system/cpu/cpufreq/policy0/stats/time_in_state_ms Ideally document it in ./Documentation/ABI/testing But I guess for this one this is also not mandatory. Then userspace can do: MS_FILE="/sys/devices/system/cpu/cpufreq/policy0/stats/time_in_state_ms" 10MS_FILE="/sys/devices/system/cpu/cpufreq/policy0/stats/time_in_state" if [ -r "$MS_FILE" ]; then POLICY0_MS=$(<"$MS_FILE") echo "Found ms stats file" elif [ -r "$10MS_FILE" ]; then echo "Found 10ms stats file, converting..." POLICY0_MS=$(<"$10MS_FILE") POLICY0_MS=$(echo "$POLICY0_MS 10 /" |dc) else echo "cpufreq stat not compiled in?" fi > I am not sure what's the right thing to do here. Use a new *_ms name. If you are unsure how many people this still might use, keep the old file and mark it deprecated and remove it in some years. You could also add: pr_info("%s userspace process accessed deprecated sysfs file %s", task->comm, OLD_SYSFS_FILE_PATH); To find other userspace apps making use of it. ... > I already fixed this recently and stats don't appear empty for fast > switch anymore. Then cpufreq_stats could be a module again? Thomas
Re: [PATCH] cpufreq: stats: Switch to ktime and msec instead of jiffies and usertime
Am Dienstag, 10. November 2020, 12:07:37 CET schrieb Viresh Kumar: > The cpufreq and thermal core, both provide sysfs statistics to help > userspace learn about the behavior of frequencies and cooling states. > > This is how they look: > /sys/devices/system/cpu/cpufreq/policy0/stats/time_in_state:120 399 > The results look like this after this commit: > /sys/devices/system/cpu/cpufreq/policy0/stats/time_in_state:120 3830 How would userspace know whether it's ms or 10ms? whatabout a new file with the same convention as cooling devices (adding ms): > /sys/class/thermal/cooling_device0/stats/time_in_state_ms:state0 3888 > /sys/devices/system/cpu/cpufreq/policy0/stats/time_in_state_ms:120 3830 Somewhat off-topic, some ideas: I wonder how useful these stats still are. CPU_FREQ_STAT is off on my system: config CPU_FREQ_STAT bool "CPU frequency transition statistics" help Export CPU frequency statistics information through sysfs. If in doubt, say N. Iirc this was a module at former times? commit 1aefc75b2449eb68a6fc3ca932e2a4ee353b748d Author: Rafael J. Wysocki Date: Tue May 31 22:14:44 2016 +0200 cpufreq: stats: Make the stats code non-modular outlined 2 problems with cpufreq_stats being non-modular, but also seem to fix them up: ... and drop the notifiers from it Make the stats sysfs attributes appear empty if fast frequency switching is enabled... Thomas
Re: [PATCH 29/29] tools: Avoid comma separated statements
Am Mittwoch, 26. August 2020, 16:45:24 CEST schrieb Joe Perches: > On Wed, 2020-08-26 at 11:30 +0200, Thomas Renninger wrote: > > Hi, > > > Read the block immediately below that too: > > "This does not apply if only one branch of a conditional statement is a > single statement; in the latter case use braces in both branches:" Then +1 from here. > > Afaik there isn't a specific tag, but having: > > cleanup only: No functional change > > > > in the changelog would be nice for people looking for fixes to backport. > > This is not a fix, so it's not for backporting. Mentioning that in the changelog explicitly or even in the title - cleanup only is nice imho, but should definitely not be the reason for re-posting. Thanks for doing this, Thomas
Re: [PATCH 29/29] tools: Avoid comma separated statements
Hi, getting rid of lines with multiple instructions, separated by comma is certainly a good idea. One nit pick, though: Am Dienstag, 25. August 2020, 06:56:26 CEST schrieb Joe Perches: > Use semicolons and braces. > > Signed-off-by: Joe Perches > --- > tools/lib/subcmd/help.c| 10 +- > tools/power/cpupower/utils/cpufreq-set.c | 14 +- > tools/testing/selftests/vm/gup_benchmark.c | 18 +- > tools/testing/selftests/vm/userfaultfd.c | 296 + > 4 files changed, 210 insertions(+), 128 deletions(-) > > diff --git a/tools/lib/subcmd/help.c b/tools/lib/subcmd/help.c > index 2859f107abc8..bf02d62a3b2b 100644 > --- a/tools/lib/subcmd/help.c > +++ b/tools/lib/subcmd/help.c > @@ -65,12 +65,14 @@ void exclude_cmds(struct cmdnames *cmds, struct cmdnames > *excludes) ci = cj = ei = 0; > while (ci < cmds->cnt && ei < excludes->cnt) { > cmp = strcmp(cmds->names[ci]->name, excludes->names[ei]->name); > - if (cmp < 0) > + if (cmp < 0) { > cmds->names[cj++] = cmds->names[ci++]; > - else if (cmp == 0) > - ci++, ei++; > - else if (cmp > 0) > + } else if (cmp == 0) { > + ci++; > ei++; > + } else if (cmp > 0) { > + ei++; > + } > } I can remember patches being rejected with one line statements in a condition, surounded by braces. I just read up Documentation/process/coding-style.rst, to be sure this still is up-to-date. It's not a must, but line 180 says: "Do not unnecessarily use braces where a single statement will do." So if this is about coding style cleanup, IMO you should remove braces from single line statements. I haven't reviewed every line, but I expect you only split up comma separated instructions into separate lines and added braces? Afaik there isn't a specific tag, but having: cleanup only: No functional change in the changelog would be nice for people looking for fixes to backport. Otherwise, I think this is a worthful cleanup. Thomas
Re: [PATCH 1/2] Modify cpupower to schedule itself on cores it is reading MSRs from
On Monday, October 7, 2019 11:11:30 PM CEST Natarajan, Janakarajan wrote: > On 10/5/2019 7:40 AM, Thomas Renninger wrote: > ... > >> > >> APERF/MPERF from CPL > 0) and avoid using the msr module (patch 2). > > > > And this one only exists on latest AMD cpus, right? > > Yes. The RDPRU instruction exists only on AMD cpus. > > > >> However, for systems that provide an instruction to get register values > >> from userspace, would a command-line parameter be acceptable? > > > > Parameter sounds like a good idea. In fact, there already is such a > > paramter. cpupower monitor --help > > > > -c > > > > Schedule the process on every core before starting and > > ending > > > > measuring. This could be needed for the Idle_Stats monitor when no other > > MSR based monitor (has to be run on the core that is measured) is run in > > parallel. This is to wake up the processors from deeper sleep states and > > let the kernel reaccount its cpuidle (C-state) information before reading > > the cpuidle timings from sysfs. > > > > Best is you exchange the order of your patches. The 2nd looks rather > > straight forward and you can add my reviewed-by. > > The RDPRU instruction reads the APERF/MPERF of the cpu on which it is > running. If we do not schedule it on each cpu specifically, it will read the > APERF/MPERF > of the cpu in which it runs/might happen to run on, which will not be the > correct behavior. Got it. And I also didn't fully read -c. I now remember.. For C-states accounting you want to have each CPU woken up at measure start and end for accurate measuring. It's a pity that the monitors do the per_cpu calls themselves. So a general idle-monitor param is not possible or can only done by for example by adding a flag to the cpuidle_monitor struct: struct cpuidle_monitor unsigned int needs_root:1 unsigned int per_cpu_schedule:1 not sure whether a: struct { unsigned int needs_root:1 unsigned int per_cpu_schedule:1 } flags should/must be put around in a separate cleanup patch (and needs_root users adjusted). You (and other monitors for which this might make sense) can then implement the per_cpu_schedule flag. In AMD case you might want (you have to) directly set it. All around a -b/-u (--bind-measure-to-cpu, --unbind-measure-to-cpu) parameter could be added at some point of time if it matters. And monitors having this could bind or not. This possibly could nuke out -c param. Or at least the idle state counter monitor could do it itself. But don't mind about this. What do you think? And you should be able to re-use the bind_cpu function used in -c case? Thomas
Re: [PATCH 1/2] Modify cpupower to schedule itself on cores it is reading MSRs from
Hi, On Wednesday, October 2, 2019 4:45:03 PM CEST Natarajan, Janakarajan wrote: > On 9/27/19 4:48 PM, Thomas Renninger wrote: > > > On Friday, September 27, 2019 6:07:56 PM CEST Natarajan, Janakarajan > > wrote: > > >> On 9/18/2019 11:34 AM, Natarajan, Janakarajan wrote: > On a 256 logical-cpu Rome system we see C0 value from cpupower output go > from 0.01 to ~(0.1 to 1.00) > > for all cpus with the 1st patch. > > However, this goes down to ~0.01 when we use the RDPRU instruction > (which can be used to get > > APERF/MPERF from CPL > 0) and avoid using the msr module (patch 2). And this one only exists on latest AMD cpus, right? > However, for systems that provide an instruction to get register values > from userspace, would a command-line parameter be acceptable? Parameter sounds like a good idea. In fact, there already is such a paramter. cpupower monitor --help -c Schedule the process on every core before starting and ending measuring. This could be needed for the Idle_Stats monitor when no other MSR based monitor (has to be run on the core that is measured) is run in parallel. This is to wake up the processors from deeper sleep states and let the kernel reaccount its cpuidle (C-state) information before reading the cpuidle timings from sysfs. Best is you exchange the order of your patches. The 2nd looks rather straight forward and you can add my reviewed-by. If you still need adjustings with -c param, they can be discussed separately. It would also be nice to mention in which case it makes sense to use it in the manpage or advantages/drawbacks if you don't. Thanks! Thomas
Re: [PATCH 1/2] Modify cpupower to schedule itself on cores it is reading MSRs from
On Friday, September 27, 2019 6:07:56 PM CEST Natarajan, Janakarajan wrote: > On 9/18/2019 11:34 AM, Natarajan, Janakarajan wrote: > > This is advantageous because an IPI is not generated when a read_msr() is > > executed on the local logical CPU thereby reducing the chance of having > > APERF and MPERF being out of sync. > > + if (sched_setaffinity(getpid(), sizeof(set), ) == -1) { > > + dprint("Could not migrate to cpu: %d\n", cpu); > > + return 1; On a 80 core cpu the process would be pushed around through the system quite a lot. This might affect what you are measuring or the other measure values? Otherwise it's the kernel's MSR read only, not the whole cpupower process, right? No idea about the exact overhead, though. Others in CC list should know. Afaik msr reads through msr module should be avoided anyway? Those which are worth it are abstracted through sysfs nowadays? For aperf/mperf it might make sense to define a sysfs file where you can read both, as this is what you always need? It would take a while, but could be a longterm solution which is also usable in secure boot or without msr module case. Thomas
Re: /dev/mem and secure boot
On Monday, September 9, 2019 3:09:57 PM CEST Jean Delvare wrote: > Hi Greg, ... > > Sure, feel free to not register it at all if the mode is enabled. > Now I feel sorry that I asked my question upstream when there's nothing > to be done there. I'll go bother SUSE kernel folks instead, sorry for > the noise. And thanks for the advice. I also/still think /dev/mem should vanish in secure boot mode, also upstream. There may have been strong reasons why it has been restricted to /dev/ioport which I do not know. Whatever the exact definition for kernel behaviour in secure boot mode in the UEFI books is (if there is any), it should close quite some possible doors for hijacking a machine or read sensible data and if anyhow possible secure boot mode should head for this feature (IMHO): Get rid of /dev/mem. Thanks for bringing this up, Thomas
Re: [PATCH] cpupower : Handle set and info subcommands for powerpc
On Thursday, September 12, 2019 11:43:40 AM CEST Abhishek wrote: > Hi Shuah, > > Thanks for the review. Few comments below. ... > Since these two options are not being used by any other architecture > except x86, I suggest these options should not even be shown for > other architecture. So we can do something like this in cpupower.c : > > static struct cmd_struct commands[] = { > . > +#if defined (__x86_64__) || defined (__i386__) > { "set",cmd_set,1}, > { "info",cmd_info,0}, > +#endif > .. > > Is this Okay? No, I expected you to add something meaningful for Power case... Just kidding. If this works without any side-effects in not x86 case, this approach seem to be the best solution for now. Thanks. Thomas
Re: [PATCH] cpupower : Handle set and info subcommands for powerpc
Hi Abishek, On Wednesday, September 11, 2019 11:54:24 AM CEST Abhishek Goel wrote: > Cpupower tool has set and info options which are not being used by > POWER machines. For powerpc, we will return directly for these two > subcommands. This removes the ambiguous error message while using set > option in case of power systems. > > Signed-off-by: Abhishek Goel > --- > tools/power/cpupower/utils/cpupower-info.c | 5 + > tools/power/cpupower/utils/cpupower-set.c | 5 + > 2 files changed, 10 insertions(+) > > diff --git a/tools/power/cpupower/utils/cpupower-info.c > b/tools/power/cpupower/utils/cpupower-info.c index > 4c9d342b70ff..674b707a76af 100644 > --- a/tools/power/cpupower/utils/cpupower-info.c > +++ b/tools/power/cpupower/utils/cpupower-info.c > @@ -39,6 +39,11 @@ int cmd_info(int argc, char **argv) > } params = {}; > int ret = 0; > > + #ifdef __powerpc__ > + printf(_("Cannot read info as system does not support performance bias > setting\n")); + return 0; > + #endif > + Please do no do this. cpupower info is designed to show general information related to powersaving features of your CPU. For examle there has been (see changelog): cpupower: Remove mc and smt power aware scheduler info/settings These kernel interfaces got removed by: Unfortunately only -b (perf bias on Intel only) is left right now. So if you cut this out for Power you do not see anything and the cmd is useless. Which is a pity, but for now makes sense. Ideally you provide some tag/option which makes sense on power (e.g. whether run in OPAL mode and if provide some figures otherwise tell running in VM mode). But if this is cut out something like this should do the same and is more flexible: - Still allows additional cpupower info features for other CPUs later easily - Should also cover AMD or other non-perf bias supporting CPUs to exclude perf_bias setting/info If this one works for you, can you please re-submit with also handling the set cmd similar. If it works or you only slightly adjust, feel free to already add: Acked-by: Thomas Renninger Thanks! Thomas --- tools/power/cpupower/utils/cpupower-info.c.orig 2019-09-12 11:45:02.578568335 +0200 +++ tools/power/cpupower/utils/cpupower-info.c 2019-09-12 11:46:09.618571947 +0200 @@ -55,8 +55,11 @@ } }; - if (!params.params) + if (!params.params) { params.params = 0x7; + if !(cpupower_cpu_info.caps & CPUPOWER_CAP_PERF_BIAS) + params.perf_bias = 0; + } /* Default is: show output of CPU 0 only */ if (bitmask_isallclear(cpus_chosen))
Re: [PATCH] cpupower : frequency-set -r option misses the last cpu in related cpu list
Hi, On Wednesday, May 29, 2019 2:12:34 PM CEST Gautham R Shenoy wrote: > Hi Abhishek, > > On Wed, May 29, 2019 at 3:02 PM Abhishek Goel ... > > bitmask_setbit(cpus_chosen, cpus->cpu); > > cpus = cpus->next; > > > > } > > > > + /* Set the last cpu in related cpus list */ > > + bitmask_setbit(cpus_chosen, cpus->cpu); > > Perhaps you could convert the while() loop to a do .. while(). That > should will ensure > that we terminate the loop after setting the last valid CPU. It would do exactly the same, right? IMHO it's not worth the extra hassle of resubmitting. Setting the last value outside a while loop is rather common. I do not have a CPU with related cores at hand. If you tested this it would be nice to see this pushed: Reviewed-by: Thomas Renninger Thanks! Thomas
Re: [PATCH] cpufreq/intel_pstate: Load only on Intel hardware
On Monday, April 1, 2019 5:03:45 PM CEST Borislav Petkov wrote: > From: Borislav Petkov > > This driver is Intel-only so loading on anything which is not Intel is > pointless. Prevent it from doing so. Nice. I wondered whether there are more of these to find by review, instead of waiting for the next message to show up. I ended up in the "not so straight forward" IOMMU init macros... and continued with daily work again. Anyway there are a lot files showing up when grepping the kernel for intel files/drivers, maybe someone who is involved in the one or other comes up with something similar... Thomas FWIW: Reviewed-by: Thomas Renninger > While at it, correct the "not supported" print statement to say CPU > "model" which is what that test does. > > Suggested-by: Erwan Velu > Signed-off-by: Borislav Petkov > Cc: Len Brown > Cc: linux...@vger.kernel.org > Cc: Rafael J. Wysocki > CC: Srinivas Pandruvada > Cc: Viresh Kumar > --- > drivers/cpufreq/intel_pstate.c | 5 - > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c > index b599c7318aab..2986119dd31f 100644 > --- a/drivers/cpufreq/intel_pstate.c > +++ b/drivers/cpufreq/intel_pstate.c > @@ -2596,6 +2596,9 @@ static int __init intel_pstate_init(void) > const struct x86_cpu_id *id; > int rc; > > + if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL) > + return -ENODEV; > + ...
Re: [PATCH 1/2] PM / arch: x86: Rework the MSR_IA32_ENERGY_PERF_BIAS handling
Thanks Rafael for your quick look at and all the time you spend for this! A /sys userspace knob will certainly not be enough for us. You'll need a tool installed fixing this. powertop on laptops or tuned on servers or a well hidden bootup quirk or whatsoever. The patch I sent with this part: + if (acpi_gbl_FADT.preferred_profile == PM_PERFORMANCE_SERVER || + acpi_gbl_FADT.preferred_profile == PM_ENTERPRISE_SERVER) + return; and not touching the EBP value then should at least match most of our users and OEMs who want a "performance" setting out of the box and set this on purpose. Even nicer would be compile option to not touch this at all. On Thursday, March 21, 2019 11:18:01 PM CET Rafael J. Wysocki wrote: > From: Rafael J. Wysocki ... > + * > + * Second, on many systems the initial EPB value coming from the platform > + * firmware is 0 ('performance') and at least on some of them that is > because + * the platform firmware does not initialize EPB Why does the CPU not initialize this value to 6? That would allow OEMs/BIOS to also suggest an init value for the system. We should try to get microcode people or whoever is in charge to initialize this value "properly" if Intel thinks 6 is the correct init value. > at all with the > assumption that + * the OS will do that anyway. That sometimes is > problematic, as it may cause + * the system battery to drain too fast, for > example, so it is better to adjust + * it on CPU bring-up and if the > initial EPB value for a given CPU is 0, the + * kernel changes it to 6 > ('normal'). I have an idea to let the kernel more decide about such policies. It's a nice example that it makes sense to let the kernel set default values. But not unconditionally, according to what the system is intended to do. I wanted to do this for quite some time.., I hopefully find the time and motivation now. Thanks Rafael. Sorry for the somewhat rude sounding previous mail, that was not on purpose. You helped me quite a lot in the past and you obviously still do. Thomas
Re: [PATCH] [RESEND] Do not modify perf bias performance setting by default at boot
Rafael, I top post the general things and answer in only a few sentences embedded in context below: I very much honour your work and your neutral opinions and reasoning and I always have. This patch is a resend and while I try to come up with alternative hacks, there still is no solution, not even a suggestion. I sent the patch 3 years ago: https://lkml.org/lkml/2016/2/26/675 And I found this when doing performance analysis with Mel (Gorman). This time Hannes (Reinicke) stumbled over it, while he was working on performance tests on NVE over fabrics. We need this fixed and I am going to repush this into our kernel(s) now. On Monday, March 18, 2019 11:57:08 PM CET Rafael J. Wysocki wrote: > On Mon, Mar 18, 2019 at 2:22 PM Thomas Renninger wrote: > > On Monday, March 18, 2019 12:40:46 PM CET Rafael J. Wysocki wrote: > > > On Mon, Mar 18, 2019 at 12:15 PM Thomas Renninger wrote: > > ... ... > > > > > It may not match every setup perfectly, but at least it > > > > > is consistent. Why exactly is it worse than whatever the BIOS has > > > > > set? > > > > > > > > Because there may be BIOS settings for the CPU which justify > > > > initialization > > > > of the Perf BIAS value by BIOS. > > > > > > Well, the EPB is there for users to set it via the OS. The BIOS > > > setting is not guaranteed to work for all users anyway. > > > > Who says that? > > I do. And this is the reason you do not see much patches from myself anymore over the last years. It's certainly not your fault. I had quite some discussions with Len about specification and BIOS breakages. Especially in the CPU powersave area, idle states and cpufreq drivers, Intel was doing it differently all time long the last 5 years. Ignoring their own specifications, ignoring possible BIOS settings and changing kernel and userspace interfaces all the time. And now I have the discussion again... While it is related to this patch, it gets off topic. I guess there should be a more general thread on lkml: "Do not change APIs every second day" Up to userspace, but also to BIOS. > And then we can get back to the initial setting discussion. Let's stick to this topic in this thread. There is no reason to not find a proper fix for this meanwhile. Overriding the BIOS setting should IMO only take place: - if lifetime of CPU could be affected as you mentioned. But in this case the affected CPUs should be matched - if we expect that there are BIOSes which "want to set this value to 6, but may have forgotten to do so", as mentioned in the original patch from Len. BIOSes where this is the case should get a quirk to set it to 6. Still, ideally the whole overriding and the message should vanish IMO. Last time I sent this patch your answer was: "I need to talk to Len about that," https://lkml.org/lkml/2016/3/4/371 But as this is x86 kernel core code, I guess this should be discussed and pushed by the general x86 maintainers anyway. Sigh. Thomas
Re: [PATCH] [RESEND] Do not modify perf bias performance setting by default at boot
On Monday, March 18, 2019 12:40:46 PM CET Rafael J. Wysocki wrote: > On Mon, Mar 18, 2019 at 12:15 PM Thomas Renninger wrote: ... > And who's BIOS, really? I guess you mean the OEM? Note, however, > that the user and the OEM may not agree on that, but whatever. I mean both! The OEM. And the user who might choose a "performance" BIOS setting. > > > Yes, the kernel replaces whatever the original BIOS setting is with > > > its own one. > > > > No, it only replaces the "performance" (0) value with "normal" (6). > > This does not makes sense and is broken. Both know it better than this... > OK, fair enough. > > I guess it would have been better to set it to 6 unconditionally. No, I guess keeping the original value is the only sane thing to do. Sorry, Rafael, I have to insist on this. There might be secrets Len and you cannot share with public, but you should at least explain this privately then in a way that the current code could somehow make sense. I absolutely cannot see that. > What about the systems that will misbehave when it is left at 0? Good question. What exactly happens there? The CPU is faster in general? And may consume a tiny bit more of power. I had reports that Turbo Mode is not entered on some server CPUs when perf BIAS is switched back to normal mode. It is hard to identify what Intel implemented in microcode, but it's not that much altogether. It would be nice if Intel would be a bit more verbose about this and show some performance vs powersave figures. > > > It may not match every setup perfectly, but at least it > > > is consistent. Why exactly is it worse than whatever the BIOS has > > > set? > > > > Because there may be BIOS settings for the CPU which justify > > initialization > > of the Perf BIAS value by BIOS. > > Well, the EPB is there for users to set it via the OS. The BIOS > setting is not guaranteed to work for all users anyway. Who says that? Is this documented? I would say it is exactly the other way around. Energy Perf BIAS hint is a MSR which must be changed in Ring 0 kernel environment. x86_energy_perf_policy is a nice tool to play with and to try out what this value is for. But for example in secure boot mode userspace must not modify this HW setting in any way. So it would not be possible for a secure booted server to switch this setting back to performance mode. > > What sense does it make to unconditionally set perf BIAS value from > > performance to balanced? > > Why is this done? > > Basically, for HW health reasons AFAICS. > > Apparently, on some systems EPB=0 is (was?) special and means (meant?) > very aggressive use of turbo etc. which is not healthy in general. Hmm, maybe you want to explain this privately. Performance is a valid setting. x86_energy_perf_policy tool has a "performance" string for this and I expect it is very used. I would switch to it if I am constantly connected to AC. man x86_energy_perf_policy also does not tell the user anything about "be careful with performance setting" And until today every CPU online/offline cycle or machine resume cycle switches the value back to performance (if kernel tried to switch to 6). ... > > There may be BIOSes initialzing it via BIOS options. And this is a very > > valid thing to do. > > Yes, and there may be BIOSes leaving it at 0 with the assumption that > the OS will adjust it. The kernel cannot know which is the case. Correct. The kernel cannot know. You know this better than anyone else: - A specific Microsoft version is doing it wrong. - We adopt. - Microsoft is doing it correctly with the next version - We are busted I am ok with leaving the message as a hint, I would still check ACPI prefered profile variable that we are not running on a server. On these systems the message would be wrong rather sure and performance is intended. ... > > No. You must not ignore BIOS settings. Even worse, you must not override > > these without any sane reason. > > While there are BIOS settings that better should not be overridden, > the EPB is not one of them. Again: Why or where is this documented? > > Your assumption above might be right. But we want to do it better, right? > > > > ... > > > > > The system-wide resume part will still not be working properly after > > > the reverts. > > > > But it must never blindly (unconditionally) be set to specific value. > > Correct? > > Yes. I've already said that. How about below solution then for the initial boot up sequence? > > You mean the kernel should store the pre-hibernation perf BIAS value > > in NVRAM and write it back when waking up again? > > Or in the image and yes, it should write it
Re: [PATCH] [RESEND] Do not modify perf bias performance setting by default at boot
On Monday, March 18, 2019 11:26:10 AM CET Rafael J. Wysocki wrote: > On Fri, Mar 15, 2019 at 4:36 PM Thomas Renninger wrote: > > Hi Rafael, > > ... > > On my workstation the BIOS initializes perf bias to: > > cpupower info > > analyzing CPU 0: > > perf-bias: 7 > > > > I could grep through quite some dozens of machines..., but these are > > mostly > > servers and probably show either "zero"/"performance" or "6"/"normal" > > because the Linux kernel overrides the INTENDED performance perf bias > > value to 6. > The perf bias Intended by whom? The instance who should be in charge to set/init such a value: BIOS? > Yes, the kernel replaces whatever the original BIOS setting is with > its own one. No, it only replaces the "performance" (0) value with "normal" (6). This does not makes sense and is broken. > It may not match every setup perfectly, but at least it > is consistent. Why exactly is it worse than whatever the BIOS has > set? Because there may be BIOS settings for the CPU which justify initialization of the Perf BIAS value by BIOS. What sense does it make to unconditionally set perf BIAS value from performance to balanced? Why is this done? > > So we (SUSE) are going with this patch forever. > > > > Otherwise we would run into a similar support nightmare we ran into, when > > Intel decided to ignore CPU idle states as exported by BIOS through ACPI. > > BIOS documentation of all big server vendors mentioned "performance" > > settings. With a kernel update these BIOS C states settings have been > > ignored (some long latency once were not exported on purpose). > > > > The list of breaking conventions and specifications is long... > > People mostly blame the "bad BIOS developer". In this case things have > > been > > broke by the kernel. > > I agree that the kernel should not modify the EPB on system-wide > resume and on CPU online, but I don't see why changing the BIOS > setting at init time is a problem really. I would agree if we differ a tablet/laptop system and set the performance value to normal/powersave. And on a server we set it from normal/powersave to performance. But we should not touch this value anyway. Again: Why should the kernel touch it? There may be BIOSes initialzing it via BIOS options. And this is a very valid thing to do. > > It's now (with the resume patch) broken in way, that "performance" setting > > So the point seems to be that the BIOS setting should be preserved or > people are not able to configure the systems for performance through > setting things in the BIOS. > However, that only means that setting > things in the BIOS is not sufficient to configure a system for > performance and that has always been the case AFAICS, with or without > the EPB. ?!? If the kernel unconditionally, without documentation overrides such values... (and in this case only because of a workaround of some buggy BIOSes not initialzing this value)... > If you want to configure a system for performance, you need to do that > not just in the BIOS, but also in the OS (and, quite arguably, I would > expect the latter to be sufficient). No. You must not ignore BIOS settings. Even worse, you must not override these without any sane reason. Your assumption above might be right. But we want to do it better, right? ... > The system-wide resume part will still not be working properly after > the reverts. But it must never blindly (unconditionally) be set to specific value. Correct? You mean the kernel should store the pre-hibernation perf BIAS value in NVRAM and write it back when waking up again? This would make sense. It would also mean perf BIAS never really worked, at least did not survive suspend. On servers (no hibernation) it would works but is overridden to a value you typically do not want to have on a server... So the current situation is rather broken in the kernel. Thomas
Re: [PATCH] [RESEND] Do not modify perf bias performance setting by default at boot
Hi Rafael, On Thursday, March 14, 2019 11:08:03 PM CET Rafael J. Wysocki wrote: > On Thu, Mar 14, 2019 at 3:42 PM Thomas Renninger wrote: > > This is a revert of mainline git commits: > > commit b51ef52df71cb28e9d90cd1d48b79bf19f0bab06 > > commit 17edf2d79f1ea6dfdb4c444801d928953b9f98d6 > > commit abe48b108247e9b90b4c6739662a2e5c765ed114 > > I'm not quite convinced that reverting these is the right thing to do here. Ok, I try harder. > > It is about this kernel message showing up on quite a lot servers: > > [0.072652] ENERGY_PERF_BIAS: Set to 'normal', was 'performance' > > [0.076003] ENERGY_PERF_BIAS: View and update with > > x86_energy_perf_policy(8) > What happens on boot is a matter of convention and the convention by > which the EPB is set to "neutral" at that point has been used for > quite a while. Changing it now may confuse some users or even worse. SUSE has this patch included for quite a while (in all kind of SLE 12 SP flavors), possibly other distributions as well. Unfortunately the patch got removed recently, because my commit (this one) was ignored by Len a while ago. I also like to see more examples of BIOSes not initializing this value. This message is rare (so the setting back does not happen often, probably only wrongly with latest BIOSes overriding intended BIOS performance settings on servers). On my workstation the BIOS initializes perf bias to: cpupower info analyzing CPU 0: perf-bias: 7 I could grep through quite some dozens of machines..., but these are mostly servers and probably show either "zero"/"performance" or "6"/"normal" because the Linux kernel overrides the INTENDED performance perf bias value to 6. So we (SUSE) are going with this patch forever. Otherwise we would run into a similar support nightmare we ran into, when Intel decided to ignore CPU idle states as exported by BIOS through ACPI. BIOS documentation of all big server vendors mentioned "performance" settings. With a kernel update these BIOS C states settings have been ignored (some long latency once were not exported on purpose). The list of breaking conventions and specifications is long... People mostly blame the "bad BIOS developer". In this case things have been broke by the kernel. It's now (with the resume patch) broken in way, that "performance" setting in the mainline kernel is unusable for years already. Can we please get this finally properly fixed! > That is a bug. There is no reason to change the EPB on CPU > offline/online and I agree that this needs to be fixed. The reverts > would make this problem go away, but that's not the only possible way > to fix it. It is the only proper way to fix this. > Moreover, what is done during system-wide resume appears to be a bug > too. Whatever EPB value is there during suspend should be saved and > it should be restored during resume. Reverting the commits above will > not fix that particular issue. Someone could have a look what happens on a recent system after suspend (with this patch on top). I do not have laptops to test suspend/resume with latest kernels. These broken patches have been added a bit too quickly. I doubt much BIOSes "forgot to initialize this value" and I have my doubts that resume let's the CPU fall back to "uninitialized" value. I more expect BIOS resets performance value after resume and the kernel is doing it all wrong. Hm, in fact this could be proofed by modifying perf bias value, do a suspend resume cyle. If the value is kept we should avoid another try of complicating things and simply reverting touching perf bias value altogether, correct? BIOS probably does not store perf BIAS value and re-applies it after suspend resume. Some might re-set it to the intial value (performance..., D'oh). Thomas
[PATCH] [RESEND] Do not modify perf bias performance setting by default at boot
This is a revert of mainline git commits: commit b51ef52df71cb28e9d90cd1d48b79bf19f0bab06 commit 17edf2d79f1ea6dfdb4c444801d928953b9f98d6 commit abe48b108247e9b90b4c6739662a2e5c765ed114 It is about this kernel message showing up on quite a lot servers: [0.072652] ENERGY_PERF_BIAS: Set to 'normal', was 'performance' [0.076003] ENERGY_PERF_BIAS: View and update with x86_energy_perf_policy(8) The background afaik were some BIOS which "forgot to set a default value". Certainly some (one?) broken laptop BIOS was meant, because the value zero (performance) is quite a reasonable default value on server systems and this is how a lot CPUs typically come up there. Another sever bug with above patches: If you really set performance BIAS MSR value to 'performance' as mentioned in kernel boot log, it is reset on next CPU offline/online cycle (see below). Unfortunately one does not see this in the logs anymore, because of the printk_once() usage: dmesg |grep -i bias [0.026642] ENERGY_PERF_BIAS: Set to 'normal', was 'performance' [0.028002] ENERGY_PERF_BIAS: View and update with x86_energy_perf_policy(8) liszt:~/:[0]# x86_energy_perf_policy cpu0: EPB 6 cpu1: EPB 6 cpu2: EPB 6 cpu3: EPB 6 liszt:~/:[0]# x86_energy_perf_policy 0 liszt:~/:[0]# x86_energy_perf_policy cpu0: EPB 0 cpu1: EPB 0 cpu2: EPB 0 cpu3: EPB 0 liszt:~/:[0]# echo offline > /sys/devices/system/cpu/cpu1/online liszt:~/:[0]# echo online > /sys/devices/system/cpu/cpu1/online liszt:~/:[0]# x86_energy_perf_policy cpu0: EPB 0 cpu1: EPB 6 cpu2: EPB 0 cpu3: EPB 0 Signed-off-by: Thomas Renninger Tested-by: Simon Schricker Index: do_not_modify_perf_bias/arch/x86/kernel/cpu/common.c === --- do_not_modify_perf_bias.orig/arch/x86/kernel/cpu/common.c 2019-03-13 17:33:06.849890801 +0100 +++ do_not_modify_perf_bias/arch/x86/kernel/cpu/common.c2019-03-13 18:01:54.781983906 +0100 @@ -18,7 +18,6 @@ #include #include #include -#include #include #include @@ -1864,23 +1863,6 @@ void cpu_init(void) } #endif -static void bsp_resume(void) -{ - if (this_cpu->c_bsp_resume) - this_cpu->c_bsp_resume(_cpu_data); -} - -static struct syscore_ops cpu_syscore_ops = { - .resume = bsp_resume, -}; - -static int __init init_cpu_syscore(void) -{ - register_syscore_ops(_syscore_ops); - return 0; -} -core_initcall(init_cpu_syscore); - /* * The microcode loader calls this upon late microcode load to recheck features, * only when microcode has been updated. Caller holds microcode_mutex and CPU Index: do_not_modify_perf_bias/arch/x86/kernel/cpu/cpu.h === --- do_not_modify_perf_bias.orig/arch/x86/kernel/cpu/cpu.h 2019-03-13 17:33:06.849890801 +0100 +++ do_not_modify_perf_bias/arch/x86/kernel/cpu/cpu.h 2019-03-13 18:01:53.973983863 +0100 @@ -14,7 +14,6 @@ struct cpu_dev { void(*c_init)(struct cpuinfo_x86 *); void(*c_identify)(struct cpuinfo_x86 *); void(*c_detect_tlb)(struct cpuinfo_x86 *); - void(*c_bsp_resume)(struct cpuinfo_x86 *); int c_x86_vendor; #ifdef CONFIG_X86_32 /* Optional vendor specific routine to obtain the cache size. */ Index: do_not_modify_perf_bias/arch/x86/kernel/cpu/intel.c === --- do_not_modify_perf_bias.orig/arch/x86/kernel/cpu/intel.c2019-03-13 17:33:06.853890801 +0100 +++ do_not_modify_perf_bias/arch/x86/kernel/cpu/intel.c 2019-03-13 18:01:52.789983799 +0100 @@ -596,36 +596,6 @@ detect_keyid_bits: c->x86_phys_bits -= keyid_bits; } -static void init_intel_energy_perf(struct cpuinfo_x86 *c) -{ - u64 epb; - - /* -* Initialize MSR_IA32_ENERGY_PERF_BIAS if not already initialized. -* (x86_energy_perf_policy(8) is available to change it at run-time.) -*/ - if (!cpu_has(c, X86_FEATURE_EPB)) - return; - - rdmsrl(MSR_IA32_ENERGY_PERF_BIAS, epb); - if ((epb & 0xF) != ENERGY_PERF_BIAS_PERFORMANCE) - return; - - pr_warn_once("ENERGY_PERF_BIAS: Set to 'normal', was 'performance'\n"); - pr_warn_once("ENERGY_PERF_BIAS: View and update with x86_energy_perf_policy(8)\n"); - epb = (epb & ~0xF) | ENERGY_PERF_BIAS_NORMAL; - wrmsrl(MSR_IA32_ENERGY_PERF_BIAS, epb); -} - -static void intel_bsp_resume(struct cpuinfo_x86 *c) -{ - /* -* MSR_IA32_ENERGY_PERF_BIAS is lost across suspend/resume, -* so reinitialize it properly like during bootup: -*/ - init_intel_energy_perf(c); -} - static void init_cpuid_fault(struct cpuinfo_x86 *c) { u64 msr; @@ -763,8 +733,6 @@ static void init_intel(struct cpuinfo_x8 if (cpu_has(c, X86_FEATURE_TME))
Re: [PATCH] pstore: automatically dump and clean dmesg entries
On Friday, March 1, 2019 3:53:25 PM CET Thomas Renninger wrote: > From: Torsten Duwe > > Dump a previous oops or panic, which has made it to pstore, > to the new syslog after reboot, optionally deleting it. > This can happen automatically, without user land interaction. > > Signed-off-by: Torsten Duwe > CC: Thomas Renninger This is a patch which is (was) part of the SUSE kernel for quite a while. Iirc someone from Fujitsu requested this. Anyway, it has not been sent upstream yet, but looks useful. Please consider to merge this one if you like it. Otherwise it will probably get dropped to Nirvana as we only keep upstream accepted (not SuSE specific) patches. I adjusted the patch to latest linux-2.6 git HEAD state and gave it a test compile. Thanks, Thomas
[PATCH] pstore: automatically dump and clean dmesg entries
From: Torsten Duwe Dump a previous oops or panic, which has made it to pstore, to the new syslog after reboot, optionally deleting it. This can happen automatically, without user land interaction. Signed-off-by: Torsten Duwe CC: Thomas Renninger --- fs/pstore/inode.c| 6 +++--- fs/pstore/internal.h | 11 +-- fs/pstore/platform.c | 33 +++-- 3 files changed, 39 insertions(+), 11 deletions(-) Index: test_pstore/fs/pstore/inode.c === --- test_pstore.orig/fs/pstore/inode.c 2019-03-01 13:55:50.561322896 +0100 +++ test_pstore/fs/pstore/inode.c 2019-03-01 13:56:05.349323693 +0100 @@ -374,7 +374,7 @@ fail: * when we are re-scanning the backing store looking to add new * error records. */ -void pstore_get_records(int quiet) +void pstore_get_records(unsigned flags) { struct pstore_info *psi = psinfo; struct dentry *root; @@ -385,7 +385,7 @@ void pstore_get_records(int quiet) root = pstore_sb->s_root; inode_lock(d_inode(root)); - pstore_get_backend_records(psi, root, quiet); + pstore_get_backend_records(psi, root, flags); inode_unlock(d_inode(root)); } @@ -415,7 +415,7 @@ static int pstore_fill_super(struct supe if (!sb->s_root) return -ENOMEM; - pstore_get_records(0); + pstore_get_records(PGR_VERBOSE|PGR_POPULATE); return 0; } Index: test_pstore/fs/pstore/internal.h === --- test_pstore.orig/fs/pstore/internal.h 2019-03-01 13:55:50.565322897 +0100 +++ test_pstore/fs/pstore/internal.h2019-03-01 13:56:05.349323693 +0100 @@ -28,9 +28,16 @@ static inline void pstore_unregister_pms extern struct pstore_info *psinfo; extern voidpstore_set_kmsg_bytes(int); -extern voidpstore_get_records(int); +extern voidpstore_get_records(unsigned); +/* Flags for the pstore iterator pstore_get_records() */ +#define PGR_QUIET 0 +#define PGR_VERBOSE1 +#define PGR_POPULATE 2 +#define PGR_SYSLOG 4 +#define PGR_CLEAR 8 + extern voidpstore_get_backend_records(struct pstore_info *psi, - struct dentry *root, int quiet); + struct dentry *root, unsigned flags); extern int pstore_mkfile(struct dentry *root, struct pstore_record *record); extern boolpstore_is_mounted(void); Index: test_pstore/fs/pstore/platform.c === --- test_pstore.orig/fs/pstore/platform.c 2019-03-01 13:55:50.565322897 +0100 +++ test_pstore/fs/pstore/platform.c2019-03-01 13:59:06.077333431 +0100 @@ -87,6 +87,11 @@ static DECLARE_WORK(pstore_work, pstore_ static DEFINE_SPINLOCK(pstore_lock); struct pstore_info *psinfo; +static int auto_action=0; +module_param(auto_action, int, 0664); +MODULE_PARM_DESC(auto_action, "action to take on backend " +"registration: 0=nothing, 1=print, 2=print+clear"); + static char *backend; static char *compress = #ifdef CONFIG_PSTORE_COMPRESS_DEFAULT @@ -108,6 +113,8 @@ static size_t big_oops_buf_sz; /* How much of the console log to snapshot */ unsigned long kmsg_bytes = PSTORE_DEFAULT_KMSG_BYTES; +module_param(kmsg_bytes, ulong, 0644); +MODULE_PARM_DESC(kmsg_bytes, "maximum size to save of a crash dump"); void pstore_set_kmsg_bytes(int bytes) { @@ -606,7 +613,11 @@ int pstore_register(struct pstore_info * allocate_buf_for_compression(); if (pstore_is_mounted()) - pstore_get_records(0); + pstore_get_records(PGR_VERBOSE|PGR_POPULATE); + + if (auto_action) + pstore_get_records(PGR_SYSLOG| + ((auto_action>1)?PGR_CLEAR:0)); if (psi->flags & PSTORE_FLAGS_DMESG) pstore_register_kmsg(); @@ -723,7 +734,7 @@ static void decompress_record(struct pst * error records. */ void pstore_get_backend_records(struct pstore_info *psi, - struct dentry *root, int quiet) + struct dentry *root, unsigned flags) { int failed = 0; unsigned int stop_loop = 65536; @@ -742,7 +753,7 @@ void pstore_get_backend_records(struct p */ for (; stop_loop; stop_loop--) { struct pstore_record *record; - int rc; + int rc = 0; record = kzalloc(sizeof(*record), GFP_KERNEL); if (!record) { @@ -760,12 +771,23 @@ void pstore_get_backend_records(struct p } decompress_record(record); - rc = pstore_mkfile(root, record); + if (flags & PGR_POPULATE) + rc = pstore_mkfile(root, record); +
Re: [PATCH] cpupower : Auto-completion for cpupower tool
On Wednesday, December 5, 2018 12:31:04 PM CET Abhishek Goel wrote: > This script adds support for auto-completion for cpupower tool. > Added support for auto-completion of all the eight commands for > cpupower tool and their all subsequent sub-commands, wherever > possible. >From what I see all changes I proposed in a previous review round with a "not that broad audience" have been addressed. I already integrated the patch (slightly different, without functional changes) into the latest cpupower sources of our build service and things work out nice. Shuah: Can you pick this up for next cpupower submission round, please. Thanks, Thomas
Re: [PATCH] cpupower : Auto-completion for cpupower tool
On Wednesday, December 5, 2018 12:31:04 PM CET Abhishek Goel wrote: > This script adds support for auto-completion for cpupower tool. > Added support for auto-completion of all the eight commands for > cpupower tool and their all subsequent sub-commands, wherever > possible. >From what I see all changes I proposed in a previous review round with a "not that broad audience" have been addressed. I already integrated the patch (slightly different, without functional changes) into the latest cpupower sources of our build service and things work out nice. Shuah: Can you pick this up for next cpupower submission round, please. Thanks, Thomas
Re: [PATCH 00/10] tools: Various build flags fixes
On Tuesday, October 16, 2018 5:06:05 PM CEST Jiri Olsa wrote: > hi, > while hardening some of the tools rpm, we noticed we > can't pass build flags to some of them. > > Sending separate tools fixes for what we found. It's > mostly override for CFLAGS and adding LDFLAGS to the > build commands. Looks fine for the cpupower patches 3/10 and 4/10. (Feel free to add a Reviewed-by or Acked-by for them if you have to resend). Thanks, Thomas
Re: [PATCH 00/10] tools: Various build flags fixes
On Tuesday, October 16, 2018 5:06:05 PM CEST Jiri Olsa wrote: > hi, > while hardening some of the tools rpm, we noticed we > can't pass build flags to some of them. > > Sending separate tools fixes for what we found. It's > mostly override for CFLAGS and adding LDFLAGS to the > build commands. Looks fine for the cpupower patches 3/10 and 4/10. (Feel free to add a Reviewed-by or Acked-by for them if you have to resend). Thanks, Thomas
Re: [PATCH] tools/power/cpupower: fix compilation with STATIC=true
On Tuesday, October 16, 2018 10:56:26 AM CEST Konstantin Khlebnikov wrote: > Rename duplicate sysfs_read_file into cpupower_read_sysfs and fix linking. > > Signed-off-by: Konstantin Khlebnikov Acked-by: Thomas Renninger Thanks! > - $(OUTPUT)../lib/cpufreq.o $(OUTPUT)../lib/sysfs.o > + $(OUTPUT)../lib/cpufreq.o $(OUTPUT)../lib/cpupower.o I also looked at this and would only have fixed half of it (overseeing above old leftover from cpupower lib introduction). I guess, this patch from Konstantin is better and the correct solution.
Re: [PATCH] tools/power/cpupower: fix compilation with STATIC=true
On Tuesday, October 16, 2018 10:56:26 AM CEST Konstantin Khlebnikov wrote: > Rename duplicate sysfs_read_file into cpupower_read_sysfs and fix linking. > > Signed-off-by: Konstantin Khlebnikov Acked-by: Thomas Renninger Thanks! > - $(OUTPUT)../lib/cpufreq.o $(OUTPUT)../lib/sysfs.o > + $(OUTPUT)../lib/cpufreq.o $(OUTPUT)../lib/cpupower.o I also looked at this and would only have fixed half of it (overseeing above old leftover from cpupower lib introduction). I guess, this patch from Konstantin is better and the correct solution.
Re: [PATCH] MAINTAINERS: add maintainer for tools/power/cpupower
Hi Shuah, On Thursday, November 02, 2017 01:19:47 PM Shuah Khan wrote: > Based on discussions with Rafael J. Wysocki, cpupower is need of an > active maintainer. I decided to on take the task of maintaining this > tool. Thanks. It is very much appreciated! I had a motorcycle accident and had to stop typing at all for several weeks/months. My plan is to be more active again on lists, do reviews again and submit some patches I have lying around for a long time, etc.,... Do not hesitate to contact me directly, I try to answer in time, may still take a while on more complex things... Good luck and have fun..., Thomas > Patches will flow through the pm sub-systems to the mainline. > > Suggested-by: Rafael J. Wysocki <raf...@kernel.org> > Signed-off-by: Shuah Khan <shua...@osg.samsung.com> Acked-by: Thomas Renninger <tr...@suse.com> > --- > MAINTAINERS | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/MAINTAINERS b/MAINTAINERS > index af0cb69f6a3e..9fd3ce23095a 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -3636,6 +3636,8 @@ F: drivers/cpufreq/arm_big_little_dt.c > > CPU POWER MONITORING SUBSYSTEM > M: Thomas Renninger <tr...@suse.com> > +M: Shuah Khan <shua...@osg.samsung.com> > +M: Shuah Khan <sh...@kernel.org> > L: linux...@vger.kernel.org > S: Maintained > F: tools/power/cpupower/ >
Re: [PATCH] MAINTAINERS: add maintainer for tools/power/cpupower
Hi Shuah, On Thursday, November 02, 2017 01:19:47 PM Shuah Khan wrote: > Based on discussions with Rafael J. Wysocki, cpupower is need of an > active maintainer. I decided to on take the task of maintaining this > tool. Thanks. It is very much appreciated! I had a motorcycle accident and had to stop typing at all for several weeks/months. My plan is to be more active again on lists, do reviews again and submit some patches I have lying around for a long time, etc.,... Do not hesitate to contact me directly, I try to answer in time, may still take a while on more complex things... Good luck and have fun..., Thomas > Patches will flow through the pm sub-systems to the mainline. > > Suggested-by: Rafael J. Wysocki > Signed-off-by: Shuah Khan Acked-by: Thomas Renninger > --- > MAINTAINERS | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/MAINTAINERS b/MAINTAINERS > index af0cb69f6a3e..9fd3ce23095a 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -3636,6 +3636,8 @@ F: drivers/cpufreq/arm_big_little_dt.c > > CPU POWER MONITORING SUBSYSTEM > M: Thomas Renninger > +M: Shuah Khan > +M: Shuah Khan > L: linux...@vger.kernel.org > S: Maintained > F: tools/power/cpupower/ >
Re: [RFC PATCH 2/3] Implement sysfs based cpuinfo for x86 cpus.
On Thursday, June 08, 2017 08:24:01 PM Greg KH wrote: > On Thu, Jun 08, 2017 at 06:56:14PM +0200, Felix Schnizlein wrote: > > --- > > arch/x86/kernel/Makefile| 1 + > > arch/x86/kernel/cpuinfo_sysfs.c | 166 > > drivers/base/cpuinfo.c | 4 - > > 3 files changed, 167 insertions(+), 4 deletions(-) > > create mode 100644 arch/x86/kernel/cpuinfo_sysfs.c > > > > When you add new sysfs entries, you have to also add new > Documentation/ABI/ entries. This one seem to be rather unmaintained? There even is: Documentation/ABI/stable/sysfs-devices-system-cpu this patchset would have to add: Documentation/ABI/stable/sysfs-devices-system-cpu-cpuinfo then. But: Documentation/ABI/stable/sysfs-devices-system-cpu describes /sys/devices/system/cpu/dscr_default and /sys/devices/system/cpu/cpu[0-9]+/dscr which I have never seen. Much more important values in there like: /sys/devices/system/cpu/topology /sys/devices/system/cpu/microcode /sys/devices/system/cpu/cpuidle /sys/devices/system/cpu/cpufreq /sys/devices/system/cpu/{online,offline} are not described there at all. I wonder whether .../cpu/dscr_default still exists in the kernel, a quick grep did not reveal anything. A Source: tag would be a nice non-optional addition in the README. We can also send definitions for topology/microcode/cache/.. to catch up a bit there again, not sure that makes much sense. Be aware that the output of /proc/cpuinfo is rather arbitrary depending on architecure and some build options. Thomas
Re: [RFC PATCH 2/3] Implement sysfs based cpuinfo for x86 cpus.
On Thursday, June 08, 2017 08:24:01 PM Greg KH wrote: > On Thu, Jun 08, 2017 at 06:56:14PM +0200, Felix Schnizlein wrote: > > --- > > arch/x86/kernel/Makefile| 1 + > > arch/x86/kernel/cpuinfo_sysfs.c | 166 > > drivers/base/cpuinfo.c | 4 - > > 3 files changed, 167 insertions(+), 4 deletions(-) > > create mode 100644 arch/x86/kernel/cpuinfo_sysfs.c > > > > When you add new sysfs entries, you have to also add new > Documentation/ABI/ entries. This one seem to be rather unmaintained? There even is: Documentation/ABI/stable/sysfs-devices-system-cpu this patchset would have to add: Documentation/ABI/stable/sysfs-devices-system-cpu-cpuinfo then. But: Documentation/ABI/stable/sysfs-devices-system-cpu describes /sys/devices/system/cpu/dscr_default and /sys/devices/system/cpu/cpu[0-9]+/dscr which I have never seen. Much more important values in there like: /sys/devices/system/cpu/topology /sys/devices/system/cpu/microcode /sys/devices/system/cpu/cpuidle /sys/devices/system/cpu/cpufreq /sys/devices/system/cpu/{online,offline} are not described there at all. I wonder whether .../cpu/dscr_default still exists in the kernel, a quick grep did not reveal anything. A Source: tag would be a nice non-optional addition in the README. We can also send definitions for topology/microcode/cache/.. to catch up a bit there again, not sure that makes much sense. Be aware that the output of /proc/cpuinfo is rather arbitrary depending on architecure and some build options. Thomas
Re: [RESEND][PATCHv2] cpupower: Correct return type of cpu_power_is_cpu_online in cpufreq
On Thursday, October 20, 2016 01:42:49 PM Shilpasri G Bhat wrote: > On 10/20/2016 04:23 AM, Laura Abbott wrote: > > When converting to a shared library in ac5a181d065d ("cpupower: Add > > cpuidle parts into library"), cpu_freq_cpu_exists was converted to > > cpupower_is_cpu_online. cpu_req_cpu_exists returned 0 on success and > > -ENOSYS on failure whereas cpupower_is_cpu_online returns 1 on success. > > Check for the correct return value in cpufreq-set. > > > > See https://bugzilla.redhat.com/show_bug.cgi?id=1374212 > > > > Fixes: ac5a181d065d ("cpupower: Add cpuidle parts into library") > > Reported-by: Julian Seward <jsew...@acm.org> > > Signed-off-by: Laura Abbott <labb...@redhat.com> > > --- > > v2: Drop another redundant cpupower_is_cpu_online > > Reviewed-by:Shilpasri G Bhat <shilpa.b...@linux.vnet.ibm.com> Acked-by: Thomas Renninger <tr...@suse.com> I wanted to go through all recently sent patches and adjust rebase them to my lates changes. But I guess the fact that I push this from day to day because of other stuff should not hold off such important fixes. I collect and review all others as well the next day(s).. Thomas
Re: [RESEND][PATCHv2] cpupower: Correct return type of cpu_power_is_cpu_online in cpufreq
On Thursday, October 20, 2016 01:42:49 PM Shilpasri G Bhat wrote: > On 10/20/2016 04:23 AM, Laura Abbott wrote: > > When converting to a shared library in ac5a181d065d ("cpupower: Add > > cpuidle parts into library"), cpu_freq_cpu_exists was converted to > > cpupower_is_cpu_online. cpu_req_cpu_exists returned 0 on success and > > -ENOSYS on failure whereas cpupower_is_cpu_online returns 1 on success. > > Check for the correct return value in cpufreq-set. > > > > See https://bugzilla.redhat.com/show_bug.cgi?id=1374212 > > > > Fixes: ac5a181d065d ("cpupower: Add cpuidle parts into library") > > Reported-by: Julian Seward > > Signed-off-by: Laura Abbott > > --- > > v2: Drop another redundant cpupower_is_cpu_online > > Reviewed-by:Shilpasri G Bhat Acked-by: Thomas Renninger I wanted to go through all recently sent patches and adjust rebase them to my lates changes. But I guess the fact that I push this from day to day because of other stuff should not hold off such important fixes. I collect and review all others as well the next day(s).. Thomas
Re: Fail to build tools/all
On Sunday, May 01, 2016 01:11:33 PM Sean Fu wrote: > Hi guys: > I encountered a build error when running "make V=1 tools/all". > Shall we write a patch to fix it? This is not a bug. > The following is error log. > > start == > commit 05cf8077e54b20dddb756eaa26f3aeb5c38dd3cf > Merge: cf78031 db5dd0d > Author: Linus Torvalds> Date: Fri Apr 1 20:03:33 2016 -0500 > = end = > > == start === > mkdir -p > /home/sean/work/source/upstream/kernel.linus/linux/tools/power/cpupower > && make O=/home/sean/work/source/upstream/kernel.linus/linux > subdir=tools/power/cpupower --no-print-directory -C power/cpupower > gcc -DVERSION=\"4.6.rc1.190.g05cf80\" -DPACKAGE=\"cpupower\" > -DPACKAGE_BUGREPORT=\"linux...@vger.kernel.org\" -D_GNU_SOURCE -pipe > -DNLS -Wall -Wchar-subscripts -Wpointer-arith -Wsign-compare > -Wno-pointer-sign -Wdeclaration-after-statement -Wshadow -O1 -g -DDEBUG > -I./lib -I ./utils -o > /home/sean/work/source/upstream/kernel.linus/linux/utils/helpers/amd.o > -c utils/helpers/amd.c > utils/helpers/amd.c:7:21: fatal error: pci/pci.h: No such file or > directory > #include You need pciutils header and library: ftp://ftp.kernel.org/pub/software/utils/pciutils Thomas
Re: Fail to build tools/all
On Sunday, May 01, 2016 01:11:33 PM Sean Fu wrote: > Hi guys: > I encountered a build error when running "make V=1 tools/all". > Shall we write a patch to fix it? This is not a bug. > The following is error log. > > start == > commit 05cf8077e54b20dddb756eaa26f3aeb5c38dd3cf > Merge: cf78031 db5dd0d > Author: Linus Torvalds > Date: Fri Apr 1 20:03:33 2016 -0500 > = end = > > == start === > mkdir -p > /home/sean/work/source/upstream/kernel.linus/linux/tools/power/cpupower > && make O=/home/sean/work/source/upstream/kernel.linus/linux > subdir=tools/power/cpupower --no-print-directory -C power/cpupower > gcc -DVERSION=\"4.6.rc1.190.g05cf80\" -DPACKAGE=\"cpupower\" > -DPACKAGE_BUGREPORT=\"linux...@vger.kernel.org\" -D_GNU_SOURCE -pipe > -DNLS -Wall -Wchar-subscripts -Wpointer-arith -Wsign-compare > -Wno-pointer-sign -Wdeclaration-after-statement -Wshadow -O1 -g -DDEBUG > -I./lib -I ./utils -o > /home/sean/work/source/upstream/kernel.linus/linux/utils/helpers/amd.o > -c utils/helpers/amd.c > utils/helpers/amd.c:7:21: fatal error: pci/pci.h: No such file or > directory > #include You need pciutils header and library: ftp://ftp.kernel.org/pub/software/utils/pciutils Thomas
Re: [PATCH] Do not modify MSR_IA32_ENERGY_PERF_BIAS in kernel
On Monday, March 07, 2016 07:50:57 PM Len Brown wrote: > > But with Broadwell-EP processor (E5-2687W v4) the CPU will not enter turbo > > modes if this value is not set to performance > > BDX-EP supports HWP. > Are these failing machines running in HWP mode? > > (On BDX-EP, and only on BDX-EP, EPB acts to set the BIAS for HWP, > because that processor doesn't yet have EPP.) I am not sure whether I can publish platform info. I asked for and added you to CC of the private bug a while ago. This kernel is run: SLES12 SP1, 3.12.49-4-default I grepped the whole supportconfig for -i hwp and couldn't find anything, so I very much expect this machine is/was not run in HWP mode, right? I still question the usefulness of the "initialize perf bias to normal" hack. For our distro I am pretty sure, we do not want to have this one and we prefer the CPU or BIOS initialized value, even (or especially) if it is set to performance. Are there any known platforms where this would really be an issue and how sever would it be? I already removed the "set perf bias to normal" years ago for SLE11 without getting any regression or negative reports. Now finding the workaround on the hack to run a suspend hook to adjust perf bias value on each suspend cycle... This is going into a wrong direction. I agree that if this needs resetting after each suspend cycle, userspace should not need to care about it. I could imagine a sysfs variable here: /sys/devices/system/cpu/intel_pstate/perf_bias but the static setting from 0 to 6 in the x86 core code and the suspend callback should get reverted, right? Thomas
Re: [PATCH] Do not modify MSR_IA32_ENERGY_PERF_BIAS in kernel
On Monday, March 07, 2016 07:50:57 PM Len Brown wrote: > > But with Broadwell-EP processor (E5-2687W v4) the CPU will not enter turbo > > modes if this value is not set to performance > > BDX-EP supports HWP. > Are these failing machines running in HWP mode? > > (On BDX-EP, and only on BDX-EP, EPB acts to set the BIAS for HWP, > because that processor doesn't yet have EPP.) I am not sure whether I can publish platform info. I asked for and added you to CC of the private bug a while ago. This kernel is run: SLES12 SP1, 3.12.49-4-default I grepped the whole supportconfig for -i hwp and couldn't find anything, so I very much expect this machine is/was not run in HWP mode, right? I still question the usefulness of the "initialize perf bias to normal" hack. For our distro I am pretty sure, we do not want to have this one and we prefer the CPU or BIOS initialized value, even (or especially) if it is set to performance. Are there any known platforms where this would really be an issue and how sever would it be? I already removed the "set perf bias to normal" years ago for SLE11 without getting any regression or negative reports. Now finding the workaround on the hack to run a suspend hook to adjust perf bias value on each suspend cycle... This is going into a wrong direction. I agree that if this needs resetting after each suspend cycle, userspace should not need to care about it. I could imagine a sysfs variable here: /sys/devices/system/cpu/intel_pstate/perf_bias but the static setting from 0 to 6 in the x86 core code and the suspend callback should get reverted, right? Thomas
Re: [PATCH] Do not modify perf bias performance setting by default at boot
On Monday, March 07, 2016 02:24:54 PM Thomas Renninger wrote: > It came out that on certain CPUs perf bias MSR has to be set to performance, > so that the CPU enters turbo states at all. > > Better do not try to wrongly adjust perf bias value, > its value probably is intended by BIOS. This whole perf bias CPU feature and trying to adjust the value at boot time gets more and more a mess. I try to sum this up, try to collect some missing bits and get them published through mailing lists. Hopefully some root causes of what went wrong are identified and passed on to the right people to get things fixed. 1. It started with: commit abe48b108247e9b90b4c6739662a2e5c765ed114 Author: Len Brown <len.br...@intel.com> Date: Thu Jul 14 00:53:24 2011 -0400 x86, intel, power: Initialize MSR_IA32_ENERGY_PERF_BIAS ... However, the typical BIOS fails to initialize the MSR, presumably because this is handled by high-volume shrink-wrap operating systems... Linux distros, on the other hand, do not yet invoke x86_energy_perf_policy(8). As a result, WSM-EP, SNB, and later hardware from Intel will run in its default hardware power-on state (performance), which assumes that users care for performance at all costs and not for energy efficiency. While that is fine for performance benchmarks, the hardware's intended default operating point is "normal" mode... Initialize the MSR to the "normal" by default during kernel boot. ... 2. Four years later it has been identified that the boot processor's perf bias switches back to 0 on suspend. I wonder whether this happens generally. Or do BIOS hooks also reset the BIAS value on suspend? 3. It has now be identified that specific CPUs can only enter Turbo modes, if the perf BIAS value is set to 0 (performance). But the performance value cannot be set anymore by BIOS because of above patches. My questions: 1. If this is true: "While that is fine for performance benchmarks, the hardware's intended default operating point is "normal" mode..." Then the HW (CPU) must initialize to 5 (normal). It's not the BIOS' fault which forgets to set something, this may always be intended. commit b51ef52df71cb28e9d90cd1d48b79bf19f0bab06 commit 17edf2d79f1ea6dfdb4c444801d928953b9f98d6 commit abe48b108247e9b90b4c6739662a2e5c765ed114 Intel must tell their microcode writers to initialize perf BIAS with 5, if this is what Intel wants. 2. Why is the value for the boot processor reset to 0 on suspend? Is this a platform/BIOS issue only? Or could this be another major misdesign in perf BIAS CPU implementation. 3. What exactly was the problem? The argumentation that the perf BIAS "should" be 5 was rather vague. What exactly are the concequences? Getting a dirty workaround into the x86 core code only by this information without any values or references is bad. Thomas PS: I will submit the patch (revert of commit b51ef52df71cb2, 17edf2d79f1ea6d and abe48b108247e9) now to our SLE code base.
Re: [PATCH] Do not modify perf bias performance setting by default at boot
On Monday, March 07, 2016 02:24:54 PM Thomas Renninger wrote: > It came out that on certain CPUs perf bias MSR has to be set to performance, > so that the CPU enters turbo states at all. > > Better do not try to wrongly adjust perf bias value, > its value probably is intended by BIOS. This whole perf bias CPU feature and trying to adjust the value at boot time gets more and more a mess. I try to sum this up, try to collect some missing bits and get them published through mailing lists. Hopefully some root causes of what went wrong are identified and passed on to the right people to get things fixed. 1. It started with: commit abe48b108247e9b90b4c6739662a2e5c765ed114 Author: Len Brown Date: Thu Jul 14 00:53:24 2011 -0400 x86, intel, power: Initialize MSR_IA32_ENERGY_PERF_BIAS ... However, the typical BIOS fails to initialize the MSR, presumably because this is handled by high-volume shrink-wrap operating systems... Linux distros, on the other hand, do not yet invoke x86_energy_perf_policy(8). As a result, WSM-EP, SNB, and later hardware from Intel will run in its default hardware power-on state (performance), which assumes that users care for performance at all costs and not for energy efficiency. While that is fine for performance benchmarks, the hardware's intended default operating point is "normal" mode... Initialize the MSR to the "normal" by default during kernel boot. ... 2. Four years later it has been identified that the boot processor's perf bias switches back to 0 on suspend. I wonder whether this happens generally. Or do BIOS hooks also reset the BIAS value on suspend? 3. It has now be identified that specific CPUs can only enter Turbo modes, if the perf BIAS value is set to 0 (performance). But the performance value cannot be set anymore by BIOS because of above patches. My questions: 1. If this is true: "While that is fine for performance benchmarks, the hardware's intended default operating point is "normal" mode..." Then the HW (CPU) must initialize to 5 (normal). It's not the BIOS' fault which forgets to set something, this may always be intended. commit b51ef52df71cb28e9d90cd1d48b79bf19f0bab06 commit 17edf2d79f1ea6dfdb4c444801d928953b9f98d6 commit abe48b108247e9b90b4c6739662a2e5c765ed114 Intel must tell their microcode writers to initialize perf BIAS with 5, if this is what Intel wants. 2. Why is the value for the boot processor reset to 0 on suspend? Is this a platform/BIOS issue only? Or could this be another major misdesign in perf BIAS CPU implementation. 3. What exactly was the problem? The argumentation that the perf BIAS "should" be 5 was rather vague. What exactly are the concequences? Getting a dirty workaround into the x86 core code only by this information without any values or references is bad. Thomas PS: I will submit the patch (revert of commit b51ef52df71cb2, 17edf2d79f1ea6d and abe48b108247e9) now to our SLE code base.
[PATCH] Do not modify perf bias performance setting by default at boot
It came out that on certain CPUs perf bias MSR has to be set to performance, so that the CPU enters turbo states at all. Better do not try to wrongly adjust perf bias value, its value probably is intended by BIOS. This is a revert of mainline git commits: commit b51ef52df71cb28e9d90cd1d48b79bf19f0bab06 commit 17edf2d79f1ea6dfdb4c444801d928953b9f98d6 commit abe48b108247e9b90b4c6739662a2e5c765ed114 Signed-off-by: Thomas Renninger <tr...@suse.de> --- arch/x86/kernel/cpu/common.c | 18 -- arch/x86/kernel/cpu/cpu.h|1 - arch/x86/kernel/cpu/intel.c | 32 3 files changed, 51 deletions(-) --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -13,7 +13,6 @@ #include #include #include -#include #include #include @@ -1489,20 +1488,3 @@ inline bool __static_cpu_has_safe(u16 bi return boot_cpu_has(bit); } EXPORT_SYMBOL_GPL(__static_cpu_has_safe); - -static void bsp_resume(void) -{ - if (this_cpu->c_bsp_resume) - this_cpu->c_bsp_resume(_cpu_data); -} - -static struct syscore_ops cpu_syscore_ops = { - .resume = bsp_resume, -}; - -static int __init init_cpu_syscore(void) -{ - register_syscore_ops(_syscore_ops); - return 0; -} -core_initcall(init_cpu_syscore); --- a/arch/x86/kernel/cpu/cpu.h +++ b/arch/x86/kernel/cpu/cpu.h @@ -13,7 +13,6 @@ struct cpu_dev { void(*c_init)(struct cpuinfo_x86 *); void(*c_identify)(struct cpuinfo_x86 *); void(*c_detect_tlb)(struct cpuinfo_x86 *); - void(*c_bsp_resume)(struct cpuinfo_x86 *); int c_x86_vendor; #ifdef CONFIG_X86_32 /* Optional vendor specific routine to obtain the cache size. */ --- a/arch/x86/kernel/cpu/intel.c +++ b/arch/x86/kernel/cpu/intel.c @@ -372,36 +372,6 @@ static void detect_vmx_virtcap(struct cp } } -static void init_intel_energy_perf(struct cpuinfo_x86 *c) -{ - u64 epb; - - /* -* Initialize MSR_IA32_ENERGY_PERF_BIAS if not already initialized. -* (x86_energy_perf_policy(8) is available to change it at run-time.) -*/ - if (!cpu_has(c, X86_FEATURE_EPB)) - return; - - rdmsrl(MSR_IA32_ENERGY_PERF_BIAS, epb); - if ((epb & 0xF) != ENERGY_PERF_BIAS_PERFORMANCE) - return; - - pr_warn_once("ENERGY_PERF_BIAS: Set to 'normal', was 'performance'\n"); - pr_warn_once("ENERGY_PERF_BIAS: View and update with x86_energy_perf_policy(8)\n"); - epb = (epb & ~0xF) | ENERGY_PERF_BIAS_NORMAL; - wrmsrl(MSR_IA32_ENERGY_PERF_BIAS, epb); -} - -static void intel_bsp_resume(struct cpuinfo_x86 *c) -{ - /* -* MSR_IA32_ENERGY_PERF_BIAS is lost across suspend/resume, -* so reinitialize it properly like during bootup: -*/ - init_intel_energy_perf(c); -} - static void init_intel(struct cpuinfo_x86 *c) { unsigned int l2 = 0; @@ -509,7 +479,6 @@ static void init_intel(struct cpuinfo_x8 if (cpu_has(c, X86_FEATURE_VMX)) detect_vmx_virtcap(c); - init_intel_energy_perf(c); } #ifdef CONFIG_X86_32 @@ -764,7 +733,6 @@ static const struct cpu_dev intel_cpu_de .c_detect_tlb = intel_detect_tlb, .c_early_init = early_init_intel, .c_init = init_intel, - .c_bsp_resume = intel_bsp_resume, .c_x86_vendor = X86_VENDOR_INTEL, };
[PATCH] Do not modify perf bias performance setting by default at boot
It came out that on certain CPUs perf bias MSR has to be set to performance, so that the CPU enters turbo states at all. Better do not try to wrongly adjust perf bias value, its value probably is intended by BIOS. This is a revert of mainline git commits: commit b51ef52df71cb28e9d90cd1d48b79bf19f0bab06 commit 17edf2d79f1ea6dfdb4c444801d928953b9f98d6 commit abe48b108247e9b90b4c6739662a2e5c765ed114 Signed-off-by: Thomas Renninger --- arch/x86/kernel/cpu/common.c | 18 -- arch/x86/kernel/cpu/cpu.h|1 - arch/x86/kernel/cpu/intel.c | 32 3 files changed, 51 deletions(-) --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -13,7 +13,6 @@ #include #include #include -#include #include #include @@ -1489,20 +1488,3 @@ inline bool __static_cpu_has_safe(u16 bi return boot_cpu_has(bit); } EXPORT_SYMBOL_GPL(__static_cpu_has_safe); - -static void bsp_resume(void) -{ - if (this_cpu->c_bsp_resume) - this_cpu->c_bsp_resume(_cpu_data); -} - -static struct syscore_ops cpu_syscore_ops = { - .resume = bsp_resume, -}; - -static int __init init_cpu_syscore(void) -{ - register_syscore_ops(_syscore_ops); - return 0; -} -core_initcall(init_cpu_syscore); --- a/arch/x86/kernel/cpu/cpu.h +++ b/arch/x86/kernel/cpu/cpu.h @@ -13,7 +13,6 @@ struct cpu_dev { void(*c_init)(struct cpuinfo_x86 *); void(*c_identify)(struct cpuinfo_x86 *); void(*c_detect_tlb)(struct cpuinfo_x86 *); - void(*c_bsp_resume)(struct cpuinfo_x86 *); int c_x86_vendor; #ifdef CONFIG_X86_32 /* Optional vendor specific routine to obtain the cache size. */ --- a/arch/x86/kernel/cpu/intel.c +++ b/arch/x86/kernel/cpu/intel.c @@ -372,36 +372,6 @@ static void detect_vmx_virtcap(struct cp } } -static void init_intel_energy_perf(struct cpuinfo_x86 *c) -{ - u64 epb; - - /* -* Initialize MSR_IA32_ENERGY_PERF_BIAS if not already initialized. -* (x86_energy_perf_policy(8) is available to change it at run-time.) -*/ - if (!cpu_has(c, X86_FEATURE_EPB)) - return; - - rdmsrl(MSR_IA32_ENERGY_PERF_BIAS, epb); - if ((epb & 0xF) != ENERGY_PERF_BIAS_PERFORMANCE) - return; - - pr_warn_once("ENERGY_PERF_BIAS: Set to 'normal', was 'performance'\n"); - pr_warn_once("ENERGY_PERF_BIAS: View and update with x86_energy_perf_policy(8)\n"); - epb = (epb & ~0xF) | ENERGY_PERF_BIAS_NORMAL; - wrmsrl(MSR_IA32_ENERGY_PERF_BIAS, epb); -} - -static void intel_bsp_resume(struct cpuinfo_x86 *c) -{ - /* -* MSR_IA32_ENERGY_PERF_BIAS is lost across suspend/resume, -* so reinitialize it properly like during bootup: -*/ - init_intel_energy_perf(c); -} - static void init_intel(struct cpuinfo_x86 *c) { unsigned int l2 = 0; @@ -509,7 +479,6 @@ static void init_intel(struct cpuinfo_x8 if (cpu_has(c, X86_FEATURE_VMX)) detect_vmx_virtcap(c); - init_intel_energy_perf(c); } #ifdef CONFIG_X86_32 @@ -764,7 +733,6 @@ static const struct cpu_dev intel_cpu_de .c_detect_tlb = intel_detect_tlb, .c_early_init = early_init_intel, .c_init = init_intel, - .c_bsp_resume = intel_bsp_resume, .c_x86_vendor = X86_VENDOR_INTEL, };
Re: [PATCH] Do not modify MSR_IA32_ENERGY_PERF_BIAS in kernel
On Wednesday, March 02, 2016 01:26:18 AM Rafael J. Wysocki wrote: > On Tuesday, March 01, 2016 01:17:37 PM Thomas Renninger wrote: > > > > if (!cpu_has(c, X86_FEATURE_EPB))z > > > > > > > > return; > > > > > > > > @@ -387,10 +391,8 @@ static void init_intel_energy_perf(struc > > > > > > > > if ((epb & 0xF) != ENERGY_PERF_BIAS_PERFORMANCE) > > > > > > > > return; > > > > > > > > - pr_warn_once("ENERGY_PERF_BIAS: Set to 'normal', was > > > > 'performance'\n"); > > > > > > - pr_warn_once("ENERGY_PERF_BIAS: View and update with > > > > x86_energy_perf_policy(8)\n"); -epb = (epb & ~0xF) | > > > > ENERGY_PERF_BIAS_NORMAL; > > > > - wrmsrl(MSR_IA32_ENERGY_PERF_BIAS, epb); > > > > + pr_info_once("ENERGY_PERF_BIAS is set to 'performance'\n"); > > > > + pr_info_once("ENERGY_PERF_BIAS: Update with cpupower- set(8)\n"); > > > > > > This doesn't need to be cpupower-set IMO. > > > > You mean why switch the message from: > > x86_energy_perf_policy to cpupower-set > > ? > > > > IMO x86_energy_perf_policy should not exist. It has been introduce before > > cpupower set -b. > > Having an extra tool/binary for this functionality is an unneeded > > packaging > > overhead for distros. > > Also having more and more of such CPU specific tools is not userfriendly. > > cpupower supports all power relevant features of your CPU and on all > > architectures (or at least it should). People should know this one better > > than "x86_energy_perf_policy" and theoretically intuitively find it, even > > without a message. > > > > So it would be nice to get the message fixed as well. > > My point is that since "cpupower set -b" is not the only way to set this, > it doesn't seem appropriate to refer to it explicitly from a kernel message. > > I actually don't think the second message is necessary at all. Hmm, thinking a bit more about this, I think the whole init_intel_energy_perf() function check should vanish. The check should get moved into the powertop userspace tool. This one is used to optimize platform for power saving features. This would also keep the kernel core code clean... If you agree I will send the patch. Thomas
Re: [PATCH] Do not modify MSR_IA32_ENERGY_PERF_BIAS in kernel
On Wednesday, March 02, 2016 01:26:18 AM Rafael J. Wysocki wrote: > On Tuesday, March 01, 2016 01:17:37 PM Thomas Renninger wrote: > > > > if (!cpu_has(c, X86_FEATURE_EPB))z > > > > > > > > return; > > > > > > > > @@ -387,10 +391,8 @@ static void init_intel_energy_perf(struc > > > > > > > > if ((epb & 0xF) != ENERGY_PERF_BIAS_PERFORMANCE) > > > > > > > > return; > > > > > > > > - pr_warn_once("ENERGY_PERF_BIAS: Set to 'normal', was > > > > 'performance'\n"); > > > > > > - pr_warn_once("ENERGY_PERF_BIAS: View and update with > > > > x86_energy_perf_policy(8)\n"); -epb = (epb & ~0xF) | > > > > ENERGY_PERF_BIAS_NORMAL; > > > > - wrmsrl(MSR_IA32_ENERGY_PERF_BIAS, epb); > > > > + pr_info_once("ENERGY_PERF_BIAS is set to 'performance'\n"); > > > > + pr_info_once("ENERGY_PERF_BIAS: Update with cpupower- set(8)\n"); > > > > > > This doesn't need to be cpupower-set IMO. > > > > You mean why switch the message from: > > x86_energy_perf_policy to cpupower-set > > ? > > > > IMO x86_energy_perf_policy should not exist. It has been introduce before > > cpupower set -b. > > Having an extra tool/binary for this functionality is an unneeded > > packaging > > overhead for distros. > > Also having more and more of such CPU specific tools is not userfriendly. > > cpupower supports all power relevant features of your CPU and on all > > architectures (or at least it should). People should know this one better > > than "x86_energy_perf_policy" and theoretically intuitively find it, even > > without a message. > > > > So it would be nice to get the message fixed as well. > > My point is that since "cpupower set -b" is not the only way to set this, > it doesn't seem appropriate to refer to it explicitly from a kernel message. > > I actually don't think the second message is necessary at all. Hmm, thinking a bit more about this, I think the whole init_intel_energy_perf() function check should vanish. The check should get moved into the powertop userspace tool. This one is used to optimize platform for power saving features. This would also keep the kernel core code clean... If you agree I will send the patch. Thomas
Re: [PATCH] Do not modify MSR_IA32_ENERGY_PERF_BIAS in kernel
On Saturday, February 27, 2016 12:15:47 AM Rafael J. Wysocki wrote: > On Friday, February 26, 2016 05:38:00 PM Thomas Renninger wrote: > > The assumption that BIOSes never want to have this register being set to > > full performance (zero) is wrong. > > > > While wrongly overruling this BIOS setting and set it from performance > > to normal did not hurt that much, because nobody really knew the effects > > inside Intel processors. > > > > But with Broadwell-EP processor (E5-2687W v4) the CPU will not enter turbo > > modes if this value is not set to performance. > > > > So switch logic to tell the user in a friendly way (info) that the CPU is > > in performance mode and how to switch via userspace if this is not > > intended. > > > > But otherwise trust that the BIOS has set the correct value here and do > > not > > blindly overrule. > > > > How this has been found: SLE11 had this patch, SLE12 it slipped through. > > It took quite some time to nail down that this patch missing is the reason > > for not entering turbo modes with this specific processor. > > > > Signed-off-by: Thomas Renninger <tr...@suse.com> > > > > --- a/arch/x86/kernel/cpu/intel.c 2016-02-26 17:19:55.731042972 +0100 > > +++ b/arch/x86/kernel/cpu/intel.c 2016-02-26 17:20:48.598020581 +0100 > > @@ -377,8 +377,12 @@ static void init_intel_energy_perf(struc > > > > u64 epb; > > > > /* > > > > -* Initialize MSR_IA32_ENERGY_PERF_BIAS if not already initialized. > > -* (x86_energy_perf_policy(8) is available to change it at run-time.) > > +* On server platforms energy bias typically is set to > > +* performance on purpose. > > +* On other platforms it may happen that MSR_IA32_ENERGY_PERF_BIAS > > +* did not get initialized properly by BIOS. > > +* Best is to to keep BIOS settings and give the user a hint whether > > +* to change it via cpupower-set(8) userspace tool at runtime. > > > > */ > > > > if (!cpu_has(c, X86_FEATURE_EPB)) > > > > return; > > > > @@ -387,10 +391,8 @@ static void init_intel_energy_perf(struc > > > > if ((epb & 0xF) != ENERGY_PERF_BIAS_PERFORMANCE) > > > > return; > > > > - pr_warn_once("ENERGY_PERF_BIAS: Set to 'normal', was 'performance'\n"); > > - pr_warn_once("ENERGY_PERF_BIAS: View and update with > > x86_energy_perf_policy(8)\n"); -epb = (epb & ~0xF) | > > ENERGY_PERF_BIAS_NORMAL; > > - wrmsrl(MSR_IA32_ENERGY_PERF_BIAS, epb); > > + pr_info_once("ENERGY_PERF_BIAS is set to 'performance'\n"); > > + pr_info_once("ENERGY_PERF_BIAS: Update with cpupower-set(8)\n"); > > This doesn't need to be cpupower-set IMO. You mean why switch the message from: x86_energy_perf_policy to cpupower-set ? IMO x86_energy_perf_policy should not exist. It has been introduce before cpupower set -b. Having an extra tool/binary for this functionality is an unneeded packaging overhead for distros. Also having more and more of such CPU specific tools is not userfriendly. cpupower supports all power relevant features of your CPU and on all architectures (or at least it should). People should know this one better than "x86_energy_perf_policy" and theoretically intuitively find it, even without a message. So it would be nice to get the message fixed as well. Thomas
Re: [PATCH] Do not modify MSR_IA32_ENERGY_PERF_BIAS in kernel
On Saturday, February 27, 2016 12:15:47 AM Rafael J. Wysocki wrote: > On Friday, February 26, 2016 05:38:00 PM Thomas Renninger wrote: > > The assumption that BIOSes never want to have this register being set to > > full performance (zero) is wrong. > > > > While wrongly overruling this BIOS setting and set it from performance > > to normal did not hurt that much, because nobody really knew the effects > > inside Intel processors. > > > > But with Broadwell-EP processor (E5-2687W v4) the CPU will not enter turbo > > modes if this value is not set to performance. > > > > So switch logic to tell the user in a friendly way (info) that the CPU is > > in performance mode and how to switch via userspace if this is not > > intended. > > > > But otherwise trust that the BIOS has set the correct value here and do > > not > > blindly overrule. > > > > How this has been found: SLE11 had this patch, SLE12 it slipped through. > > It took quite some time to nail down that this patch missing is the reason > > for not entering turbo modes with this specific processor. > > > > Signed-off-by: Thomas Renninger > > > > --- a/arch/x86/kernel/cpu/intel.c 2016-02-26 17:19:55.731042972 +0100 > > +++ b/arch/x86/kernel/cpu/intel.c 2016-02-26 17:20:48.598020581 +0100 > > @@ -377,8 +377,12 @@ static void init_intel_energy_perf(struc > > > > u64 epb; > > > > /* > > > > -* Initialize MSR_IA32_ENERGY_PERF_BIAS if not already initialized. > > -* (x86_energy_perf_policy(8) is available to change it at run-time.) > > +* On server platforms energy bias typically is set to > > +* performance on purpose. > > +* On other platforms it may happen that MSR_IA32_ENERGY_PERF_BIAS > > +* did not get initialized properly by BIOS. > > +* Best is to to keep BIOS settings and give the user a hint whether > > +* to change it via cpupower-set(8) userspace tool at runtime. > > > > */ > > > > if (!cpu_has(c, X86_FEATURE_EPB)) > > > > return; > > > > @@ -387,10 +391,8 @@ static void init_intel_energy_perf(struc > > > > if ((epb & 0xF) != ENERGY_PERF_BIAS_PERFORMANCE) > > > > return; > > > > - pr_warn_once("ENERGY_PERF_BIAS: Set to 'normal', was 'performance'\n"); > > - pr_warn_once("ENERGY_PERF_BIAS: View and update with > > x86_energy_perf_policy(8)\n"); -epb = (epb & ~0xF) | > > ENERGY_PERF_BIAS_NORMAL; > > - wrmsrl(MSR_IA32_ENERGY_PERF_BIAS, epb); > > + pr_info_once("ENERGY_PERF_BIAS is set to 'performance'\n"); > > + pr_info_once("ENERGY_PERF_BIAS: Update with cpupower-set(8)\n"); > > This doesn't need to be cpupower-set IMO. You mean why switch the message from: x86_energy_perf_policy to cpupower-set ? IMO x86_energy_perf_policy should not exist. It has been introduce before cpupower set -b. Having an extra tool/binary for this functionality is an unneeded packaging overhead for distros. Also having more and more of such CPU specific tools is not userfriendly. cpupower supports all power relevant features of your CPU and on all architectures (or at least it should). People should know this one better than "x86_energy_perf_policy" and theoretically intuitively find it, even without a message. So it would be nice to get the message fixed as well. Thomas
Re: [PATCH] Do not modify MSR_IA32_ENERGY_PERF_BIAS in kernel
This in fact is a re-send, including x86 maintainers. Even this is a PM (Power Management) issue, the code is in the x86 architecture paths. >From last submit: > > Patch is against latest linux-pm kernel. > > Rafael: Can you queue this one up, please. > Well, I'm not an x86 arch maintainer. > Can you at least CC them, please? Can someone, best would be a x86 maintainer?, take the patch please if it is considered ok. Thanks, Thomas
Re: [PATCH] Do not modify MSR_IA32_ENERGY_PERF_BIAS in kernel
This in fact is a re-send, including x86 maintainers. Even this is a PM (Power Management) issue, the code is in the x86 architecture paths. >From last submit: > > Patch is against latest linux-pm kernel. > > Rafael: Can you queue this one up, please. > Well, I'm not an x86 arch maintainer. > Can you at least CC them, please? Can someone, best would be a x86 maintainer?, take the patch please if it is considered ok. Thanks, Thomas
[PATCH] Do not modify MSR_IA32_ENERGY_PERF_BIAS in kernel
The assumption that BIOSes never want to have this register being set to full performance (zero) is wrong. While wrongly overruling this BIOS setting and set it from performance to normal did not hurt that much, because nobody really knew the effects inside Intel processors. But with Broadwell-EP processor (E5-2687W v4) the CPU will not enter turbo modes if this value is not set to performance. So switch logic to tell the user in a friendly way (info) that the CPU is in performance mode and how to switch via userspace if this is not intended. But otherwise trust that the BIOS has set the correct value here and do not blindly overrule. How this has been found: SLE11 had this patch, SLE12 it slipped through. It took quite some time to nail down that this patch missing is the reason for not entering turbo modes with this specific processor. Signed-off-by: Thomas Renninger <tr...@suse.com> --- a/arch/x86/kernel/cpu/intel.c 2016-02-26 17:19:55.731042972 +0100 +++ b/arch/x86/kernel/cpu/intel.c 2016-02-26 17:20:48.598020581 +0100 @@ -377,8 +377,12 @@ static void init_intel_energy_perf(struc u64 epb; /* -* Initialize MSR_IA32_ENERGY_PERF_BIAS if not already initialized. -* (x86_energy_perf_policy(8) is available to change it at run-time.) +* On server platforms energy bias typically is set to +* performance on purpose. +* On other platforms it may happen that MSR_IA32_ENERGY_PERF_BIAS +* did not get initialized properly by BIOS. +* Best is to to keep BIOS settings and give the user a hint whether +* to change it via cpupower-set(8) userspace tool at runtime. */ if (!cpu_has(c, X86_FEATURE_EPB)) return; @@ -387,10 +391,8 @@ static void init_intel_energy_perf(struc if ((epb & 0xF) != ENERGY_PERF_BIAS_PERFORMANCE) return; - pr_warn_once("ENERGY_PERF_BIAS: Set to 'normal', was 'performance'\n"); - pr_warn_once("ENERGY_PERF_BIAS: View and update with x86_energy_perf_policy(8)\n"); - epb = (epb & ~0xF) | ENERGY_PERF_BIAS_NORMAL; - wrmsrl(MSR_IA32_ENERGY_PERF_BIAS, epb); + pr_info_once("ENERGY_PERF_BIAS is set to 'performance'\n"); + pr_info_once("ENERGY_PERF_BIAS: Update with cpupower-set(8)\n"); } static void intel_bsp_resume(struct cpuinfo_x86 *c)
[PATCH] Do not modify MSR_IA32_ENERGY_PERF_BIAS in kernel
The assumption that BIOSes never want to have this register being set to full performance (zero) is wrong. While wrongly overruling this BIOS setting and set it from performance to normal did not hurt that much, because nobody really knew the effects inside Intel processors. But with Broadwell-EP processor (E5-2687W v4) the CPU will not enter turbo modes if this value is not set to performance. So switch logic to tell the user in a friendly way (info) that the CPU is in performance mode and how to switch via userspace if this is not intended. But otherwise trust that the BIOS has set the correct value here and do not blindly overrule. How this has been found: SLE11 had this patch, SLE12 it slipped through. It took quite some time to nail down that this patch missing is the reason for not entering turbo modes with this specific processor. Signed-off-by: Thomas Renninger --- a/arch/x86/kernel/cpu/intel.c 2016-02-26 17:19:55.731042972 +0100 +++ b/arch/x86/kernel/cpu/intel.c 2016-02-26 17:20:48.598020581 +0100 @@ -377,8 +377,12 @@ static void init_intel_energy_perf(struc u64 epb; /* -* Initialize MSR_IA32_ENERGY_PERF_BIAS if not already initialized. -* (x86_energy_perf_policy(8) is available to change it at run-time.) +* On server platforms energy bias typically is set to +* performance on purpose. +* On other platforms it may happen that MSR_IA32_ENERGY_PERF_BIAS +* did not get initialized properly by BIOS. +* Best is to to keep BIOS settings and give the user a hint whether +* to change it via cpupower-set(8) userspace tool at runtime. */ if (!cpu_has(c, X86_FEATURE_EPB)) return; @@ -387,10 +391,8 @@ static void init_intel_energy_perf(struc if ((epb & 0xF) != ENERGY_PERF_BIAS_PERFORMANCE) return; - pr_warn_once("ENERGY_PERF_BIAS: Set to 'normal', was 'performance'\n"); - pr_warn_once("ENERGY_PERF_BIAS: View and update with x86_energy_perf_policy(8)\n"); - epb = (epb & ~0xF) | ENERGY_PERF_BIAS_NORMAL; - wrmsrl(MSR_IA32_ENERGY_PERF_BIAS, epb); + pr_info_once("ENERGY_PERF_BIAS is set to 'performance'\n"); + pr_info_once("ENERGY_PERF_BIAS: Update with cpupower-set(8)\n"); } static void intel_bsp_resume(struct cpuinfo_x86 *c)
Re: [PATCH] cpupower: Fix build error in cpufreq-info
On Monday, January 18, 2016 08:44:43 PM Shreyas B. Prabhu wrote: > Fix the following build error by including limits.h - > > utils/cpufreq-info.c: In function ‘get_latency’: > utils/cpufreq-info.c:437:29: error: ‘UINT_MAX’ undeclared (first use in > this function) > if (!latency || latency == UINT_MAX) { > ^ > Signed-off-by: Shreyas B. Prabhu <shre...@linux.vnet.ibm.com> Signed-off-by: Thomas Renninger <tr...@suse.com> Builds for me by luck, thanks! Thomas
Re: [PATCH] cpupower: Fix build error in cpufreq-info
On Monday, January 18, 2016 08:44:43 PM Shreyas B. Prabhu wrote: > Fix the following build error by including limits.h - > > utils/cpufreq-info.c: In function ‘get_latency’: > utils/cpufreq-info.c:437:29: error: ‘UINT_MAX’ undeclared (first use in > this function) > if (!latency || latency == UINT_MAX) { > ^ > Signed-off-by: Shreyas B. Prabhu Signed-off-by: Thomas Renninger Builds for me by luck, thanks! Thomas
Re: [PATCH] acpi/ec: Deny write access unless requested by module param
On Saturday, February 06, 2016 02:08:08 AM gr...@linuxhacker.ru wrote: > From: Oleg Drokin <gr...@linuxhacker.ru> > > In debugfs it's not enough to just set file mode to read-only to > deny write access to a file, instead just don't provide > the write method unless write access is really requested. > > Signed-off-by: Oleg Drokin <gr...@linuxhacker.ru> Signed-off-by: Thomas Renninger <tr...@suse.com> Thanks! > --- > I assume allowing run-time changes via /sys/module is preferrable, > opposed to forced module unload and reload to change this option, > but I can submit another patch to only depend on the module parameter > too, please let me know. > > drivers/acpi/ec_sys.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/acpi/ec_sys.c b/drivers/acpi/ec_sys.c > index bea8e42..6c7dd7a 100644 > --- a/drivers/acpi/ec_sys.c > +++ b/drivers/acpi/ec_sys.c > @@ -73,6 +73,9 @@ static ssize_t acpi_ec_write_io(struct file *f, const char > __user *buf, loff_t init_off = *off; > int err = 0; > > + if (!write_support) > + return -EINVAL; > + > if (*off >= EC_SPACE_SIZE) > return 0; > if (*off + count >= EC_SPACE_SIZE) {
Re: [PATCH] acpi/ec: Deny write access unless requested by module param
On Saturday, February 06, 2016 02:08:08 AM gr...@linuxhacker.ru wrote: > From: Oleg Drokin > > In debugfs it's not enough to just set file mode to read-only to > deny write access to a file, instead just don't provide > the write method unless write access is really requested. > > Signed-off-by: Oleg Drokin Signed-off-by: Thomas Renninger Thanks! > --- > I assume allowing run-time changes via /sys/module is preferrable, > opposed to forced module unload and reload to change this option, > but I can submit another patch to only depend on the module parameter > too, please let me know. > > drivers/acpi/ec_sys.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/acpi/ec_sys.c b/drivers/acpi/ec_sys.c > index bea8e42..6c7dd7a 100644 > --- a/drivers/acpi/ec_sys.c > +++ b/drivers/acpi/ec_sys.c > @@ -73,6 +73,9 @@ static ssize_t acpi_ec_write_io(struct file *f, const char > __user *buf, loff_t init_off = *off; > int err = 0; > > + if (!write_support) > + return -EINVAL; > + > if (*off >= EC_SPACE_SIZE) > return 0; > if (*off + count >= EC_SPACE_SIZE) {
Re: Do not modify MSR_IA32_ENERGY_PERF_BIAS in kernel
On Friday, January 22, 2016 01:09:15 PM Thomas Renninger wrote: > The assumption that BIOSes never want to have this register being set to > full performance (zero) is wrong. Patch is against latest linux-pm kernel. Rafael: Can you queue this one up, please. Thanks, Thomas
Do not modify MSR_IA32_ENERGY_PERF_BIAS in kernel
The assumption that BIOSes never want to have this register being set to full performance (zero) is wrong. While wrongly overruling this BIOS setting and set it to from performance to normal did not hurt that much, because nobody really knew the effects inside Intel processors. But with Broadwell-EP processor (E5-2687W v4) the CPU will not enter turbo modes if this value is not set to performance. So switch logic to tell the user in a friendly way (info) that the CPU is in performance mode and how to switch via userspace if this is not intended. But otherwise trust that the BIOS has set the correct value here and do not blindly overrule. How this has been found: SLE11 had this patch, SLE12 it slipped through. It took quite some time to nail down that this patch missing is the reason for not entering turbo modes with this specific processor. Signed-off-by: Thomas Renninger --- arch/x86/kernel/cpu/intel.c.orig2016-01-22 12:49:03.347906007 +0100 +++ arch/x86/kernel/cpu/intel.c 2016-01-22 12:53:33.043234963 +0100 @@ -377,8 +377,14 @@ u64 epb; /* -* Initialize MSR_IA32_ENERGY_PERF_BIAS if not already initialized. -* (x86_energy_perf_policy(8) is available to change it at run-time.) +* On server platforms energy bias typically is set to +* performance on purpose. +* On other platforms it may happen that MSR_IA32_ENERGY_PERF_BIAS +* did not get initialized properly by BIOS. +* An additional check could be to look at pm_profile and check whether +* this is a performance oriented platform. +* But best is to to keep BIOS settings and give the user a hint whether +* to change it via cpupower-set(8) userspace tool at runtime. */ if (!cpu_has(c, X86_FEATURE_EPB)) return; @@ -387,10 +393,8 @@ if ((epb & 0xF) != ENERGY_PERF_BIAS_PERFORMANCE) return; - pr_warn_once("ENERGY_PERF_BIAS: Set to 'normal', was 'performance'\n"); - pr_warn_once("ENERGY_PERF_BIAS: View and update with x86_energy_perf_policy(8)\n"); - epb = (epb & ~0xF) | ENERGY_PERF_BIAS_NORMAL; - wrmsrl(MSR_IA32_ENERGY_PERF_BIAS, epb); + pr_info_once("ENERGY_PERF_BIAS is set to 'performance'\n"); + pr_info_once("ENERGY_PERF_BIAS: Update with cpupower-set(8)\n"); } static void intel_bsp_resume(struct cpuinfo_x86 *c)
Do not modify MSR_IA32_ENERGY_PERF_BIAS in kernel
The assumption that BIOSes never want to have this register being set to full performance (zero) is wrong. While wrongly overruling this BIOS setting and set it to from performance to normal did not hurt that much, because nobody really knew the effects inside Intel processors. But with Broadwell-EP processor (E5-2687W v4) the CPU will not enter turbo modes if this value is not set to performance. So switch logic to tell the user in a friendly way (info) that the CPU is in performance mode and how to switch via userspace if this is not intended. But otherwise trust that the BIOS has set the correct value here and do not blindly overrule. How this has been found: SLE11 had this patch, SLE12 it slipped through. It took quite some time to nail down that this patch missing is the reason for not entering turbo modes with this specific processor. Signed-off-by: Thomas Renninger <tr...@suse.com> --- arch/x86/kernel/cpu/intel.c.orig2016-01-22 12:49:03.347906007 +0100 +++ arch/x86/kernel/cpu/intel.c 2016-01-22 12:53:33.043234963 +0100 @@ -377,8 +377,14 @@ u64 epb; /* -* Initialize MSR_IA32_ENERGY_PERF_BIAS if not already initialized. -* (x86_energy_perf_policy(8) is available to change it at run-time.) +* On server platforms energy bias typically is set to +* performance on purpose. +* On other platforms it may happen that MSR_IA32_ENERGY_PERF_BIAS +* did not get initialized properly by BIOS. +* An additional check could be to look at pm_profile and check whether +* this is a performance oriented platform. +* But best is to to keep BIOS settings and give the user a hint whether +* to change it via cpupower-set(8) userspace tool at runtime. */ if (!cpu_has(c, X86_FEATURE_EPB)) return; @@ -387,10 +393,8 @@ if ((epb & 0xF) != ENERGY_PERF_BIAS_PERFORMANCE) return; - pr_warn_once("ENERGY_PERF_BIAS: Set to 'normal', was 'performance'\n"); - pr_warn_once("ENERGY_PERF_BIAS: View and update with x86_energy_perf_policy(8)\n"); - epb = (epb & ~0xF) | ENERGY_PERF_BIAS_NORMAL; - wrmsrl(MSR_IA32_ENERGY_PERF_BIAS, epb); + pr_info_once("ENERGY_PERF_BIAS is set to 'performance'\n"); + pr_info_once("ENERGY_PERF_BIAS: Update with cpupower-set(8)\n"); } static void intel_bsp_resume(struct cpuinfo_x86 *c)
Re: Do not modify MSR_IA32_ENERGY_PERF_BIAS in kernel
On Friday, January 22, 2016 01:09:15 PM Thomas Renninger wrote: > The assumption that BIOSes never want to have this register being set to > full performance (zero) is wrong. Patch is against latest linux-pm kernel. Rafael: Can you queue this one up, please. Thanks, Thomas
Re: [PATCH 1/2] cpu hotplug, add CONFIG_PERMANENT_CPU_TOPOLOGY and keep topology directory for lifetime of CPU [v2]
On Wednesday, October 21, 2015 12:06:37 PM Prarit Bhargava wrote: ... > +config PERMANENT_CPU_TOPOLOGY > + bool "Permanent CPU Topology" > + depends on HOTPLUG_CPU > + default 1 if X86 > + default 0 > + help > + This option configures CPU topology to be permanent for the lifetime > + of the CPU (until it is physically removed). I cannot see where you differ soft offlined and physically removed? The topology file is simply never removed or do I oversee something? Hm, maybe this one is even correct, but could get re-phrased a bit. Puhh, not sure, maybe this: > + This option configures CPU topology to be permanent. should be enough? It is obvious that topology info is still correct when the CPU is software offlined via: echo 0 >/sys/devices/system/cpu/cpuX/online But if someone really hit the button to eject a Numa node or so, this info might not be up-to-date (but still avail now, because we cannot differ software vs hardware events, at least not that easy). But it is up-to-date, once the newly plugged in Numa node or CPU gets added and ramped up, so this change should be very fine. Maybe it's worth to phrase above into the changelog at least? I think this is a sane change and works! If physically removed, the topology info could be outdated, but userspace knows, that the core is offlined right now. Also the info will likely be the same when something gets re-plugged. If not, topology info will be valid once the re-plugged core gets initialized, so everything should be fine. I put these two patches onto our latest kernel builds, enabled the .config option for i386 and x86_64, disabled it for other archs and things still built nicely: https://build.opensuse.org/project/show/home:trenn:kernel One issue: Why do you move topology.c into cpu.c? It is hard to see the real diff now. If moving makes sense, it would be nice to first submit a patch showing the changes for this feature and another patch saying: "Eleminating topology.c, only code move, no functional change" in the changelog. Feel free to add a Reviewed-by: Thomas Renninger if you plan to re-submit. Thanks! Thomas -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] cpu hotplug, add CONFIG_PERMANENT_CPU_TOPOLOGY and keep topology directory for lifetime of CPU [v2]
On Wednesday, October 21, 2015 12:06:37 PM Prarit Bhargava wrote: ... > +config PERMANENT_CPU_TOPOLOGY > + bool "Permanent CPU Topology" > + depends on HOTPLUG_CPU > + default 1 if X86 > + default 0 > + help > + This option configures CPU topology to be permanent for the lifetime > + of the CPU (until it is physically removed). I cannot see where you differ soft offlined and physically removed? The topology file is simply never removed or do I oversee something? Hm, maybe this one is even correct, but could get re-phrased a bit. Puhh, not sure, maybe this: > + This option configures CPU topology to be permanent. should be enough? It is obvious that topology info is still correct when the CPU is software offlined via: echo 0 >/sys/devices/system/cpu/cpuX/online But if someone really hit the button to eject a Numa node or so, this info might not be up-to-date (but still avail now, because we cannot differ software vs hardware events, at least not that easy). But it is up-to-date, once the newly plugged in Numa node or CPU gets added and ramped up, so this change should be very fine. Maybe it's worth to phrase above into the changelog at least? I think this is a sane change and works! If physically removed, the topology info could be outdated, but userspace knows, that the core is offlined right now. Also the info will likely be the same when something gets re-plugged. If not, topology info will be valid once the re-plugged core gets initialized, so everything should be fine. I put these two patches onto our latest kernel builds, enabled the .config option for i386 and x86_64, disabled it for other archs and things still built nicely: https://build.opensuse.org/project/show/home:trenn:kernel One issue: Why do you move topology.c into cpu.c? It is hard to see the real diff now. If moving makes sense, it would be nice to first submit a patch showing the changes for this feature and another patch saying: "Eleminating topology.c, only code move, no functional change" in the changelog. Feel free to add a Reviewed-by: Thomas Renninger <tr...@suse.com> if you plan to re-submit. Thanks! Thomas -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] cpupower tools: Fix error when running cpupower monitor
On Monday, August 03, 2015 11:46:00 AM Shreyas B. Prabhu wrote: > get_cpu_topology() tries to get topology info from all cpus by reading > files in the topology sysfs dir. If a cpu is offlined, since it doesn't > have topology dir, this function fails and returns -1. This causes > functions relying on get_cpu_topology() to fail. For example- > > $ cpupower monitor > Cannot read number of available processors > > Fix this by skipping fetching topology info for offline cpus. Looks fine. Thanks! Acked-by: Thomas Renninger > > Signed-off-by: Shreyas B. Prabhu > Reported-by: Pavaman Subramaniyam > --- > tools/power/cpupower/utils/helpers/topology.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/tools/power/cpupower/utils/helpers/topology.c > b/tools/power/cpupower/utils/helpers/topology.c index > c13120af519b..cea398c176e7 100644 > --- a/tools/power/cpupower/utils/helpers/topology.c > +++ b/tools/power/cpupower/utils/helpers/topology.c > @@ -73,6 +73,8 @@ int get_cpu_topology(struct cpupower_topology *cpu_top) > for (cpu = 0; cpu < cpus; cpu++) { > cpu_top->core_info[cpu].cpu = cpu; > cpu_top->core_info[cpu].is_online = sysfs_is_cpu_online(cpu); > + if (!cpu_top->core_info[cpu].is_online) > + continue; > if(sysfs_topology_read_file( > cpu, > "physical_package_id", -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] cpupower tools: Fix error when running cpupower monitor
On Monday, August 03, 2015 11:46:00 AM Shreyas B. Prabhu wrote: get_cpu_topology() tries to get topology info from all cpus by reading files in the topology sysfs dir. If a cpu is offlined, since it doesn't have topology dir, this function fails and returns -1. This causes functions relying on get_cpu_topology() to fail. For example- $ cpupower monitor Cannot read number of available processors Fix this by skipping fetching topology info for offline cpus. Looks fine. Thanks! Acked-by: Thomas Renninger tr...@suse.de Signed-off-by: Shreyas B. Prabhu shre...@linux.vnet.ibm.com Reported-by: Pavaman Subramaniyam pavsu...@linux.vnet.ibm.com --- tools/power/cpupower/utils/helpers/topology.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tools/power/cpupower/utils/helpers/topology.c b/tools/power/cpupower/utils/helpers/topology.c index c13120af519b..cea398c176e7 100644 --- a/tools/power/cpupower/utils/helpers/topology.c +++ b/tools/power/cpupower/utils/helpers/topology.c @@ -73,6 +73,8 @@ int get_cpu_topology(struct cpupower_topology *cpu_top) for (cpu = 0; cpu cpus; cpu++) { cpu_top-core_info[cpu].cpu = cpu; cpu_top-core_info[cpu].is_online = sysfs_is_cpu_online(cpu); + if (!cpu_top-core_info[cpu].is_online) + continue; if(sysfs_topology_read_file( cpu, physical_package_id, -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] cpupower: Do not change the frequency of offline cpu
On Wednesday, July 22, 2015 01:50:49 PM Shilpasri G Bhat wrote: > Check if the cpu is online before changing the frequency/governor of > the cpu. > > Signed-off-by: Shilpasri G Bhat > --- > tools/power/cpupower/utils/cpufreq-set.c | 4 > 1 file changed, 4 insertions(+) > > diff --git a/tools/power/cpupower/utils/cpufreq-set.c > b/tools/power/cpupower/utils/cpufreq-set.c index f656e58..4e21357 100644 > --- a/tools/power/cpupower/utils/cpufreq-set.c > +++ b/tools/power/cpupower/utils/cpufreq-set.c > @@ -17,6 +17,7 @@ > > #include "cpufreq.h" > #include "helpers/helpers.h" > +#include "helpers/sysfs.h" > > #define NORM_FREQ_LEN 32 > > @@ -318,6 +319,9 @@ int cmd_freq_set(int argc, char **argv) > cpufreq_cpu_exists(cpu)) > continue; > > + if (sysfs_is_cpu_online(cpu) != 1) > + continue; > + > printf(_("Setting cpu: %d\n"), cpu); > ret = do_one_cpu(cpu, _pol, freq, policychange); > if (ret) { Thanks for catching this one and sending a patch! Acked-by: Thomas Renninger -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] cpupower: Do not change the frequency of offline cpu
On Wednesday, July 22, 2015 01:50:49 PM Shilpasri G Bhat wrote: Check if the cpu is online before changing the frequency/governor of the cpu. Signed-off-by: Shilpasri G Bhat shilpa.b...@linux.vnet.ibm.com --- tools/power/cpupower/utils/cpufreq-set.c | 4 1 file changed, 4 insertions(+) diff --git a/tools/power/cpupower/utils/cpufreq-set.c b/tools/power/cpupower/utils/cpufreq-set.c index f656e58..4e21357 100644 --- a/tools/power/cpupower/utils/cpufreq-set.c +++ b/tools/power/cpupower/utils/cpufreq-set.c @@ -17,6 +17,7 @@ #include cpufreq.h #include helpers/helpers.h +#include helpers/sysfs.h #define NORM_FREQ_LEN 32 @@ -318,6 +319,9 @@ int cmd_freq_set(int argc, char **argv) cpufreq_cpu_exists(cpu)) continue; + if (sysfs_is_cpu_online(cpu) != 1) + continue; + printf(_(Setting cpu: %d\n), cpu); ret = do_one_cpu(cpu, new_pol, freq, policychange); if (ret) { Thanks for catching this one and sending a patch! Acked-by: Thomas Renninger tr...@suse.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] cpupower: mperf monitor: fix output in MAX_FREQ_SYSFS mode
Hi, and sorry for the late answer, this had not highest prio... On Wednesday, May 20, 2015 02:26:47 AM Rafael J. Wysocki wrote: > On Monday, May 11, 2015 12:40:22 PM Herton R. Krzesinski wrote: > > On Mon, May 11, 2015 at 11:35:35AM -0300, Herton R. Krzesinski wrote: > > > There is clearly wrong output when mperf monitor runs in MAX_FREQ_SYSFS > > > mode: average frequency shows in kHz unit (despite the intended output > > > to be in MHz), and percentages for C state information are all wrong > > > (including high/negative values shown). > > > > > > The problem is that the max_frequency read on initialization isn't used > > > where it should have been used on mperf_get_count_percent (to estimate > > > the number of ticks in the given time period), and the value we read > > > from sysfs is in kHz, so we must divide it to get the MHz value to use > > > in current calculations. > > > > > > While at it, also I fixed another small issues in the debug output of > > > max_frequency value in mperf_get_count_freq. > > > > > > Signed-off-by: Herton R. Krzesinski > > > --- > > > > > > tools/power/cpupower/utils/idle_monitor/mperf_monitor.c | 5 +++-- > > > 1 file changed, 3 insertions(+), 2 deletions(-) > > > > Actually please consider v2 patch below, just a minor change in the debug > > output, which isn't a percentage... > > Thomas, any comments? I tried to find fitting HW without much success. The MAX_FREQ_SYSFS is used only on rare HW and it may never got a proper test. Patch looks sane. Even I cannot add a Tested-by: you may add: Acked-by: Thomas Renninger and add it. Thanks, Thomas -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] cpupower: mperf monitor: fix output in MAX_FREQ_SYSFS mode
Hi, and sorry for the late answer, this had not highest prio... On Wednesday, May 20, 2015 02:26:47 AM Rafael J. Wysocki wrote: On Monday, May 11, 2015 12:40:22 PM Herton R. Krzesinski wrote: On Mon, May 11, 2015 at 11:35:35AM -0300, Herton R. Krzesinski wrote: There is clearly wrong output when mperf monitor runs in MAX_FREQ_SYSFS mode: average frequency shows in kHz unit (despite the intended output to be in MHz), and percentages for C state information are all wrong (including high/negative values shown). The problem is that the max_frequency read on initialization isn't used where it should have been used on mperf_get_count_percent (to estimate the number of ticks in the given time period), and the value we read from sysfs is in kHz, so we must divide it to get the MHz value to use in current calculations. While at it, also I fixed another small issues in the debug output of max_frequency value in mperf_get_count_freq. Signed-off-by: Herton R. Krzesinski her...@redhat.com --- tools/power/cpupower/utils/idle_monitor/mperf_monitor.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) Actually please consider v2 patch below, just a minor change in the debug output, which isn't a percentage... Thomas, any comments? I tried to find fitting HW without much success. The MAX_FREQ_SYSFS is used only on rare HW and it may never got a proper test. Patch looks sane. Even I cannot add a Tested-by: you may add: Acked-by: Thomas Renninger tr...@suse.de and add it. Thanks, Thomas -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] cpupower: fix breakage from libpci API change
Hi, On Monday, April 13, 2015 10:24:01 PM Lucas Stach wrote: > libpci 3.3.0 introduced an additional member in the pci_filter struct > which needs to be initialized to -1 to get the same behavior as before > the API change. Sounds not that clever, but there probably is a reason for this... I am not that familiar with the pci lib and its recent changes, but below patch looks reasonable. Acked-by: Thomas Renninger > The libpci internal helpers got updated accordingly, > but as the cpupower pci helpers initialized the struct themselves the > behavior changed. > > Use the libpci helper pci_filter_init() to fix this and guard against > similar breakages in the future. > > This fixes probing of the AMD fam12h/14h cpuidle monitor on systems > with libpci >= 3.3.0. > > Signed-off-by: Lucas Stach > --- > tools/power/cpupower/utils/helpers/pci.c | 11 +-- > 1 file changed, 9 insertions(+), 2 deletions(-) > > diff --git a/tools/power/cpupower/utils/helpers/pci.c > b/tools/power/cpupower/utils/helpers/pci.c index 9690798..8b27898 100644 > --- a/tools/power/cpupower/utils/helpers/pci.c > +++ b/tools/power/cpupower/utils/helpers/pci.c > @@ -25,14 +25,21 @@ > struct pci_dev *pci_acc_init(struct pci_access **pacc, int domain, int bus, > int slot, int func, int vendor, int dev) > { > - struct pci_filter filter_nb_link = { domain, bus, slot, func, > - vendor, dev }; > + struct pci_filter filter_nb_link; > struct pci_dev *device; > > *pacc = pci_alloc(); > if (*pacc == NULL) > return NULL; > > + pci_filter_init(*pacc, _nb_link); > + filter_nb_link.domain = domain; > + filter_nb_link.bus = bus; > + filter_nb_link.slot = slot; > + filter_nb_link.func = func; > + filter_nb_link.vendor = vendor; > + filter_nb_link.device = dev; > + > pci_init(*pacc); > pci_scan_bus(*pacc); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] cpupower: fix breakage from libpci API change
Hi, On Monday, April 13, 2015 10:24:01 PM Lucas Stach wrote: libpci 3.3.0 introduced an additional member in the pci_filter struct which needs to be initialized to -1 to get the same behavior as before the API change. Sounds not that clever, but there probably is a reason for this... I am not that familiar with the pci lib and its recent changes, but below patch looks reasonable. Acked-by: Thomas Renninger tr...@suse.de The libpci internal helpers got updated accordingly, but as the cpupower pci helpers initialized the struct themselves the behavior changed. Use the libpci helper pci_filter_init() to fix this and guard against similar breakages in the future. This fixes probing of the AMD fam12h/14h cpuidle monitor on systems with libpci = 3.3.0. Signed-off-by: Lucas Stach d...@lynxeye.de --- tools/power/cpupower/utils/helpers/pci.c | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/tools/power/cpupower/utils/helpers/pci.c b/tools/power/cpupower/utils/helpers/pci.c index 9690798..8b27898 100644 --- a/tools/power/cpupower/utils/helpers/pci.c +++ b/tools/power/cpupower/utils/helpers/pci.c @@ -25,14 +25,21 @@ struct pci_dev *pci_acc_init(struct pci_access **pacc, int domain, int bus, int slot, int func, int vendor, int dev) { - struct pci_filter filter_nb_link = { domain, bus, slot, func, - vendor, dev }; + struct pci_filter filter_nb_link; struct pci_dev *device; *pacc = pci_alloc(); if (*pacc == NULL) return NULL; + pci_filter_init(*pacc, filter_nb_link); + filter_nb_link.domain = domain; + filter_nb_link.bus = bus; + filter_nb_link.slot = slot; + filter_nb_link.func = func; + filter_nb_link.vendor = vendor; + filter_nb_link.device = dev; + pci_init(*pacc); pci_scan_bus(*pacc); -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bad rpath in cpupower with 4.0-rcX
On Tuesday, March 10, 2015 11:30:44 PM Rafael J. Wysocki wrote: > On Tuesday, March 10, 2015 08:33:54 AM Josh Boyer wrote: > > On Fri, Mar 6, 2015 at 8:47 AM, Josh Boyer wrote: > > > Hi All, > > > > > > Commit 5c1de006e8e66 (cpupower Makefile change to help run the tool > > > without 'make install') added an rpath to the cpupower binary. From > > > what I can understand, this is to make it easier to run cpupower from > > > the local build directory without having to run make install. It does > > > accomplish that, but it also leaves the binary with the rpath in it > > > which is considered bad practice. It also causes cpupower to fail in > > > rpmbuild with the following error: > > > > > > ERROR 0004: file '/usr/bin/cpupower' contains an insecure rpath './' > > > in [./] > > > error: Bad exit status from /var/tmp/rpm-tmp.A6u26r (%install) > > > > > > Bad exit status from /var/tmp/rpm-tmp.A6u26r (%install) > > > > > > I understand the want for eased development, but couldn't people just > > > set LD_LIBRARY_PATH instead? > > > > No comments on this? Should I just send a revert patch instead? I agree adding . as library path was not a good idea. This probably is also a potential security issue. What I do is building the libraries statically into the binary when testing. I will send a Makefile change introducing a static = true/false variable to do that easily. > You can do that. Thanks for sending the revert, it's the right thing to do. Thomas -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bad rpath in cpupower with 4.0-rcX
On Tuesday, March 10, 2015 11:30:44 PM Rafael J. Wysocki wrote: On Tuesday, March 10, 2015 08:33:54 AM Josh Boyer wrote: On Fri, Mar 6, 2015 at 8:47 AM, Josh Boyer jwbo...@fedoraproject.org wrote: Hi All, Commit 5c1de006e8e66 (cpupower Makefile change to help run the tool without 'make install') added an rpath to the cpupower binary. From what I can understand, this is to make it easier to run cpupower from the local build directory without having to run make install. It does accomplish that, but it also leaves the binary with the rpath in it which is considered bad practice. It also causes cpupower to fail in rpmbuild with the following error: ERROR 0004: file '/usr/bin/cpupower' contains an insecure rpath './' in [./] error: Bad exit status from /var/tmp/rpm-tmp.A6u26r (%install) Bad exit status from /var/tmp/rpm-tmp.A6u26r (%install) I understand the want for eased development, but couldn't people just set LD_LIBRARY_PATH instead? No comments on this? Should I just send a revert patch instead? I agree adding . as library path was not a good idea. This probably is also a potential security issue. What I do is building the libraries statically into the binary when testing. I will send a Makefile change introducing a static = true/false variable to do that easily. You can do that. Thanks for sending the revert, it's the right thing to do. Thomas -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] Fix no idle state information return value
On Sunday, December 14, 2014 09:06:38 AM Prarit Bhargava wrote: > sysfs_get_idlestate_count() returns an unsigned int. Returning -ENODEV > is not the right thing to do here, and in any case is handled the same > way as if there are no states found. Yep, returning zero states instead of an error code makes a lot sense here. > > Cc: Thomas Renninger > Cc: Rafael J. Wysocki > Signed-off-by: Prarit Bhargava Acked-by: Thomas Renninger Thanks! Thomas > --- > tools/power/cpupower/utils/helpers/sysfs.c |2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/tools/power/cpupower/utils/helpers/sysfs.c > b/tools/power/cpupower/utils/helpers/sysfs.c index 09afe5d..4e8fe2c 100644 > --- a/tools/power/cpupower/utils/helpers/sysfs.c > +++ b/tools/power/cpupower/utils/helpers/sysfs.c > @@ -361,7 +361,7 @@ unsigned int sysfs_get_idlestate_count(unsigned int cpu) > > snprintf(file, SYSFS_PATH_MAX, PATH_TO_CPU "cpuidle"); > if (stat(file, ) != 0 || !S_ISDIR(statbuf.st_mode)) > - return -ENODEV; > + return 0; > > snprintf(file, SYSFS_PATH_MAX, PATH_TO_CPU "cpu%u/cpuidle/state0", cpu); > if (stat(file, ) != 0 || !S_ISDIR(statbuf.st_mode)) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] cpupower: Correctly detect if running as root
On Sunday, December 14, 2014 10:29:24 PM Rafael J. Wysocki wrote: > On Sunday, December 14, 2014 01:36:52 PM Michal Privoznik wrote: > > Some operations, like frequency-set, need root privileges. However, > > the way that this is detected is not correct. The getuid() is called, > > while in fact geteuid() should be. This way we can allow > > distributions or users to set SETUID flags on the cpupower binary if > > they want to and let regular users change the cpu frequency governor. > > > > Signed-off-by: Michal Privoznik > > An ACK from Thomas is needed. Patch is correct, Thanks! Acked-by: Thomas Renninger Thomas > > > --- > > > > tools/power/cpupower/utils/cpupower.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/tools/power/cpupower/utils/cpupower.c > > b/tools/power/cpupower/utils/cpupower.c index 7cdcf88..9ea9143 100644 > > --- a/tools/power/cpupower/utils/cpupower.c > > +++ b/tools/power/cpupower/utils/cpupower.c > > @@ -199,7 +199,7 @@ int main(int argc, const char *argv[]) > > > > } > > > > get_cpu_info(0, _cpu_info); > > > > - run_as_root = !getuid(); > > + run_as_root = !geteuid(); > > > > if (run_as_root) { > > > > ret = uname(); > > if (!ret && !strcmp(uts.machine, "x86_64") && -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] cpupower: Correctly detect if running as root
On Sunday, December 14, 2014 10:29:24 PM Rafael J. Wysocki wrote: On Sunday, December 14, 2014 01:36:52 PM Michal Privoznik wrote: Some operations, like frequency-set, need root privileges. However, the way that this is detected is not correct. The getuid() is called, while in fact geteuid() should be. This way we can allow distributions or users to set SETUID flags on the cpupower binary if they want to and let regular users change the cpu frequency governor. Signed-off-by: Michal Privoznik mpriv...@redhat.com An ACK from Thomas is needed. Patch is correct, Thanks! Acked-by: Thomas Renninger tr...@suse.de Thomas --- tools/power/cpupower/utils/cpupower.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/power/cpupower/utils/cpupower.c b/tools/power/cpupower/utils/cpupower.c index 7cdcf88..9ea9143 100644 --- a/tools/power/cpupower/utils/cpupower.c +++ b/tools/power/cpupower/utils/cpupower.c @@ -199,7 +199,7 @@ int main(int argc, const char *argv[]) } get_cpu_info(0, cpupower_cpu_info); - run_as_root = !getuid(); + run_as_root = !geteuid(); if (run_as_root) { ret = uname(uts); if (!ret !strcmp(uts.machine, x86_64) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] Fix no idle state information return value
On Sunday, December 14, 2014 09:06:38 AM Prarit Bhargava wrote: sysfs_get_idlestate_count() returns an unsigned int. Returning -ENODEV is not the right thing to do here, and in any case is handled the same way as if there are no states found. Yep, returning zero states instead of an error code makes a lot sense here. Cc: Thomas Renninger tr...@suse.de Cc: Rafael J. Wysocki rafael.j.wyso...@intel.com Signed-off-by: Prarit Bhargava pra...@redhat.com Acked-by: Thomas Renninger tr...@suse.de Thanks! Thomas --- tools/power/cpupower/utils/helpers/sysfs.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/power/cpupower/utils/helpers/sysfs.c b/tools/power/cpupower/utils/helpers/sysfs.c index 09afe5d..4e8fe2c 100644 --- a/tools/power/cpupower/utils/helpers/sysfs.c +++ b/tools/power/cpupower/utils/helpers/sysfs.c @@ -361,7 +361,7 @@ unsigned int sysfs_get_idlestate_count(unsigned int cpu) snprintf(file, SYSFS_PATH_MAX, PATH_TO_CPU cpuidle); if (stat(file, statbuf) != 0 || !S_ISDIR(statbuf.st_mode)) - return -ENODEV; + return 0; snprintf(file, SYSFS_PATH_MAX, PATH_TO_CPU cpu%u/cpuidle/state0, cpu); if (stat(file, statbuf) != 0 || !S_ISDIR(statbuf.st_mode)) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] tools, cpupower, fix return checks for sysfs_get_idlestate_count()
On Monday, December 01, 2014 09:39:22 AM Prarit Bhargava wrote: > Red Hat and Fedora use a bug reporting tool that gathers data about > "broken" systems called sosreport. Among other things, it includes the > output of 'cpupower idle-info'. Executing 'cpupower idle-info' on a > system that has cpuidle disabled via 'cpuidle.off=1' results in a 300 > second hang in the cpupower application. > > ie) > [root@intel-brickland-05]# cpupower idle-info > Could not determine cpuidle driver > > Analyzing CPU 0: > Number of idle states: -19 > [hang] > > The problem is that the cpupower code only checks for a zero return from > sysfs_get_idlestate_count(). The function can return -ENODEV (-19) as > above. This patch fixes callers to sysfs_get_idlestate_count() to check > the right return values. > > Cc: Thomas Renninger > Cc: linux...@vger.kernel.org > Signed-off-by: Prarit Bhargava Signed-off-by: Thomas Renninger -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] tools, cpupower, fix return checks for sysfs_get_idlestate_count()
On Monday, December 01, 2014 09:39:22 AM Prarit Bhargava wrote: Red Hat and Fedora use a bug reporting tool that gathers data about broken systems called sosreport. Among other things, it includes the output of 'cpupower idle-info'. Executing 'cpupower idle-info' on a system that has cpuidle disabled via 'cpuidle.off=1' results in a 300 second hang in the cpupower application. ie) [root@intel-brickland-05]# cpupower idle-info Could not determine cpuidle driver Analyzing CPU 0: Number of idle states: -19 [hang] The problem is that the cpupower code only checks for a zero return from sysfs_get_idlestate_count(). The function can return -ENODEV (-19) as above. This patch fixes callers to sysfs_get_idlestate_count() to check the right return values. Cc: Thomas Renninger tr...@suse.de Cc: linux...@vger.kernel.org Signed-off-by: Prarit Bhargava pra...@redhat.com Signed-off-by: Thomas Renninger tr...@suse.de -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] ipmi: Setup ipmi_devintf automatically if ipmi_msghandler gets loaded
On Friday, October 24, 2014 12:40:56 PM Wilck, Martin wrote: > Thomas, > > > > > I vote for MODULE_SOFTDEP for upstream, and modalias for distros that > > > don't support MODULE_SOFTDEP yet. > > > > I vote for ipmi_msghandler.c renaming into ipmi_handler.c > > > sorry, I am not with you here. This is a small problem that should be > handled with small effort. Why apply a 3-part patch set for a problem > that can be solved with 2 modalias lines in user space? Because it is wrong. In fact this is about a 10 line change. The file move is no code change at all. With a modalias added in userspace this will show up in different distributions again and again. I agree that a modalias makes sense for SLE and this is probably what we will go for. But it would be nice to have this fixed in mainline properly. Thanks, Thomas -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] ipmi: Setup ipmi_devintf automatically if ipmi_msghandler gets loaded
On Friday, October 24, 2014 12:40:56 PM Wilck, Martin wrote: Thomas, I vote for MODULE_SOFTDEP for upstream, and modalias for distros that don't support MODULE_SOFTDEP yet. I vote for ipmi_msghandler.c renaming into ipmi_handler.c sorry, I am not with you here. This is a small problem that should be handled with small effort. Why apply a 3-part patch set for a problem that can be solved with 2 modalias lines in user space? Because it is wrong. In fact this is about a 10 line change. The file move is no code change at all. With a modalias added in userspace this will show up in different distributions again and again. I agree that a modalias makes sense for SLE and this is probably what we will go for. But it would be nice to have this fixed in mainline properly. Thanks, Thomas -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] ipmi: Setup ipmi_devintf automatically if ipmi_msghandler gets loaded
On Monday, October 20, 2014 10:28:53 AM Wilck, Martin wrote: > On Fri, 2014-10-17 at 18:14 +0200, Corey Minyard wrote: > > > > How about this. I did a little research, and there's something called > > soft module dependencies. Apparently you can add: > > > > MODULE_SOFTDEP("post: ipmi_devintf") > > > > to ipmi_msghandler.c and modprobe would load ipmi_devintf after loading > > ipmi_msghandler if it was available. I do not like this approach for 2 reasons: 1) Same as Martin mentioned: Not avail on all distros yet (not double checked, but I trust you). 2) It's still wrong or say, far away from being as it should be: Instead of autoloading via userspace (this is what I expect happens here), ipmi_devintf should directly be compiled into ipmi_msghandler module and be available as soon as ipmi_msghandler is available. This saves the overhead of an unneeded additional module (memory and loading time overhead -> going through userspace, etc.). > This is nice, but not commonly available in distro kernels so far. Sorry, it's not that nice... > AFAICS, out of the distros Fujitsu supports, only RHEL7 supports it. Bad. > I vote for MODULE_SOFTDEP for upstream, and modalias for distros that > don't support MODULE_SOFTDEP yet. I vote for ipmi_msghandler.c renaming into ipmi_handler.c I looked around a bit. Renaming is not bad and happens often. Documentation/laptops/hpfall.c got renamed into: Documentation/laptops/freefall.c history is preserved by using: git log --follow Documentation/laptops/freefall.c git blame Documentation/laptops/freefall.c works as well, showing modifications before the renaming. I'll send a patchset. Thanks, Thomas -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] ipmi: Setup ipmi_devintf automatically if ipmi_msghandler gets loaded
On Monday, October 20, 2014 10:28:53 AM Wilck, Martin wrote: On Fri, 2014-10-17 at 18:14 +0200, Corey Minyard wrote: How about this. I did a little research, and there's something called soft module dependencies. Apparently you can add: MODULE_SOFTDEP(post: ipmi_devintf) to ipmi_msghandler.c and modprobe would load ipmi_devintf after loading ipmi_msghandler if it was available. I do not like this approach for 2 reasons: 1) Same as Martin mentioned: Not avail on all distros yet (not double checked, but I trust you). 2) It's still wrong or say, far away from being as it should be: Instead of autoloading via userspace (this is what I expect happens here), ipmi_devintf should directly be compiled into ipmi_msghandler module and be available as soon as ipmi_msghandler is available. This saves the overhead of an unneeded additional module (memory and loading time overhead - going through userspace, etc.). This is nice, but not commonly available in distro kernels so far. Sorry, it's not that nice... AFAICS, out of the distros Fujitsu supports, only RHEL7 supports it. Bad. I vote for MODULE_SOFTDEP for upstream, and modalias for distros that don't support MODULE_SOFTDEP yet. I vote for ipmi_msghandler.c renaming into ipmi_handler.c I looked around a bit. Renaming is not bad and happens often. Documentation/laptops/hpfall.c got renamed into: Documentation/laptops/freefall.c history is preserved by using: git log --follow Documentation/laptops/freefall.c git blame Documentation/laptops/freefall.c works as well, showing modifications before the renaming. I'll send a patchset. Thanks, Thomas -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 2/3] ipmi: Remove ipmi_major module parameter and define global IPMI_MAJOR
Hi, On Tuesday, October 14, 2014 08:26:26 PM Corey Minyard wrote: > Sorry to top post on this, but you attached the file, so it's hard to reply > inline. > > There's no way this can go in. There's not enough major device numbers > for all the devices that exist, we have mechanisms to handle dynamically > assigning numbers, and the IPMI driver just isn't important enough to > warrant it's own major device number. Hm, I always get 248, on 2 different machines. Just luck? If 248 is still free (and according to major.h it is), why not? > The particular mechanism to pass in the major number was just a temporary > measure. It is no longer really necessary and could be removed. Ok, if I read above right, you mean: The module param to specify the major number can vanish. But I must not try to invent a new MAJOR_IPMI definition in major.h. Instead the whole thing can be dynamic (use MKDEV, remember the return value and use it later to freeing again). Thanks, Thomas > -corey > > On 10/14/2014 09:40 AM, tr...@suse.de wrote: > > There should be no need to specify the major number of /dev/ipmiX via module > parameter. Others do not need this as well. > This is the way msr.c (/dev/cpu/X/msr) is doing it as well. > > Side effect of this patch will be that the userspace interface cannot > be disabled at kernel level anymore. But this can also be enforced by not > letting userspace create the device file /dev/ipmiX. > > Signed-off-by: Thomas Renninger > CC: miny...@acm.org > > Index: kernel_ipmi/drivers/char/ipmi/ipmi_devintf.c > === > --- kernel_ipmi.orig/drivers/char/ipmi/ipmi_devintf.c > +++ kernel_ipmi/drivers/char/ipmi/ipmi_devintf.c > @@ -39,6 +39,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -867,14 +868,6 @@ static const struct file_operations ipmi > > #define DEVICE_NAME "ipmidev" > > -static int ipmi_major; > -module_param(ipmi_major, int, 0); > -MODULE_PARM_DESC(ipmi_major, "Sets the major number of the IPMI device. > By" - " default, or if you set it to zero, it will choose the next" > - " available device. Setting it to -1 will disable the" > - " interface. Other values will set the major device number" > - " to that value."); > - > /* Keep track of the devices that are registered. */ > struct ipmi_reg_list { > dev_tdev; > @@ -887,7 +880,7 @@ static struct class *ipmi_class; > > static void ipmi_new_smi(int if_num, struct device *device) > { > - dev_t dev = MKDEV(ipmi_major, if_num); > + dev_t dev = MKDEV(IPMI_MAJOR, if_num); > struct ipmi_reg_list *entry; > > entry = kmalloc(sizeof(*entry), GFP_KERNEL); > @@ -906,7 +899,7 @@ static void ipmi_new_smi(int if_num, str > > static void ipmi_smi_gone(int if_num) > { > - dev_t dev = MKDEV(ipmi_major, if_num); > + dev_t dev = MKDEV(IPMI_MAJOR, if_num); > struct ipmi_reg_list *entry; > > mutex_lock(_list_mutex); > @@ -932,9 +925,6 @@ int __init init_ipmi_devintf(void) > { > int rv; > > - if (ipmi_major < 0) > - return -EINVAL; > - > printk(KERN_INFO "ipmi device interface\n"); > > ipmi_class = class_create(THIS_MODULE, "ipmi"); > @@ -943,20 +933,16 @@ int __init init_ipmi_devintf(void) > return PTR_ERR(ipmi_class); > } > > - rv = register_chrdev(ipmi_major, DEVICE_NAME, _fops); > + rv = register_chrdev(IPMI_MAJOR, DEVICE_NAME, _fops); > if (rv < 0) { > class_destroy(ipmi_class); > - printk(KERN_ERR "ipmi: can't get major %d\n", ipmi_major); > + printk(KERN_ERR "ipmi: can't get major %d\n", IPMI_MAJOR); > return rv; > } > > - if (ipmi_major == 0) { > - ipmi_major = rv; > - } > - > rv = ipmi_smi_watcher_register(_watcher); > if (rv) { > - unregister_chrdev(ipmi_major, DEVICE_NAME); > + unregister_chrdev(IPMI_MAJOR, DEVICE_NAME); > class_destroy(ipmi_class); > printk(KERN_WARNING "ipmi: can't register smi watcher\n"); > return rv; > @@ -977,5 +963,5 @@ void cleanup_ipmi_dev(void) > mutex_unlock(_list_mutex); > class_destroy(ipmi_class); > ipmi_smi_watcher_unregister(_watcher); > - unregister_chrdev(ipmi_major, DEVICE_NAME); > + unregister_chrdev(IPMI_MAJOR, DEVICE_NAME); > } > Index: kernel_ipmi/include/uapi/linu
Re: [patch 1/3] ipmi: Setup ipmi_devintf automatically if ipmi_msghandler gets loaded
Hi, On Tuesday, October 14, 2014 08:22:20 PM Corey Minyard wrote: > On 10/14/2014 09:40 AM, tr...@suse.de wrote: > > This removes the ipmi_devintf to be a module, but it will automatically > > compiled in if ipmi_msghandler is set. > > > > ipmi_msghandler module is renamed to ipmi_handler because of the name > > clash with the ipmi_msghandler.o object file (see Makefile for details). > > Alternatively ipmi_msghandler.c could be renamed to ipmi_handler.c, but > > not polluting git history should be more of an advantage than module > > renaming. > > > > cleanup_ipmi_dev() and init_ipmi_devintf() implemented in ipmi_devintf.c > > are are simply declared ipmi_msghandler.c without creating a separate > > header file which looks more reasonable to me. Hope that is ok. > > One minor style issue: the function definitions should really be in a .h > file. Ok. > Renaming the module is also probably a bad idea. If this was to go in, > it would certainly need to keep the ipmi_msghandler.ko name to avoid > complete breakage all over the place. ? What should break? It should never be needed to load ipmi_msghandler manually. Even if autoloading does not work, loading the low level serving driver (e.g. ipmi_si) is enough. This one will request ipmi_(msg)handler.ko automatically. I excpect this is the same for all other ipmi HW serving drivers, right? > In that vein, I am worried that this would just result in a lot of work > for all the distros that have set up this already. We also loaded the ipmi driver manually via the OpenIPMI init service which simply tried to load ipmi drivers. This results in boot load time, resource and code overhead. As ipmi autoload seem to work fine for recent HW these days, the userspace interface should load as well, otherwise the whole autoloading is more or less useless. > I'm trying to see the pros and cons of > this, and I can't see that the pros outweigh the cons. Do you have people > that have issues with this? Yes. In fact Fujitsu opened a bug for this. Unfortunately I cannot set it public as it is against SLES12. If I find a way, I'll reference it in the changelog: https://bugzilla.novell.com/show_bug.cgi?id=893181 Subject: ipmi_devintf is not available when IPMI device is detected I already added Martin to CC: of the patch changelog, but quilt may not have recogized it and not added him to CC? Beside Fujitsu, I'd like to have this fixed as well. We adjust BMC settings via after install script on our servers via ipmitool. This does not work because of this bug. One has to: modprobe ipmi_devintf This is overhead and makes the whole autoloading mechanism useless. And even worse, ipmi_devintf seem to load on all machines whether they have an ipmi controller or not. > > In fact there already was a kind of autoloading mechanism (gets deleted > > with this patch). I interpret this line that ipmi_devintf should get > > autoloaded if ipmi_si gets loaded?: > > -MODULE_ALIAS("platform:ipmi_si"); > > > > But: > > - It's wrong: There are other low lever drivers which can be used by > > > > ipmi_devintf > > > > - It does not work (anymore?) > > - There is no need to keep ipmi_devintf as a module (resource and load > > time > > > > overhead) > > This change is certainly a good idea. Whatever it was trying to do is > wrong. Thanks. I am convinced that the right way to go for is to integrate the ipmi_devintf into ipmi_msghandler. I see 3 options how to do this: - rename ipmi_msghandler.ko to ipmi_handler.ko As this only is an interface provider for other modules getting loaded/requested automatically and never needs to be loaded manually (pls correct me if I am wrong), I would like to go for this option. - rename ipmi_msghandler.c to ipmi_handler.c -> git history lost. But should still be seen via: git log --follow? - Someone finds another way how to modify the Makefile to achieve above. Thanks, Thomas -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] ipmi: Setup ipmi_devintf automatically if ipmi_msghandler gets loaded
Hi, On Tuesday, October 14, 2014 08:22:20 PM Corey Minyard wrote: On 10/14/2014 09:40 AM, tr...@suse.de wrote: This removes the ipmi_devintf to be a module, but it will automatically compiled in if ipmi_msghandler is set. ipmi_msghandler module is renamed to ipmi_handler because of the name clash with the ipmi_msghandler.o object file (see Makefile for details). Alternatively ipmi_msghandler.c could be renamed to ipmi_handler.c, but not polluting git history should be more of an advantage than module renaming. cleanup_ipmi_dev() and init_ipmi_devintf() implemented in ipmi_devintf.c are are simply declared ipmi_msghandler.c without creating a separate header file which looks more reasonable to me. Hope that is ok. One minor style issue: the function definitions should really be in a .h file. Ok. Renaming the module is also probably a bad idea. If this was to go in, it would certainly need to keep the ipmi_msghandler.ko name to avoid complete breakage all over the place. ? What should break? It should never be needed to load ipmi_msghandler manually. Even if autoloading does not work, loading the low level serving driver (e.g. ipmi_si) is enough. This one will request ipmi_(msg)handler.ko automatically. I excpect this is the same for all other ipmi HW serving drivers, right? In that vein, I am worried that this would just result in a lot of work for all the distros that have set up this already. We also loaded the ipmi driver manually via the OpenIPMI init service which simply tried to load ipmi drivers. This results in boot load time, resource and code overhead. As ipmi autoload seem to work fine for recent HW these days, the userspace interface should load as well, otherwise the whole autoloading is more or less useless. I'm trying to see the pros and cons of this, and I can't see that the pros outweigh the cons. Do you have people that have issues with this? Yes. In fact Fujitsu opened a bug for this. Unfortunately I cannot set it public as it is against SLES12. If I find a way, I'll reference it in the changelog: https://bugzilla.novell.com/show_bug.cgi?id=893181 Subject: ipmi_devintf is not available when IPMI device is detected I already added Martin to CC: of the patch changelog, but quilt may not have recogized it and not added him to CC? Beside Fujitsu, I'd like to have this fixed as well. We adjust BMC settings via after install script on our servers via ipmitool. This does not work because of this bug. One has to: modprobe ipmi_devintf This is overhead and makes the whole autoloading mechanism useless. And even worse, ipmi_devintf seem to load on all machines whether they have an ipmi controller or not. In fact there already was a kind of autoloading mechanism (gets deleted with this patch). I interpret this line that ipmi_devintf should get autoloaded if ipmi_si gets loaded?: -MODULE_ALIAS(platform:ipmi_si); But: - It's wrong: There are other low lever drivers which can be used by ipmi_devintf - It does not work (anymore?) - There is no need to keep ipmi_devintf as a module (resource and load time overhead) This change is certainly a good idea. Whatever it was trying to do is wrong. Thanks. I am convinced that the right way to go for is to integrate the ipmi_devintf into ipmi_msghandler. I see 3 options how to do this: - rename ipmi_msghandler.ko to ipmi_handler.ko As this only is an interface provider for other modules getting loaded/requested automatically and never needs to be loaded manually (pls correct me if I am wrong), I would like to go for this option. - rename ipmi_msghandler.c to ipmi_handler.c - git history lost. But should still be seen via: git log --follow? - Someone finds another way how to modify the Makefile to achieve above. Thanks, Thomas -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 2/3] ipmi: Remove ipmi_major module parameter and define global IPMI_MAJOR
Hi, On Tuesday, October 14, 2014 08:26:26 PM Corey Minyard wrote: Sorry to top post on this, but you attached the file, so it's hard to reply inline. There's no way this can go in. There's not enough major device numbers for all the devices that exist, we have mechanisms to handle dynamically assigning numbers, and the IPMI driver just isn't important enough to warrant it's own major device number. Hm, I always get 248, on 2 different machines. Just luck? If 248 is still free (and according to major.h it is), why not? The particular mechanism to pass in the major number was just a temporary measure. It is no longer really necessary and could be removed. Ok, if I read above right, you mean: The module param to specify the major number can vanish. But I must not try to invent a new MAJOR_IPMI definition in major.h. Instead the whole thing can be dynamic (use MKDEV, remember the return value and use it later to freeing again). Thanks, Thomas -corey On 10/14/2014 09:40 AM, tr...@suse.de wrote: There should be no need to specify the major number of /dev/ipmiX via module parameter. Others do not need this as well. This is the way msr.c (/dev/cpu/X/msr) is doing it as well. Side effect of this patch will be that the userspace interface cannot be disabled at kernel level anymore. But this can also be enforced by not letting userspace create the device file /dev/ipmiX. Signed-off-by: Thomas Renninger tr...@suse.de CC: miny...@acm.org Index: kernel_ipmi/drivers/char/ipmi/ipmi_devintf.c === --- kernel_ipmi.orig/drivers/char/ipmi/ipmi_devintf.c +++ kernel_ipmi/drivers/char/ipmi/ipmi_devintf.c @@ -39,6 +39,7 @@ #include linux/spinlock.h #include linux/slab.h #include linux/ipmi.h +#include linux/major.h #include linux/mutex.h #include linux/init.h #include linux/device.h @@ -867,14 +868,6 @@ static const struct file_operations ipmi #define DEVICE_NAME ipmidev -static int ipmi_major; -module_param(ipmi_major, int, 0); -MODULE_PARM_DESC(ipmi_major, Sets the major number of the IPMI device. By - default, or if you set it to zero, it will choose the next - available device. Setting it to -1 will disable the - interface. Other values will set the major device number - to that value.); - /* Keep track of the devices that are registered. */ struct ipmi_reg_list { dev_tdev; @@ -887,7 +880,7 @@ static struct class *ipmi_class; static void ipmi_new_smi(int if_num, struct device *device) { - dev_t dev = MKDEV(ipmi_major, if_num); + dev_t dev = MKDEV(IPMI_MAJOR, if_num); struct ipmi_reg_list *entry; entry = kmalloc(sizeof(*entry), GFP_KERNEL); @@ -906,7 +899,7 @@ static void ipmi_new_smi(int if_num, str static void ipmi_smi_gone(int if_num) { - dev_t dev = MKDEV(ipmi_major, if_num); + dev_t dev = MKDEV(IPMI_MAJOR, if_num); struct ipmi_reg_list *entry; mutex_lock(reg_list_mutex); @@ -932,9 +925,6 @@ int __init init_ipmi_devintf(void) { int rv; - if (ipmi_major 0) - return -EINVAL; - printk(KERN_INFO ipmi device interface\n); ipmi_class = class_create(THIS_MODULE, ipmi); @@ -943,20 +933,16 @@ int __init init_ipmi_devintf(void) return PTR_ERR(ipmi_class); } - rv = register_chrdev(ipmi_major, DEVICE_NAME, ipmi_fops); + rv = register_chrdev(IPMI_MAJOR, DEVICE_NAME, ipmi_fops); if (rv 0) { class_destroy(ipmi_class); - printk(KERN_ERR ipmi: can't get major %d\n, ipmi_major); + printk(KERN_ERR ipmi: can't get major %d\n, IPMI_MAJOR); return rv; } - if (ipmi_major == 0) { - ipmi_major = rv; - } - rv = ipmi_smi_watcher_register(smi_watcher); if (rv) { - unregister_chrdev(ipmi_major, DEVICE_NAME); + unregister_chrdev(IPMI_MAJOR, DEVICE_NAME); class_destroy(ipmi_class); printk(KERN_WARNING ipmi: can't register smi watcher\n); return rv; @@ -977,5 +963,5 @@ void cleanup_ipmi_dev(void) mutex_unlock(reg_list_mutex); class_destroy(ipmi_class); ipmi_smi_watcher_unregister(smi_watcher); - unregister_chrdev(ipmi_major, DEVICE_NAME); + unregister_chrdev(IPMI_MAJOR, DEVICE_NAME); } Index: kernel_ipmi/include/uapi/linux/major.h === --- kernel_ipmi.orig/include/uapi/linux/major.h +++ kernel_ipmi/include/uapi/linux/major.h @@ -173,6 +173,8 @@ #define VIOTAPE_MAJOR230 +#define IPMI_MAJOR 248 + #define BLOCK_EXT_MAJOR 259 #define SCSI_OSD_MAJOR 260 /* open-osd's OSD scsi device */ -- To unsubscribe from this list: send the line
Re: [PATCH 2/4] ACPICA: Introduce new acpi_os_physical_table_add OS callback
On Wednesday, March 05, 2014 06:26:54 AM Zheng, Lv wrote: ... > > > After the table manager cleanups are tested and shipped in the ACPICA > > > repo, > > > the new facilities will automatically be rolled into Linux branches. > > > > I'd suggest to just wait for that. > > Best already try to integrate the ACPI table override/add part as you > > think > > it should work without additional changes in drivers/acpi/acpica. > > > > If this happened and things are submitted to get integrated into the Linux > > kernel, please add me to CC or point me to the patchset. > > Fortunately, there is a kernel bugzilla entry requires this series. > I've posted it on the kernel bugzilla. > Here is the patchset: > https://bugzilla.kernel.org/show_bug.cgi?id=69711 > You need to apply 9 patches: > attachment 128061 - attachment 128141 > The last patch has introduced an early boot API: > acpi_install_table. > This can be used to enhance this series. Ok, thanks. Be aware that I am super busy right now. This week is impossible, I cannot promise when I will have time for that. Thomas -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/4] ACPICA: Introduce new acpi_os_physical_table_add OS callback
On Wednesday, March 05, 2014 06:26:54 AM Zheng, Lv wrote: ... After the table manager cleanups are tested and shipped in the ACPICA repo, the new facilities will automatically be rolled into Linux branches. I'd suggest to just wait for that. Best already try to integrate the ACPI table override/add part as you think it should work without additional changes in drivers/acpi/acpica. If this happened and things are submitted to get integrated into the Linux kernel, please add me to CC or point me to the patchset. Fortunately, there is a kernel bugzilla entry requires this series. I've posted it on the kernel bugzilla. Here is the patchset: https://bugzilla.kernel.org/show_bug.cgi?id=69711 You need to apply 9 patches: attachment 128061 - attachment 128141 The last patch has introduced an early boot API: acpi_install_table. This can be used to enhance this series. Ok, thanks. Be aware that I am super busy right now. This week is impossible, I cannot promise when I will have time for that. Thomas -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/4] ACPICA: Introduce new acpi_os_physical_table_add OS callback
On Tuesday, March 04, 2014 12:31:57 AM Zheng, Lv wrote: > Hi, Thomas > > > From: Thomas Renninger [mailto:tr...@suse.de] > > Sent: Monday, March 03, 2014 8:42 PM > > > > Hi Lv, > > > > On Monday, March 03, 2014 01:20:31 AM Zheng, Lv wrote: > > > Hi, Thomas > > > > > > I have a patch series that can cleanup the ACPICA table manager, and > > > change > > > > > the acpi_load_table into the following style: > > Ok. I suggest that: > > 1) If Thomas (Gleixner) or whoever wants to try out or needs it urgently, > > he> > >can (must) use a recent kernel with my patches applied. > > > > 2) You continue to get your changes into ACPICA. > > > >Eventually or best would be if you add whatever is needed to > >allow adding of tables as well (which will be there automatically if > >I understood the description of your changes correctly). > > > > 3) Either you give it a try yourself or give me short description for > > > >what I have to look out for and I can re-post the Linux patches > >based on your ACPICA changes, once they show up in the Linux kernel. > >Best give me a ping as soon as I should look at it. > > That sounds good. > > Or it can be more efficient for productions: > Linux can merge your patches and ACPICA just stop to take them. You mean add my stuff to drivers/acpi/acpica in Linux kernel without pushing them into the ACPICA repository? I cannot remember that this ever happened (beside small important fixes) and I doubt Rafael is willing to do that. If it would be super critical, but I cannot see that it is. > This will leave us divergences. Yes, that would be bad. > After the table manager cleanups are tested and shipped in the ACPICA repo, > the new facilities will automatically be rolled into Linux branches. I'd suggest to just wait for that. Best already try to integrate the ACPI table override/add part as you think it should work without additional changes in drivers/acpi/acpica. If this happened and things are submitted to get integrated into the Linux kernel, please add me to CC or point me to the patchset. > Then I > can help to reduce the divergences using the new ACPICA facilities. At that > time I may ask whoever that can test to offer help to review the cleanup > patch. I can then give your new patcheset some testing and try to get the Linux (drivers/acpi/osl.c) parts (re-)implemented based on your stuff. There might still be the one or other minor fix needed that has to go into acpica as well, but that should not be that hard to manage and might end up in acpica and Linux kernel in parallel without much extra overhead. Thomas -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/4] ACPICA: Introduce new acpi_os_physical_table_add OS callback
On Tuesday, March 04, 2014 12:31:57 AM Zheng, Lv wrote: Hi, Thomas From: Thomas Renninger [mailto:tr...@suse.de] Sent: Monday, March 03, 2014 8:42 PM Hi Lv, On Monday, March 03, 2014 01:20:31 AM Zheng, Lv wrote: Hi, Thomas I have a patch series that can cleanup the ACPICA table manager, and change the acpi_load_table into the following style: Ok. I suggest that: 1) If Thomas (Gleixner) or whoever wants to try out or needs it urgently, he can (must) use a recent kernel with my patches applied. 2) You continue to get your changes into ACPICA. Eventually or best would be if you add whatever is needed to allow adding of tables as well (which will be there automatically if I understood the description of your changes correctly). 3) Either you give it a try yourself or give me short description for what I have to look out for and I can re-post the Linux patches based on your ACPICA changes, once they show up in the Linux kernel. Best give me a ping as soon as I should look at it. That sounds good. Or it can be more efficient for productions: Linux can merge your patches and ACPICA just stop to take them. You mean add my stuff to drivers/acpi/acpica in Linux kernel without pushing them into the ACPICA repository? I cannot remember that this ever happened (beside small important fixes) and I doubt Rafael is willing to do that. If it would be super critical, but I cannot see that it is. This will leave us divergences. Yes, that would be bad. After the table manager cleanups are tested and shipped in the ACPICA repo, the new facilities will automatically be rolled into Linux branches. I'd suggest to just wait for that. Best already try to integrate the ACPI table override/add part as you think it should work without additional changes in drivers/acpi/acpica. If this happened and things are submitted to get integrated into the Linux kernel, please add me to CC or point me to the patchset. Then I can help to reduce the divergences using the new ACPICA facilities. At that time I may ask whoever that can test to offer help to review the cleanup patch. I can then give your new patcheset some testing and try to get the Linux (drivers/acpi/osl.c) parts (re-)implemented based on your stuff. There might still be the one or other minor fix needed that has to go into acpica as well, but that should not be that hard to manage and might end up in acpica and Linux kernel in parallel without much extra overhead. Thomas -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/4] ACPICA: Introduce new acpi_os_physical_table_add OS callback
Hi Lv, On Monday, March 03, 2014 01:20:31 AM Zheng, Lv wrote: > Hi, Thomas > > I have a patch series that can cleanup the ACPICA table manager, and change > the acpi_load_table into the following style: Ok. I suggest that: 1) If Thomas (Gleixner) or whoever wants to try out or needs it urgently, he can (must) use a recent kernel with my patches applied. 2) You continue to get your changes into ACPICA. Eventually or best would be if you add whatever is needed to allow adding of tables as well (which will be there automatically if I understood the description of your changes correctly). 3) Either you give it a try yourself or give me short description for what I have to look out for and I can re-post the Linux patches based on your ACPICA changes, once they show up in the Linux kernel. Best give me a ping as soon as I should look at it. Thanks! Thomas PS: You might want to add the BGRT definition in ACPICA already, the tiny one line-liner... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/4] ACPICA: Introduce new acpi_os_physical_table_add OS callback
Hi Lv, On Monday, March 03, 2014 01:20:31 AM Zheng, Lv wrote: Hi, Thomas I have a patch series that can cleanup the ACPICA table manager, and change the acpi_load_table into the following style: Ok. I suggest that: 1) If Thomas (Gleixner) or whoever wants to try out or needs it urgently, he can (must) use a recent kernel with my patches applied. 2) You continue to get your changes into ACPICA. Eventually or best would be if you add whatever is needed to allow adding of tables as well (which will be there automatically if I understood the description of your changes correctly). 3) Either you give it a try yourself or give me short description for what I have to look out for and I can re-post the Linux patches based on your ACPICA changes, once they show up in the Linux kernel. Best give me a ping as soon as I should look at it. Thanks! Thomas PS: You might want to add the BGRT definition in ACPICA already, the tiny one line-liner... -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/4] ACPI: Provide support for ACPI table adding via OS
Latest changes are compile tested only! If this gets serialized/merged and accepted in acpica in some form with whatever other stuff currently added, please drop me a mail. I can then submit the Linux parts again to the kernel people with the documentation adjusted as well: Documentation/acpi/initrd_table_override.txt Thanks, Thomas -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 4/4] ACPI: Add new table signatures that can be overridden/added.
Signed-off-by: Thomas Renninger CC: h...@zytor.com CC: t...@linutronix.de CC: c...@conrad-kostecki.de CC: linux-kernel@vger.kernel.org CC: x...@kernel.org CC: mi...@redhat.com CC: r...@rjwysocki.net CC: de...@acpica.org --- drivers/acpi/osl.c |4 +++- 1 files changed, 3 insertions(+), 1 deletions(-) diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c index 7f9a865..23ce5b1 100644 --- a/drivers/acpi/osl.c +++ b/drivers/acpi/osl.c @@ -38,6 +38,7 @@ #include #include #include +# define ACPI_UNDEFINED_TABLES 1 #include #include #include @@ -559,7 +560,8 @@ static const char * const table_sigs[] = { ACPI_SIG_IBFT, ACPI_SIG_IVRS, ACPI_SIG_MCFG, ACPI_SIG_MCHI, ACPI_SIG_SLIC, ACPI_SIG_SPCR, ACPI_SIG_SPMI, ACPI_SIG_TCPA, ACPI_SIG_UEFI, ACPI_SIG_WAET, ACPI_SIG_WDAT, ACPI_SIG_WDDT, - ACPI_SIG_WDRT, ACPI_SIG_DSDT, ACPI_SIG_FADT, ACPI_SIG_PSDT, + ACPI_SIG_WDRT, ACPI_SIG_BGRT, ACPI_SIG_ATKG, ACPI_SIG_GSCI, + ACPI_SIG_IEIT, ACPI_SIG_DSDT, ACPI_SIG_FADT, ACPI_SIG_PSDT, ACPI_SIG_RSDT, ACPI_SIG_XSDT, ACPI_SIG_SSDT, NULL }; #define ACPI_HEADER_SIZE sizeof(struct acpi_table_header) -- 1.7.6.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/4] ACPICA: Add BGRT signature to known signatures
In Linux there even exists a driver already making use of this table: drivers/acpi/bgrt.c:MODULE_DESCRIPTION("BGRT boot graphic support"); Signed-off-by: Thomas Renninger CC: h...@zytor.com CC: t...@linutronix.de CC: c...@conrad-kostecki.de CC: linux-kernel@vger.kernel.org CC: x...@kernel.org CC: mi...@redhat.com CC: r...@rjwysocki.net CC: de...@acpica.org --- include/acpi/actbl2.h |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/include/acpi/actbl2.h b/include/acpi/actbl2.h index 094a906..9ed1c20 100644 --- a/include/acpi/actbl2.h +++ b/include/acpi/actbl2.h @@ -83,6 +83,7 @@ #define ACPI_SIG_WDAT "WDAT" /* Watchdog Action Table */ #define ACPI_SIG_WDDT "WDDT" /* Watchdog Timer Description Table */ #define ACPI_SIG_WDRT "WDRT" /* Watchdog Resource Table */ +#define ACPI_SIG_BGRT "BGRT" /* Boot Graphic Support */ #ifdef ACPI_UNDEFINED_TABLES /* -- 1.7.6.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/4] ACPI: Provide support for ACPI table adding via OS
This is done the same way as the previous ACPI physical table override mechanism. How to override or add tables via initrd, please look up: Documentation/acpi/initrd_table_override.txt SSDTs can only be overridden, not added. Overriding only happens if the OEM id of the table header matches the one with the BIOS provided one. All table types (SSDTs are an exception), must only show up once. So either you: - Add a fresh new table for adding of which type (signature) none exists in the BIOS -> OS ACPI table adding happens. or - Add a table which already exists in BIOS, but the OEM id must match the one of the table of the same type (signature) that exists in BIOS already -> OS ACPI table overriding happens Typically one copies away the original ACPI table, disassembles, modifies (for example adding debug strings), compiles it and provides the table via initrd for overriding (will have the same OEM id). But this is not necessary, one could also come up with a selfmade table for overriding, by taking care that the signature and OEM id is the same as the one provided by BIOS In ACPI table overriding case you see in dmesg: ACPI: Override [DSDT- BXDSDT], this is unsafe: tainting kernel Disabling lock debugging due to kernel taint In ACPI table adding case you see in dmesg (BGRT table got added): ACPI: Add [BGRT-SLIC-WKS], this is unsafe: tainting kernel ACPI: BGRT 7fffd1ba 38 (v00 HPQOEM SLIC-WKS 01072009 INTL 20130823) Signed-off-by: Thomas Renninger CC: h...@zytor.com CC: t...@linutronix.de CC: c...@conrad-kostecki.de CC: linux-kernel@vger.kernel.org CC: x...@kernel.org CC: mi...@redhat.com CC: r...@rjwysocki.net CC: de...@acpica.org --- drivers/acpi/osl.c | 91 +++- 1 files changed, 90 insertions(+), 1 deletions(-) diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c index fc1aa79..7f9a865 100644 --- a/drivers/acpi/osl.c +++ b/drivers/acpi/osl.c @@ -566,6 +566,8 @@ static const char * const table_sigs[] = { #define ACPI_OVERRIDE_TABLES 64 static struct cpio_data __initdata acpi_initrd_files[ACPI_OVERRIDE_TABLES]; +/* Remember physical address of overriden or added tables */ +static acpi_physical_address acpi_table_overridden[ACPI_OVERRIDE_TABLES]; #define MAP_CHUNK_SIZE (NR_FIX_BTMAPS << PAGE_SHIFT) @@ -715,7 +717,7 @@ acpi_os_physical_table_override(struct acpi_table_header *existing_table, *address = 0; return AE_OK; #else - int table_offset = 0; + int no, table_offset = 0; struct acpi_table_header *table; *table_length = 0; @@ -759,6 +761,12 @@ acpi_os_physical_table_override(struct acpi_table_header *existing_table, *table_length = table->length; acpi_os_unmap_memory(table, ACPI_HEADER_SIZE); *address = acpi_tables_addr + table_offset; + for (no = 0; no < ACPI_OVERRIDE_TABLES; no++) { + if (acpi_table_overridden[no] == 0) { + acpi_table_overridden[no] = *address; + break; + } + } break; } while (table_offset + ACPI_HEADER_SIZE < all_tables_size); @@ -768,6 +776,87 @@ acpi_os_physical_table_override(struct acpi_table_header *existing_table, #endif } +acpi_status +acpi_os_physical_table_add(acpi_physical_address *address, + u32 *table_length) +{ +#ifndef CONFIG_ACPI_INITRD_TABLE_OVERRIDE + *table_length = 0; + *address = 0; + return AE_OK; +#else + int no, table_offset = 0; + struct acpi_table_header *table; + + *table_length = 0; + *address = 0; + + if (!acpi_tables_addr) + return AE_OK; + + do { + if (table_offset + ACPI_HEADER_SIZE > all_tables_size) { + WARN_ON(1); + return AE_OK; + } + + table = acpi_os_map_memory(acpi_tables_addr + table_offset, + ACPI_HEADER_SIZE); + + if (table_offset + table->length > all_tables_size) { + acpi_os_unmap_memory(table, ACPI_HEADER_SIZE); + WARN_ON(1); + return AE_OK; + } + + table_offset += table->length; + + /* Do not add SSDTs, only if acpi_no_auto_ssdt boot parameter + got set. + */ + if (acpi_gbl_disable_ssdt_table_load == FALSE && + !memcmp("SSDT", table->signature, 4)) { + pr_info("Will not add SSDT [%8.8s], use acpi_no_auto_ssdt to force\n", table->oem_table_id); + acpi_os_unmap_memory(table, +
[PATCH 2/4] ACPICA: Introduce new acpi_os_physical_table_add OS callback
This one allows OS to add arbitrary ACPI tables. ToDo: It should get checked whether a table with the same signature already exists and if this is the case, adding should not happen. Signed-off-by: Thomas Renninger CC: h...@zytor.com CC: t...@linutronix.de CC: c...@conrad-kostecki.de CC: linux-kernel@vger.kernel.org CC: x...@kernel.org CC: mi...@redhat.com CC: r...@rjwysocki.net CC: de...@acpica.org --- drivers/acpi/acpica/tbutils.c | 36 include/acpi/acpiosxf.h |6 ++ 2 files changed, 42 insertions(+), 0 deletions(-) diff --git a/drivers/acpi/acpica/tbutils.c b/drivers/acpi/acpica/tbutils.c index 6412d3c..a819d198 100644 --- a/drivers/acpi/acpica/tbutils.c +++ b/drivers/acpi/acpica/tbutils.c @@ -453,6 +453,8 @@ static acpi_status acpi_tb_validate_xsdt(acpi_physical_address xsdt_address) * **/ +#define ACPI_MAX_TABLE_ADD 64 + acpi_status __init acpi_tb_parse_root_table(acpi_physical_address rsdp_address) { struct acpi_table_rsdp *rsdp; @@ -623,5 +625,39 @@ acpi_status __init acpi_tb_parse_root_table(acpi_physical_address rsdp_address) } } + /* +* ACPI Table Add: +* Allow the OS to add additional tables to the global root table list +*/ + for (i = 0; i < ACPI_MAX_TABLE_ADD; i++) { + int tmp, k; + table_entry_size = 0; + address = 0; + status = acpi_os_physical_table_add(, + _entry_size); + if (status == AE_OK && table_entry_size && address) { + table = acpi_os_map_memory(address, table_entry_size); + for (k = 2; k < acpi_gbl_root_table_list.current_table_count; k++) { + /* + Always add SSDTs. Only allow adding of other + tables if none of such a signature already + exists. Use the override interface instead + in such a case. + */ + if (ACPI_COMPARE_NAME("SSDT", table->signature) && + !ACPI_COMPARE_NAME(_gbl_root_table_list.tables[i].signature, table->signature)) { + ACPI_INFO((AE_INFO, "OS table not added, signature already exists:")); + acpi_tb_print_table_header(address, table); + acpi_os_unmap_memory(table, table_entry_size); + } else { + ACPI_INFO((AE_INFO, "Add OS provided table:")); + acpi_tb_print_table_header(address, table); + status = acpi_tb_store_table(address, table, table_entry_size, +ACPI_TABLE_ORIGIN_MAPPED, ); + } + } + } else + break; + } return_ACPI_STATUS(AE_OK); } diff --git a/include/acpi/acpiosxf.h b/include/acpi/acpiosxf.h index 01e6c6d..70c00ed 100644 --- a/include/acpi/acpiosxf.h +++ b/include/acpi/acpiosxf.h @@ -111,6 +111,12 @@ acpi_os_physical_table_override(struct acpi_table_header *existing_table, u32 *new_table_length); #endif +#ifndef ACPI_USE_ALTERNATE_PROTOTYPE_acpi_os_physical_table_add +acpi_status +acpi_os_physical_table_add(acpi_physical_address * new_address, + u32 *new_table_length); +#endif + /* * Spinlock primitives */ -- 1.7.6.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/4] ACPICA: Introduce new acpi_os_physical_table_add OS callback
This one allows OS to add arbitrary ACPI tables. ToDo: It should get checked whether a table with the same signature already exists and if this is the case, adding should not happen. Signed-off-by: Thomas Renninger tr...@suse.de CC: h...@zytor.com CC: t...@linutronix.de CC: c...@conrad-kostecki.de CC: linux-kernel@vger.kernel.org CC: x...@kernel.org CC: mi...@redhat.com CC: r...@rjwysocki.net CC: de...@acpica.org --- drivers/acpi/acpica/tbutils.c | 36 include/acpi/acpiosxf.h |6 ++ 2 files changed, 42 insertions(+), 0 deletions(-) diff --git a/drivers/acpi/acpica/tbutils.c b/drivers/acpi/acpica/tbutils.c index 6412d3c..a819d198 100644 --- a/drivers/acpi/acpica/tbutils.c +++ b/drivers/acpi/acpica/tbutils.c @@ -453,6 +453,8 @@ static acpi_status acpi_tb_validate_xsdt(acpi_physical_address xsdt_address) * **/ +#define ACPI_MAX_TABLE_ADD 64 + acpi_status __init acpi_tb_parse_root_table(acpi_physical_address rsdp_address) { struct acpi_table_rsdp *rsdp; @@ -623,5 +625,39 @@ acpi_status __init acpi_tb_parse_root_table(acpi_physical_address rsdp_address) } } + /* +* ACPI Table Add: +* Allow the OS to add additional tables to the global root table list +*/ + for (i = 0; i ACPI_MAX_TABLE_ADD; i++) { + int tmp, k; + table_entry_size = 0; + address = 0; + status = acpi_os_physical_table_add(address, + table_entry_size); + if (status == AE_OK table_entry_size address) { + table = acpi_os_map_memory(address, table_entry_size); + for (k = 2; k acpi_gbl_root_table_list.current_table_count; k++) { + /* + Always add SSDTs. Only allow adding of other + tables if none of such a signature already + exists. Use the override interface instead + in such a case. + */ + if (ACPI_COMPARE_NAME(SSDT, table-signature) + !ACPI_COMPARE_NAME(acpi_gbl_root_table_list.tables[i].signature, table-signature)) { + ACPI_INFO((AE_INFO, OS table not added, signature already exists:)); + acpi_tb_print_table_header(address, table); + acpi_os_unmap_memory(table, table_entry_size); + } else { + ACPI_INFO((AE_INFO, Add OS provided table:)); + acpi_tb_print_table_header(address, table); + status = acpi_tb_store_table(address, table, table_entry_size, +ACPI_TABLE_ORIGIN_MAPPED, tmp); + } + } + } else + break; + } return_ACPI_STATUS(AE_OK); } diff --git a/include/acpi/acpiosxf.h b/include/acpi/acpiosxf.h index 01e6c6d..70c00ed 100644 --- a/include/acpi/acpiosxf.h +++ b/include/acpi/acpiosxf.h @@ -111,6 +111,12 @@ acpi_os_physical_table_override(struct acpi_table_header *existing_table, u32 *new_table_length); #endif +#ifndef ACPI_USE_ALTERNATE_PROTOTYPE_acpi_os_physical_table_add +acpi_status +acpi_os_physical_table_add(acpi_physical_address * new_address, + u32 *new_table_length); +#endif + /* * Spinlock primitives */ -- 1.7.6.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/4] ACPI: Provide support for ACPI table adding via OS
This is done the same way as the previous ACPI physical table override mechanism. How to override or add tables via initrd, please look up: Documentation/acpi/initrd_table_override.txt SSDTs can only be overridden, not added. Overriding only happens if the OEM id of the table header matches the one with the BIOS provided one. All table types (SSDTs are an exception), must only show up once. So either you: - Add a fresh new table for adding of which type (signature) none exists in the BIOS - OS ACPI table adding happens. or - Add a table which already exists in BIOS, but the OEM id must match the one of the table of the same type (signature) that exists in BIOS already - OS ACPI table overriding happens Typically one copies away the original ACPI table, disassembles, modifies (for example adding debug strings), compiles it and provides the table via initrd for overriding (will have the same OEM id). But this is not necessary, one could also come up with a selfmade table for overriding, by taking care that the signature and OEM id is the same as the one provided by BIOS In ACPI table overriding case you see in dmesg: ACPI: Override [DSDT- BXDSDT], this is unsafe: tainting kernel Disabling lock debugging due to kernel taint In ACPI table adding case you see in dmesg (BGRT table got added): ACPI: Add [BGRT-SLIC-WKS], this is unsafe: tainting kernel ACPI: BGRT 7fffd1ba 38 (v00 HPQOEM SLIC-WKS 01072009 INTL 20130823) Signed-off-by: Thomas Renninger tr...@suse.de CC: h...@zytor.com CC: t...@linutronix.de CC: c...@conrad-kostecki.de CC: linux-kernel@vger.kernel.org CC: x...@kernel.org CC: mi...@redhat.com CC: r...@rjwysocki.net CC: de...@acpica.org --- drivers/acpi/osl.c | 91 +++- 1 files changed, 90 insertions(+), 1 deletions(-) diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c index fc1aa79..7f9a865 100644 --- a/drivers/acpi/osl.c +++ b/drivers/acpi/osl.c @@ -566,6 +566,8 @@ static const char * const table_sigs[] = { #define ACPI_OVERRIDE_TABLES 64 static struct cpio_data __initdata acpi_initrd_files[ACPI_OVERRIDE_TABLES]; +/* Remember physical address of overriden or added tables */ +static acpi_physical_address acpi_table_overridden[ACPI_OVERRIDE_TABLES]; #define MAP_CHUNK_SIZE (NR_FIX_BTMAPS PAGE_SHIFT) @@ -715,7 +717,7 @@ acpi_os_physical_table_override(struct acpi_table_header *existing_table, *address = 0; return AE_OK; #else - int table_offset = 0; + int no, table_offset = 0; struct acpi_table_header *table; *table_length = 0; @@ -759,6 +761,12 @@ acpi_os_physical_table_override(struct acpi_table_header *existing_table, *table_length = table-length; acpi_os_unmap_memory(table, ACPI_HEADER_SIZE); *address = acpi_tables_addr + table_offset; + for (no = 0; no ACPI_OVERRIDE_TABLES; no++) { + if (acpi_table_overridden[no] == 0) { + acpi_table_overridden[no] = *address; + break; + } + } break; } while (table_offset + ACPI_HEADER_SIZE all_tables_size); @@ -768,6 +776,87 @@ acpi_os_physical_table_override(struct acpi_table_header *existing_table, #endif } +acpi_status +acpi_os_physical_table_add(acpi_physical_address *address, + u32 *table_length) +{ +#ifndef CONFIG_ACPI_INITRD_TABLE_OVERRIDE + *table_length = 0; + *address = 0; + return AE_OK; +#else + int no, table_offset = 0; + struct acpi_table_header *table; + + *table_length = 0; + *address = 0; + + if (!acpi_tables_addr) + return AE_OK; + + do { + if (table_offset + ACPI_HEADER_SIZE all_tables_size) { + WARN_ON(1); + return AE_OK; + } + + table = acpi_os_map_memory(acpi_tables_addr + table_offset, + ACPI_HEADER_SIZE); + + if (table_offset + table-length all_tables_size) { + acpi_os_unmap_memory(table, ACPI_HEADER_SIZE); + WARN_ON(1); + return AE_OK; + } + + table_offset += table-length; + + /* Do not add SSDTs, only if acpi_no_auto_ssdt boot parameter + got set. + */ + if (acpi_gbl_disable_ssdt_table_load == FALSE + !memcmp(SSDT, table-signature, 4)) { + pr_info(Will not add SSDT [%8.8s], use acpi_no_auto_ssdt to force\n, table-oem_table_id); + acpi_os_unmap_memory(table, +ACPI_HEADER_SIZE); + continue
[PATCH 3/4] ACPICA: Add BGRT signature to known signatures
In Linux there even exists a driver already making use of this table: drivers/acpi/bgrt.c:MODULE_DESCRIPTION(BGRT boot graphic support); Signed-off-by: Thomas Renninger tr...@suse.de CC: h...@zytor.com CC: t...@linutronix.de CC: c...@conrad-kostecki.de CC: linux-kernel@vger.kernel.org CC: x...@kernel.org CC: mi...@redhat.com CC: r...@rjwysocki.net CC: de...@acpica.org --- include/acpi/actbl2.h |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/include/acpi/actbl2.h b/include/acpi/actbl2.h index 094a906..9ed1c20 100644 --- a/include/acpi/actbl2.h +++ b/include/acpi/actbl2.h @@ -83,6 +83,7 @@ #define ACPI_SIG_WDAT WDAT /* Watchdog Action Table */ #define ACPI_SIG_WDDT WDDT /* Watchdog Timer Description Table */ #define ACPI_SIG_WDRT WDRT /* Watchdog Resource Table */ +#define ACPI_SIG_BGRT BGRT /* Boot Graphic Support */ #ifdef ACPI_UNDEFINED_TABLES /* -- 1.7.6.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 4/4] ACPI: Add new table signatures that can be overridden/added.
Signed-off-by: Thomas Renninger tr...@suse.de CC: h...@zytor.com CC: t...@linutronix.de CC: c...@conrad-kostecki.de CC: linux-kernel@vger.kernel.org CC: x...@kernel.org CC: mi...@redhat.com CC: r...@rjwysocki.net CC: de...@acpica.org --- drivers/acpi/osl.c |4 +++- 1 files changed, 3 insertions(+), 1 deletions(-) diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c index 7f9a865..23ce5b1 100644 --- a/drivers/acpi/osl.c +++ b/drivers/acpi/osl.c @@ -38,6 +38,7 @@ #include linux/delay.h #include linux/workqueue.h #include linux/nmi.h +# define ACPI_UNDEFINED_TABLES 1 #include linux/acpi.h #include linux/efi.h #include linux/ioport.h @@ -559,7 +560,8 @@ static const char * const table_sigs[] = { ACPI_SIG_IBFT, ACPI_SIG_IVRS, ACPI_SIG_MCFG, ACPI_SIG_MCHI, ACPI_SIG_SLIC, ACPI_SIG_SPCR, ACPI_SIG_SPMI, ACPI_SIG_TCPA, ACPI_SIG_UEFI, ACPI_SIG_WAET, ACPI_SIG_WDAT, ACPI_SIG_WDDT, - ACPI_SIG_WDRT, ACPI_SIG_DSDT, ACPI_SIG_FADT, ACPI_SIG_PSDT, + ACPI_SIG_WDRT, ACPI_SIG_BGRT, ACPI_SIG_ATKG, ACPI_SIG_GSCI, + ACPI_SIG_IEIT, ACPI_SIG_DSDT, ACPI_SIG_FADT, ACPI_SIG_PSDT, ACPI_SIG_RSDT, ACPI_SIG_XSDT, ACPI_SIG_SSDT, NULL }; #define ACPI_HEADER_SIZE sizeof(struct acpi_table_header) -- 1.7.6.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/