from:"Thomas Renninger"

Re: [PATCH] cpupower: add Makefile dependencies for install targets

2021-01-07 Thread Thomas Renninger

Am Donnerstag, 7. Januar 2021, 22:15:16 CET schrieb Ivan Babrou:
> On Thu, Jan 7, 2021 at 12:59 PM Thomas Renninger  wrote:
> > Am Donnerstag, 7. Januar 2021, 18:42:25 CET schrieb Ivan Babrou:
> > > On Thu, Jan 7, 2021 at 2:07 AM Thomas Renninger  wrote:
> > > > Am Dienstag, 5. Januar 2021, 00:57:18 CET schrieb Ivan Babrou:
> > > > > This allows building cpupower in parallel rather than serially.
...
> > Can you please show the make calls, ideally with a timing to better
> > understand and also to reproduce the advantages this patch introduces.
> > From what I can see, it only helps if one calls "sub-install" targets
> > directly?
> 
> That's exactly what we do: make install directly:
> 
> /linux-5.10.5$ make -C ./tools/power/cpupower DESTDIR=/tmp/cpupower
> install -j $(nproc)

This makes sense then.
@Shuah @Rafael: Can this patch be queued up, please.

Thanks for the patch and the details,

   Thomas

Re: [PATCH] cpupower: add Makefile dependencies for install targets

2021-01-07 Thread Thomas Renninger

Am Donnerstag, 7. Januar 2021, 18:42:25 CET schrieb Ivan Babrou:
> On Thu, Jan 7, 2021 at 2:07 AM Thomas Renninger  wrote:
> > Am Dienstag, 5. Januar 2021, 00:57:18 CET schrieb Ivan Babrou:
> > > This allows building cpupower in parallel rather than serially.
> > 
> > cpupower is built serially:
> > 
> > [ make clean ]
> > 
> > time make
> > real0m3,742s
> > user0m3,330s
> > sys 0m1,105s
> > 
> > [ make clean ]
> > 
> > time make -j10
> > real0m1,045s
> > user0m3,153s
> > sys 0m1,037s
> > 
> > Only advantage I see is that you can call
> > make install-xy
> > targets without calling the corresponding build target
> > make xy
> > similar to the general install target:
> > install: all install-lib ...
> > 
> > Not sure anyone needs this and whether all targets
> > successfully work this way.
> > If you'd show a useful usecase example...
> 
> We build a bunch of kernel related tools (perf, cpupower, bpftool,
> etc.) from our own top level Makefile, propagating parallelism
> downwards like one should.
I still do not understand why you do not simply build:
Also if I call this from /tools directory I get a quick build:
make -j20 cpupower

Can you please show the make calls, ideally with a timing to better understand
and also to reproduce the advantages this patch introduces.
>From what I can see, it only helps if one calls "sub-install" targets 
directly?
And I still do not understand why things should be more parallel now.

> Without this patch we have to remove parallelism for cpupower,
Why?

> which doesn't seem like a very clean thing
> to do, especially if you consider that it's 3x faster with parallelism
> enabled in wall clock terms.
Sure, you want to build in parallel. I still do not understand how this
patch helps in this regard.

BTW, I recently had a bunch of userspace tools Makefile patches.
I'd like to add you to CC for a review if they are not submitted already.

Thomas

Re: [PATCH] cpupower: add Makefile dependencies for install targets

2021-01-07 Thread Thomas Renninger

Am Dienstag, 5. Januar 2021, 00:57:18 CET schrieb Ivan Babrou:
> This allows building cpupower in parallel rather than serially.

cpupower is built serially:

[ make clean ]

time make
real0m3,742s
user0m3,330s
sys 0m1,105s

[ make clean ]

time make -j10
real0m1,045s
user0m3,153s
sys 0m1,037s

Only advantage I see is that you can call
make install-xy
targets without calling the corresponding build target
make xy
similar to the general install target:
install: all install-lib ...

Not sure anyone needs this and whether all targets
successfully work this way.
If you'd show a useful usecase example...

   Thomas

> 
> Signed-off-by: Ivan Babrou 
> ---
>  tools/power/cpupower/Makefile   | 8 
>  tools/power/cpupower/bench/Makefile | 2 +-
>  2 files changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/tools/power/cpupower/Makefile b/tools/power/cpupower/Makefile
> index c7bcddbd486d..3b1594447f29 100644
> --- a/tools/power/cpupower/Makefile
> +++ b/tools/power/cpupower/Makefile
> @@ -270,14 +270,14 @@ clean:
>   $(MAKE) -C bench O=$(OUTPUT) clean
> 
> 
> -install-lib:
> +install-lib: libcpupower
>   $(INSTALL) -d $(DESTDIR)${libdir}
>   $(CP) $(OUTPUT)libcpupower.so* $(DESTDIR)${libdir}/
>   $(INSTALL) -d $(DESTDIR)${includedir}
>   $(INSTALL_DATA) lib/cpufreq.h $(DESTDIR)${includedir}/cpufreq.h
>   $(INSTALL_DATA) lib/cpuidle.h $(DESTDIR)${includedir}/cpuidle.h
> 
> -install-tools:
> +install-tools: $(OUTPUT)cpupower
>   $(INSTALL) -d $(DESTDIR)${bindir}
>   $(INSTALL_PROGRAM) $(OUTPUT)cpupower $(DESTDIR)${bindir}
>   $(INSTALL) -d $(DESTDIR)${bash_completion_dir}
> @@ -293,14 +293,14 @@ install-man:
>   $(INSTALL_DATA) -D man/cpupower-info.1
> $(DESTDIR)${mandir}/man1/cpupower-info.1 $(INSTALL_DATA) -D
> man/cpupower-monitor.1 $(DESTDIR)${mandir}/man1/cpupower-monitor.1
> 
> -install-gmo:
> +install-gmo: create-gmo
>   $(INSTALL) -d $(DESTDIR)${localedir}
>   for HLANG in $(LANGUAGES); do \
>   echo '$(INSTALL_DATA) -D $(OUTPUT)po/$$HLANG.gmo
> $(DESTDIR)${localedir}/$$HLANG/LC_MESSAGES/cpupower.mo'; \ $(INSTALL_DATA)
> -D $(OUTPUT)po/$$HLANG.gmo
> $(DESTDIR)${localedir}/$$HLANG/LC_MESSAGES/cpupower.mo; \ done;
> 
> -install-bench:
> +install-bench: compile-bench
>   @#DESTDIR must be set from outside to survive
>   @sbindir=$(sbindir) bindir=$(bindir) docdir=$(docdir) confdir=$
(confdir)
> $(MAKE) -C bench O=$(OUTPUT) install
> 
> diff --git a/tools/power/cpupower/bench/Makefile
> b/tools/power/cpupower/bench/Makefile index f68b4bc55273..d9d9923af85c
> 100644
> --- a/tools/power/cpupower/bench/Makefile
> +++ b/tools/power/cpupower/bench/Makefile
> @@ -27,7 +27,7 @@ $(OUTPUT)cpufreq-bench: $(OBJS)
> 
>  all: $(OUTPUT)cpufreq-bench
> 
> -install:
> +install: $(OUTPUT)cpufreq-bench
>   mkdir -p $(DESTDIR)/$(sbindir)
>   mkdir -p $(DESTDIR)/$(bindir)
>   mkdir -p $(DESTDIR)/$(docdir)

Re: [PATCH] cpufreq: stats: Switch to ktime and msec instead of jiffies and usertime

2020-11-11 Thread Thomas Renninger

Am Mittwoch, 11. November 2020, 06:13:50 CET schrieb Viresh Kumar:
> On 10-11-20, 13:53, Thomas Renninger wrote:
> > Am Dienstag, 10. November 2020, 12:07:37 CET schrieb Viresh Kumar:
> > > The cpufreq and thermal core, both provide sysfs statistics to help
> > > userspace learn about the behavior of frequencies and cooling states.
> > > 
> > > This is how they look:
> > > /sys/devices/system/cpu/cpufreq/policy0/stats/time_in_state:120 399
> > > 
> > > The results look like this after this commit:
> > > /sys/devices/system/cpu/cpufreq/policy0/stats/time_in_state:120 3830
> > 
> > How would userspace know whether it's ms or 10ms?

Again:
How would userspace know whether it's ms or 10ms?

> > whatabout a new file with the same convention as cooling devices (adding 
> > ms):
> Keeping two files for same stuff is not great, and renaming the file
> breaks userspace ABI.

No exactly the other way around:
- Renaming, breaks the userspace ABI.
- Two files would be the super correct way to go:
  - Deprecate the old file and keep the 10ms around for some years
./Documentation/ABI/obsolete
  - Add the new interface and document it in:
   ./Documentation/ABI/testing

As this is about a minor cpufreq_stat debug file, it is enough if
you rename to:
> /sys/devices/system/cpu/cpufreq/policy0/stats/time_in_state_ms
Ideally document it in ./Documentation/ABI/testing
But I guess for this one this is also not mandatory.

Then userspace can do:
MS_FILE="/sys/devices/system/cpu/cpufreq/policy0/stats/time_in_state_ms"
10MS_FILE="/sys/devices/system/cpu/cpufreq/policy0/stats/time_in_state"

if [ -r "$MS_FILE" ]; then
POLICY0_MS=$(<"$MS_FILE")
echo "Found ms stats file"
elif [ -r "$10MS_FILE" ]; then
echo "Found 10ms stats file, converting..."
POLICY0_MS=$(<"$10MS_FILE")
POLICY0_MS=$(echo "$POLICY0_MS 10 /" |dc)
else
echo "cpufreq stat not compiled in?"
fi

> I am not sure what's the right thing to do here.

Use a new *_ms name.
If you are unsure how many people this still might use, keep the old file and 
mark
it deprecated and remove it in some years.
You could also add:
pr_info("%s userspace process accessed deprecated sysfs file %s", task->comm, 
OLD_SYSFS_FILE_PATH);
To find other userspace apps making use of it.

...
 
> I already fixed this recently and stats don't appear empty for fast
> switch anymore.

Then cpufreq_stats could be a module again?

  Thomas

Re: [PATCH] cpufreq: stats: Switch to ktime and msec instead of jiffies and usertime

2020-11-10 Thread Thomas Renninger

Am Dienstag, 10. November 2020, 12:07:37 CET schrieb Viresh Kumar:
> The cpufreq and thermal core, both provide sysfs statistics to help
> userspace learn about the behavior of frequencies and cooling states.
> 
> This is how they look:
> /sys/devices/system/cpu/cpufreq/policy0/stats/time_in_state:120 399
 
> The results look like this after this commit:
> /sys/devices/system/cpu/cpufreq/policy0/stats/time_in_state:120 3830

How would userspace know whether it's ms or 10ms?

whatabout a new file with the same convention as cooling devices (adding ms):
 
> /sys/class/thermal/cooling_device0/stats/time_in_state_ms:state0 3888
> /sys/devices/system/cpu/cpufreq/policy0/stats/time_in_state_ms:120 3830

Somewhat off-topic, some ideas:

I wonder how useful these stats still are.
CPU_FREQ_STAT is off on my system:

config CPU_FREQ_STAT
bool "CPU frequency transition statistics"
help
  Export CPU frequency statistics information through sysfs.

  If in doubt, say N.

Iirc this was a module at former times?

commit 1aefc75b2449eb68a6fc3ca932e2a4ee353b748d
Author: Rafael J. Wysocki 
Date:   Tue May 31 22:14:44 2016 +0200

cpufreq: stats: Make the stats code non-modular

outlined 2 problems with cpufreq_stats being non-modular, but
also seem to fix them up:
... and drop the notifiers from it
Make the stats sysfs attributes appear empty if fast frequency
switching is enabled...

   Thomas

Re: [PATCH 29/29] tools: Avoid comma separated statements

2020-08-26 Thread Thomas Renninger

Am Mittwoch, 26. August 2020, 16:45:24 CEST schrieb Joe Perches:
> On Wed, 2020-08-26 at 11:30 +0200, Thomas Renninger wrote:
> > Hi,
> > 

> Read the block immediately below that too:
> 
> "This does not apply if only one branch of a conditional statement is a
> single statement; in the latter case use braces in both branches:"

Then +1 from here.

> > Afaik there isn't a specific tag, but having:
> > cleanup only: No functional change
> > 
> > in the changelog would be nice for people looking for fixes to backport.
> 
> This is not a fix, so it's not for backporting.

Mentioning that in the changelog explicitly or even in the title - cleanup 
only
is nice imho, but should definitely not be the reason for re-posting.

Thanks for doing this,

   Thomas

Re: [PATCH 29/29] tools: Avoid comma separated statements

2020-08-26 Thread Thomas Renninger

Hi,

getting rid of lines with multiple instructions, separated by comma is
certainly a good idea.
One nit pick, though:

Am Dienstag, 25. August 2020, 06:56:26 CEST schrieb Joe Perches:
> Use semicolons and braces.
> 
> Signed-off-by: Joe Perches 
> ---
>  tools/lib/subcmd/help.c|  10 +-
>  tools/power/cpupower/utils/cpufreq-set.c   |  14 +-
>  tools/testing/selftests/vm/gup_benchmark.c |  18 +-
>  tools/testing/selftests/vm/userfaultfd.c   | 296 +
>  4 files changed, 210 insertions(+), 128 deletions(-)
> 
> diff --git a/tools/lib/subcmd/help.c b/tools/lib/subcmd/help.c
> index 2859f107abc8..bf02d62a3b2b 100644
> --- a/tools/lib/subcmd/help.c
> +++ b/tools/lib/subcmd/help.c
> @@ -65,12 +65,14 @@ void exclude_cmds(struct cmdnames *cmds, struct cmdnames
> *excludes) ci = cj = ei = 0;
>   while (ci < cmds->cnt && ei < excludes->cnt) {
>   cmp = strcmp(cmds->names[ci]->name, excludes->names[ei]->name);
> - if (cmp < 0)
> + if (cmp < 0) {
>   cmds->names[cj++] = cmds->names[ci++];
> - else if (cmp == 0)
> - ci++, ei++;
> - else if (cmp > 0)
> + } else if (cmp == 0) {
> + ci++;
>   ei++;
> + } else if (cmp > 0) {
> + ei++;
> + }
>   }

I can remember patches being rejected with one line statements in a condition,
surounded by braces.
I just read up Documentation/process/coding-style.rst, to be sure this still is 
up-to-date.
It's not a must, but line 180 says:
"Do not unnecessarily use braces where a single statement will do."

So if this is about coding style cleanup, IMO you should remove braces from 
single line
statements.

I haven't reviewed every line, but I expect you only split up comma separated 
instructions
into separate lines and added braces?

Afaik there isn't a specific tag, but having:
cleanup only: No functional change

in the changelog would be nice for people looking for fixes to backport.


Otherwise, I think this is a worthful cleanup.

 Thomas

Re: [PATCH 1/2] Modify cpupower to schedule itself on cores it is reading MSRs from

2019-10-10 Thread Thomas Renninger

On Monday, October 7, 2019 11:11:30 PM CEST Natarajan, Janakarajan wrote:
> On 10/5/2019 7:40 AM, Thomas Renninger wrote:
> 
...
> >>
> >> APERF/MPERF from CPL > 0) and avoid using the msr module (patch 2).
> > 
> > And this one only exists on latest AMD cpus, right?
> 
> Yes. The RDPRU instruction exists only on AMD cpus.
> >
> >> However, for systems that provide an instruction  to get register values
> >> from userspace, would a command-line parameter be acceptable?
> > 
> > Parameter sounds like a good idea. In fact, there already is such a
> > paramter.
 cpupower monitor --help
> > 
> > -c
> > 
> > Schedule  the  process  on every core before starting and
> > ending
> > 
> > measuring.  This could be needed for the Idle_Stats monitor when no other
> > MSR based monitor (has to be run on the core that is measured) is run in
> > parallel. This is to wake up the processors from deeper sleep states and
> > let the kernel reaccount its cpuidle (C-state) information before reading
> > the cpuidle timings from sysfs.
> >
> > Best is you exchange the order of your patches. The 2nd looks rather
> > straight forward and you can add my reviewed-by.
> 
> The RDPRU instruction reads the APERF/MPERF of the cpu on which it is 
> running. If we do not schedule it on each cpu specifically, it will read the 
> APERF/MPERF
> of the cpu in which it runs/might happen to run on, which will not be the 
> correct behavior.

Got it. And I also didn't fully read -c. I now remember.. For C-states 
accounting
you want to have each CPU woken up at measure start and end for accurate 
measuring.

It's a pity that the monitors do the per_cpu calls themselves.
So a general idle-monitor param is not possible or can only done by for example 
by
adding a flag to the cpuidle_monitor struct:

struct cpuidle_monitor

unsigned int needs_root:1
unsigned int per_cpu_schedule:1

not sure whether a:
struct {
  unsigned int needs_root:1
  unsigned int per_cpu_schedule:1
} flags

should/must be put around in a separate cleanup patch (and needs_root users 
adjusted).

You (and other monitors for which this might make sense) can then implement
the per_cpu_schedule flag. In AMD case you might want (you have to)
directly set it.

All around a -b/-u (--bind-measure-to-cpu, --unbind-measure-to-cpu)
parameter could be added at some point of time if it matters. And monitors
having this could bind or not.
This possibly could nuke out -c param. Or at least the idle state counter
monitor could do it itself. But don't mind about this.

What do you think?

And you should be able to re-use the bind_cpu function used in -c case?

   Thomas

Re: [PATCH 1/2] Modify cpupower to schedule itself on cores it is reading MSRs from

2019-10-05 Thread Thomas Renninger

Hi,

On Wednesday, October 2, 2019 4:45:03 PM CEST Natarajan, Janakarajan wrote:
> On 9/27/19 4:48 PM, Thomas Renninger wrote:
> 
> > On Friday, September 27, 2019 6:07:56 PM CEST  Natarajan, Janakarajan 
> > wrote:
> 
> >> On 9/18/2019 11:34 AM, Natarajan, Janakarajan wrote:
 
> On a 256 logical-cpu Rome system we see C0 value from cpupower output go 
> from 0.01 to ~(0.1 to 1.00)
> 
> for all cpus with the 1st patch.
> 
> However, this goes down to ~0.01 when we use the RDPRU instruction 
> (which can be used to get
> 
> APERF/MPERF from CPL > 0) and avoid using the msr module (patch 2).

And this one only exists on latest AMD cpus, right?

> However, for systems that provide an instruction  to get register values 
> from userspace, would a command-line parameter be acceptable?

Parameter sounds like a good idea. In fact, there already is such a paramter.
cpupower monitor --help
   -c
   Schedule  the  process  on every core before starting and ending 
measuring.  This could be needed for the Idle_Stats monitor when no other MSR 
based monitor (has to be run on the core that is measured) is run in parallel.  
This is to wake up the processors from deeper sleep states and let the kernel 
reaccount its cpuidle (C-state) information before reading the cpuidle timings 
from sysfs.

Best is you exchange the order of your patches. The 2nd looks rather straight
forward and you can add my reviewed-by.

If you still need adjustings with -c param, they can be discussed separately.
It would also be nice to mention in which case it makes sense to use it in the 
manpage or advantages/drawbacks if you don't.

Thanks!

Thomas

Re: [PATCH 1/2] Modify cpupower to schedule itself on cores it is reading MSRs from

2019-09-27 Thread Thomas Renninger

On Friday, September 27, 2019 6:07:56 PM CEST  Natarajan, Janakarajan  wrote:
> On 9/18/2019 11:34 AM, Natarajan, Janakarajan wrote:

> > This is advantageous because an IPI is not generated when a read_msr() is
> > executed on the local logical CPU thereby reducing the chance of having
> > APERF and MPERF being out of sync.
> > +   if (sched_setaffinity(getpid(), sizeof(set), ) == -1) {
> > +   dprint("Could not migrate to cpu: %d\n", cpu);
> > +   return 1;

On a 80 core cpu the process would be pushed around through the
system quite a lot. 
This might affect what you are measuring or the other measure values?
Otherwise it's the kernel's MSR read only, not the whole cpupower process,
right? No idea about the exact overhead, though. Others in CC list should
know.

Afaik msr reads through msr module should be avoided anyway?
Those which are worth it are abstracted through sysfs nowadays?

For aperf/mperf it might make sense to define a sysfs file where you
can read both, as this is what you always need?

It would take a while, but could be a longterm solution which is also
usable in secure boot or without msr module case.

  Thomas

Re: /dev/mem and secure boot

2019-09-12 Thread Thomas Renninger

On Monday, September 9, 2019 3:09:57 PM CEST Jean Delvare wrote:
> Hi Greg,

...

> > Sure, feel free to not register it at all if the mode is enabled.

> Now I feel sorry that I asked my question upstream when there's nothing
> to be done there. I'll go bother SUSE kernel folks instead, sorry for
> the noise. And thanks for the advice.

I also/still think /dev/mem should vanish in secure boot mode, also upstream.
There may have been strong reasons why it has been restricted to /dev/ioport
which I do not know.

Whatever the exact definition for kernel behaviour in secure boot mode in the
UEFI books is (if there is any), it should close quite some possible doors
for hijacking a machine or read sensible data and if anyhow possible secure
boot mode should head for this feature (IMHO): Get rid of /dev/mem.

Thanks for bringing this up,

Thomas

Re: [PATCH] cpupower : Handle set and info subcommands for powerpc

2019-09-12 Thread Thomas Renninger

On Thursday, September 12, 2019 11:43:40 AM CEST Abhishek wrote:
> Hi Shuah,
> 
> Thanks for the review. Few comments below.

...

> Since these two options are not being used by any other architecture
> except x86, I suggest these options should not even be shown for
> other architecture. So we can do something like this in cpupower.c :
> 
> static struct cmd_struct commands[] = {
>   .
> +#if defined (__x86_64__) || defined (__i386__)
>  { "set",cmd_set,1},
>  { "info",cmd_info,0},
> +#endif
>  ..
> 
> Is this Okay?

No, I expected you to add something meaningful for Power case...

Just kidding. If this works without any side-effects in not x86 case, this 
approach seem to be the best solution for now.

Thanks.

  Thomas

Re: [PATCH] cpupower : Handle set and info subcommands for powerpc

2019-09-12 Thread Thomas Renninger

Hi Abishek,

On Wednesday, September 11, 2019 11:54:24 AM CEST Abhishek Goel wrote:
> Cpupower tool has set and info options which are not being used by
> POWER machines. For powerpc, we will return directly for these two
> subcommands. This removes the ambiguous error message while using set
> option in case of power systems.
> 
> Signed-off-by: Abhishek Goel 
> ---
>  tools/power/cpupower/utils/cpupower-info.c | 5 +
>  tools/power/cpupower/utils/cpupower-set.c  | 5 +
>  2 files changed, 10 insertions(+)
> 
> diff --git a/tools/power/cpupower/utils/cpupower-info.c
> b/tools/power/cpupower/utils/cpupower-info.c index
> 4c9d342b70ff..674b707a76af 100644
> --- a/tools/power/cpupower/utils/cpupower-info.c
> +++ b/tools/power/cpupower/utils/cpupower-info.c
> @@ -39,6 +39,11 @@ int cmd_info(int argc, char **argv)
>   } params = {};
>   int ret = 0;
> 
> + #ifdef __powerpc__
> + printf(_("Cannot read info as system does not support performance bias
> setting\n")); +   return 0;
> + #endif
> +
Please do no do this.

cpupower info
is designed to show general information related to powersaving features of your 
CPU.

For examle there has been (see changelog):
cpupower: Remove mc and smt power aware scheduler info/settings
These kernel interfaces got removed by:

Unfortunately only -b (perf bias on Intel only) is left right now.

So if you cut this out for Power you do not see anything and the cmd is useless.
Which is a pity, but for now makes sense.
Ideally you provide some tag/option which makes sense on power (e.g. whether run
in OPAL mode and if provide some figures otherwise tell running in VM mode).

But if this is cut out something like this should do the same and is more 
flexible:
- Still allows additional cpupower info features for other CPUs later easily
- Should also cover AMD or other non-perf bias supporting CPUs to exclude 
perf_bias
  setting/info

If this one works for you, can you please re-submit with also handling the set 
cmd
similar. If it works or you only slightly adjust, feel free to already add:
Acked-by: Thomas Renninger 

Thanks!

   Thomas

--- tools/power/cpupower/utils/cpupower-info.c.orig 2019-09-12 
11:45:02.578568335 +0200
+++ tools/power/cpupower/utils/cpupower-info.c  2019-09-12 11:46:09.618571947 
+0200
@@ -55,8 +55,11 @@
}
};
 
-   if (!params.params)
+   if (!params.params) {
params.params = 0x7;
+   if !(cpupower_cpu_info.caps & CPUPOWER_CAP_PERF_BIAS)
+   params.perf_bias = 0;
+   }
 
/* Default is: show output of CPU 0 only */
if (bitmask_isallclear(cpus_chosen))

Re: [PATCH] cpupower : frequency-set -r option misses the last cpu in related cpu list

2019-05-29 Thread Thomas Renninger

Hi,

On Wednesday, May 29, 2019 2:12:34 PM CEST Gautham R Shenoy wrote:
> Hi Abhishek,
> 
> On Wed, May 29, 2019 at 3:02 PM Abhishek Goel

...

> > bitmask_setbit(cpus_chosen, cpus->cpu);
> > cpus = cpus->next;
> > 
> > }
> > 
> > +   /* Set the last cpu in related cpus list */
> > +   bitmask_setbit(cpus_chosen, cpus->cpu);
> 
> Perhaps you could convert the while() loop to a do ..  while(). That
> should will ensure
> that we terminate the loop after setting the last valid CPU.

It would do exactly the same, right?
IMHO it's not worth the extra hassle of resubmitting. Setting the last value 
outside a while loop is rather common.

I do not have a CPU with related cores at hand.
If you tested this it would be nice to see this pushed:

Reviewed-by: Thomas Renninger 

Thanks!

   Thomas

Re: [PATCH] cpufreq/intel_pstate: Load only on Intel hardware

2019-04-01 Thread Thomas Renninger

On Monday, April 1, 2019 5:03:45 PM CEST Borislav Petkov wrote:
> From: Borislav Petkov 
> 
> This driver is Intel-only so loading on anything which is not Intel is
> pointless. Prevent it from doing so.

Nice.
I wondered whether there are more of these to find by review, instead
of waiting for the next message to show up.

I ended up in the "not so straight forward" IOMMU init macros... and
continued with daily work again.

Anyway there are a lot files showing up when grepping the kernel for
intel files/drivers, maybe someone who is involved in the one or other comes
up with something similar...

   Thomas

FWIW:
Reviewed-by: Thomas Renninger 

 
> While at it, correct the "not supported" print statement to say CPU
> "model" which is what that test does.
> 
> Suggested-by: Erwan Velu 
> Signed-off-by: Borislav Petkov 
> Cc: Len Brown 
> Cc: linux...@vger.kernel.org
> Cc: Rafael J. Wysocki 
> CC: Srinivas Pandruvada 
> Cc: Viresh Kumar 
> ---
>  drivers/cpufreq/intel_pstate.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
> index b599c7318aab..2986119dd31f 100644
> --- a/drivers/cpufreq/intel_pstate.c
> +++ b/drivers/cpufreq/intel_pstate.c
> @@ -2596,6 +2596,9 @@ static int __init intel_pstate_init(void)
>   const struct x86_cpu_id *id;
>   int rc;
> 
> + if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
> + return -ENODEV;
> +
...

Re: [PATCH 1/2] PM / arch: x86: Rework the MSR_IA32_ENERGY_PERF_BIAS handling

2019-03-22 Thread Thomas Renninger

Thanks Rafael for your quick look at and all the time you spend for this!

A /sys userspace knob will certainly not be enough for us.
You'll need a tool installed fixing this.
powertop on laptops or tuned on servers or a well hidden bootup quirk or 
whatsoever.

The patch I sent with this part:

+   if (acpi_gbl_FADT.preferred_profile == PM_PERFORMANCE_SERVER ||
+   acpi_gbl_FADT.preferred_profile == PM_ENTERPRISE_SERVER)
+   return;

and not touching the EBP value then should at least match most of
our users and OEMs who want a "performance" setting out of the box and
set this on purpose.

Even nicer would be compile option to not touch this at all.

On Thursday, March 21, 2019 11:18:01 PM CET Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki 

...

> + *
> + * Second, on many systems the initial EPB value coming from the platform
> + * firmware is 0 ('performance') and at least on some of them that is
> because + * the platform firmware does not initialize EPB

Why does the CPU not initialize this value to 6?
That would allow OEMs/BIOS to also suggest an init value for the system.
We should try to get microcode people or whoever is in charge to initialize
this value "properly" if Intel thinks 6 is the correct init value.

> at all with the
> assumption that + * the OS will do that anyway.  That sometimes is
> problematic, as it may cause + * the system battery to drain too fast, for
> example, so it is better to adjust + * it on CPU bring-up and if the
> initial EPB value for a given CPU is 0, the + * kernel changes it to 6
> ('normal').

I have an idea to let the kernel more decide about such policies.
It's a nice example that it makes sense to let the kernel set default values.
But not unconditionally, according to what the system is intended to do.

I wanted to do this for quite some time.., I hopefully find the time and
motivation now.

Thanks Rafael.
Sorry for the somewhat rude sounding previous mail, that was not on purpose.
You helped me quite a lot in the past and you obviously still do.

 Thomas

Re: [PATCH] [RESEND] Do not modify perf bias performance setting by default at boot

2019-03-20 Thread Thomas Renninger

Rafael,

I top post the general things and answer in only a few sentences embedded in 
context below:

I very much honour your work and your neutral opinions and reasoning and
I always have.

This patch is a resend and while I try to come up with alternative hacks, 
there still is no solution, not even a suggestion.

I sent the patch 3 years ago:
https://lkml.org/lkml/2016/2/26/675
And I found this when doing performance analysis with Mel (Gorman).

This time Hannes (Reinicke) stumbled over it, while he was working on 
performance tests on NVE over fabrics.

We need this fixed and I am going to repush this into our kernel(s) now.

On Monday, March 18, 2019 11:57:08 PM CET Rafael J. Wysocki wrote:
> On Mon, Mar 18, 2019 at 2:22 PM Thomas Renninger  wrote:
> > On Monday, March 18, 2019 12:40:46 PM CET Rafael J. Wysocki wrote:
> > > On Mon, Mar 18, 2019 at 12:15 PM Thomas Renninger  wrote:
> > ...

...

> > > > > It may not match every setup perfectly, but at least it
> > > > > is consistent.  Why exactly is it worse than whatever the BIOS has
> > > > > set?
> > > > 
> > > > Because there may be BIOS settings for the CPU which justify
> > > > initialization
> > > > of the Perf BIAS value by BIOS.
> > > 
> > > Well, the EPB is there for users to set it via the OS.  The BIOS
> > > setting is not guaranteed to work for all users anyway.
> > 
> > Who says that?
> 
> I do.

And this is the reason you do not see much patches from myself anymore over
the last years.
It's certainly not your fault. I had quite some discussions with Len about
specification and BIOS breakages.

Especially in the CPU powersave area, idle states and cpufreq drivers, Intel
was doing it differently all time long the last 5 years. Ignoring their own
specifications, ignoring possible BIOS settings and changing kernel and 
userspace interfaces all the time.

And now I have the discussion again...
While it is related to this patch, it gets off topic.
I guess there should be a more general thread on lkml:
"Do not change APIs every second day"
Up to userspace, but also to BIOS.

> And then we can get back to the initial setting discussion.

Let's stick to this topic in this thread.
There is no reason to not find a proper fix for this meanwhile.

Overriding the BIOS setting should IMO only take place:
- if lifetime of CPU could be affected as you mentioned. But in this case the
  affected CPUs should be matched
- if we expect that there are BIOSes which "want to set this value to 6,
  but may have forgotten to do so", as mentioned in the original patch
  from Len. BIOSes where this is the case should get a quirk to set it to 6.

Still, ideally the whole overriding and the message should vanish IMO.

Last time I sent this patch your answer was:
"I need to talk to Len about that,"
https://lkml.org/lkml/2016/3/4/371

But as this is x86 kernel core code, I guess this should be discussed and 
pushed by the general x86 maintainers anyway. Sigh.

 Thomas

Re: [PATCH] [RESEND] Do not modify perf bias performance setting by default at boot

2019-03-18 Thread Thomas Renninger

On Monday, March 18, 2019 12:40:46 PM CET Rafael J. Wysocki wrote:
> On Mon, Mar 18, 2019 at 12:15 PM Thomas Renninger  wrote:

...

> And who's BIOS, really?  I guess you mean the OEM?  Note, however,
> that the user and the OEM may not agree on that, but whatever.

I mean both!
The OEM.
And the user who might choose a "performance" BIOS setting.

> > > Yes, the kernel replaces whatever the original BIOS setting is with
> > > its own one.
> > 
> > No, it only replaces the "performance" (0) value with "normal" (6).
> > This does not makes sense and is broken.

Both know it better than this...

> OK, fair enough.
> 
> I guess it would have been better to set it to 6 unconditionally.

No, I guess keeping the original value is the only sane thing to do.
Sorry, Rafael, I have to insist on this.
There might be secrets Len and you cannot share with public, but you
should at least explain this privately then in a way that the current
code could somehow make sense. I absolutely cannot see that.

> What about the systems that will misbehave when it is left at 0?

Good question.
What exactly happens there? The CPU is faster in general?
And may consume a tiny bit more of power.
I had reports that Turbo Mode is not entered on some server CPUs when
perf BIAS is switched back to normal mode.
It is hard to identify what Intel implemented in microcode, but it's not
that much altogether. It would be nice if Intel would be a bit more verbose
about this and show some performance vs powersave figures.

> > > It may not match every setup perfectly, but at least it
> > > is consistent.  Why exactly is it worse than whatever the BIOS has
> > > set?
> > 
> > Because there may be BIOS settings for the CPU which justify
> > initialization
> > of the Perf BIAS value by BIOS.
> 
> Well, the EPB is there for users to set it via the OS.  The BIOS
> setting is not guaranteed to work for all users anyway.

Who says that? Is this documented?

I would say it is exactly the other way around. Energy Perf BIAS hint is
a MSR which must be changed in Ring 0 kernel environment.

x86_energy_perf_policy is a nice tool to play with and to try out what
this value is for. But for example in secure boot mode userspace must not
modify this HW setting in any way.
So it would not be possible for a secure booted server to switch this
setting back to performance mode.

> > What sense does it make to unconditionally set perf BIAS value from
> > performance to balanced?
> > Why is this done?
> 
> Basically, for HW health reasons AFAICS.
>  
> Apparently, on some systems EPB=0 is (was?) special and means (meant?)
> very aggressive use of turbo etc. which is not healthy in general.

Hmm, maybe you want to explain this privately.
Performance is a valid setting.
x86_energy_perf_policy tool has a "performance" string for this and I expect it 
is very used.
I would switch to it if I am constantly connected to AC.
man x86_energy_perf_policy
also does not tell the user anything about "be careful with performance setting"

And until today every CPU online/offline cycle or machine resume cycle switches
the value back to performance (if kernel tried to switch to 6).

...

> > There may be BIOSes initialzing it via BIOS options. And this is a very
> > valid thing to do.
> 
> Yes, and there may be BIOSes leaving it at 0 with the assumption that
> the OS will adjust it.  The kernel cannot know which is the case.

Correct. The kernel cannot know.

You know this better than anyone else:
- A specific Microsoft version is doing it wrong.
- We adopt.
- Microsoft is doing it correctly with the next version
- We are busted

I am ok with leaving the message as a hint, I would still check
ACPI prefered profile variable that we are not running on a server.
On these systems the message would be wrong rather sure and performance is 
intended.

...

> > No. You must not ignore BIOS settings. Even worse, you must not override
> > these without any sane reason.
> 
> While there are BIOS settings that better should not be overridden,
> the EPB is not one of them.

Again: Why or where is this documented?

> > Your assumption above might be right. But we want to do it better, right?
> > 
> > ...
> > 
> > > The system-wide resume part will still not be working properly after
> > > the reverts.
> > 
> > But it must never blindly (unconditionally) be set to specific value.
> > Correct?
> 
> Yes.  I've already said that.

How about below solution then for the initial boot up sequence?

> > You mean the kernel should store the pre-hibernation perf BIAS value
> > in NVRAM and write it back when waking up again?
> 
> Or in the image and yes, it should write it

Re: [PATCH] [RESEND] Do not modify perf bias performance setting by default at boot

2019-03-18 Thread Thomas Renninger

On Monday, March 18, 2019 11:26:10 AM CET Rafael J. Wysocki wrote:
> On Fri, Mar 15, 2019 at 4:36 PM Thomas Renninger  wrote:
> > Hi Rafael,
> >

...

> > On my workstation the BIOS initializes perf bias to:
> > cpupower info
> > analyzing CPU 0:
> > perf-bias: 7
> > 
> > I could grep through quite some dozens of machines..., but these are
> > mostly
> > servers and probably show either "zero"/"performance" or "6"/"normal"
> > because the Linux kernel overrides the INTENDED performance perf bias
> > value to 6.
> The perf bias Intended by whom?

The instance who should be in charge to set/init such a value:
BIOS?

> Yes, the kernel replaces whatever the original BIOS setting is with
> its own one.

No, it only replaces the "performance" (0) value with "normal" (6).
This does not makes sense and is broken.

> It may not match every setup perfectly, but at least it
> is consistent.  Why exactly is it worse than whatever the BIOS has
> set?

Because there may be BIOS settings for the CPU which justify initialization
of the Perf BIAS value by BIOS.

What sense does it make to unconditionally set perf BIAS value from
performance to balanced?
Why is this done?

> > So we (SUSE) are going with this patch forever.
> > 
> > Otherwise we would run into a similar support nightmare we ran into, when
> > Intel decided to ignore CPU idle states as exported by BIOS through ACPI.
> > BIOS documentation of all big server vendors mentioned "performance"
> > settings. With a kernel update these BIOS C states settings have been
> > ignored (some long latency once were not exported on purpose).
> > 
> > The list of breaking conventions and specifications is long...
> > People mostly blame the "bad BIOS developer". In this case things have
> > been
> > broke by the kernel.
> 
> I agree that the kernel should not modify the EPB on system-wide
> resume and on CPU online, but I don't see why changing the BIOS
> setting at init time is a problem really.

I would agree if we differ a tablet/laptop system and set the performance
value to normal/powersave.
And on a server we set it from normal/powersave to performance.

But we should not touch this value anyway.
Again: Why should the kernel touch it?

There may be BIOSes initialzing it via BIOS options. And this is a very valid 
thing to do.

> > It's now (with the resume patch) broken in way, that "performance" setting
> 
> So the point seems to be that the BIOS setting should be preserved or
> people are not able to configure the systems for performance through
> setting things in the BIOS.
> However, that only means that setting
> things in the BIOS is not sufficient to configure a system for
> performance and that has always been the case AFAICS, with or without
> the EPB.

?!?
If the kernel unconditionally, without documentation overrides such values...
(and in this case only because of a workaround of some buggy BIOSes not 
initialzing this value)...

> If you want to configure a system for performance, you need to do that
> not just in the BIOS, but also in the OS (and, quite arguably, I would
> expect the latter to be sufficient).

No. You must not ignore BIOS settings. Even worse, you must not override
these without any sane reason.

Your assumption above might be right. But we want to do it better, right?

...

> The system-wide resume part will still not be working properly after
> the reverts.

But it must never blindly (unconditionally) be set to specific value.
Correct?

You mean the kernel should store the pre-hibernation perf BIAS value
in NVRAM and write it back when waking up again?
This would make sense.

It would also mean perf BIAS never really worked, at least did not survive
suspend. On servers (no hibernation) it would works but is overridden
to a value you typically do not want to have on a server...
So the current situation is rather broken in the kernel.

  Thomas

Re: [PATCH] [RESEND] Do not modify perf bias performance setting by default at boot

2019-03-15 Thread Thomas Renninger

Hi Rafael,

On Thursday, March 14, 2019 11:08:03 PM CET Rafael J. Wysocki wrote:
> On Thu, Mar 14, 2019 at 3:42 PM Thomas Renninger  wrote:
> > This is a revert of mainline git commits:
> > commit b51ef52df71cb28e9d90cd1d48b79bf19f0bab06
> > commit 17edf2d79f1ea6dfdb4c444801d928953b9f98d6
> > commit abe48b108247e9b90b4c6739662a2e5c765ed114
> 
> I'm not quite convinced that reverting these is the right thing to do here.

Ok, I try harder.

> > It is about this kernel message showing up on quite a lot servers:
> > [0.072652] ENERGY_PERF_BIAS: Set to 'normal', was 'performance'
> > [0.076003] ENERGY_PERF_BIAS: View and update with
> > x86_energy_perf_policy(8)
> What happens on boot is a matter of convention and the convention by
> which the EPB is set to "neutral" at that point has been used for
> quite a while.  Changing it now may confuse some users or even worse.

SUSE has this patch included for quite a while (in all kind of
SLE 12 SP flavors), possibly other distributions as well.
Unfortunately the patch got removed recently, because my commit (this one)
was ignored by Len a while ago.

I also like to see more examples of BIOSes not initializing this value.
This message is rare (so the setting back does not happen often, probably
only wrongly with latest BIOSes overriding intended BIOS performance settings 
on servers).

On my workstation the BIOS initializes perf bias to:
cpupower info
analyzing CPU 0:
perf-bias: 7

I could grep through quite some dozens of machines..., but these are mostly
servers and probably show either "zero"/"performance" or "6"/"normal" because
the Linux kernel overrides the INTENDED performance perf bias value to 6.

So we (SUSE) are going with this patch forever.
Otherwise we would run into a similar support nightmare we ran into, when
Intel decided to ignore CPU idle states as exported by BIOS through ACPI.
BIOS documentation of all big server vendors mentioned "performance" settings.
With a kernel update these BIOS C states settings have been ignored (some long 
latency once were not exported on purpose).

The list of breaking conventions and specifications is long...
People mostly blame the "bad BIOS developer". In this case things have been
broke by the kernel.
It's now (with the resume patch) broken in way, that "performance" setting
in the mainline kernel is unusable for years already.

Can we please get this finally properly fixed!

> That is a bug.  There is no reason to change the EPB on CPU
> offline/online and I agree that this needs to be fixed.  The reverts
> would make this problem go away, but that's not the only possible way
> to fix it.

It is the only proper way to fix this.

> Moreover, what is done during system-wide resume appears to be a bug
> too.  Whatever EPB value is there during suspend should be saved and
> it should be restored during resume. Reverting the commits above will
> not fix that particular issue.

Someone could have a look what happens on a recent system
after suspend (with this patch on top).
I do not have laptops to test suspend/resume with latest kernels.
These broken patches have been added a bit too quickly.

I doubt much BIOSes "forgot to initialize this value" and I have my doubts
that resume let's the CPU fall back to "uninitialized" value.
I more expect BIOS resets performance value after resume and the kernel is
doing it all wrong.
Hm, in fact this could be proofed by modifying perf bias value, do a suspend
resume cyle. If the value is kept we should avoid another try of complicating
things and simply reverting touching perf bias value altogether, correct?

BIOS probably does not store perf BIAS value and re-applies it after
suspend resume. Some might re-set it to the intial value (performance..., 
D'oh).

Thomas

[PATCH] [RESEND] Do not modify perf bias performance setting by default at boot

2019-03-14 Thread Thomas Renninger

This is a revert of mainline git commits:
commit b51ef52df71cb28e9d90cd1d48b79bf19f0bab06
commit 17edf2d79f1ea6dfdb4c444801d928953b9f98d6
commit abe48b108247e9b90b4c6739662a2e5c765ed114

It is about this kernel message showing up on quite a lot servers:
[0.072652] ENERGY_PERF_BIAS: Set to 'normal', was 'performance'
[0.076003] ENERGY_PERF_BIAS: View and update with x86_energy_perf_policy(8)

The background afaik were some BIOS which "forgot to set a default value".
Certainly some (one?) broken laptop BIOS was meant, because
the value zero (performance) is quite a reasonable default value on server 
systems
and this is how a lot CPUs typically come up there.

Another sever bug with above patches:
If you really set performance BIAS MSR value to 'performance'
as mentioned in kernel boot log, it is reset on next CPU offline/online cycle
(see below). Unfortunately one does not see this in the logs anymore, because
of the printk_once() usage:

dmesg |grep -i bias
[0.026642] ENERGY_PERF_BIAS: Set to 'normal', was 'performance'
[0.028002] ENERGY_PERF_BIAS: View and update with x86_energy_perf_policy(8)
liszt:~/:[0]# x86_energy_perf_policy 
cpu0: EPB 6
cpu1: EPB 6
cpu2: EPB 6
cpu3: EPB 6
liszt:~/:[0]# x86_energy_perf_policy 0
liszt:~/:[0]# x86_energy_perf_policy 
cpu0: EPB 0
cpu1: EPB 0
cpu2: EPB 0
cpu3: EPB 0
liszt:~/:[0]# echo offline > /sys/devices/system/cpu/cpu1/online
liszt:~/:[0]# echo online > /sys/devices/system/cpu/cpu1/online
liszt:~/:[0]# x86_energy_perf_policy 
cpu0: EPB 0
cpu1: EPB 6
cpu2: EPB 0
cpu3: EPB 0


Signed-off-by: Thomas Renninger 
Tested-by: Simon Schricker 

Index: do_not_modify_perf_bias/arch/x86/kernel/cpu/common.c
===
--- do_not_modify_perf_bias.orig/arch/x86/kernel/cpu/common.c   2019-03-13 
17:33:06.849890801 +0100
+++ do_not_modify_perf_bias/arch/x86/kernel/cpu/common.c2019-03-13 
18:01:54.781983906 +0100
@@ -18,7 +18,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #include 
 #include 
@@ -1864,23 +1863,6 @@ void cpu_init(void)
 }
 #endif
 
-static void bsp_resume(void)
-{
-   if (this_cpu->c_bsp_resume)
-   this_cpu->c_bsp_resume(_cpu_data);
-}
-
-static struct syscore_ops cpu_syscore_ops = {
-   .resume = bsp_resume,
-};
-
-static int __init init_cpu_syscore(void)
-{
-   register_syscore_ops(_syscore_ops);
-   return 0;
-}
-core_initcall(init_cpu_syscore);
-
 /*
  * The microcode loader calls this upon late microcode load to recheck 
features,
  * only when microcode has been updated. Caller holds microcode_mutex and CPU
Index: do_not_modify_perf_bias/arch/x86/kernel/cpu/cpu.h
===
--- do_not_modify_perf_bias.orig/arch/x86/kernel/cpu/cpu.h  2019-03-13 
17:33:06.849890801 +0100
+++ do_not_modify_perf_bias/arch/x86/kernel/cpu/cpu.h   2019-03-13 
18:01:53.973983863 +0100
@@ -14,7 +14,6 @@ struct cpu_dev {
void(*c_init)(struct cpuinfo_x86 *);
void(*c_identify)(struct cpuinfo_x86 *);
void(*c_detect_tlb)(struct cpuinfo_x86 *);
-   void(*c_bsp_resume)(struct cpuinfo_x86 *);
int c_x86_vendor;
 #ifdef CONFIG_X86_32
/* Optional vendor specific routine to obtain the cache size. */
Index: do_not_modify_perf_bias/arch/x86/kernel/cpu/intel.c
===
--- do_not_modify_perf_bias.orig/arch/x86/kernel/cpu/intel.c2019-03-13 
17:33:06.853890801 +0100
+++ do_not_modify_perf_bias/arch/x86/kernel/cpu/intel.c 2019-03-13 
18:01:52.789983799 +0100
@@ -596,36 +596,6 @@ detect_keyid_bits:
c->x86_phys_bits -= keyid_bits;
 }
 
-static void init_intel_energy_perf(struct cpuinfo_x86 *c)
-{
-   u64 epb;
-
-   /*
-* Initialize MSR_IA32_ENERGY_PERF_BIAS if not already initialized.
-* (x86_energy_perf_policy(8) is available to change it at run-time.)
-*/
-   if (!cpu_has(c, X86_FEATURE_EPB))
-   return;
-
-   rdmsrl(MSR_IA32_ENERGY_PERF_BIAS, epb);
-   if ((epb & 0xF) != ENERGY_PERF_BIAS_PERFORMANCE)
-   return;
-
-   pr_warn_once("ENERGY_PERF_BIAS: Set to 'normal', was 'performance'\n");
-   pr_warn_once("ENERGY_PERF_BIAS: View and update with 
x86_energy_perf_policy(8)\n");
-   epb = (epb & ~0xF) | ENERGY_PERF_BIAS_NORMAL;
-   wrmsrl(MSR_IA32_ENERGY_PERF_BIAS, epb);
-}
-
-static void intel_bsp_resume(struct cpuinfo_x86 *c)
-{
-   /*
-* MSR_IA32_ENERGY_PERF_BIAS is lost across suspend/resume,
-* so reinitialize it properly like during bootup:
-*/
-   init_intel_energy_perf(c);
-}
-
 static void init_cpuid_fault(struct cpuinfo_x86 *c)
 {
u64 msr;
@@ -763,8 +733,6 @@ static void init_intel(struct cpuinfo_x8
if (cpu_has(c, X86_FEATURE_TME))

Re: [PATCH] pstore: automatically dump and clean dmesg entries

2019-03-01 Thread Thomas Renninger

On Friday, March 1, 2019 3:53:25 PM CET Thomas Renninger wrote:
> From: Torsten Duwe 
> 
> Dump a previous oops or panic, which has made it to pstore,
> to the new syslog after reboot, optionally deleting it.
> This can happen automatically, without user land interaction.
> 
> Signed-off-by: Torsten Duwe 
> CC: Thomas Renninger 

This is a patch which is (was) part of the SUSE kernel for quite a while.
Iirc someone from Fujitsu requested this.

Anyway, it has not been sent upstream yet, but looks useful.
Please consider to merge this one if you like it.
Otherwise it will probably get dropped to Nirvana as we only keep upstream 
accepted (not SuSE specific) patches.

I adjusted the patch to latest linux-2.6 git HEAD state and gave it a test 
compile.

Thanks,

  Thomas

[PATCH] pstore: automatically dump and clean dmesg entries

2019-03-01 Thread Thomas Renninger

From: Torsten Duwe 

Dump a previous oops or panic, which has made it to pstore,
to the new syslog after reboot, optionally deleting it.
This can happen automatically, without user land interaction.

Signed-off-by: Torsten Duwe 
CC: Thomas Renninger 

---
 fs/pstore/inode.c|  6 +++---
 fs/pstore/internal.h | 11 +--
 fs/pstore/platform.c | 33 +++--
 3 files changed, 39 insertions(+), 11 deletions(-)

Index: test_pstore/fs/pstore/inode.c
===
--- test_pstore.orig/fs/pstore/inode.c  2019-03-01 13:55:50.561322896 +0100
+++ test_pstore/fs/pstore/inode.c   2019-03-01 13:56:05.349323693 +0100
@@ -374,7 +374,7 @@ fail:
  * when we are re-scanning the backing store looking to add new
  * error records.
  */
-void pstore_get_records(int quiet)
+void pstore_get_records(unsigned flags)
 {
struct pstore_info *psi = psinfo;
struct dentry *root;
@@ -385,7 +385,7 @@ void pstore_get_records(int quiet)
root = pstore_sb->s_root;
 
inode_lock(d_inode(root));
-   pstore_get_backend_records(psi, root, quiet);
+   pstore_get_backend_records(psi, root, flags);
inode_unlock(d_inode(root));
 }
 
@@ -415,7 +415,7 @@ static int pstore_fill_super(struct supe
if (!sb->s_root)
return -ENOMEM;
 
-   pstore_get_records(0);
+   pstore_get_records(PGR_VERBOSE|PGR_POPULATE);
 
return 0;
 }
Index: test_pstore/fs/pstore/internal.h
===
--- test_pstore.orig/fs/pstore/internal.h   2019-03-01 13:55:50.565322897 
+0100
+++ test_pstore/fs/pstore/internal.h2019-03-01 13:56:05.349323693 +0100
@@ -28,9 +28,16 @@ static inline void pstore_unregister_pms
 extern struct pstore_info *psinfo;
 
 extern voidpstore_set_kmsg_bytes(int);
-extern voidpstore_get_records(int);
+extern voidpstore_get_records(unsigned);
+/* Flags for the pstore iterator pstore_get_records() */
+#define PGR_QUIET  0
+#define PGR_VERBOSE1
+#define PGR_POPULATE   2
+#define PGR_SYSLOG 4
+#define PGR_CLEAR  8
+
 extern voidpstore_get_backend_records(struct pstore_info *psi,
-  struct dentry *root, int quiet);
+  struct dentry *root, unsigned flags);
 extern int pstore_mkfile(struct dentry *root,
  struct pstore_record *record);
 extern boolpstore_is_mounted(void);
Index: test_pstore/fs/pstore/platform.c
===
--- test_pstore.orig/fs/pstore/platform.c   2019-03-01 13:55:50.565322897 
+0100
+++ test_pstore/fs/pstore/platform.c2019-03-01 13:59:06.077333431 +0100
@@ -87,6 +87,11 @@ static DECLARE_WORK(pstore_work, pstore_
 static DEFINE_SPINLOCK(pstore_lock);
 struct pstore_info *psinfo;
 
+static int auto_action=0;
+module_param(auto_action, int, 0664);
+MODULE_PARM_DESC(auto_action, "action to take on backend "
+"registration: 0=nothing, 1=print, 2=print+clear");
+
 static char *backend;
 static char *compress =
 #ifdef CONFIG_PSTORE_COMPRESS_DEFAULT
@@ -108,6 +113,8 @@ static size_t big_oops_buf_sz;
 
 /* How much of the console log to snapshot */
 unsigned long kmsg_bytes = PSTORE_DEFAULT_KMSG_BYTES;
+module_param(kmsg_bytes, ulong, 0644);
+MODULE_PARM_DESC(kmsg_bytes, "maximum size to save of a crash dump");
 
 void pstore_set_kmsg_bytes(int bytes)
 {
@@ -606,7 +613,11 @@ int pstore_register(struct pstore_info *
allocate_buf_for_compression();
 
if (pstore_is_mounted())
-   pstore_get_records(0);
+   pstore_get_records(PGR_VERBOSE|PGR_POPULATE);
+
+   if (auto_action)
+   pstore_get_records(PGR_SYSLOG|
+  ((auto_action>1)?PGR_CLEAR:0));
 
if (psi->flags & PSTORE_FLAGS_DMESG)
pstore_register_kmsg();
@@ -723,7 +734,7 @@ static void decompress_record(struct pst
  * error records.
  */
 void pstore_get_backend_records(struct pstore_info *psi,
-   struct dentry *root, int quiet)
+   struct dentry *root, unsigned flags)
 {
int failed = 0;
unsigned int stop_loop = 65536;
@@ -742,7 +753,7 @@ void pstore_get_backend_records(struct p
 */
for (; stop_loop; stop_loop--) {
struct pstore_record *record;
-   int rc;
+   int rc = 0;
 
record = kzalloc(sizeof(*record), GFP_KERNEL);
if (!record) {
@@ -760,12 +771,23 @@ void pstore_get_backend_records(struct p
}
 
decompress_record(record);
-   rc = pstore_mkfile(root, record);
+   if (flags & PGR_POPULATE)
+   rc = pstore_mkfile(root, record);
+

Re: [PATCH] cpupower : Auto-completion for cpupower tool

2018-12-05 Thread Thomas Renninger

On Wednesday, December 5, 2018 12:31:04 PM CET Abhishek Goel wrote:
> This script adds support for auto-completion for cpupower tool.
> Added support for auto-completion of all the eight commands for
> cpupower tool and their all subsequent sub-commands, wherever
> possible.

>From what I see all changes I proposed in a previous review round
with a "not that broad audience" have been addressed.

I already integrated the patch (slightly different, without functional 
changes) into the latest cpupower sources of our build service and
things work out nice.

Shuah: Can you pick this up for next cpupower submission round, please.

Thanks,

   Thomas

Re: [PATCH] cpupower : Auto-completion for cpupower tool

2018-12-05 Thread Thomas Renninger

On Wednesday, December 5, 2018 12:31:04 PM CET Abhishek Goel wrote:
> This script adds support for auto-completion for cpupower tool.
> Added support for auto-completion of all the eight commands for
> cpupower tool and their all subsequent sub-commands, wherever
> possible.

>From what I see all changes I proposed in a previous review round
with a "not that broad audience" have been addressed.

I already integrated the patch (slightly different, without functional 
changes) into the latest cpupower sources of our build service and
things work out nice.

Shuah: Can you pick this up for next cpupower submission round, please.

Thanks,

   Thomas

Re: [PATCH 00/10] tools: Various build flags fixes

2018-10-17 Thread Thomas Renninger

On Tuesday, October 16, 2018 5:06:05 PM CEST Jiri Olsa wrote:
> hi,
> while hardening some of the tools rpm, we noticed we
> can't pass build flags to some of them.
> 
> Sending separate tools fixes for what we found. It's
> mostly override for CFLAGS and adding LDFLAGS to the
> build commands.

Looks fine for the cpupower patches 3/10 and 4/10.
(Feel free to add a Reviewed-by or Acked-by for them
if you have to resend).

Thanks,

   Thomas

Re: [PATCH 00/10] tools: Various build flags fixes

2018-10-17 Thread Thomas Renninger

On Tuesday, October 16, 2018 5:06:05 PM CEST Jiri Olsa wrote:
> hi,
> while hardening some of the tools rpm, we noticed we
> can't pass build flags to some of them.
> 
> Sending separate tools fixes for what we found. It's
> mostly override for CFLAGS and adding LDFLAGS to the
> build commands.

Looks fine for the cpupower patches 3/10 and 4/10.
(Feel free to add a Reviewed-by or Acked-by for them
if you have to resend).

Thanks,

   Thomas

Re: [PATCH] tools/power/cpupower: fix compilation with STATIC=true

2018-10-16 Thread Thomas Renninger

On Tuesday, October 16, 2018 10:56:26 AM CEST Konstantin Khlebnikov wrote:
> Rename duplicate sysfs_read_file into cpupower_read_sysfs and fix linking.
> 
> Signed-off-by: Konstantin Khlebnikov 

Acked-by: Thomas Renninger 

Thanks!

> -   $(OUTPUT)../lib/cpufreq.o $(OUTPUT)../lib/sysfs.o
> +   $(OUTPUT)../lib/cpufreq.o $(OUTPUT)../lib/cpupower.o

I also looked at this and would only have fixed half of it (overseeing above
old leftover from cpupower lib introduction).
I guess, this patch from Konstantin is better and the correct solution.

Re: [PATCH] tools/power/cpupower: fix compilation with STATIC=true

2018-10-16 Thread Thomas Renninger

On Tuesday, October 16, 2018 10:56:26 AM CEST Konstantin Khlebnikov wrote:
> Rename duplicate sysfs_read_file into cpupower_read_sysfs and fix linking.
> 
> Signed-off-by: Konstantin Khlebnikov 

Acked-by: Thomas Renninger 

Thanks!

> -   $(OUTPUT)../lib/cpufreq.o $(OUTPUT)../lib/sysfs.o
> +   $(OUTPUT)../lib/cpufreq.o $(OUTPUT)../lib/cpupower.o

I also looked at this and would only have fixed half of it (overseeing above
old leftover from cpupower lib introduction).
I guess, this patch from Konstantin is better and the correct solution.

Re: [PATCH] MAINTAINERS: add maintainer for tools/power/cpupower

2017-11-03 Thread Thomas Renninger

Hi Shuah,


On Thursday, November 02, 2017 01:19:47 PM Shuah Khan wrote:
> Based on discussions with Rafael J. Wysocki, cpupower is need of an
> active maintainer. I decided to on take the task of maintaining this
> tool.

Thanks. It is very much appreciated!
I had a motorcycle accident and had to stop typing at all for several 
weeks/months.

My plan is to be more active again on lists, do reviews again and submit some
patches I have lying around for a long time, etc.,...
Do not hesitate to contact me directly, I try to answer in time, may still 
take a while on more complex things...

Good luck and have fun...,

  Thomas

 
> Patches will flow through the pm sub-systems to the mainline.
> 
> Suggested-by: Rafael J. Wysocki <raf...@kernel.org>
> Signed-off-by: Shuah Khan <shua...@osg.samsung.com>

Acked-by: Thomas Renninger <tr...@suse.com>

> ---
>  MAINTAINERS | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index af0cb69f6a3e..9fd3ce23095a 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -3636,6 +3636,8 @@ F:  drivers/cpufreq/arm_big_little_dt.c
>  
>  CPU POWER MONITORING SUBSYSTEM
>  M:   Thomas Renninger <tr...@suse.com>
> +M:   Shuah Khan <shua...@osg.samsung.com>
> +M:   Shuah Khan <sh...@kernel.org>
>  L:   linux...@vger.kernel.org
>  S:   Maintained
>  F:   tools/power/cpupower/
>

Re: [PATCH] MAINTAINERS: add maintainer for tools/power/cpupower

2017-11-03 Thread Thomas Renninger

Hi Shuah,

On Thursday, November 02, 2017 01:19:47 PM Shuah Khan wrote:
> Based on discussions with Rafael J. Wysocki, cpupower is need of an
> active maintainer. I decided to on take the task of maintaining this
> tool.

Thanks. It is very much appreciated!
I had a motorcycle accident and had to stop typing at all for several 
weeks/months.

My plan is to be more active again on lists, do reviews again and submit some
patches I have lying around for a long time, etc.,...
Do not hesitate to contact me directly, I try to answer in time, may still 
take a while on more complex things...

Good luck and have fun...,

  Thomas

> Patches will flow through the pm sub-systems to the mainline.
> 
> Suggested-by: Rafael J. Wysocki 
> Signed-off-by: Shuah Khan 

Acked-by: Thomas Renninger 

> ---
>  MAINTAINERS | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index af0cb69f6a3e..9fd3ce23095a 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -3636,6 +3636,8 @@ F:  drivers/cpufreq/arm_big_little_dt.c
>  
>  CPU POWER MONITORING SUBSYSTEM
>  M:   Thomas Renninger 
> +M:   Shuah Khan 
> +M:   Shuah Khan 
>  L:   linux...@vger.kernel.org
>  S:   Maintained
>  F:   tools/power/cpupower/
>

Re: [RFC PATCH 2/3] Implement sysfs based cpuinfo for x86 cpus.

2017-06-09 Thread Thomas Renninger

On Thursday, June 08, 2017 08:24:01 PM Greg KH wrote:
> On Thu, Jun 08, 2017 at 06:56:14PM +0200, Felix Schnizlein wrote:
> > ---
> >  arch/x86/kernel/Makefile|   1 +
> >  arch/x86/kernel/cpuinfo_sysfs.c | 166 

> >  drivers/base/cpuinfo.c  |   4 -
> >  3 files changed, 167 insertions(+), 4 deletions(-)
> >  create mode 100644 arch/x86/kernel/cpuinfo_sysfs.c
> > 
> 
> When you add new sysfs entries, you have to also add new
> Documentation/ABI/ entries.

This one seem to be rather unmaintained?
There even is:
Documentation/ABI/stable/sysfs-devices-system-cpu
this patchset would have to add:
Documentation/ABI/stable/sysfs-devices-system-cpu-cpuinfo
then.

But:
Documentation/ABI/stable/sysfs-devices-system-cpu
describes
/sys/devices/system/cpu/dscr_default
and
/sys/devices/system/cpu/cpu[0-9]+/dscr

which I have never seen.

Much more important values in there like:
/sys/devices/system/cpu/topology
/sys/devices/system/cpu/microcode
/sys/devices/system/cpu/cpuidle
/sys/devices/system/cpu/cpufreq
/sys/devices/system/cpu/{online,offline}

are not described there at all.

I wonder whether .../cpu/dscr_default still exists in the
kernel, a quick grep did not reveal anything.
A
Source:
tag would be a nice non-optional addition in the README.

We can also send definitions for topology/microcode/cache/..
to catch up a bit there again, not sure that makes much sense.

Be aware that the output of /proc/cpuinfo is rather arbitrary
depending on architecure and some build options.

  Thomas

Re: [RFC PATCH 2/3] Implement sysfs based cpuinfo for x86 cpus.

2017-06-09 Thread Thomas Renninger

On Thursday, June 08, 2017 08:24:01 PM Greg KH wrote:
> On Thu, Jun 08, 2017 at 06:56:14PM +0200, Felix Schnizlein wrote:
> > ---
> >  arch/x86/kernel/Makefile|   1 +
> >  arch/x86/kernel/cpuinfo_sysfs.c | 166 

> >  drivers/base/cpuinfo.c  |   4 -
> >  3 files changed, 167 insertions(+), 4 deletions(-)
> >  create mode 100644 arch/x86/kernel/cpuinfo_sysfs.c
> > 
> 
> When you add new sysfs entries, you have to also add new
> Documentation/ABI/ entries.

This one seem to be rather unmaintained?
There even is:
Documentation/ABI/stable/sysfs-devices-system-cpu
this patchset would have to add:
Documentation/ABI/stable/sysfs-devices-system-cpu-cpuinfo
then.

But:
Documentation/ABI/stable/sysfs-devices-system-cpu
describes
/sys/devices/system/cpu/dscr_default
and
/sys/devices/system/cpu/cpu[0-9]+/dscr

which I have never seen.

Much more important values in there like:
/sys/devices/system/cpu/topology
/sys/devices/system/cpu/microcode
/sys/devices/system/cpu/cpuidle
/sys/devices/system/cpu/cpufreq
/sys/devices/system/cpu/{online,offline}

are not described there at all.

I wonder whether .../cpu/dscr_default still exists in the
kernel, a quick grep did not reveal anything.
A
Source:
tag would be a nice non-optional addition in the README.

We can also send definitions for topology/microcode/cache/..
to catch up a bit there again, not sure that makes much sense.

Be aware that the output of /proc/cpuinfo is rather arbitrary
depending on architecure and some build options.

  Thomas

Re: [RESEND][PATCHv2] cpupower: Correct return type of cpu_power_is_cpu_online in cpufreq

2016-10-24 Thread Thomas Renninger

On Thursday, October 20, 2016 01:42:49 PM Shilpasri G Bhat wrote:
> On 10/20/2016 04:23 AM, Laura Abbott wrote:
> > When converting to a shared library in ac5a181d065d ("cpupower: Add
> > cpuidle parts into library"), cpu_freq_cpu_exists was converted to
> > cpupower_is_cpu_online. cpu_req_cpu_exists returned 0 on success and
> > -ENOSYS on failure whereas cpupower_is_cpu_online returns 1 on success.
> > Check for the correct return value in cpufreq-set.
> > 
> > See https://bugzilla.redhat.com/show_bug.cgi?id=1374212
> > 
> > Fixes: ac5a181d065d ("cpupower: Add cpuidle parts into library")
> > Reported-by: Julian Seward <jsew...@acm.org>
> > Signed-off-by: Laura Abbott <labb...@redhat.com>
> > ---
> > v2: Drop another redundant cpupower_is_cpu_online
> 
> Reviewed-by:Shilpasri G Bhat <shilpa.b...@linux.vnet.ibm.com>

Acked-by: Thomas Renninger <tr...@suse.com>

I wanted to go through all recently sent patches and adjust rebase them to
my lates changes.
But I guess the fact that I push this from day to day because of other
stuff should not hold off such important fixes.

I collect and review all others as well the next day(s)..

  Thomas

Re: [RESEND][PATCHv2] cpupower: Correct return type of cpu_power_is_cpu_online in cpufreq

2016-10-24 Thread Thomas Renninger

On Thursday, October 20, 2016 01:42:49 PM Shilpasri G Bhat wrote:
> On 10/20/2016 04:23 AM, Laura Abbott wrote:
> > When converting to a shared library in ac5a181d065d ("cpupower: Add
> > cpuidle parts into library"), cpu_freq_cpu_exists was converted to
> > cpupower_is_cpu_online. cpu_req_cpu_exists returned 0 on success and
> > -ENOSYS on failure whereas cpupower_is_cpu_online returns 1 on success.
> > Check for the correct return value in cpufreq-set.
> > 
> > See https://bugzilla.redhat.com/show_bug.cgi?id=1374212
> > 
> > Fixes: ac5a181d065d ("cpupower: Add cpuidle parts into library")
> > Reported-by: Julian Seward 
> > Signed-off-by: Laura Abbott 
> > ---
> > v2: Drop another redundant cpupower_is_cpu_online
> 
> Reviewed-by:Shilpasri G Bhat 

Acked-by: Thomas Renninger 

I wanted to go through all recently sent patches and adjust rebase them to
my lates changes.
But I guess the fact that I push this from day to day because of other
stuff should not hold off such important fixes.

I collect and review all others as well the next day(s)..

  Thomas

Re: Fail to build tools/all

2016-05-02 Thread Thomas Renninger

On Sunday, May 01, 2016 01:11:33 PM Sean Fu wrote:
> Hi guys:
> I encountered a build error when running "make V=1 tools/all".
> Shall we write a patch to fix it?

This is not a bug.

> The following is error log.
> 
>  start ==
> commit 05cf8077e54b20dddb756eaa26f3aeb5c38dd3cf
> Merge: cf78031 db5dd0d
> Author: Linus Torvalds 
> Date:   Fri Apr 1 20:03:33 2016 -0500
> = end  =
> 
> == start ===
> mkdir -p
> /home/sean/work/source/upstream/kernel.linus/linux/tools/power/cpupower
> && make O=/home/sean/work/source/upstream/kernel.linus/linux
> subdir=tools/power/cpupower --no-print-directory -C power/cpupower
> gcc  -DVERSION=\"4.6.rc1.190.g05cf80\" -DPACKAGE=\"cpupower\"
> -DPACKAGE_BUGREPORT=\"linux...@vger.kernel.org\" -D_GNU_SOURCE -pipe
> -DNLS -Wall -Wchar-subscripts -Wpointer-arith -Wsign-compare
> -Wno-pointer-sign -Wdeclaration-after-statement -Wshadow -O1 -g -DDEBUG
> -I./lib -I ./utils -o
> /home/sean/work/source/upstream/kernel.linus/linux/utils/helpers/amd.o
> -c utils/helpers/amd.c
> utils/helpers/amd.c:7:21: fatal error: pci/pci.h: No such file or
> directory
>  #include 

You need pciutils header and library:
ftp://ftp.kernel.org/pub/software/utils/pciutils

Thomas

Re: Fail to build tools/all

2016-05-02 Thread Thomas Renninger

On Sunday, May 01, 2016 01:11:33 PM Sean Fu wrote:
> Hi guys:
> I encountered a build error when running "make V=1 tools/all".
> Shall we write a patch to fix it?

This is not a bug.

> The following is error log.
> 
>  start ==
> commit 05cf8077e54b20dddb756eaa26f3aeb5c38dd3cf
> Merge: cf78031 db5dd0d
> Author: Linus Torvalds 
> Date:   Fri Apr 1 20:03:33 2016 -0500
> = end  =
> 
> == start ===
> mkdir -p
> /home/sean/work/source/upstream/kernel.linus/linux/tools/power/cpupower
> && make O=/home/sean/work/source/upstream/kernel.linus/linux
> subdir=tools/power/cpupower --no-print-directory -C power/cpupower
> gcc  -DVERSION=\"4.6.rc1.190.g05cf80\" -DPACKAGE=\"cpupower\"
> -DPACKAGE_BUGREPORT=\"linux...@vger.kernel.org\" -D_GNU_SOURCE -pipe
> -DNLS -Wall -Wchar-subscripts -Wpointer-arith -Wsign-compare
> -Wno-pointer-sign -Wdeclaration-after-statement -Wshadow -O1 -g -DDEBUG
> -I./lib -I ./utils -o
> /home/sean/work/source/upstream/kernel.linus/linux/utils/helpers/amd.o
> -c utils/helpers/amd.c
> utils/helpers/amd.c:7:21: fatal error: pci/pci.h: No such file or
> directory
>  #include 

You need pciutils header and library:
ftp://ftp.kernel.org/pub/software/utils/pciutils

Thomas

Re: [PATCH] Do not modify MSR_IA32_ENERGY_PERF_BIAS in kernel

2016-03-08 Thread Thomas Renninger

On Monday, March 07, 2016 07:50:57 PM Len Brown wrote:
> > But with Broadwell-EP processor (E5-2687W v4) the CPU will not enter turbo
> > modes if this value is not set to performance
> 
> BDX-EP supports HWP.
> Are these failing machines running in HWP mode?
> 
> (On BDX-EP, and only on BDX-EP, EPB acts to set the BIAS for HWP,
>  because that processor doesn't yet have EPP.)

I am not sure whether I can publish platform info. I asked for and added you
to CC of the private bug a while ago.

This kernel is run: SLES12 SP1, 3.12.49-4-default
I grepped the whole supportconfig for -i hwp and couldn't find anything, so I 
very much expect this machine is/was not run in HWP mode, right?

I still question the usefulness of the "initialize perf bias to normal" hack.
For our distro I am pretty sure, we do not want to have this one and
we prefer the CPU or BIOS initialized value, even (or especially) if it is
set to performance.

Are there any known platforms where this would really be an issue
and how sever would it be?

I already removed the "set perf bias to normal" years ago for SLE11 without
getting any regression or negative reports.

Now finding the workaround on the hack to run a suspend hook to
adjust perf bias value on each suspend cycle... This is going into
a wrong direction.

I agree that if this needs resetting after each suspend cycle, userspace 
should not need to care about it. I could imagine a sysfs variable here:
/sys/devices/system/cpu/intel_pstate/perf_bias

but the static setting from 0 to 6 in the x86 core code and
the suspend callback should get reverted, right?

Thomas

Re: [PATCH] Do not modify MSR_IA32_ENERGY_PERF_BIAS in kernel

2016-03-08 Thread Thomas Renninger

On Monday, March 07, 2016 07:50:57 PM Len Brown wrote:
> > But with Broadwell-EP processor (E5-2687W v4) the CPU will not enter turbo
> > modes if this value is not set to performance
> 
> BDX-EP supports HWP.
> Are these failing machines running in HWP mode?
> 
> (On BDX-EP, and only on BDX-EP, EPB acts to set the BIAS for HWP,
>  because that processor doesn't yet have EPP.)

I am not sure whether I can publish platform info. I asked for and added you
to CC of the private bug a while ago.

This kernel is run: SLES12 SP1, 3.12.49-4-default
I grepped the whole supportconfig for -i hwp and couldn't find anything, so I 
very much expect this machine is/was not run in HWP mode, right?

I still question the usefulness of the "initialize perf bias to normal" hack.
For our distro I am pretty sure, we do not want to have this one and
we prefer the CPU or BIOS initialized value, even (or especially) if it is
set to performance.

Are there any known platforms where this would really be an issue
and how sever would it be?

I already removed the "set perf bias to normal" years ago for SLE11 without
getting any regression or negative reports.

Now finding the workaround on the hack to run a suspend hook to
adjust perf bias value on each suspend cycle... This is going into
a wrong direction.

I agree that if this needs resetting after each suspend cycle, userspace 
should not need to care about it. I could imagine a sysfs variable here:
/sys/devices/system/cpu/intel_pstate/perf_bias

but the static setting from 0 to 6 in the x86 core code and
the suspend callback should get reverted, right?

Thomas

Re: [PATCH] Do not modify perf bias performance setting by default at boot

2016-03-07 Thread Thomas Renninger

On Monday, March 07, 2016 02:24:54 PM Thomas Renninger wrote:
> It came out that on certain CPUs perf bias MSR has to be set to performance,
> so that the CPU enters turbo states at all.
> 
> Better do not try to wrongly adjust perf bias value,
> its value probably is intended by BIOS.

This whole perf bias CPU feature and trying to adjust the value
at boot time gets more and more a mess.

I try to sum this up, try to collect some missing bits and get them
published through mailing lists. Hopefully some root causes of what
went wrong are identified and passed on to the right people to get
things fixed.

1. It started with:
commit abe48b108247e9b90b4c6739662a2e5c765ed114
Author: Len Brown <len.br...@intel.com>
Date:   Thu Jul 14 00:53:24 2011 -0400

x86, intel, power: Initialize MSR_IA32_ENERGY_PERF_BIAS

...
However, the typical BIOS fails to initialize the MSR, presumably
because this is handled by high-volume shrink-wrap operating systems...

Linux distros, on the other hand, do not yet invoke 
x86_energy_perf_policy(8).
As a result, WSM-EP, SNB, and later hardware from Intel will run in its
default hardware power-on state (performance), which assumes that users
care for performance at all costs and not for energy efficiency.
While that is fine for performance benchmarks, the hardware's intended 
default
operating point is "normal" mode...

Initialize the MSR to the "normal" by default during kernel boot.
...

2. Four years later it has been identified that the boot processor's perf bias
switches back to 0 on suspend.

I wonder whether this happens generally. Or do BIOS hooks also reset the BIAS
value on suspend?

3. It has now be identified that specific CPUs can only enter Turbo modes, if
the perf BIAS value is set to 0 (performance). But the performance value cannot
be set anymore by BIOS because of above patches.

My questions:

1. If this is true:
"While that is fine for performance benchmarks, the hardware's intended default
operating point is "normal" mode..."
Then the HW (CPU) must initialize to 5 (normal).
It's not the BIOS' fault which forgets to set something,
this may always be intended.
commit b51ef52df71cb28e9d90cd1d48b79bf19f0bab06
commit 17edf2d79f1ea6dfdb4c444801d928953b9f98d6
commit abe48b108247e9b90b4c6739662a2e5c765ed114

Intel must tell their microcode writers to initialize perf BIAS with 5, if this
is what Intel wants.

2. Why is the value for the boot processor reset to 0 on suspend?
Is this a platform/BIOS issue only?
Or could this be another major misdesign in perf BIAS CPU implementation.

3. What exactly was the problem?
The argumentation that the perf BIAS "should" be 5 was rather vague.
What exactly are the concequences? Getting a dirty workaround into the x86 core
code only by this information without any values or references is bad.

  Thomas

PS: I will submit the patch (revert of commit b51ef52df71cb2, 17edf2d79f1ea6d 
and
abe48b108247e9) now to our SLE code base.

Re: [PATCH] Do not modify perf bias performance setting by default at boot

2016-03-07 Thread Thomas Renninger

On Monday, March 07, 2016 02:24:54 PM Thomas Renninger wrote:
> It came out that on certain CPUs perf bias MSR has to be set to performance,
> so that the CPU enters turbo states at all.
> 
> Better do not try to wrongly adjust perf bias value,
> its value probably is intended by BIOS.

This whole perf bias CPU feature and trying to adjust the value
at boot time gets more and more a mess.

I try to sum this up, try to collect some missing bits and get them
published through mailing lists. Hopefully some root causes of what
went wrong are identified and passed on to the right people to get
things fixed.

1. It started with:
commit abe48b108247e9b90b4c6739662a2e5c765ed114
Author: Len Brown 
Date:   Thu Jul 14 00:53:24 2011 -0400

x86, intel, power: Initialize MSR_IA32_ENERGY_PERF_BIAS

...
However, the typical BIOS fails to initialize the MSR, presumably
because this is handled by high-volume shrink-wrap operating systems...

Linux distros, on the other hand, do not yet invoke 
x86_energy_perf_policy(8).
As a result, WSM-EP, SNB, and later hardware from Intel will run in its
default hardware power-on state (performance), which assumes that users
care for performance at all costs and not for energy efficiency.
While that is fine for performance benchmarks, the hardware's intended 
default
operating point is "normal" mode...

Initialize the MSR to the "normal" by default during kernel boot.
...

2. Four years later it has been identified that the boot processor's perf bias
switches back to 0 on suspend.

I wonder whether this happens generally. Or do BIOS hooks also reset the BIAS
value on suspend?

3. It has now be identified that specific CPUs can only enter Turbo modes, if
the perf BIAS value is set to 0 (performance). But the performance value cannot
be set anymore by BIOS because of above patches.

My questions:

1. If this is true:
"While that is fine for performance benchmarks, the hardware's intended default
operating point is "normal" mode..."
Then the HW (CPU) must initialize to 5 (normal).
It's not the BIOS' fault which forgets to set something,
this may always be intended.
commit b51ef52df71cb28e9d90cd1d48b79bf19f0bab06
commit 17edf2d79f1ea6dfdb4c444801d928953b9f98d6
commit abe48b108247e9b90b4c6739662a2e5c765ed114

Intel must tell their microcode writers to initialize perf BIAS with 5, if this
is what Intel wants.

2. Why is the value for the boot processor reset to 0 on suspend?
Is this a platform/BIOS issue only?
Or could this be another major misdesign in perf BIAS CPU implementation.

3. What exactly was the problem?
The argumentation that the perf BIAS "should" be 5 was rather vague.
What exactly are the concequences? Getting a dirty workaround into the x86 core
code only by this information without any values or references is bad.

  Thomas

PS: I will submit the patch (revert of commit b51ef52df71cb2, 17edf2d79f1ea6d 
and
abe48b108247e9) now to our SLE code base.

[PATCH] Do not modify perf bias performance setting by default at boot

2016-03-07 Thread Thomas Renninger

It came out that on certain CPUs perf bias MSR has to be set to performance,
so that the CPU enters turbo states at all.

Better do not try to wrongly adjust perf bias value,
its value probably is intended by BIOS.

This is a revert of mainline git commits:
commit b51ef52df71cb28e9d90cd1d48b79bf19f0bab06
commit 17edf2d79f1ea6dfdb4c444801d928953b9f98d6
commit abe48b108247e9b90b4c6739662a2e5c765ed114

Signed-off-by: Thomas Renninger <tr...@suse.de>

---
 arch/x86/kernel/cpu/common.c |   18 --
 arch/x86/kernel/cpu/cpu.h|1 -
 arch/x86/kernel/cpu/intel.c  |   32 
 3 files changed, 51 deletions(-)

--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -13,7 +13,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #include 
 #include 
@@ -1489,20 +1488,3 @@ inline bool __static_cpu_has_safe(u16 bi
return boot_cpu_has(bit);
 }
 EXPORT_SYMBOL_GPL(__static_cpu_has_safe);
-
-static void bsp_resume(void)
-{
-   if (this_cpu->c_bsp_resume)
-   this_cpu->c_bsp_resume(_cpu_data);
-}
-
-static struct syscore_ops cpu_syscore_ops = {
-   .resume = bsp_resume,
-};
-
-static int __init init_cpu_syscore(void)
-{
-   register_syscore_ops(_syscore_ops);
-   return 0;
-}
-core_initcall(init_cpu_syscore);
--- a/arch/x86/kernel/cpu/cpu.h
+++ b/arch/x86/kernel/cpu/cpu.h
@@ -13,7 +13,6 @@ struct cpu_dev {
void(*c_init)(struct cpuinfo_x86 *);
void(*c_identify)(struct cpuinfo_x86 *);
void(*c_detect_tlb)(struct cpuinfo_x86 *);
-   void(*c_bsp_resume)(struct cpuinfo_x86 *);
int c_x86_vendor;
 #ifdef CONFIG_X86_32
/* Optional vendor specific routine to obtain the cache size. */
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -372,36 +372,6 @@ static void detect_vmx_virtcap(struct cp
}
 }
 
-static void init_intel_energy_perf(struct cpuinfo_x86 *c)
-{
-   u64 epb;
-
-   /*
-* Initialize MSR_IA32_ENERGY_PERF_BIAS if not already initialized.
-* (x86_energy_perf_policy(8) is available to change it at run-time.)
-*/
-   if (!cpu_has(c, X86_FEATURE_EPB))
-   return;
-
-   rdmsrl(MSR_IA32_ENERGY_PERF_BIAS, epb);
-   if ((epb & 0xF) != ENERGY_PERF_BIAS_PERFORMANCE)
-   return;
-
-   pr_warn_once("ENERGY_PERF_BIAS: Set to 'normal', was 'performance'\n");
-   pr_warn_once("ENERGY_PERF_BIAS: View and update with 
x86_energy_perf_policy(8)\n");
-   epb = (epb & ~0xF) | ENERGY_PERF_BIAS_NORMAL;
-   wrmsrl(MSR_IA32_ENERGY_PERF_BIAS, epb);
-}
-
-static void intel_bsp_resume(struct cpuinfo_x86 *c)
-{
-   /*
-* MSR_IA32_ENERGY_PERF_BIAS is lost across suspend/resume,
-* so reinitialize it properly like during bootup:
-*/
-   init_intel_energy_perf(c);
-}
-
 static void init_intel(struct cpuinfo_x86 *c)
 {
unsigned int l2 = 0;
@@ -509,7 +479,6 @@ static void init_intel(struct cpuinfo_x8
if (cpu_has(c, X86_FEATURE_VMX))
detect_vmx_virtcap(c);
 
-   init_intel_energy_perf(c);
 }
 
 #ifdef CONFIG_X86_32
@@ -764,7 +733,6 @@ static const struct cpu_dev intel_cpu_de
.c_detect_tlb   = intel_detect_tlb,
.c_early_init   = early_init_intel,
.c_init = init_intel,
-   .c_bsp_resume   = intel_bsp_resume,
.c_x86_vendor   = X86_VENDOR_INTEL,
 };

[PATCH] Do not modify perf bias performance setting by default at boot

2016-03-07 Thread Thomas Renninger

It came out that on certain CPUs perf bias MSR has to be set to performance,
so that the CPU enters turbo states at all.

Better do not try to wrongly adjust perf bias value,
its value probably is intended by BIOS.

This is a revert of mainline git commits:
commit b51ef52df71cb28e9d90cd1d48b79bf19f0bab06
commit 17edf2d79f1ea6dfdb4c444801d928953b9f98d6
commit abe48b108247e9b90b4c6739662a2e5c765ed114

Signed-off-by: Thomas Renninger 

---
 arch/x86/kernel/cpu/common.c |   18 --
 arch/x86/kernel/cpu/cpu.h|1 -
 arch/x86/kernel/cpu/intel.c  |   32 
 3 files changed, 51 deletions(-)

--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -13,7 +13,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #include 
 #include 
@@ -1489,20 +1488,3 @@ inline bool __static_cpu_has_safe(u16 bi
return boot_cpu_has(bit);
 }
 EXPORT_SYMBOL_GPL(__static_cpu_has_safe);
-
-static void bsp_resume(void)
-{
-   if (this_cpu->c_bsp_resume)
-   this_cpu->c_bsp_resume(_cpu_data);
-}
-
-static struct syscore_ops cpu_syscore_ops = {
-   .resume = bsp_resume,
-};
-
-static int __init init_cpu_syscore(void)
-{
-   register_syscore_ops(_syscore_ops);
-   return 0;
-}
-core_initcall(init_cpu_syscore);
--- a/arch/x86/kernel/cpu/cpu.h
+++ b/arch/x86/kernel/cpu/cpu.h
@@ -13,7 +13,6 @@ struct cpu_dev {
void(*c_init)(struct cpuinfo_x86 *);
void(*c_identify)(struct cpuinfo_x86 *);
void(*c_detect_tlb)(struct cpuinfo_x86 *);
-   void(*c_bsp_resume)(struct cpuinfo_x86 *);
int c_x86_vendor;
 #ifdef CONFIG_X86_32
/* Optional vendor specific routine to obtain the cache size. */
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -372,36 +372,6 @@ static void detect_vmx_virtcap(struct cp
}
 }
 
-static void init_intel_energy_perf(struct cpuinfo_x86 *c)
-{
-   u64 epb;
-
-   /*
-* Initialize MSR_IA32_ENERGY_PERF_BIAS if not already initialized.
-* (x86_energy_perf_policy(8) is available to change it at run-time.)
-*/
-   if (!cpu_has(c, X86_FEATURE_EPB))
-   return;
-
-   rdmsrl(MSR_IA32_ENERGY_PERF_BIAS, epb);
-   if ((epb & 0xF) != ENERGY_PERF_BIAS_PERFORMANCE)
-   return;
-
-   pr_warn_once("ENERGY_PERF_BIAS: Set to 'normal', was 'performance'\n");
-   pr_warn_once("ENERGY_PERF_BIAS: View and update with 
x86_energy_perf_policy(8)\n");
-   epb = (epb & ~0xF) | ENERGY_PERF_BIAS_NORMAL;
-   wrmsrl(MSR_IA32_ENERGY_PERF_BIAS, epb);
-}
-
-static void intel_bsp_resume(struct cpuinfo_x86 *c)
-{
-   /*
-* MSR_IA32_ENERGY_PERF_BIAS is lost across suspend/resume,
-* so reinitialize it properly like during bootup:
-*/
-   init_intel_energy_perf(c);
-}
-
 static void init_intel(struct cpuinfo_x86 *c)
 {
unsigned int l2 = 0;
@@ -509,7 +479,6 @@ static void init_intel(struct cpuinfo_x8
if (cpu_has(c, X86_FEATURE_VMX))
detect_vmx_virtcap(c);
 
-   init_intel_energy_perf(c);
 }
 
 #ifdef CONFIG_X86_32
@@ -764,7 +733,6 @@ static const struct cpu_dev intel_cpu_de
.c_detect_tlb   = intel_detect_tlb,
.c_early_init   = early_init_intel,
.c_init = init_intel,
-   .c_bsp_resume   = intel_bsp_resume,
.c_x86_vendor   = X86_VENDOR_INTEL,
 };

Re: [PATCH] Do not modify MSR_IA32_ENERGY_PERF_BIAS in kernel

2016-03-04 Thread Thomas Renninger

On Wednesday, March 02, 2016 01:26:18 AM Rafael J. Wysocki wrote:
> On Tuesday, March 01, 2016 01:17:37 PM Thomas Renninger wrote:
> > > > if (!cpu_has(c, X86_FEATURE_EPB))z
> > > > 
> > > > return;
> > > > 
> > > > @@ -387,10 +391,8 @@ static void init_intel_energy_perf(struc
> > > > 
> > > > if ((epb & 0xF) != ENERGY_PERF_BIAS_PERFORMANCE)
> > > > 
> > > > return;
> > > > 
> > > > -   pr_warn_once("ENERGY_PERF_BIAS: Set to 'normal', was
> > 
> > 'performance'\n");
> > 
> > > > -   pr_warn_once("ENERGY_PERF_BIAS: View and update with
> > > > x86_energy_perf_policy(8)\n"); -epb = (epb & ~0xF) |
> > > > ENERGY_PERF_BIAS_NORMAL;
> > > > -   wrmsrl(MSR_IA32_ENERGY_PERF_BIAS, epb);
> > > > +   pr_info_once("ENERGY_PERF_BIAS is set to 'performance'\n");
> > > > +   pr_info_once("ENERGY_PERF_BIAS: Update with cpupower-
set(8)\n");
> > > 
> > > This doesn't need to be cpupower-set IMO.
> > 
> > You mean why switch the message from:
> > x86_energy_perf_policy to cpupower-set
> > ?
> > 
> > IMO x86_energy_perf_policy should not exist. It has been introduce before
> > cpupower set -b.
> > Having an extra tool/binary for this functionality is an unneeded
> > packaging
> > overhead for distros.
> > Also having more and more of such CPU specific tools is not userfriendly.
> > cpupower supports all power relevant features of your CPU and on all
> > architectures (or at least it should). People should know this one better
> > than "x86_energy_perf_policy" and theoretically intuitively find it, even
> > without a message.
> > 
> > So it would be nice to get the message fixed as well.
> 
> My point is that since "cpupower set -b" is not the only way to set this,
> it doesn't seem appropriate to refer to it explicitly from a kernel message.
> 
> I actually don't think the second message is necessary at all.

Hmm, thinking a bit more about this, I think the whole 
init_intel_energy_perf() function check should vanish.

The check should get moved into the powertop userspace tool.
This one is used to optimize platform for power saving features.

This would also keep the kernel core code clean...

If you agree I will send the patch.

  Thomas

Re: [PATCH] Do not modify MSR_IA32_ENERGY_PERF_BIAS in kernel

2016-03-04 Thread Thomas Renninger

On Wednesday, March 02, 2016 01:26:18 AM Rafael J. Wysocki wrote:
> On Tuesday, March 01, 2016 01:17:37 PM Thomas Renninger wrote:
> > > > if (!cpu_has(c, X86_FEATURE_EPB))z
> > > > 
> > > > return;
> > > > 
> > > > @@ -387,10 +391,8 @@ static void init_intel_energy_perf(struc
> > > > 
> > > > if ((epb & 0xF) != ENERGY_PERF_BIAS_PERFORMANCE)
> > > > 
> > > > return;
> > > > 
> > > > -   pr_warn_once("ENERGY_PERF_BIAS: Set to 'normal', was
> > 
> > 'performance'\n");
> > 
> > > > -   pr_warn_once("ENERGY_PERF_BIAS: View and update with
> > > > x86_energy_perf_policy(8)\n"); -epb = (epb & ~0xF) |
> > > > ENERGY_PERF_BIAS_NORMAL;
> > > > -   wrmsrl(MSR_IA32_ENERGY_PERF_BIAS, epb);
> > > > +   pr_info_once("ENERGY_PERF_BIAS is set to 'performance'\n");
> > > > +   pr_info_once("ENERGY_PERF_BIAS: Update with cpupower-
set(8)\n");
> > > 
> > > This doesn't need to be cpupower-set IMO.
> > 
> > You mean why switch the message from:
> > x86_energy_perf_policy to cpupower-set
> > ?
> > 
> > IMO x86_energy_perf_policy should not exist. It has been introduce before
> > cpupower set -b.
> > Having an extra tool/binary for this functionality is an unneeded
> > packaging
> > overhead for distros.
> > Also having more and more of such CPU specific tools is not userfriendly.
> > cpupower supports all power relevant features of your CPU and on all
> > architectures (or at least it should). People should know this one better
> > than "x86_energy_perf_policy" and theoretically intuitively find it, even
> > without a message.
> > 
> > So it would be nice to get the message fixed as well.
> 
> My point is that since "cpupower set -b" is not the only way to set this,
> it doesn't seem appropriate to refer to it explicitly from a kernel message.
> 
> I actually don't think the second message is necessary at all.

Hmm, thinking a bit more about this, I think the whole 
init_intel_energy_perf() function check should vanish.

The check should get moved into the powertop userspace tool.
This one is used to optimize platform for power saving features.

This would also keep the kernel core code clean...

If you agree I will send the patch.

  Thomas

Re: [PATCH] Do not modify MSR_IA32_ENERGY_PERF_BIAS in kernel

2016-03-01 Thread Thomas Renninger

On Saturday, February 27, 2016 12:15:47 AM Rafael J. Wysocki wrote:
> On Friday, February 26, 2016 05:38:00 PM Thomas Renninger wrote:
> > The assumption that BIOSes never want to have this register being set to
> > full performance (zero) is wrong.
> > 
> > While wrongly overruling this BIOS setting and set it from performance
> > to normal did not hurt that much, because nobody really knew the effects
> > inside Intel processors.
> > 
> > But with Broadwell-EP processor (E5-2687W v4) the CPU will not enter turbo
> > modes if this value is not set to performance.
> > 
> > So switch logic to tell the user in a friendly way (info) that the CPU is
> > in performance mode and how to switch via userspace if this is not
> > intended.
> > 
> > But otherwise trust that the BIOS has set the correct value here and do
> > not
> > blindly overrule.
> > 
> > How this has been found: SLE11 had this patch, SLE12 it slipped through.
> > It took quite some time to nail down that this patch missing is the reason
> > for not entering turbo modes with this specific processor.
> > 
> > Signed-off-by: Thomas Renninger <tr...@suse.com>
> > 
> > --- a/arch/x86/kernel/cpu/intel.c   2016-02-26 17:19:55.731042972 +0100
> > +++ b/arch/x86/kernel/cpu/intel.c   2016-02-26 17:20:48.598020581 +0100
> > @@ -377,8 +377,12 @@ static void init_intel_energy_perf(struc
> > 
> > u64 epb;
> > 
> > /*
> > 
> > -* Initialize MSR_IA32_ENERGY_PERF_BIAS if not already initialized.
> > -* (x86_energy_perf_policy(8) is available to change it at run-time.)
> > +* On server platforms energy bias typically is set to
> > +* performance on purpose.
> > +* On other platforms it may happen that MSR_IA32_ENERGY_PERF_BIAS
> > +* did not get initialized properly by BIOS.
> > +* Best is to to keep BIOS settings and give the user a hint whether
> > +* to change it via cpupower-set(8) userspace tool at runtime.
> > 
> >  */
> > 
> > if (!cpu_has(c, X86_FEATURE_EPB))
> > 
> > return;
> > 
> > @@ -387,10 +391,8 @@ static void init_intel_energy_perf(struc
> > 
> > if ((epb & 0xF) != ENERGY_PERF_BIAS_PERFORMANCE)
> > 
> > return;
> > 
> > -   pr_warn_once("ENERGY_PERF_BIAS: Set to 'normal', was 
'performance'\n");
> > -   pr_warn_once("ENERGY_PERF_BIAS: View and update with
> > x86_energy_perf_policy(8)\n"); -epb = (epb & ~0xF) |
> > ENERGY_PERF_BIAS_NORMAL;
> > -   wrmsrl(MSR_IA32_ENERGY_PERF_BIAS, epb);
> > +   pr_info_once("ENERGY_PERF_BIAS is set to 'performance'\n");
> > +   pr_info_once("ENERGY_PERF_BIAS: Update with cpupower-set(8)\n");
> 
> This doesn't need to be cpupower-set IMO.

You mean why switch the message from:
x86_energy_perf_policy to cpupower-set
?

IMO x86_energy_perf_policy should not exist. It has been introduce before
cpupower set -b.
Having an extra tool/binary for this functionality is an unneeded packaging 
overhead for distros.
Also having more and more of such CPU specific tools is not userfriendly.
cpupower supports all power relevant features of your CPU and on all 
architectures (or at least it should). People should know this one better
than "x86_energy_perf_policy" and theoretically intuitively find it, even 
without a message.

So it would be nice to get the message fixed as well.

  Thomas

Re: [PATCH] Do not modify MSR_IA32_ENERGY_PERF_BIAS in kernel

2016-03-01 Thread Thomas Renninger

On Saturday, February 27, 2016 12:15:47 AM Rafael J. Wysocki wrote:
> On Friday, February 26, 2016 05:38:00 PM Thomas Renninger wrote:
> > The assumption that BIOSes never want to have this register being set to
> > full performance (zero) is wrong.
> > 
> > While wrongly overruling this BIOS setting and set it from performance
> > to normal did not hurt that much, because nobody really knew the effects
> > inside Intel processors.
> > 
> > But with Broadwell-EP processor (E5-2687W v4) the CPU will not enter turbo
> > modes if this value is not set to performance.
> > 
> > So switch logic to tell the user in a friendly way (info) that the CPU is
> > in performance mode and how to switch via userspace if this is not
> > intended.
> > 
> > But otherwise trust that the BIOS has set the correct value here and do
> > not
> > blindly overrule.
> > 
> > How this has been found: SLE11 had this patch, SLE12 it slipped through.
> > It took quite some time to nail down that this patch missing is the reason
> > for not entering turbo modes with this specific processor.
> > 
> > Signed-off-by: Thomas Renninger 
> > 
> > --- a/arch/x86/kernel/cpu/intel.c   2016-02-26 17:19:55.731042972 +0100
> > +++ b/arch/x86/kernel/cpu/intel.c   2016-02-26 17:20:48.598020581 +0100
> > @@ -377,8 +377,12 @@ static void init_intel_energy_perf(struc
> > 
> > u64 epb;
> > 
> > /*
> > 
> > -* Initialize MSR_IA32_ENERGY_PERF_BIAS if not already initialized.
> > -* (x86_energy_perf_policy(8) is available to change it at run-time.)
> > +* On server platforms energy bias typically is set to
> > +* performance on purpose.
> > +* On other platforms it may happen that MSR_IA32_ENERGY_PERF_BIAS
> > +* did not get initialized properly by BIOS.
> > +* Best is to to keep BIOS settings and give the user a hint whether
> > +* to change it via cpupower-set(8) userspace tool at runtime.
> > 
> >  */
> > 
> > if (!cpu_has(c, X86_FEATURE_EPB))
> > 
> > return;
> > 
> > @@ -387,10 +391,8 @@ static void init_intel_energy_perf(struc
> > 
> > if ((epb & 0xF) != ENERGY_PERF_BIAS_PERFORMANCE)
> > 
> > return;
> > 
> > -   pr_warn_once("ENERGY_PERF_BIAS: Set to 'normal', was 
'performance'\n");
> > -   pr_warn_once("ENERGY_PERF_BIAS: View and update with
> > x86_energy_perf_policy(8)\n"); -epb = (epb & ~0xF) |
> > ENERGY_PERF_BIAS_NORMAL;
> > -   wrmsrl(MSR_IA32_ENERGY_PERF_BIAS, epb);
> > +   pr_info_once("ENERGY_PERF_BIAS is set to 'performance'\n");
> > +   pr_info_once("ENERGY_PERF_BIAS: Update with cpupower-set(8)\n");
> 
> This doesn't need to be cpupower-set IMO.

You mean why switch the message from:
x86_energy_perf_policy to cpupower-set
?

IMO x86_energy_perf_policy should not exist. It has been introduce before
cpupower set -b.
Having an extra tool/binary for this functionality is an unneeded packaging 
overhead for distros.
Also having more and more of such CPU specific tools is not userfriendly.
cpupower supports all power relevant features of your CPU and on all 
architectures (or at least it should). People should know this one better
than "x86_energy_perf_policy" and theoretically intuitively find it, even 
without a message.

So it would be nice to get the message fixed as well.

  Thomas

Re: [PATCH] Do not modify MSR_IA32_ENERGY_PERF_BIAS in kernel

2016-02-26 Thread Thomas Renninger

This in fact is a re-send, including x86 maintainers.
Even this is a PM (Power Management) issue, the code is in the
x86 architecture paths.

>From last submit:
> > Patch is against latest linux-pm kernel.
> > Rafael: Can you queue this one up, please.

> Well, I'm not an x86 arch maintainer.
> Can you at least CC them, please?

Can someone, best would be a x86 maintainer?, take the patch please if
it is considered ok.

Thanks,

   Thomas

Re: [PATCH] Do not modify MSR_IA32_ENERGY_PERF_BIAS in kernel

2016-02-26 Thread Thomas Renninger

This in fact is a re-send, including x86 maintainers.
Even this is a PM (Power Management) issue, the code is in the
x86 architecture paths.

>From last submit:
> > Patch is against latest linux-pm kernel.
> > Rafael: Can you queue this one up, please.

> Well, I'm not an x86 arch maintainer.
> Can you at least CC them, please?

Can someone, best would be a x86 maintainer?, take the patch please if
it is considered ok.

Thanks,

   Thomas

[PATCH] Do not modify MSR_IA32_ENERGY_PERF_BIAS in kernel

2016-02-26 Thread Thomas Renninger

The assumption that BIOSes never want to have this register being set to
full performance (zero) is wrong.

While wrongly overruling this BIOS setting and set it from performance
to normal did not hurt that much, because nobody really knew the effects inside
Intel processors.

But with Broadwell-EP processor (E5-2687W v4) the CPU will not enter turbo modes
if this value is not set to performance.

So switch logic to tell the user in a friendly way (info) that the CPU is in
performance mode and how to switch via userspace if this is not intended.

But otherwise trust that the BIOS has set the correct value here and do not
blindly overrule.

How this has been found: SLE11 had this patch, SLE12 it slipped through.
It took quite some time to nail down that this patch missing is the reason
for not entering turbo modes with this specific processor.

Signed-off-by: Thomas Renninger <tr...@suse.com>

--- a/arch/x86/kernel/cpu/intel.c   2016-02-26 17:19:55.731042972 +0100
+++ b/arch/x86/kernel/cpu/intel.c   2016-02-26 17:20:48.598020581 +0100
@@ -377,8 +377,12 @@ static void init_intel_energy_perf(struc
u64 epb;
 
/*
-* Initialize MSR_IA32_ENERGY_PERF_BIAS if not already initialized.
-* (x86_energy_perf_policy(8) is available to change it at run-time.)
+* On server platforms energy bias typically is set to
+* performance on purpose.
+* On other platforms it may happen that MSR_IA32_ENERGY_PERF_BIAS
+* did not get initialized properly by BIOS.
+* Best is to to keep BIOS settings and give the user a hint whether
+* to change it via cpupower-set(8) userspace tool at runtime.
 */
if (!cpu_has(c, X86_FEATURE_EPB))
return;
@@ -387,10 +391,8 @@ static void init_intel_energy_perf(struc
if ((epb & 0xF) != ENERGY_PERF_BIAS_PERFORMANCE)
return;
 
-   pr_warn_once("ENERGY_PERF_BIAS: Set to 'normal', was 'performance'\n");
-   pr_warn_once("ENERGY_PERF_BIAS: View and update with 
x86_energy_perf_policy(8)\n");
-   epb = (epb & ~0xF) | ENERGY_PERF_BIAS_NORMAL;
-   wrmsrl(MSR_IA32_ENERGY_PERF_BIAS, epb);
+   pr_info_once("ENERGY_PERF_BIAS is set to 'performance'\n");
+   pr_info_once("ENERGY_PERF_BIAS: Update with cpupower-set(8)\n");
 }
 
 static void intel_bsp_resume(struct cpuinfo_x86 *c)

[PATCH] Do not modify MSR_IA32_ENERGY_PERF_BIAS in kernel

2016-02-26 Thread Thomas Renninger

The assumption that BIOSes never want to have this register being set to
full performance (zero) is wrong.

While wrongly overruling this BIOS setting and set it from performance
to normal did not hurt that much, because nobody really knew the effects inside
Intel processors.

But with Broadwell-EP processor (E5-2687W v4) the CPU will not enter turbo modes
if this value is not set to performance.

So switch logic to tell the user in a friendly way (info) that the CPU is in
performance mode and how to switch via userspace if this is not intended.

But otherwise trust that the BIOS has set the correct value here and do not
blindly overrule.

How this has been found: SLE11 had this patch, SLE12 it slipped through.
It took quite some time to nail down that this patch missing is the reason
for not entering turbo modes with this specific processor.

Signed-off-by: Thomas Renninger 

--- a/arch/x86/kernel/cpu/intel.c   2016-02-26 17:19:55.731042972 +0100
+++ b/arch/x86/kernel/cpu/intel.c   2016-02-26 17:20:48.598020581 +0100
@@ -377,8 +377,12 @@ static void init_intel_energy_perf(struc
u64 epb;
 
/*
-* Initialize MSR_IA32_ENERGY_PERF_BIAS if not already initialized.
-* (x86_energy_perf_policy(8) is available to change it at run-time.)
+* On server platforms energy bias typically is set to
+* performance on purpose.
+* On other platforms it may happen that MSR_IA32_ENERGY_PERF_BIAS
+* did not get initialized properly by BIOS.
+* Best is to to keep BIOS settings and give the user a hint whether
+* to change it via cpupower-set(8) userspace tool at runtime.
 */
if (!cpu_has(c, X86_FEATURE_EPB))
return;
@@ -387,10 +391,8 @@ static void init_intel_energy_perf(struc
if ((epb & 0xF) != ENERGY_PERF_BIAS_PERFORMANCE)
return;
 
-   pr_warn_once("ENERGY_PERF_BIAS: Set to 'normal', was 'performance'\n");
-   pr_warn_once("ENERGY_PERF_BIAS: View and update with 
x86_energy_perf_policy(8)\n");
-   epb = (epb & ~0xF) | ENERGY_PERF_BIAS_NORMAL;
-   wrmsrl(MSR_IA32_ENERGY_PERF_BIAS, epb);
+   pr_info_once("ENERGY_PERF_BIAS is set to 'performance'\n");
+   pr_info_once("ENERGY_PERF_BIAS: Update with cpupower-set(8)\n");
 }
 
 static void intel_bsp_resume(struct cpuinfo_x86 *c)

Re: [PATCH] cpupower: Fix build error in cpufreq-info

2016-02-17 Thread Thomas Renninger

On Monday, January 18, 2016 08:44:43 PM Shreyas B. Prabhu wrote:
> Fix the following build error by including limits.h -
> 
> utils/cpufreq-info.c: In function ‘get_latency’:
> utils/cpufreq-info.c:437:29: error: ‘UINT_MAX’ undeclared (first use in
> this function)
>   if (!latency || latency == UINT_MAX) {
>  ^
> Signed-off-by: Shreyas B. Prabhu <shre...@linux.vnet.ibm.com>

Signed-off-by: Thomas Renninger <tr...@suse.com>

Builds for me by luck, thanks!


   Thomas

Re: [PATCH] cpupower: Fix build error in cpufreq-info

2016-02-17 Thread Thomas Renninger

On Monday, January 18, 2016 08:44:43 PM Shreyas B. Prabhu wrote:
> Fix the following build error by including limits.h -
> 
> utils/cpufreq-info.c: In function ‘get_latency’:
> utils/cpufreq-info.c:437:29: error: ‘UINT_MAX’ undeclared (first use in
> this function)
>   if (!latency || latency == UINT_MAX) {
>  ^
> Signed-off-by: Shreyas B. Prabhu 

Signed-off-by: Thomas Renninger 

Builds for me by luck, thanks!


   Thomas

Re: [PATCH] acpi/ec: Deny write access unless requested by module param

2016-02-17 Thread Thomas Renninger

On Saturday, February 06, 2016 02:08:08 AM gr...@linuxhacker.ru wrote:
> From: Oleg Drokin <gr...@linuxhacker.ru>
> 
> In debugfs it's not enough to just set file mode to read-only to
> deny write access to a file, instead just don't provide
> the write method unless write access is really requested.
> 
> Signed-off-by: Oleg Drokin <gr...@linuxhacker.ru>
Signed-off-by: Thomas Renninger <tr...@suse.com>

Thanks!

> ---
> I assume allowing run-time changes via /sys/module is preferrable,
> opposed to forced module unload and reload to change this option,
> but I can submit another patch to only depend on the module parameter
> too, please let me know.
> 
>  drivers/acpi/ec_sys.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/acpi/ec_sys.c b/drivers/acpi/ec_sys.c
> index bea8e42..6c7dd7a 100644
> --- a/drivers/acpi/ec_sys.c
> +++ b/drivers/acpi/ec_sys.c
> @@ -73,6 +73,9 @@ static ssize_t acpi_ec_write_io(struct file *f, const char
> __user *buf, loff_t init_off = *off;
>   int err = 0;
> 
> + if (!write_support)
> + return -EINVAL;
> +
>   if (*off >= EC_SPACE_SIZE)
>   return 0;
>   if (*off + count >= EC_SPACE_SIZE) {

Re: [PATCH] acpi/ec: Deny write access unless requested by module param

2016-02-17 Thread Thomas Renninger

On Saturday, February 06, 2016 02:08:08 AM gr...@linuxhacker.ru wrote:
> From: Oleg Drokin 
> 
> In debugfs it's not enough to just set file mode to read-only to
> deny write access to a file, instead just don't provide
> the write method unless write access is really requested.
> 
> Signed-off-by: Oleg Drokin 
Signed-off-by: Thomas Renninger 

Thanks!

> ---
> I assume allowing run-time changes via /sys/module is preferrable,
> opposed to forced module unload and reload to change this option,
> but I can submit another patch to only depend on the module parameter
> too, please let me know.
> 
>  drivers/acpi/ec_sys.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/acpi/ec_sys.c b/drivers/acpi/ec_sys.c
> index bea8e42..6c7dd7a 100644
> --- a/drivers/acpi/ec_sys.c
> +++ b/drivers/acpi/ec_sys.c
> @@ -73,6 +73,9 @@ static ssize_t acpi_ec_write_io(struct file *f, const char
> __user *buf, loff_t init_off = *off;
>   int err = 0;
> 
> + if (!write_support)
> + return -EINVAL;
> +
>   if (*off >= EC_SPACE_SIZE)
>   return 0;
>   if (*off + count >= EC_SPACE_SIZE) {

Re: Do not modify MSR_IA32_ENERGY_PERF_BIAS in kernel

2016-01-22 Thread Thomas Renninger

On Friday, January 22, 2016 01:09:15 PM Thomas Renninger wrote:
> The assumption that BIOSes never want to have this register being set to
> full performance (zero) is wrong.

Patch is against latest linux-pm kernel.
Rafael: Can you queue this one up, please.

Thanks,

Thomas

Do not modify MSR_IA32_ENERGY_PERF_BIAS in kernel

2016-01-22 Thread Thomas Renninger

The assumption that BIOSes never want to have this register being set to
full performance (zero) is wrong.

While wrongly overruling this BIOS setting and set it to from performance
to normal did not hurt that much, because nobody really knew the effects inside
Intel processors.

But with Broadwell-EP processor (E5-2687W v4) the CPU will not enter turbo modes
if this value is not set to performance.

So switch logic to tell the user in a friendly way (info) that the CPU is in
performance mode and how to switch via userspace if this is not intended.

But otherwise trust that the BIOS has set the correct value here and do not
blindly overrule.

How this has been found: SLE11 had this patch, SLE12 it slipped through.
It took quite some time to nail down that this patch missing is the reason
for not entering turbo modes with this specific processor.

Signed-off-by: Thomas Renninger 

--- arch/x86/kernel/cpu/intel.c.orig2016-01-22 12:49:03.347906007 +0100
+++ arch/x86/kernel/cpu/intel.c 2016-01-22 12:53:33.043234963 +0100
@@ -377,8 +377,14 @@
u64 epb;
 
/*
-* Initialize MSR_IA32_ENERGY_PERF_BIAS if not already initialized.
-* (x86_energy_perf_policy(8) is available to change it at run-time.)
+* On server platforms energy bias typically is set to
+* performance on purpose.
+* On other platforms it may happen that MSR_IA32_ENERGY_PERF_BIAS
+* did not get initialized properly by BIOS.
+* An additional check could be to look at pm_profile and check whether
+* this is a performance oriented platform.
+* But best is to to keep BIOS settings and give the user a hint whether
+* to change it via cpupower-set(8) userspace tool at runtime.
 */
if (!cpu_has(c, X86_FEATURE_EPB))
return;
@@ -387,10 +393,8 @@
if ((epb & 0xF) != ENERGY_PERF_BIAS_PERFORMANCE)
return;
 
-   pr_warn_once("ENERGY_PERF_BIAS: Set to 'normal', was 'performance'\n");
-   pr_warn_once("ENERGY_PERF_BIAS: View and update with 
x86_energy_perf_policy(8)\n");
-   epb = (epb & ~0xF) | ENERGY_PERF_BIAS_NORMAL;
-   wrmsrl(MSR_IA32_ENERGY_PERF_BIAS, epb);
+   pr_info_once("ENERGY_PERF_BIAS is set to 'performance'\n");
+   pr_info_once("ENERGY_PERF_BIAS: Update with cpupower-set(8)\n");
 }
 
 static void intel_bsp_resume(struct cpuinfo_x86 *c)

Do not modify MSR_IA32_ENERGY_PERF_BIAS in kernel

2016-01-22 Thread Thomas Renninger

The assumption that BIOSes never want to have this register being set to
full performance (zero) is wrong.

While wrongly overruling this BIOS setting and set it to from performance
to normal did not hurt that much, because nobody really knew the effects inside
Intel processors.

But with Broadwell-EP processor (E5-2687W v4) the CPU will not enter turbo modes
if this value is not set to performance.

So switch logic to tell the user in a friendly way (info) that the CPU is in
performance mode and how to switch via userspace if this is not intended.

But otherwise trust that the BIOS has set the correct value here and do not
blindly overrule.

How this has been found: SLE11 had this patch, SLE12 it slipped through.
It took quite some time to nail down that this patch missing is the reason
for not entering turbo modes with this specific processor.

Signed-off-by: Thomas Renninger <tr...@suse.com>

--- arch/x86/kernel/cpu/intel.c.orig2016-01-22 12:49:03.347906007 +0100
+++ arch/x86/kernel/cpu/intel.c 2016-01-22 12:53:33.043234963 +0100
@@ -377,8 +377,14 @@
u64 epb;
 
/*
-* Initialize MSR_IA32_ENERGY_PERF_BIAS if not already initialized.
-* (x86_energy_perf_policy(8) is available to change it at run-time.)
+* On server platforms energy bias typically is set to
+* performance on purpose.
+* On other platforms it may happen that MSR_IA32_ENERGY_PERF_BIAS
+* did not get initialized properly by BIOS.
+* An additional check could be to look at pm_profile and check whether
+* this is a performance oriented platform.
+* But best is to to keep BIOS settings and give the user a hint whether
+* to change it via cpupower-set(8) userspace tool at runtime.
 */
if (!cpu_has(c, X86_FEATURE_EPB))
return;
@@ -387,10 +393,8 @@
if ((epb & 0xF) != ENERGY_PERF_BIAS_PERFORMANCE)
return;
 
-   pr_warn_once("ENERGY_PERF_BIAS: Set to 'normal', was 'performance'\n");
-   pr_warn_once("ENERGY_PERF_BIAS: View and update with 
x86_energy_perf_policy(8)\n");
-   epb = (epb & ~0xF) | ENERGY_PERF_BIAS_NORMAL;
-   wrmsrl(MSR_IA32_ENERGY_PERF_BIAS, epb);
+   pr_info_once("ENERGY_PERF_BIAS is set to 'performance'\n");
+   pr_info_once("ENERGY_PERF_BIAS: Update with cpupower-set(8)\n");
 }
 
 static void intel_bsp_resume(struct cpuinfo_x86 *c)

Re: Do not modify MSR_IA32_ENERGY_PERF_BIAS in kernel

2016-01-22 Thread Thomas Renninger

On Friday, January 22, 2016 01:09:15 PM Thomas Renninger wrote:
> The assumption that BIOSes never want to have this register being set to
> full performance (zero) is wrong.

Patch is against latest linux-pm kernel.
Rafael: Can you queue this one up, please.

Thanks,

Thomas

Re: [PATCH 1/2] cpu hotplug, add CONFIG_PERMANENT_CPU_TOPOLOGY and keep topology directory for lifetime of CPU [v2]

2015-10-27 Thread Thomas Renninger

On Wednesday, October 21, 2015 12:06:37 PM Prarit Bhargava wrote:

...

> +config PERMANENT_CPU_TOPOLOGY
> + bool "Permanent CPU Topology"
> + depends on HOTPLUG_CPU
> + default 1 if X86
> + default 0
> + help
> +   This option configures CPU topology to be permanent for the lifetime
> +   of the CPU (until it is physically removed).

I cannot see where you differ soft offlined and physically removed?
The topology file is simply never removed or do I oversee something?
Hm, maybe this one is even correct, but could get re-phrased a bit.

Puhh, not sure, maybe this:
> +   This option configures CPU topology to be permanent.
should be enough?
It is obvious that topology info is still correct when the CPU is software
offlined via:
echo 0 >/sys/devices/system/cpu/cpuX/online

But if someone really hit the button to eject a Numa node or
so, this info might not be up-to-date (but still avail now, because we
cannot differ software vs hardware events, at least not that easy).

But it is up-to-date, once the newly plugged in Numa node or CPU gets
added and ramped up, so this change should be very fine.

Maybe it's worth to phrase above into the changelog at least?

I think this is a sane change and works!
If physically removed, the topology info could be outdated, but userspace
knows, that the core is offlined right now. Also the info will likely
be the same when something gets re-plugged. If not, topology info will be
valid once the re-plugged core gets initialized, so everything should be fine.

I put these two patches onto our latest kernel builds, enabled the .config
option for i386 and x86_64, disabled it for other archs and things still built
nicely:
https://build.opensuse.org/project/show/home:trenn:kernel

One issue:
Why do you move topology.c into cpu.c?
It is hard to see the real diff now.
If moving makes sense, it would be nice to first submit a patch
showing the changes for this feature and another patch saying:
"Eleminating topology.c, only code move, no functional change"
in the changelog.

Feel free to add a Reviewed-by: Thomas Renninger 
if you plan to re-submit.

Thanks!

Thomas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] cpu hotplug, add CONFIG_PERMANENT_CPU_TOPOLOGY and keep topology directory for lifetime of CPU [v2]

2015-10-27 Thread Thomas Renninger

On Wednesday, October 21, 2015 12:06:37 PM Prarit Bhargava wrote:

...

> +config PERMANENT_CPU_TOPOLOGY
> + bool "Permanent CPU Topology"
> + depends on HOTPLUG_CPU
> + default 1 if X86
> + default 0
> + help
> +   This option configures CPU topology to be permanent for the lifetime
> +   of the CPU (until it is physically removed).

I cannot see where you differ soft offlined and physically removed?
The topology file is simply never removed or do I oversee something?
Hm, maybe this one is even correct, but could get re-phrased a bit.

Puhh, not sure, maybe this:
> +   This option configures CPU topology to be permanent.
should be enough?
It is obvious that topology info is still correct when the CPU is software
offlined via:
echo 0 >/sys/devices/system/cpu/cpuX/online

But if someone really hit the button to eject a Numa node or
so, this info might not be up-to-date (but still avail now, because we
cannot differ software vs hardware events, at least not that easy).

But it is up-to-date, once the newly plugged in Numa node or CPU gets
added and ramped up, so this change should be very fine.

Maybe it's worth to phrase above into the changelog at least?

I think this is a sane change and works!
If physically removed, the topology info could be outdated, but userspace
knows, that the core is offlined right now. Also the info will likely
be the same when something gets re-plugged. If not, topology info will be
valid once the re-plugged core gets initialized, so everything should be fine.

I put these two patches onto our latest kernel builds, enabled the .config
option for i386 and x86_64, disabled it for other archs and things still built
nicely:
https://build.opensuse.org/project/show/home:trenn:kernel

One issue:
Why do you move topology.c into cpu.c?
It is hard to see the real diff now.
If moving makes sense, it would be nice to first submit a patch
showing the changes for this feature and another patch saying:
"Eleminating topology.c, only code move, no functional change"
in the changelog.

Feel free to add a Reviewed-by: Thomas Renninger <tr...@suse.com>
if you plan to re-submit.

Thanks!

Thomas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] cpupower tools: Fix error when running cpupower monitor

2015-08-10 Thread Thomas Renninger

On Monday, August 03, 2015 11:46:00 AM Shreyas B. Prabhu wrote:
> get_cpu_topology() tries to get topology info from all cpus by reading
> files in the topology sysfs dir. If a cpu is offlined, since it doesn't
> have topology dir, this function fails and returns -1. This causes
> functions relying on get_cpu_topology() to fail. For example-
> 
> $ cpupower monitor
> Cannot read number of available processors
> 
> Fix this by skipping fetching topology info for offline cpus.

Looks fine.

Thanks!

Acked-by: Thomas Renninger 


> 
> Signed-off-by: Shreyas B. Prabhu 
> Reported-by: Pavaman Subramaniyam 
> ---
>  tools/power/cpupower/utils/helpers/topology.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/tools/power/cpupower/utils/helpers/topology.c
> b/tools/power/cpupower/utils/helpers/topology.c index
> c13120af519b..cea398c176e7 100644
> --- a/tools/power/cpupower/utils/helpers/topology.c
> +++ b/tools/power/cpupower/utils/helpers/topology.c
> @@ -73,6 +73,8 @@ int get_cpu_topology(struct cpupower_topology *cpu_top)
>   for (cpu = 0; cpu < cpus; cpu++) {
>   cpu_top->core_info[cpu].cpu = cpu;
>   cpu_top->core_info[cpu].is_online = sysfs_is_cpu_online(cpu);
> + if (!cpu_top->core_info[cpu].is_online)
> + continue;
>   if(sysfs_topology_read_file(
>   cpu,
>   "physical_package_id",

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] cpupower tools: Fix error when running cpupower monitor

2015-08-10 Thread Thomas Renninger

On Monday, August 03, 2015 11:46:00 AM Shreyas B. Prabhu wrote:
 get_cpu_topology() tries to get topology info from all cpus by reading
 files in the topology sysfs dir. If a cpu is offlined, since it doesn't
 have topology dir, this function fails and returns -1. This causes
 functions relying on get_cpu_topology() to fail. For example-
 
 $ cpupower monitor
 Cannot read number of available processors
 
 Fix this by skipping fetching topology info for offline cpus.

Looks fine.

Thanks!

Acked-by: Thomas Renninger tr...@suse.de


 
 Signed-off-by: Shreyas B. Prabhu shre...@linux.vnet.ibm.com
 Reported-by: Pavaman Subramaniyam pavsu...@linux.vnet.ibm.com
 ---
  tools/power/cpupower/utils/helpers/topology.c | 2 ++
  1 file changed, 2 insertions(+)
 
 diff --git a/tools/power/cpupower/utils/helpers/topology.c
 b/tools/power/cpupower/utils/helpers/topology.c index
 c13120af519b..cea398c176e7 100644
 --- a/tools/power/cpupower/utils/helpers/topology.c
 +++ b/tools/power/cpupower/utils/helpers/topology.c
 @@ -73,6 +73,8 @@ int get_cpu_topology(struct cpupower_topology *cpu_top)
   for (cpu = 0; cpu  cpus; cpu++) {
   cpu_top-core_info[cpu].cpu = cpu;
   cpu_top-core_info[cpu].is_online = sysfs_is_cpu_online(cpu);
 + if (!cpu_top-core_info[cpu].is_online)
 + continue;
   if(sysfs_topology_read_file(
   cpu,
   physical_package_id,

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] cpupower: Do not change the frequency of offline cpu

2015-07-22 Thread Thomas Renninger

On Wednesday, July 22, 2015 01:50:49 PM Shilpasri G Bhat wrote:
> Check if the cpu is online before changing the frequency/governor of
> the cpu.
> 
> Signed-off-by: Shilpasri G Bhat 
> ---
>  tools/power/cpupower/utils/cpufreq-set.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/tools/power/cpupower/utils/cpufreq-set.c
> b/tools/power/cpupower/utils/cpufreq-set.c index f656e58..4e21357 100644
> --- a/tools/power/cpupower/utils/cpufreq-set.c
> +++ b/tools/power/cpupower/utils/cpufreq-set.c
> @@ -17,6 +17,7 @@
> 
>  #include "cpufreq.h"
>  #include "helpers/helpers.h"
> +#include "helpers/sysfs.h"
> 
>  #define NORM_FREQ_LEN 32
> 
> @@ -318,6 +319,9 @@ int cmd_freq_set(int argc, char **argv)
>   cpufreq_cpu_exists(cpu))
>   continue;
> 
> + if (sysfs_is_cpu_online(cpu) != 1)
> + continue;
> +
>   printf(_("Setting cpu: %d\n"), cpu);
>   ret = do_one_cpu(cpu, _pol, freq, policychange);
>   if (ret) {

Thanks for catching this one and sending a patch!

Acked-by: Thomas Renninger 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] cpupower: Do not change the frequency of offline cpu

2015-07-22 Thread Thomas Renninger

On Wednesday, July 22, 2015 01:50:49 PM Shilpasri G Bhat wrote:
 Check if the cpu is online before changing the frequency/governor of
 the cpu.
 
 Signed-off-by: Shilpasri G Bhat shilpa.b...@linux.vnet.ibm.com
 ---
  tools/power/cpupower/utils/cpufreq-set.c | 4 
  1 file changed, 4 insertions(+)
 
 diff --git a/tools/power/cpupower/utils/cpufreq-set.c
 b/tools/power/cpupower/utils/cpufreq-set.c index f656e58..4e21357 100644
 --- a/tools/power/cpupower/utils/cpufreq-set.c
 +++ b/tools/power/cpupower/utils/cpufreq-set.c
 @@ -17,6 +17,7 @@
 
  #include cpufreq.h
  #include helpers/helpers.h
 +#include helpers/sysfs.h
 
  #define NORM_FREQ_LEN 32
 
 @@ -318,6 +319,9 @@ int cmd_freq_set(int argc, char **argv)
   cpufreq_cpu_exists(cpu))
   continue;
 
 + if (sysfs_is_cpu_online(cpu) != 1)
 + continue;
 +
   printf(_(Setting cpu: %d\n), cpu);
   ret = do_one_cpu(cpu, new_pol, freq, policychange);
   if (ret) {

Thanks for catching this one and sending a patch!

Acked-by: Thomas Renninger tr...@suse.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] cpupower: mperf monitor: fix output in MAX_FREQ_SYSFS mode

2015-05-27 Thread Thomas Renninger

Hi,

and sorry for the late answer, this had not highest prio...

On Wednesday, May 20, 2015 02:26:47 AM Rafael J. Wysocki wrote:
> On Monday, May 11, 2015 12:40:22 PM Herton R. Krzesinski wrote:
> > On Mon, May 11, 2015 at 11:35:35AM -0300, Herton R. Krzesinski wrote:
> > > There is clearly wrong output when mperf monitor runs in MAX_FREQ_SYSFS
> > > mode: average frequency shows in kHz unit (despite the intended output
> > > to be in MHz), and percentages for C state information are all wrong
> > > (including high/negative values shown).
> > > 
> > > The problem is that the max_frequency read on initialization isn't used
> > > where it should have been used on mperf_get_count_percent (to estimate
> > > the number of ticks in the given time period), and the value we read
> > > from sysfs is in kHz, so we must divide it to get the MHz value to use
> > > in current calculations.
> > > 
> > > While at it, also I fixed another small issues in the debug output of
> > > max_frequency value in mperf_get_count_freq.
> > > 
> > > Signed-off-by: Herton R. Krzesinski 
> > > ---
> > > 
> > >  tools/power/cpupower/utils/idle_monitor/mperf_monitor.c | 5 +++--
> > >  1 file changed, 3 insertions(+), 2 deletions(-)
> > 
> > Actually please consider v2 patch below, just a minor change in the debug
> > output, which isn't a percentage...
> 
> Thomas, any comments?

I tried to find fitting HW without much success.
The MAX_FREQ_SYSFS is used only on rare HW and it may never got a proper test.

Patch looks sane. Even I cannot add a Tested-by:
you may add:

Acked-by: Thomas Renninger 

and add it.

Thanks,

   Thomas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] cpupower: mperf monitor: fix output in MAX_FREQ_SYSFS mode

2015-05-27 Thread Thomas Renninger

Hi,

and sorry for the late answer, this had not highest prio...

On Wednesday, May 20, 2015 02:26:47 AM Rafael J. Wysocki wrote:
 On Monday, May 11, 2015 12:40:22 PM Herton R. Krzesinski wrote:
  On Mon, May 11, 2015 at 11:35:35AM -0300, Herton R. Krzesinski wrote:
   There is clearly wrong output when mperf monitor runs in MAX_FREQ_SYSFS
   mode: average frequency shows in kHz unit (despite the intended output
   to be in MHz), and percentages for C state information are all wrong
   (including high/negative values shown).
   
   The problem is that the max_frequency read on initialization isn't used
   where it should have been used on mperf_get_count_percent (to estimate
   the number of ticks in the given time period), and the value we read
   from sysfs is in kHz, so we must divide it to get the MHz value to use
   in current calculations.
   
   While at it, also I fixed another small issues in the debug output of
   max_frequency value in mperf_get_count_freq.
   
   Signed-off-by: Herton R. Krzesinski her...@redhat.com
   ---
   
tools/power/cpupower/utils/idle_monitor/mperf_monitor.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
  
  Actually please consider v2 patch below, just a minor change in the debug
  output, which isn't a percentage...
 
 Thomas, any comments?

I tried to find fitting HW without much success.
The MAX_FREQ_SYSFS is used only on rare HW and it may never got a proper test.

Patch looks sane. Even I cannot add a Tested-by:
you may add:

Acked-by: Thomas Renninger tr...@suse.de

and add it.

Thanks,

   Thomas
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] cpupower: fix breakage from libpci API change

2015-04-14 Thread Thomas Renninger

Hi,

On Monday, April 13, 2015 10:24:01 PM Lucas Stach wrote:
> libpci 3.3.0 introduced an additional member in the pci_filter struct
> which needs to be initialized to -1 to get the same behavior as before
> the API change.
Sounds not that clever, but there probably is a reason for this...

I am not that familiar with the pci lib and its recent changes, but
below patch looks reasonable.

Acked-by: Thomas Renninger 


> The libpci internal helpers got updated accordingly,
> but as the cpupower pci helpers initialized the struct themselves the
> behavior changed.
> 
> Use the libpci helper pci_filter_init() to fix this and guard against
> similar breakages in the future.
> 
> This fixes probing of the AMD fam12h/14h cpuidle monitor on systems
> with libpci >= 3.3.0.
> 
> Signed-off-by: Lucas Stach 
> ---
>  tools/power/cpupower/utils/helpers/pci.c | 11 +--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/power/cpupower/utils/helpers/pci.c
> b/tools/power/cpupower/utils/helpers/pci.c index 9690798..8b27898 100644
> --- a/tools/power/cpupower/utils/helpers/pci.c
> +++ b/tools/power/cpupower/utils/helpers/pci.c
> @@ -25,14 +25,21 @@
>  struct pci_dev *pci_acc_init(struct pci_access **pacc, int domain, int bus,
> int slot, int func, int vendor, int dev)
>  {
> - struct pci_filter filter_nb_link = { domain, bus, slot, func,
> -  vendor, dev };
> + struct pci_filter filter_nb_link;
>   struct pci_dev *device;
> 
>   *pacc = pci_alloc();
>   if (*pacc == NULL)
>   return NULL;
> 
> + pci_filter_init(*pacc, _nb_link);
> + filter_nb_link.domain   = domain;
> + filter_nb_link.bus  = bus;
> + filter_nb_link.slot = slot;
> + filter_nb_link.func = func;
> + filter_nb_link.vendor   = vendor;
> + filter_nb_link.device   = dev;
> +
>   pci_init(*pacc);
>   pci_scan_bus(*pacc);

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] cpupower: fix breakage from libpci API change

2015-04-14 Thread Thomas Renninger

Hi,

On Monday, April 13, 2015 10:24:01 PM Lucas Stach wrote:
 libpci 3.3.0 introduced an additional member in the pci_filter struct
 which needs to be initialized to -1 to get the same behavior as before
 the API change.
Sounds not that clever, but there probably is a reason for this...

I am not that familiar with the pci lib and its recent changes, but
below patch looks reasonable.

Acked-by: Thomas Renninger tr...@suse.de


 The libpci internal helpers got updated accordingly,
 but as the cpupower pci helpers initialized the struct themselves the
 behavior changed.
 
 Use the libpci helper pci_filter_init() to fix this and guard against
 similar breakages in the future.
 
 This fixes probing of the AMD fam12h/14h cpuidle monitor on systems
 with libpci = 3.3.0.
 
 Signed-off-by: Lucas Stach d...@lynxeye.de
 ---
  tools/power/cpupower/utils/helpers/pci.c | 11 +--
  1 file changed, 9 insertions(+), 2 deletions(-)
 
 diff --git a/tools/power/cpupower/utils/helpers/pci.c
 b/tools/power/cpupower/utils/helpers/pci.c index 9690798..8b27898 100644
 --- a/tools/power/cpupower/utils/helpers/pci.c
 +++ b/tools/power/cpupower/utils/helpers/pci.c
 @@ -25,14 +25,21 @@
  struct pci_dev *pci_acc_init(struct pci_access **pacc, int domain, int bus,
 int slot, int func, int vendor, int dev)
  {
 - struct pci_filter filter_nb_link = { domain, bus, slot, func,
 -  vendor, dev };
 + struct pci_filter filter_nb_link;
   struct pci_dev *device;
 
   *pacc = pci_alloc();
   if (*pacc == NULL)
   return NULL;
 
 + pci_filter_init(*pacc, filter_nb_link);
 + filter_nb_link.domain   = domain;
 + filter_nb_link.bus  = bus;
 + filter_nb_link.slot = slot;
 + filter_nb_link.func = func;
 + filter_nb_link.vendor   = vendor;
 + filter_nb_link.device   = dev;
 +
   pci_init(*pacc);
   pci_scan_bus(*pacc);

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Bad rpath in cpupower with 4.0-rcX

2015-03-11 Thread Thomas Renninger

On Tuesday, March 10, 2015 11:30:44 PM Rafael J. Wysocki wrote:
> On Tuesday, March 10, 2015 08:33:54 AM Josh Boyer wrote:
> > On Fri, Mar 6, 2015 at 8:47 AM, Josh Boyer  
wrote:
> > > Hi All,
> > > 
> > > Commit 5c1de006e8e66 (cpupower Makefile change to help run the tool
> > > without 'make install') added an rpath to the cpupower binary.  From
> > > what I can understand, this is to make it easier to run cpupower from
> > > the local build directory without having to run make install.  It does
> > > accomplish that, but it also leaves the binary with the rpath in it
> > > which is considered bad practice.  It also causes cpupower to fail in
> > > rpmbuild with the following error:
> > > 
> > > ERROR   0004: file '/usr/bin/cpupower' contains an insecure rpath './'
> > > in [./]
> > > error: Bad exit status from /var/tmp/rpm-tmp.A6u26r (%install)
> > > 
> > > Bad exit status from /var/tmp/rpm-tmp.A6u26r (%install)
> > > 
> > > I understand the want for eased development, but couldn't people just
> > > set LD_LIBRARY_PATH instead?
> > 
> > No comments on this?  Should I just send a revert patch instead?

I agree adding . as library path was not a good idea.
This probably is also a potential security issue.

What I do is building the libraries statically into the binary when
testing. I will send a Makefile change introducing a static = true/false
variable to do that easily.

> You can do that.

Thanks for sending the revert, it's the right thing to do.

 Thomas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Bad rpath in cpupower with 4.0-rcX

2015-03-11 Thread Thomas Renninger

On Tuesday, March 10, 2015 11:30:44 PM Rafael J. Wysocki wrote:
 On Tuesday, March 10, 2015 08:33:54 AM Josh Boyer wrote:
  On Fri, Mar 6, 2015 at 8:47 AM, Josh Boyer jwbo...@fedoraproject.org 
wrote:
   Hi All,
   
   Commit 5c1de006e8e66 (cpupower Makefile change to help run the tool
   without 'make install') added an rpath to the cpupower binary.  From
   what I can understand, this is to make it easier to run cpupower from
   the local build directory without having to run make install.  It does
   accomplish that, but it also leaves the binary with the rpath in it
   which is considered bad practice.  It also causes cpupower to fail in
   rpmbuild with the following error:
   
   ERROR   0004: file '/usr/bin/cpupower' contains an insecure rpath './'
   in [./]
   error: Bad exit status from /var/tmp/rpm-tmp.A6u26r (%install)
   
   Bad exit status from /var/tmp/rpm-tmp.A6u26r (%install)
   
   I understand the want for eased development, but couldn't people just
   set LD_LIBRARY_PATH instead?
  
  No comments on this?  Should I just send a revert patch instead?

I agree adding . as library path was not a good idea.
This probably is also a potential security issue.

What I do is building the libraries statically into the binary when
testing. I will send a Makefile change introducing a static = true/false
variable to do that easily.

 You can do that.

Thanks for sending the revert, it's the right thing to do.

 Thomas
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] Fix no idle state information return value

2014-12-17 Thread Thomas Renninger

On Sunday, December 14, 2014 09:06:38 AM Prarit Bhargava wrote:
> sysfs_get_idlestate_count() returns an unsigned int.  Returning -ENODEV
> is not the right thing to do here, and in any case is handled the same
> way as if there are no states found.
Yep, returning zero states instead of an error code makes a lot sense
here.

> 
> Cc: Thomas Renninger 
> Cc: Rafael J. Wysocki 
> Signed-off-by: Prarit Bhargava 
Acked-by: Thomas Renninger 

Thanks!

Thomas

> ---
>  tools/power/cpupower/utils/helpers/sysfs.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tools/power/cpupower/utils/helpers/sysfs.c
> b/tools/power/cpupower/utils/helpers/sysfs.c index 09afe5d..4e8fe2c 100644
> --- a/tools/power/cpupower/utils/helpers/sysfs.c
> +++ b/tools/power/cpupower/utils/helpers/sysfs.c
> @@ -361,7 +361,7 @@ unsigned int sysfs_get_idlestate_count(unsigned int cpu)
> 
>   snprintf(file, SYSFS_PATH_MAX, PATH_TO_CPU "cpuidle");
>   if (stat(file, ) != 0 || !S_ISDIR(statbuf.st_mode))
> - return -ENODEV;
> + return 0;
> 
>   snprintf(file, SYSFS_PATH_MAX, PATH_TO_CPU "cpu%u/cpuidle/state0", 
cpu);
>   if (stat(file, ) != 0 || !S_ISDIR(statbuf.st_mode))

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] cpupower: Correctly detect if running as root

2014-12-17 Thread Thomas Renninger

On Sunday, December 14, 2014 10:29:24 PM Rafael J. Wysocki wrote:
> On Sunday, December 14, 2014 01:36:52 PM Michal Privoznik wrote:
> > Some operations, like frequency-set, need root privileges. However,
> > the way that this is detected is not correct. The getuid() is called,
> > while in fact geteuid() should be. This way we can allow
> > distributions or users to set SETUID flags on the cpupower binary if
> > they want to and let regular users change the cpu frequency governor.
> > 
> > Signed-off-by: Michal Privoznik 
> 
> An ACK from Thomas is needed.
Patch is correct, Thanks!

Acked-by: Thomas Renninger 

 Thomas
> 
> > ---
> > 
> >  tools/power/cpupower/utils/cpupower.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/tools/power/cpupower/utils/cpupower.c
> > b/tools/power/cpupower/utils/cpupower.c index 7cdcf88..9ea9143 100644
> > --- a/tools/power/cpupower/utils/cpupower.c
> > +++ b/tools/power/cpupower/utils/cpupower.c
> > @@ -199,7 +199,7 @@ int main(int argc, const char *argv[])
> > 
> > }
> > 
> > get_cpu_info(0, _cpu_info);
> > 
> > -   run_as_root = !getuid();
> > +   run_as_root = !geteuid();
> > 
> > if (run_as_root) {
> > 
> > ret = uname();
> > if (!ret && !strcmp(uts.machine, "x86_64") &&

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] cpupower: Correctly detect if running as root

2014-12-17 Thread Thomas Renninger

On Sunday, December 14, 2014 10:29:24 PM Rafael J. Wysocki wrote:
 On Sunday, December 14, 2014 01:36:52 PM Michal Privoznik wrote:
  Some operations, like frequency-set, need root privileges. However,
  the way that this is detected is not correct. The getuid() is called,
  while in fact geteuid() should be. This way we can allow
  distributions or users to set SETUID flags on the cpupower binary if
  they want to and let regular users change the cpu frequency governor.
  
  Signed-off-by: Michal Privoznik mpriv...@redhat.com
 
 An ACK from Thomas is needed.
Patch is correct, Thanks!

Acked-by: Thomas Renninger tr...@suse.de

 Thomas
 
  ---
  
   tools/power/cpupower/utils/cpupower.c | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)
  
  diff --git a/tools/power/cpupower/utils/cpupower.c
  b/tools/power/cpupower/utils/cpupower.c index 7cdcf88..9ea9143 100644
  --- a/tools/power/cpupower/utils/cpupower.c
  +++ b/tools/power/cpupower/utils/cpupower.c
  @@ -199,7 +199,7 @@ int main(int argc, const char *argv[])
  
  }
  
  get_cpu_info(0, cpupower_cpu_info);
  
  -   run_as_root = !getuid();
  +   run_as_root = !geteuid();
  
  if (run_as_root) {
  
  ret = uname(uts);
  if (!ret  !strcmp(uts.machine, x86_64) 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] Fix no idle state information return value

2014-12-17 Thread Thomas Renninger

On Sunday, December 14, 2014 09:06:38 AM Prarit Bhargava wrote:
 sysfs_get_idlestate_count() returns an unsigned int.  Returning -ENODEV
 is not the right thing to do here, and in any case is handled the same
 way as if there are no states found.
Yep, returning zero states instead of an error code makes a lot sense
here.

 Cc: Thomas Renninger tr...@suse.de
 Cc: Rafael J. Wysocki rafael.j.wyso...@intel.com
 Signed-off-by: Prarit Bhargava pra...@redhat.com
Acked-by: Thomas Renninger tr...@suse.de

Thanks!

Thomas

 ---
  tools/power/cpupower/utils/helpers/sysfs.c |2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/tools/power/cpupower/utils/helpers/sysfs.c
 b/tools/power/cpupower/utils/helpers/sysfs.c index 09afe5d..4e8fe2c 100644
 --- a/tools/power/cpupower/utils/helpers/sysfs.c
 +++ b/tools/power/cpupower/utils/helpers/sysfs.c
 @@ -361,7 +361,7 @@ unsigned int sysfs_get_idlestate_count(unsigned int cpu)

   snprintf(file, SYSFS_PATH_MAX, PATH_TO_CPU cpuidle);
   if (stat(file, statbuf) != 0 || !S_ISDIR(statbuf.st_mode))
 - return -ENODEV;
 + return 0;

   snprintf(file, SYSFS_PATH_MAX, PATH_TO_CPU cpu%u/cpuidle/state0, 
cpu);
   if (stat(file, statbuf) != 0 || !S_ISDIR(statbuf.st_mode))

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] tools, cpupower, fix return checks for sysfs_get_idlestate_count()

2014-12-03 Thread Thomas Renninger

On Monday, December 01, 2014 09:39:22 AM Prarit Bhargava wrote:
> Red Hat and Fedora use a bug reporting tool that gathers data about
> "broken" systems called sosreport.  Among other things, it includes the
> output of 'cpupower idle-info'.  Executing 'cpupower idle-info' on a
> system that has cpuidle disabled via 'cpuidle.off=1' results in a 300
> second hang in the cpupower application.
> 
> ie)
> [root@intel-brickland-05]# cpupower idle-info
> Could not determine cpuidle driver
> 
> Analyzing CPU 0:
> Number of idle states: -19
> [hang]
> 
> The problem is that the cpupower code only checks for a zero return from
> sysfs_get_idlestate_count().  The function can return -ENODEV (-19) as
> above.  This patch fixes callers to sysfs_get_idlestate_count() to check
> the right return values.
> 
> Cc: Thomas Renninger 
> Cc: linux...@vger.kernel.org
> Signed-off-by: Prarit Bhargava 

Signed-off-by: Thomas Renninger 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] tools, cpupower, fix return checks for sysfs_get_idlestate_count()

2014-12-03 Thread Thomas Renninger

On Monday, December 01, 2014 09:39:22 AM Prarit Bhargava wrote:
 Red Hat and Fedora use a bug reporting tool that gathers data about
 broken systems called sosreport.  Among other things, it includes the
 output of 'cpupower idle-info'.  Executing 'cpupower idle-info' on a
 system that has cpuidle disabled via 'cpuidle.off=1' results in a 300
 second hang in the cpupower application.
 
 ie)
 [root@intel-brickland-05]# cpupower idle-info
 Could not determine cpuidle driver
 
 Analyzing CPU 0:
 Number of idle states: -19
 [hang]
 
 The problem is that the cpupower code only checks for a zero return from
 sysfs_get_idlestate_count().  The function can return -ENODEV (-19) as
 above.  This patch fixes callers to sysfs_get_idlestate_count() to check
 the right return values.
 
 Cc: Thomas Renninger tr...@suse.de
 Cc: linux...@vger.kernel.org
 Signed-off-by: Prarit Bhargava pra...@redhat.com

Signed-off-by: Thomas Renninger tr...@suse.de

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 1/3] ipmi: Setup ipmi_devintf automatically if ipmi_msghandler gets loaded

2014-10-28 Thread Thomas Renninger

On Friday, October 24, 2014 12:40:56 PM Wilck, Martin wrote:
> Thomas,
> 
> 
> > > I vote for MODULE_SOFTDEP for upstream, and modalias for distros that
> > > don't support MODULE_SOFTDEP yet.
> > 
> > I vote for ipmi_msghandler.c renaming into ipmi_handler.c
> 
> 
> sorry, I am not with you here. This is a small problem that should be
> handled with small effort. Why apply a 3-part patch set for a problem
> that can be solved with 2 modalias lines in user space?

Because it is wrong.
In fact this is about a 10 line change.
The file move is no code change at all.
With a modalias added in userspace this will show up in different
distributions again and again.
I agree that a modalias makes sense for SLE and this is probably
what we will go for.
But it would be nice to have this fixed in mainline properly.

Thanks,

Thomas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 1/3] ipmi: Setup ipmi_devintf automatically if ipmi_msghandler gets loaded

2014-10-28 Thread Thomas Renninger

On Friday, October 24, 2014 12:40:56 PM Wilck, Martin wrote:
 Thomas,
 
 
   I vote for MODULE_SOFTDEP for upstream, and modalias for distros that
   don't support MODULE_SOFTDEP yet.
  
  I vote for ipmi_msghandler.c renaming into ipmi_handler.c
 
 
 sorry, I am not with you here. This is a small problem that should be
 handled with small effort. Why apply a 3-part patch set for a problem
 that can be solved with 2 modalias lines in user space?

Because it is wrong.
In fact this is about a 10 line change.
The file move is no code change at all.
With a modalias added in userspace this will show up in different
distributions again and again.
I agree that a modalias makes sense for SLE and this is probably
what we will go for.
But it would be nice to have this fixed in mainline properly.

Thanks,

Thomas
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 1/3] ipmi: Setup ipmi_devintf automatically if ipmi_msghandler gets loaded

2014-10-23 Thread Thomas Renninger

On Monday, October 20, 2014 10:28:53 AM Wilck, Martin wrote:
> On Fri, 2014-10-17 at 18:14 +0200, Corey Minyard wrote:
> 
> 
> > How about this.  I did a little research, and there's something called
> > soft module dependencies.  Apparently you can add:
> > 
> > MODULE_SOFTDEP("post: ipmi_devintf")
> > 
> > to ipmi_msghandler.c and modprobe would load ipmi_devintf after loading
> > ipmi_msghandler if it was available.

I do not like this approach for 2 reasons:
1) Same as Martin mentioned: Not avail on all distros yet (not double checked,
   but I trust you).
2) It's still wrong or say, far away from being as it should be:
   Instead of autoloading via userspace (this is what I expect happens here),
   ipmi_devintf should directly be compiled into ipmi_msghandler module
   and be available as soon as ipmi_msghandler is available.
   This saves the overhead of an unneeded additional module
   (memory and loading time overhead -> going through userspace, etc.).

> This is nice, but not commonly available in distro kernels so far.
Sorry, it's not that nice...
> AFAICS, out of the distros Fujitsu supports, only RHEL7 supports it.
Bad.
 
> I vote for MODULE_SOFTDEP for upstream, and modalias for distros that
> don't support MODULE_SOFTDEP yet.
I vote for ipmi_msghandler.c renaming into ipmi_handler.c
I looked around a bit. Renaming is not bad and happens often.
Documentation/laptops/hpfall.c
got renamed into:
Documentation/laptops/freefall.c
history is preserved by using:
git log --follow Documentation/laptops/freefall.c
git blame Documentation/laptops/freefall.c
works as well, showing modifications before the renaming.

I'll send a patchset.

Thanks,

   Thomas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 1/3] ipmi: Setup ipmi_devintf automatically if ipmi_msghandler gets loaded

2014-10-23 Thread Thomas Renninger

On Monday, October 20, 2014 10:28:53 AM Wilck, Martin wrote:
 On Fri, 2014-10-17 at 18:14 +0200, Corey Minyard wrote:
 
 
  How about this.  I did a little research, and there's something called
  soft module dependencies.  Apparently you can add:
  
  MODULE_SOFTDEP(post: ipmi_devintf)
  
  to ipmi_msghandler.c and modprobe would load ipmi_devintf after loading
  ipmi_msghandler if it was available.

I do not like this approach for 2 reasons:
1) Same as Martin mentioned: Not avail on all distros yet (not double checked,
   but I trust you).
2) It's still wrong or say, far away from being as it should be:
   Instead of autoloading via userspace (this is what I expect happens here),
   ipmi_devintf should directly be compiled into ipmi_msghandler module
   and be available as soon as ipmi_msghandler is available.
   This saves the overhead of an unneeded additional module
   (memory and loading time overhead - going through userspace, etc.).

 This is nice, but not commonly available in distro kernels so far.
Sorry, it's not that nice...
 AFAICS, out of the distros Fujitsu supports, only RHEL7 supports it.
Bad.
 
 I vote for MODULE_SOFTDEP for upstream, and modalias for distros that
 don't support MODULE_SOFTDEP yet.
I vote for ipmi_msghandler.c renaming into ipmi_handler.c
I looked around a bit. Renaming is not bad and happens often.
Documentation/laptops/hpfall.c
got renamed into:
Documentation/laptops/freefall.c
history is preserved by using:
git log --follow Documentation/laptops/freefall.c
git blame Documentation/laptops/freefall.c
works as well, showing modifications before the renaming.

I'll send a patchset.

Thanks,

   Thomas
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 2/3] ipmi: Remove ipmi_major module parameter and define global IPMI_MAJOR

2014-10-17 Thread Thomas Renninger

Hi,

On Tuesday, October 14, 2014 08:26:26 PM Corey Minyard wrote:
> Sorry to top post on this, but you attached the file, so it's hard to reply
> inline.
> 
> There's no way this can go in.  There's not enough major device numbers
> for all the devices that exist, we have mechanisms to handle dynamically
> assigning numbers, and the IPMI driver just isn't important enough to
> warrant it's own major device number.
Hm, I always get 248, on 2 different machines. Just luck?
If 248 is still free (and according to major.h it is), why not?

> The particular mechanism to pass in the major number was just a temporary
> measure.  It is no longer really necessary and could be removed.
Ok, if I read above right, you mean:
The module param to specify the major number can vanish.
But I must not try to invent a new MAJOR_IPMI definition in major.h.
Instead the whole thing can be dynamic (use MKDEV, remember the return
value and use it later to freeing again).

Thanks,

 Thomas

 
> -corey
> 
> On 10/14/2014 09:40 AM, tr...@suse.de wrote:
> 
> There should be no need to specify the major number of /dev/ipmiX via module
> parameter. Others do not need this as well.
> This is the way msr.c (/dev/cpu/X/msr) is doing it as well.
> 
> Side effect of this patch will be that the userspace interface cannot
> be disabled at kernel level anymore. But this can also be enforced by not
> letting userspace create the device file /dev/ipmiX.
> 
> Signed-off-by: Thomas Renninger 
> CC: miny...@acm.org
> 
> Index: kernel_ipmi/drivers/char/ipmi/ipmi_devintf.c
> ===
> --- kernel_ipmi.orig/drivers/char/ipmi/ipmi_devintf.c
> +++ kernel_ipmi/drivers/char/ipmi/ipmi_devintf.c
> @@ -39,6 +39,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -867,14 +868,6 @@ static const struct file_operations ipmi
> 
>  #define DEVICE_NAME "ipmidev"
> 
> -static int ipmi_major;
> -module_param(ipmi_major, int, 0);
> -MODULE_PARM_DESC(ipmi_major, "Sets the major number of the IPMI device. 
> By" -  " default, or if you set it to zero, it will choose the next"
> -  " available device.  Setting it to -1 will disable the"
> -  " interface.  Other values will set the major device number"
> -  " to that value.");
> -
>  /* Keep track of the devices that are registered. */
>  struct ipmi_reg_list {
>   dev_tdev;
> @@ -887,7 +880,7 @@ static struct class *ipmi_class;
> 
>  static void ipmi_new_smi(int if_num, struct device *device)
>  {
> - dev_t dev = MKDEV(ipmi_major, if_num);
> + dev_t dev = MKDEV(IPMI_MAJOR, if_num);
>   struct ipmi_reg_list *entry;
> 
>   entry = kmalloc(sizeof(*entry), GFP_KERNEL);
> @@ -906,7 +899,7 @@ static void ipmi_new_smi(int if_num, str
> 
>  static void ipmi_smi_gone(int if_num)
>  {
> - dev_t dev = MKDEV(ipmi_major, if_num);
> + dev_t dev = MKDEV(IPMI_MAJOR, if_num);
>   struct ipmi_reg_list *entry;
> 
>   mutex_lock(_list_mutex);
> @@ -932,9 +925,6 @@ int __init init_ipmi_devintf(void)
>  {
>   int rv;
> 
> - if (ipmi_major < 0)
> - return -EINVAL;
> -
>   printk(KERN_INFO "ipmi device interface\n");
> 
>   ipmi_class = class_create(THIS_MODULE, "ipmi");
> @@ -943,20 +933,16 @@ int __init init_ipmi_devintf(void)
>   return PTR_ERR(ipmi_class);
>   }
> 
> - rv = register_chrdev(ipmi_major, DEVICE_NAME, _fops);
> + rv = register_chrdev(IPMI_MAJOR, DEVICE_NAME, _fops);
>   if (rv < 0) {
>   class_destroy(ipmi_class);
> - printk(KERN_ERR "ipmi: can't get major %d\n", ipmi_major);
> + printk(KERN_ERR "ipmi: can't get major %d\n", IPMI_MAJOR);
>   return rv;
>   }
> 
> - if (ipmi_major == 0) {
> - ipmi_major = rv;
> - }
> -
>   rv = ipmi_smi_watcher_register(_watcher);
>   if (rv) {
> - unregister_chrdev(ipmi_major, DEVICE_NAME);
> + unregister_chrdev(IPMI_MAJOR, DEVICE_NAME);
>   class_destroy(ipmi_class);
>   printk(KERN_WARNING "ipmi: can't register smi watcher\n");
>   return rv;
> @@ -977,5 +963,5 @@ void cleanup_ipmi_dev(void)
>   mutex_unlock(_list_mutex);
>   class_destroy(ipmi_class);
>   ipmi_smi_watcher_unregister(_watcher);
> - unregister_chrdev(ipmi_major, DEVICE_NAME);
> + unregister_chrdev(IPMI_MAJOR, DEVICE_NAME);
>  }
> Index: kernel_ipmi/include/uapi/linu

Re: [patch 1/3] ipmi: Setup ipmi_devintf automatically if ipmi_msghandler gets loaded

2014-10-17 Thread Thomas Renninger

Hi,

On Tuesday, October 14, 2014 08:22:20 PM Corey Minyard wrote:
> On 10/14/2014 09:40 AM, tr...@suse.de wrote:
> > This removes the ipmi_devintf to be a module, but it will automatically
> > compiled in if ipmi_msghandler is set.
> > 
> > ipmi_msghandler module is renamed to ipmi_handler because of the name
> > clash with the ipmi_msghandler.o object file (see Makefile for details).
> > Alternatively ipmi_msghandler.c could be renamed to ipmi_handler.c, but
> > not polluting git history should be more of an advantage than module
> > renaming.
> > 
> > cleanup_ipmi_dev() and init_ipmi_devintf() implemented in ipmi_devintf.c
> > are are simply declared ipmi_msghandler.c without creating a separate
> > header file which looks more reasonable to me. Hope that is ok.
> 
> One minor style issue: the function definitions should really be in a .h
> file.
Ok.

> Renaming the module is also probably a bad idea.  If this was to go in,
> it would certainly need to keep the ipmi_msghandler.ko name to avoid
> complete breakage all over the place.
? What should break?
It should never be needed to load ipmi_msghandler manually.
Even if autoloading does not work, loading the low level
serving driver (e.g. ipmi_si) is enough. This one will request
ipmi_(msg)handler.ko automatically.
I excpect this is the same for all other ipmi HW serving drivers, right?

> In that vein, I am worried that this would just result in a lot of work
> for all the distros that have set up this already.
We also loaded the ipmi driver manually via the OpenIPMI init service
which simply tried to load ipmi drivers.
This results in boot load time, resource and code overhead.
As ipmi autoload seem to work fine for recent HW these days, the userspace
interface should load as well, otherwise the whole autoloading is more
or less useless.

> I'm trying to see the pros and cons of
> this, and I can't see that the pros outweigh the cons.  Do you have people
> that have issues with this?

Yes. In fact Fujitsu opened a bug for this. Unfortunately I cannot set it
public as it is against SLES12. If I find a way, I'll reference it in
the changelog:
https://bugzilla.novell.com/show_bug.cgi?id=893181
Subject:
ipmi_devintf is not available when IPMI device is detected

I already added Martin to CC: of the patch changelog, but quilt may
not have recogized it and not added him to CC?

Beside Fujitsu, I'd like to have this fixed as well.
We adjust BMC settings via after install script on our servers via
ipmitool. This does not work because of this bug.
One has to:
modprobe ipmi_devintf
This is overhead and makes the whole autoloading mechanism useless.
And even worse, ipmi_devintf seem to load on all machines whether they
have an ipmi controller or not.


> > In fact there already was a kind of autoloading mechanism (gets deleted
> > with this patch). I interpret this line that ipmi_devintf should get
> > autoloaded if ipmi_si gets loaded?:
> > -MODULE_ALIAS("platform:ipmi_si");
> > 
> > But:
> >   - It's wrong: There are other low lever drivers which can be used by
> >   
> > ipmi_devintf
> >   
> >   - It does not work (anymore?)
> >   - There is no need to keep ipmi_devintf as a module (resource and load
> >   time
> >   
> > overhead)
> 
> This change is certainly a good idea.  Whatever it was trying to do is
> wrong.

Thanks. I am convinced that the right way to go for is to integrate the 
ipmi_devintf into ipmi_msghandler.

I see 3 options how to do this:
  - rename ipmi_msghandler.ko to ipmi_handler.ko
As this only is an interface provider for other modules getting
loaded/requested automatically and never needs to be loaded manually
(pls correct me if I am wrong), I would like to go for this option.

  - rename ipmi_msghandler.c to ipmi_handler.c
-> git history lost. But should still be seen via:
git log --follow?

  - Someone finds another way how to modify the Makefile to achieve above.

Thanks,

 Thomas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 1/3] ipmi: Setup ipmi_devintf automatically if ipmi_msghandler gets loaded

2014-10-17 Thread Thomas Renninger

Hi,

On Tuesday, October 14, 2014 08:22:20 PM Corey Minyard wrote:
 On 10/14/2014 09:40 AM, tr...@suse.de wrote:
  This removes the ipmi_devintf to be a module, but it will automatically
  compiled in if ipmi_msghandler is set.
  
  ipmi_msghandler module is renamed to ipmi_handler because of the name
  clash with the ipmi_msghandler.o object file (see Makefile for details).
  Alternatively ipmi_msghandler.c could be renamed to ipmi_handler.c, but
  not polluting git history should be more of an advantage than module
  renaming.
  
  cleanup_ipmi_dev() and init_ipmi_devintf() implemented in ipmi_devintf.c
  are are simply declared ipmi_msghandler.c without creating a separate
  header file which looks more reasonable to me. Hope that is ok.
 
 One minor style issue: the function definitions should really be in a .h
 file.
Ok.

 Renaming the module is also probably a bad idea.  If this was to go in,
 it would certainly need to keep the ipmi_msghandler.ko name to avoid
 complete breakage all over the place.
? What should break?
It should never be needed to load ipmi_msghandler manually.
Even if autoloading does not work, loading the low level
serving driver (e.g. ipmi_si) is enough. This one will request
ipmi_(msg)handler.ko automatically.
I excpect this is the same for all other ipmi HW serving drivers, right?

 In that vein, I am worried that this would just result in a lot of work
 for all the distros that have set up this already.
We also loaded the ipmi driver manually via the OpenIPMI init service
which simply tried to load ipmi drivers.
This results in boot load time, resource and code overhead.
As ipmi autoload seem to work fine for recent HW these days, the userspace
interface should load as well, otherwise the whole autoloading is more
or less useless.

 I'm trying to see the pros and cons of
 this, and I can't see that the pros outweigh the cons.  Do you have people
 that have issues with this?

Yes. In fact Fujitsu opened a bug for this. Unfortunately I cannot set it
public as it is against SLES12. If I find a way, I'll reference it in
the changelog:
https://bugzilla.novell.com/show_bug.cgi?id=893181
Subject:
ipmi_devintf is not available when IPMI device is detected

I already added Martin to CC: of the patch changelog, but quilt may
not have recogized it and not added him to CC?

Beside Fujitsu, I'd like to have this fixed as well.
We adjust BMC settings via after install script on our servers via
ipmitool. This does not work because of this bug.
One has to:
modprobe ipmi_devintf
This is overhead and makes the whole autoloading mechanism useless.
And even worse, ipmi_devintf seem to load on all machines whether they
have an ipmi controller or not.


  In fact there already was a kind of autoloading mechanism (gets deleted
  with this patch). I interpret this line that ipmi_devintf should get
  autoloaded if ipmi_si gets loaded?:
  -MODULE_ALIAS(platform:ipmi_si);
  
  But:
- It's wrong: There are other low lever drivers which can be used by

  ipmi_devintf

- It does not work (anymore?)
- There is no need to keep ipmi_devintf as a module (resource and load
time

  overhead)
 
 This change is certainly a good idea.  Whatever it was trying to do is
 wrong.

Thanks. I am convinced that the right way to go for is to integrate the 
ipmi_devintf into ipmi_msghandler.

I see 3 options how to do this:
  - rename ipmi_msghandler.ko to ipmi_handler.ko
As this only is an interface provider for other modules getting
loaded/requested automatically and never needs to be loaded manually
(pls correct me if I am wrong), I would like to go for this option.

  - rename ipmi_msghandler.c to ipmi_handler.c
- git history lost. But should still be seen via:
git log --follow?

  - Someone finds another way how to modify the Makefile to achieve above.

Thanks,

 Thomas
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 2/3] ipmi: Remove ipmi_major module parameter and define global IPMI_MAJOR

2014-10-17 Thread Thomas Renninger

Hi,

On Tuesday, October 14, 2014 08:26:26 PM Corey Minyard wrote:
 Sorry to top post on this, but you attached the file, so it's hard to reply
 inline.
 
 There's no way this can go in.  There's not enough major device numbers
 for all the devices that exist, we have mechanisms to handle dynamically
 assigning numbers, and the IPMI driver just isn't important enough to
 warrant it's own major device number.
Hm, I always get 248, on 2 different machines. Just luck?
If 248 is still free (and according to major.h it is), why not?

 The particular mechanism to pass in the major number was just a temporary
 measure.  It is no longer really necessary and could be removed.
Ok, if I read above right, you mean:
The module param to specify the major number can vanish.
But I must not try to invent a new MAJOR_IPMI definition in major.h.
Instead the whole thing can be dynamic (use MKDEV, remember the return
value and use it later to freeing again).

Thanks,

 Thomas

 
 -corey
 
 On 10/14/2014 09:40 AM, tr...@suse.de wrote:
 
 There should be no need to specify the major number of /dev/ipmiX via module
 parameter. Others do not need this as well.
 This is the way msr.c (/dev/cpu/X/msr) is doing it as well.
 
 Side effect of this patch will be that the userspace interface cannot
 be disabled at kernel level anymore. But this can also be enforced by not
 letting userspace create the device file /dev/ipmiX.
 
 Signed-off-by: Thomas Renninger tr...@suse.de
 CC: miny...@acm.org
 
 Index: kernel_ipmi/drivers/char/ipmi/ipmi_devintf.c
 ===
 --- kernel_ipmi.orig/drivers/char/ipmi/ipmi_devintf.c
 +++ kernel_ipmi/drivers/char/ipmi/ipmi_devintf.c
 @@ -39,6 +39,7 @@
  #include linux/spinlock.h
  #include linux/slab.h
  #include linux/ipmi.h
 +#include linux/major.h
  #include linux/mutex.h
  #include linux/init.h
  #include linux/device.h
 @@ -867,14 +868,6 @@ static const struct file_operations ipmi
 
  #define DEVICE_NAME ipmidev
 
 -static int ipmi_major;
 -module_param(ipmi_major, int, 0);
 -MODULE_PARM_DESC(ipmi_major, Sets the major number of the IPMI device. 
 By -   default, or if you set it to zero, it will choose the next
 -   available device.  Setting it to -1 will disable the
 -   interface.  Other values will set the major device number
 -   to that value.);
 -
  /* Keep track of the devices that are registered. */
  struct ipmi_reg_list {
   dev_tdev;
 @@ -887,7 +880,7 @@ static struct class *ipmi_class;
 
  static void ipmi_new_smi(int if_num, struct device *device)
  {
 - dev_t dev = MKDEV(ipmi_major, if_num);
 + dev_t dev = MKDEV(IPMI_MAJOR, if_num);
   struct ipmi_reg_list *entry;
 
   entry = kmalloc(sizeof(*entry), GFP_KERNEL);
 @@ -906,7 +899,7 @@ static void ipmi_new_smi(int if_num, str
 
  static void ipmi_smi_gone(int if_num)
  {
 - dev_t dev = MKDEV(ipmi_major, if_num);
 + dev_t dev = MKDEV(IPMI_MAJOR, if_num);
   struct ipmi_reg_list *entry;
 
   mutex_lock(reg_list_mutex);
 @@ -932,9 +925,6 @@ int __init init_ipmi_devintf(void)
  {
   int rv;
 
 - if (ipmi_major  0)
 - return -EINVAL;
 -
   printk(KERN_INFO ipmi device interface\n);
 
   ipmi_class = class_create(THIS_MODULE, ipmi);
 @@ -943,20 +933,16 @@ int __init init_ipmi_devintf(void)
   return PTR_ERR(ipmi_class);
   }
 
 - rv = register_chrdev(ipmi_major, DEVICE_NAME, ipmi_fops);
 + rv = register_chrdev(IPMI_MAJOR, DEVICE_NAME, ipmi_fops);
   if (rv  0) {
   class_destroy(ipmi_class);
 - printk(KERN_ERR ipmi: can't get major %d\n, ipmi_major);
 + printk(KERN_ERR ipmi: can't get major %d\n, IPMI_MAJOR);
   return rv;
   }
 
 - if (ipmi_major == 0) {
 - ipmi_major = rv;
 - }
 -
   rv = ipmi_smi_watcher_register(smi_watcher);
   if (rv) {
 - unregister_chrdev(ipmi_major, DEVICE_NAME);
 + unregister_chrdev(IPMI_MAJOR, DEVICE_NAME);
   class_destroy(ipmi_class);
   printk(KERN_WARNING ipmi: can't register smi watcher\n);
   return rv;
 @@ -977,5 +963,5 @@ void cleanup_ipmi_dev(void)
   mutex_unlock(reg_list_mutex);
   class_destroy(ipmi_class);
   ipmi_smi_watcher_unregister(smi_watcher);
 - unregister_chrdev(ipmi_major, DEVICE_NAME);
 + unregister_chrdev(IPMI_MAJOR, DEVICE_NAME);
  }
 Index: kernel_ipmi/include/uapi/linux/major.h
 ===
 --- kernel_ipmi.orig/include/uapi/linux/major.h
 +++ kernel_ipmi/include/uapi/linux/major.h
 @@ -173,6 +173,8 @@
 
  #define VIOTAPE_MAJOR230
 
 +#define IPMI_MAJOR  248
 +
  #define BLOCK_EXT_MAJOR  259
  #define SCSI_OSD_MAJOR   260 /* open-osd's OSD scsi device 
*/

--
To unsubscribe from this list: send the line

Re: [PATCH 2/4] ACPICA: Introduce new acpi_os_physical_table_add OS callback

2014-03-05 Thread Thomas Renninger

On Wednesday, March 05, 2014 06:26:54 AM Zheng, Lv wrote:
... 
> > > After the table manager cleanups are tested and shipped in the ACPICA
> > > repo,
> > > the new facilities will automatically be rolled into Linux branches.
> > 
> > I'd suggest to just wait for that.
> > Best already try to integrate the ACPI table override/add part as you
> > think
> > it should work without additional changes in drivers/acpi/acpica.
> > 
> > If this happened and things are submitted to get integrated into the Linux
> > kernel, please add me to CC or point me to the patchset.
> 
> Fortunately, there is a kernel bugzilla entry requires this series.
> I've posted it on the kernel bugzilla.
> Here is the patchset:
> https://bugzilla.kernel.org/show_bug.cgi?id=69711
> You need to apply 9 patches:
> attachment 128061 - attachment 128141
> The last patch has introduced an early boot API:
> acpi_install_table.
> This can be used to enhance this series.

Ok, thanks.
Be aware that I am super busy right now. This week is impossible,
I cannot promise when I will have time for that.

  Thomas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/4] ACPICA: Introduce new acpi_os_physical_table_add OS callback

2014-03-05 Thread Thomas Renninger

On Wednesday, March 05, 2014 06:26:54 AM Zheng, Lv wrote:
... 
   After the table manager cleanups are tested and shipped in the ACPICA
   repo,
   the new facilities will automatically be rolled into Linux branches.
  
  I'd suggest to just wait for that.
  Best already try to integrate the ACPI table override/add part as you
  think
  it should work without additional changes in drivers/acpi/acpica.
  
  If this happened and things are submitted to get integrated into the Linux
  kernel, please add me to CC or point me to the patchset.
 
 Fortunately, there is a kernel bugzilla entry requires this series.
 I've posted it on the kernel bugzilla.
 Here is the patchset:
 https://bugzilla.kernel.org/show_bug.cgi?id=69711
 You need to apply 9 patches:
 attachment 128061 - attachment 128141
 The last patch has introduced an early boot API:
 acpi_install_table.
 This can be used to enhance this series.

Ok, thanks.
Be aware that I am super busy right now. This week is impossible,
I cannot promise when I will have time for that.

  Thomas
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/4] ACPICA: Introduce new acpi_os_physical_table_add OS callback

2014-03-04 Thread Thomas Renninger

On Tuesday, March 04, 2014 12:31:57 AM Zheng, Lv wrote:
> Hi,  Thomas
> 
> > From: Thomas Renninger [mailto:tr...@suse.de]
> > Sent: Monday, March 03, 2014 8:42 PM
> > 
> > Hi Lv,
> > 
> > On Monday, March 03, 2014 01:20:31 AM Zheng, Lv wrote:
> > > Hi, Thomas
> > > 
> > > I have a patch series that can cleanup the ACPICA table manager, and
> > > change
> > 
> > > the acpi_load_table into the following style:
> > Ok. I suggest that:
> > 1) If Thomas (Gleixner) or whoever wants to try out or needs it urgently,
> > he> 
> >can (must) use a recent kernel with my patches applied.
> > 
> > 2) You continue to get your changes into ACPICA.
> > 
> >Eventually or best would be if you add whatever is needed to
> >allow adding of tables as well (which will be there automatically if
> >I understood the description of your changes correctly).
> > 
> > 3) Either you give it a try yourself or give me short description for
> > 
> >what I have to look out for and I can re-post the Linux patches
> >based on your ACPICA changes, once they show up in the Linux kernel.
> >Best give me a ping as soon as I should look at it.
> 
> That sounds good.
> 
> Or it can be more efficient for productions:
> Linux can merge your patches and ACPICA just stop to take them.
You mean add my stuff to drivers/acpi/acpica in Linux kernel without
pushing them into the ACPICA repository?

I cannot remember that this ever happened (beside small important fixes)
and I doubt Rafael is willing to do that.
If it would be super critical, but I cannot see that it is.

> This will leave us divergences.
Yes, that would be bad.

> After the table manager cleanups are tested and shipped in the ACPICA repo,
> the new facilities will automatically be rolled into Linux branches.
I'd suggest to just wait for that.
Best already try to integrate the ACPI table override/add part as you think
it should work without additional changes in drivers/acpi/acpica.


If this happened and things are submitted to get integrated into the Linux
kernel, please add me to CC or point me to the patchset.

> Then I
> can help to reduce the divergences using the new ACPICA facilities. At that
> time I may ask whoever that can test to offer help to review the cleanup
> patch.

I can then give your new patcheset some testing and try to get the Linux
(drivers/acpi/osl.c) parts (re-)implemented based on your stuff.
There might still be the one or other minor fix needed that has to go
into acpica as well, but that should not be that hard to manage and
might end up in acpica and Linux kernel in parallel without much
extra overhead.

 Thomas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/4] ACPICA: Introduce new acpi_os_physical_table_add OS callback

2014-03-04 Thread Thomas Renninger

On Tuesday, March 04, 2014 12:31:57 AM Zheng, Lv wrote:
 Hi,  Thomas
 
  From: Thomas Renninger [mailto:tr...@suse.de]
  Sent: Monday, March 03, 2014 8:42 PM
  
  Hi Lv,
  
  On Monday, March 03, 2014 01:20:31 AM Zheng, Lv wrote:
   Hi, Thomas
   
   I have a patch series that can cleanup the ACPICA table manager, and
   change
  
   the acpi_load_table into the following style:
  Ok. I suggest that:
  1) If Thomas (Gleixner) or whoever wants to try out or needs it urgently,
  he 
 can (must) use a recent kernel with my patches applied.
  
  2) You continue to get your changes into ACPICA.
  
 Eventually or best would be if you add whatever is needed to
 allow adding of tables as well (which will be there automatically if
 I understood the description of your changes correctly).
  
  3) Either you give it a try yourself or give me short description for
  
 what I have to look out for and I can re-post the Linux patches
 based on your ACPICA changes, once they show up in the Linux kernel.
 Best give me a ping as soon as I should look at it.
 
 That sounds good.
 
 Or it can be more efficient for productions:
 Linux can merge your patches and ACPICA just stop to take them.
You mean add my stuff to drivers/acpi/acpica in Linux kernel without
pushing them into the ACPICA repository?

I cannot remember that this ever happened (beside small important fixes)
and I doubt Rafael is willing to do that.
If it would be super critical, but I cannot see that it is.

 This will leave us divergences.
Yes, that would be bad.

 After the table manager cleanups are tested and shipped in the ACPICA repo,
 the new facilities will automatically be rolled into Linux branches.
I'd suggest to just wait for that.
Best already try to integrate the ACPI table override/add part as you think
it should work without additional changes in drivers/acpi/acpica.


If this happened and things are submitted to get integrated into the Linux
kernel, please add me to CC or point me to the patchset.

 Then I
 can help to reduce the divergences using the new ACPICA facilities. At that
 time I may ask whoever that can test to offer help to review the cleanup
 patch.

I can then give your new patcheset some testing and try to get the Linux
(drivers/acpi/osl.c) parts (re-)implemented based on your stuff.
There might still be the one or other minor fix needed that has to go
into acpica as well, but that should not be that hard to manage and
might end up in acpica and Linux kernel in parallel without much
extra overhead.

 Thomas
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/4] ACPICA: Introduce new acpi_os_physical_table_add OS callback

2014-03-03 Thread Thomas Renninger

Hi Lv,

On Monday, March 03, 2014 01:20:31 AM Zheng, Lv wrote:
> Hi, Thomas
> 
> I have a patch series that can cleanup the ACPICA table manager, and change
> the acpi_load_table into the following style:

Ok. I suggest that:
1) If Thomas (Gleixner) or whoever wants to try out or needs it urgently, he
   can (must) use a recent kernel with my patches applied.
2) You continue to get your changes into ACPICA.
   Eventually or best would be if you add whatever is needed to
   allow adding of tables as well (which will be there automatically if
   I understood the description of your changes correctly).
3) Either you give it a try yourself or give me short description for 
   what I have to look out for and I can re-post the Linux patches
   based on your ACPICA changes, once they show up in the Linux kernel.
   Best give me a ping as soon as I should look at it.

Thanks!

  Thomas

PS: You might want to add the BGRT definition in ACPICA already, the
tiny one line-liner...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/4] ACPICA: Introduce new acpi_os_physical_table_add OS callback

2014-03-03 Thread Thomas Renninger

Hi Lv,

On Monday, March 03, 2014 01:20:31 AM Zheng, Lv wrote:
 Hi, Thomas
 
 I have a patch series that can cleanup the ACPICA table manager, and change
 the acpi_load_table into the following style:

Ok. I suggest that:
1) If Thomas (Gleixner) or whoever wants to try out or needs it urgently, he
   can (must) use a recent kernel with my patches applied.
2) You continue to get your changes into ACPICA.
   Eventually or best would be if you add whatever is needed to
   allow adding of tables as well (which will be there automatically if
   I understood the description of your changes correctly).
3) Either you give it a try yourself or give me short description for 
   what I have to look out for and I can re-post the Linux patches
   based on your ACPICA changes, once they show up in the Linux kernel.
   Best give me a ping as soon as I should look at it.

Thanks!

  Thomas

PS: You might want to add the BGRT definition in ACPICA already, the
tiny one line-liner...
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/4] ACPI: Provide support for ACPI table adding via OS

2014-02-28 Thread Thomas Renninger

Latest changes are compile tested only!

If this gets serialized/merged and accepted in acpica in
some form with whatever other stuff currently added,
please drop me a mail.

I can then submit the Linux parts again to the kernel people
with the documentation adjusted as well:
Documentation/acpi/initrd_table_override.txt

Thanks,

 Thomas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 4/4] ACPI: Add new table signatures that can be overridden/added.

2014-02-28 Thread Thomas Renninger

Signed-off-by: Thomas Renninger 
CC: h...@zytor.com
CC: t...@linutronix.de
CC: c...@conrad-kostecki.de
CC: linux-kernel@vger.kernel.org
CC: x...@kernel.org
CC: mi...@redhat.com
CC: r...@rjwysocki.net
CC: de...@acpica.org
---
 drivers/acpi/osl.c |4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index 7f9a865..23ce5b1 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -38,6 +38,7 @@
 #include 
 #include 
 #include 
+# define ACPI_UNDEFINED_TABLES 1
 #include 
 #include 
 #include 
@@ -559,7 +560,8 @@ static const char * const table_sigs[] = {
ACPI_SIG_IBFT, ACPI_SIG_IVRS, ACPI_SIG_MCFG, ACPI_SIG_MCHI,
ACPI_SIG_SLIC, ACPI_SIG_SPCR, ACPI_SIG_SPMI, ACPI_SIG_TCPA,
ACPI_SIG_UEFI, ACPI_SIG_WAET, ACPI_SIG_WDAT, ACPI_SIG_WDDT,
-   ACPI_SIG_WDRT, ACPI_SIG_DSDT, ACPI_SIG_FADT, ACPI_SIG_PSDT,
+   ACPI_SIG_WDRT, ACPI_SIG_BGRT, ACPI_SIG_ATKG, ACPI_SIG_GSCI,
+   ACPI_SIG_IEIT, ACPI_SIG_DSDT, ACPI_SIG_FADT, ACPI_SIG_PSDT,
ACPI_SIG_RSDT, ACPI_SIG_XSDT, ACPI_SIG_SSDT, NULL };
 
 #define ACPI_HEADER_SIZE sizeof(struct acpi_table_header)
-- 
1.7.6.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/4] ACPICA: Add BGRT signature to known signatures

2014-02-28 Thread Thomas Renninger

In Linux there even exists a driver already making use of this table:
drivers/acpi/bgrt.c:MODULE_DESCRIPTION("BGRT boot graphic support");

Signed-off-by: Thomas Renninger 
CC: h...@zytor.com
CC: t...@linutronix.de
CC: c...@conrad-kostecki.de
CC: linux-kernel@vger.kernel.org
CC: x...@kernel.org
CC: mi...@redhat.com
CC: r...@rjwysocki.net
CC: de...@acpica.org
---
 include/acpi/actbl2.h |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/include/acpi/actbl2.h b/include/acpi/actbl2.h
index 094a906..9ed1c20 100644
--- a/include/acpi/actbl2.h
+++ b/include/acpi/actbl2.h
@@ -83,6 +83,7 @@
 #define ACPI_SIG_WDAT   "WDAT" /* Watchdog Action Table */
 #define ACPI_SIG_WDDT   "WDDT" /* Watchdog Timer Description Table */
 #define ACPI_SIG_WDRT   "WDRT" /* Watchdog Resource Table */
+#define ACPI_SIG_BGRT   "BGRT"  /* Boot Graphic Support */
 
 #ifdef ACPI_UNDEFINED_TABLES
 /*
-- 
1.7.6.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/4] ACPI: Provide support for ACPI table adding via OS

2014-02-28 Thread Thomas Renninger

This is done the same way as the previous ACPI physical table override
mechanism.
How to override or add tables via initrd, please look up:
Documentation/acpi/initrd_table_override.txt

SSDTs can only be overridden, not added.

Overriding only happens if the OEM id of the table header matches the one
with the BIOS provided one.
All table types (SSDTs are an exception), must only show up once.
So either you:
   - Add a fresh new table for adding of which type (signature) none exists
 in the BIOS  -> OS ACPI table adding happens.
or
   - Add a table which already exists in BIOS, but the OEM id must match the
 one of the table of the same type (signature) that exists in BIOS already
 -> OS ACPI table overriding happens
 Typically one copies away the original ACPI table, disassembles,
 modifies (for example adding debug strings), compiles it and provides
 the table via initrd for overriding (will have the same OEM id).
 But this is not necessary, one could also come up with a selfmade
 table for overriding, by taking care that the signature and OEM id is
 the same as the one provided by BIOS

In ACPI table overriding case you see in dmesg:
   ACPI: Override [DSDT-  BXDSDT], this is unsafe: tainting kernel
   Disabling lock debugging due to kernel taint

In ACPI table adding case you see in dmesg (BGRT table got added):
   ACPI: Add [BGRT-SLIC-WKS], this is unsafe: tainting kernel
   ACPI: BGRT 7fffd1ba 38 (v00 HPQOEM SLIC-WKS 01072009
 INTL 20130823)

Signed-off-by: Thomas Renninger 
CC: h...@zytor.com
CC: t...@linutronix.de
CC: c...@conrad-kostecki.de
CC: linux-kernel@vger.kernel.org
CC: x...@kernel.org
CC: mi...@redhat.com
CC: r...@rjwysocki.net
CC: de...@acpica.org
---
 drivers/acpi/osl.c |   91 +++-
 1 files changed, 90 insertions(+), 1 deletions(-)

diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index fc1aa79..7f9a865 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -566,6 +566,8 @@ static const char * const table_sigs[] = {
 
 #define ACPI_OVERRIDE_TABLES 64
 static struct cpio_data __initdata acpi_initrd_files[ACPI_OVERRIDE_TABLES];
+/* Remember physical address of overriden or added tables */
+static acpi_physical_address acpi_table_overridden[ACPI_OVERRIDE_TABLES];
 
 #define MAP_CHUNK_SIZE   (NR_FIX_BTMAPS << PAGE_SHIFT)
 
@@ -715,7 +717,7 @@ acpi_os_physical_table_override(struct acpi_table_header 
*existing_table,
*address = 0;
return AE_OK;
 #else
-   int table_offset = 0;
+   int no, table_offset = 0;
struct acpi_table_header *table;
 
*table_length = 0;
@@ -759,6 +761,12 @@ acpi_os_physical_table_override(struct acpi_table_header 
*existing_table,
*table_length = table->length;
acpi_os_unmap_memory(table, ACPI_HEADER_SIZE);
*address = acpi_tables_addr + table_offset;
+   for (no = 0; no < ACPI_OVERRIDE_TABLES; no++) {
+   if (acpi_table_overridden[no] == 0) {
+   acpi_table_overridden[no] = *address;
+   break;
+   }
+   }
break;
} while (table_offset + ACPI_HEADER_SIZE < all_tables_size);
 
@@ -768,6 +776,87 @@ acpi_os_physical_table_override(struct acpi_table_header 
*existing_table,
 #endif
 }
 
+acpi_status
+acpi_os_physical_table_add(acpi_physical_address *address,
+  u32 *table_length)
+{
+#ifndef CONFIG_ACPI_INITRD_TABLE_OVERRIDE
+   *table_length = 0;
+   *address = 0;
+   return AE_OK;
+#else
+   int no, table_offset = 0;
+   struct acpi_table_header *table;
+
+   *table_length = 0;
+   *address = 0;
+
+   if (!acpi_tables_addr)
+   return AE_OK;
+
+   do {
+   if (table_offset + ACPI_HEADER_SIZE > all_tables_size) {
+   WARN_ON(1);
+   return AE_OK;
+   }
+
+   table = acpi_os_map_memory(acpi_tables_addr + table_offset,
+  ACPI_HEADER_SIZE);
+
+   if (table_offset + table->length > all_tables_size) {
+   acpi_os_unmap_memory(table, ACPI_HEADER_SIZE);
+   WARN_ON(1);
+   return AE_OK;
+   }
+
+   table_offset += table->length;
+
+   /* Do not add SSDTs, only if acpi_no_auto_ssdt boot parameter
+  got set.
+   */
+   if (acpi_gbl_disable_ssdt_table_load == FALSE &&
+   !memcmp("SSDT", table->signature, 4)) {
+   pr_info("Will not add SSDT [%8.8s], use 
acpi_no_auto_ssdt to force\n", table->oem_table_id);
+   acpi_os_unmap_memory(table,
+

[PATCH 2/4] ACPICA: Introduce new acpi_os_physical_table_add OS callback

2014-02-28 Thread Thomas Renninger

This one allows OS to add arbitrary ACPI tables.

ToDo: It should get checked whether a table with the same signature already
exists and if this is the case, adding should not happen.

Signed-off-by: Thomas Renninger 
CC: h...@zytor.com
CC: t...@linutronix.de
CC: c...@conrad-kostecki.de
CC: linux-kernel@vger.kernel.org
CC: x...@kernel.org
CC: mi...@redhat.com
CC: r...@rjwysocki.net
CC: de...@acpica.org
---
 drivers/acpi/acpica/tbutils.c |   36 
 include/acpi/acpiosxf.h   |6 ++
 2 files changed, 42 insertions(+), 0 deletions(-)

diff --git a/drivers/acpi/acpica/tbutils.c b/drivers/acpi/acpica/tbutils.c
index 6412d3c..a819d198 100644
--- a/drivers/acpi/acpica/tbutils.c
+++ b/drivers/acpi/acpica/tbutils.c
@@ -453,6 +453,8 @@ static acpi_status 
acpi_tb_validate_xsdt(acpi_physical_address xsdt_address)
  *
  
**/
 
+#define ACPI_MAX_TABLE_ADD 64
+
 acpi_status __init acpi_tb_parse_root_table(acpi_physical_address rsdp_address)
 {
struct acpi_table_rsdp *rsdp;
@@ -623,5 +625,39 @@ acpi_status __init 
acpi_tb_parse_root_table(acpi_physical_address rsdp_address)
}
}
 
+   /*
+* ACPI Table Add:
+* Allow the OS to add additional tables to the global root table list
+*/
+   for (i = 0; i < ACPI_MAX_TABLE_ADD; i++) {
+   int tmp, k;
+   table_entry_size = 0;
+   address = 0;
+   status = acpi_os_physical_table_add(,
+   _entry_size);
+   if (status == AE_OK && table_entry_size && address) {
+   table = acpi_os_map_memory(address, table_entry_size);
+   for (k = 2; k < 
acpi_gbl_root_table_list.current_table_count; k++) {
+   /*
+ Always add SSDTs. Only allow adding of other
+ tables if none of such a signature already
+ exists. Use the override interface instead
+ in such a case.
+   */
+   if (ACPI_COMPARE_NAME("SSDT", table->signature) 
&&
+   
!ACPI_COMPARE_NAME(_gbl_root_table_list.tables[i].signature, 
table->signature)) {
+   ACPI_INFO((AE_INFO, "OS table not added, signature already exists:"));
+   acpi_tb_print_table_header(address, table);
+   acpi_os_unmap_memory(table, table_entry_size);
+   } else {
+   ACPI_INFO((AE_INFO, "Add OS provided table:"));
+   acpi_tb_print_table_header(address, table);
+   status = acpi_tb_store_table(address, table, table_entry_size,
+ACPI_TABLE_ORIGIN_MAPPED, );
+   }
+   }
+   } else
+   break;
+   }
return_ACPI_STATUS(AE_OK);
 }
diff --git a/include/acpi/acpiosxf.h b/include/acpi/acpiosxf.h
index 01e6c6d..70c00ed 100644
--- a/include/acpi/acpiosxf.h
+++ b/include/acpi/acpiosxf.h
@@ -111,6 +111,12 @@ acpi_os_physical_table_override(struct acpi_table_header 
*existing_table,
u32 *new_table_length);
 #endif
 
+#ifndef ACPI_USE_ALTERNATE_PROTOTYPE_acpi_os_physical_table_add
+acpi_status
+acpi_os_physical_table_add(acpi_physical_address * new_address,
+  u32 *new_table_length);
+#endif
+
 /*
  * Spinlock primitives
  */
-- 
1.7.6.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/4] ACPICA: Introduce new acpi_os_physical_table_add OS callback

2014-02-28 Thread Thomas Renninger

This one allows OS to add arbitrary ACPI tables.

ToDo: It should get checked whether a table with the same signature already
exists and if this is the case, adding should not happen.

Signed-off-by: Thomas Renninger tr...@suse.de
CC: h...@zytor.com
CC: t...@linutronix.de
CC: c...@conrad-kostecki.de
CC: linux-kernel@vger.kernel.org
CC: x...@kernel.org
CC: mi...@redhat.com
CC: r...@rjwysocki.net
CC: de...@acpica.org
---
 drivers/acpi/acpica/tbutils.c |   36 
 include/acpi/acpiosxf.h   |6 ++
 2 files changed, 42 insertions(+), 0 deletions(-)

diff --git a/drivers/acpi/acpica/tbutils.c b/drivers/acpi/acpica/tbutils.c
index 6412d3c..a819d198 100644
--- a/drivers/acpi/acpica/tbutils.c
+++ b/drivers/acpi/acpica/tbutils.c
@@ -453,6 +453,8 @@ static acpi_status 
acpi_tb_validate_xsdt(acpi_physical_address xsdt_address)
  *
  
**/
 
+#define ACPI_MAX_TABLE_ADD 64
+
 acpi_status __init acpi_tb_parse_root_table(acpi_physical_address rsdp_address)
 {
struct acpi_table_rsdp *rsdp;
@@ -623,5 +625,39 @@ acpi_status __init 
acpi_tb_parse_root_table(acpi_physical_address rsdp_address)
}
}
 
+   /*
+* ACPI Table Add:
+* Allow the OS to add additional tables to the global root table list
+*/
+   for (i = 0; i  ACPI_MAX_TABLE_ADD; i++) {
+   int tmp, k;
+   table_entry_size = 0;
+   address = 0;
+   status = acpi_os_physical_table_add(address,
+   table_entry_size);
+   if (status == AE_OK  table_entry_size  address) {
+   table = acpi_os_map_memory(address, table_entry_size);
+   for (k = 2; k  
acpi_gbl_root_table_list.current_table_count; k++) {
+   /*
+ Always add SSDTs. Only allow adding of other
+ tables if none of such a signature already
+ exists. Use the override interface instead
+ in such a case.
+   */
+   if (ACPI_COMPARE_NAME(SSDT, table-signature) 

+   
!ACPI_COMPARE_NAME(acpi_gbl_root_table_list.tables[i].signature, 
table-signature)) {
+   ACPI_INFO((AE_INFO, OS table not added, signature already exists:));
+   acpi_tb_print_table_header(address, table);
+   acpi_os_unmap_memory(table, table_entry_size);
+   } else {
+   ACPI_INFO((AE_INFO, Add OS provided table:));
+   acpi_tb_print_table_header(address, table);
+   status = acpi_tb_store_table(address, table, table_entry_size,
+ACPI_TABLE_ORIGIN_MAPPED, tmp);
+   }
+   }
+   } else
+   break;
+   }
return_ACPI_STATUS(AE_OK);
 }
diff --git a/include/acpi/acpiosxf.h b/include/acpi/acpiosxf.h
index 01e6c6d..70c00ed 100644
--- a/include/acpi/acpiosxf.h
+++ b/include/acpi/acpiosxf.h
@@ -111,6 +111,12 @@ acpi_os_physical_table_override(struct acpi_table_header 
*existing_table,
u32 *new_table_length);
 #endif
 
+#ifndef ACPI_USE_ALTERNATE_PROTOTYPE_acpi_os_physical_table_add
+acpi_status
+acpi_os_physical_table_add(acpi_physical_address * new_address,
+  u32 *new_table_length);
+#endif
+
 /*
  * Spinlock primitives
  */
-- 
1.7.6.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/4] ACPI: Provide support for ACPI table adding via OS

2014-02-28 Thread Thomas Renninger

This is done the same way as the previous ACPI physical table override
mechanism.
How to override or add tables via initrd, please look up:
Documentation/acpi/initrd_table_override.txt

SSDTs can only be overridden, not added.

Overriding only happens if the OEM id of the table header matches the one
with the BIOS provided one.
All table types (SSDTs are an exception), must only show up once.
So either you:
   - Add a fresh new table for adding of which type (signature) none exists
 in the BIOS  - OS ACPI table adding happens.
or
   - Add a table which already exists in BIOS, but the OEM id must match the
 one of the table of the same type (signature) that exists in BIOS already
 - OS ACPI table overriding happens
 Typically one copies away the original ACPI table, disassembles,
 modifies (for example adding debug strings), compiles it and provides
 the table via initrd for overriding (will have the same OEM id).
 But this is not necessary, one could also come up with a selfmade
 table for overriding, by taking care that the signature and OEM id is
 the same as the one provided by BIOS

In ACPI table overriding case you see in dmesg:
   ACPI: Override [DSDT-  BXDSDT], this is unsafe: tainting kernel
   Disabling lock debugging due to kernel taint

In ACPI table adding case you see in dmesg (BGRT table got added):
   ACPI: Add [BGRT-SLIC-WKS], this is unsafe: tainting kernel
   ACPI: BGRT 7fffd1ba 38 (v00 HPQOEM SLIC-WKS 01072009
 INTL 20130823)

Signed-off-by: Thomas Renninger tr...@suse.de
CC: h...@zytor.com
CC: t...@linutronix.de
CC: c...@conrad-kostecki.de
CC: linux-kernel@vger.kernel.org
CC: x...@kernel.org
CC: mi...@redhat.com
CC: r...@rjwysocki.net
CC: de...@acpica.org
---
 drivers/acpi/osl.c |   91 +++-
 1 files changed, 90 insertions(+), 1 deletions(-)

diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index fc1aa79..7f9a865 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -566,6 +566,8 @@ static const char * const table_sigs[] = {
 
 #define ACPI_OVERRIDE_TABLES 64
 static struct cpio_data __initdata acpi_initrd_files[ACPI_OVERRIDE_TABLES];
+/* Remember physical address of overriden or added tables */
+static acpi_physical_address acpi_table_overridden[ACPI_OVERRIDE_TABLES];
 
 #define MAP_CHUNK_SIZE   (NR_FIX_BTMAPS  PAGE_SHIFT)
 
@@ -715,7 +717,7 @@ acpi_os_physical_table_override(struct acpi_table_header 
*existing_table,
*address = 0;
return AE_OK;
 #else
-   int table_offset = 0;
+   int no, table_offset = 0;
struct acpi_table_header *table;
 
*table_length = 0;
@@ -759,6 +761,12 @@ acpi_os_physical_table_override(struct acpi_table_header 
*existing_table,
*table_length = table-length;
acpi_os_unmap_memory(table, ACPI_HEADER_SIZE);
*address = acpi_tables_addr + table_offset;
+   for (no = 0; no  ACPI_OVERRIDE_TABLES; no++) {
+   if (acpi_table_overridden[no] == 0) {
+   acpi_table_overridden[no] = *address;
+   break;
+   }
+   }
break;
} while (table_offset + ACPI_HEADER_SIZE  all_tables_size);
 
@@ -768,6 +776,87 @@ acpi_os_physical_table_override(struct acpi_table_header 
*existing_table,
 #endif
 }
 
+acpi_status
+acpi_os_physical_table_add(acpi_physical_address *address,
+  u32 *table_length)
+{
+#ifndef CONFIG_ACPI_INITRD_TABLE_OVERRIDE
+   *table_length = 0;
+   *address = 0;
+   return AE_OK;
+#else
+   int no, table_offset = 0;
+   struct acpi_table_header *table;
+
+   *table_length = 0;
+   *address = 0;
+
+   if (!acpi_tables_addr)
+   return AE_OK;
+
+   do {
+   if (table_offset + ACPI_HEADER_SIZE  all_tables_size) {
+   WARN_ON(1);
+   return AE_OK;
+   }
+
+   table = acpi_os_map_memory(acpi_tables_addr + table_offset,
+  ACPI_HEADER_SIZE);
+
+   if (table_offset + table-length  all_tables_size) {
+   acpi_os_unmap_memory(table, ACPI_HEADER_SIZE);
+   WARN_ON(1);
+   return AE_OK;
+   }
+
+   table_offset += table-length;
+
+   /* Do not add SSDTs, only if acpi_no_auto_ssdt boot parameter
+  got set.
+   */
+   if (acpi_gbl_disable_ssdt_table_load == FALSE 
+   !memcmp(SSDT, table-signature, 4)) {
+   pr_info(Will not add SSDT [%8.8s], use 
acpi_no_auto_ssdt to force\n, table-oem_table_id);
+   acpi_os_unmap_memory(table,
+ACPI_HEADER_SIZE);
+   continue

[PATCH 3/4] ACPICA: Add BGRT signature to known signatures

2014-02-28 Thread Thomas Renninger

In Linux there even exists a driver already making use of this table:
drivers/acpi/bgrt.c:MODULE_DESCRIPTION(BGRT boot graphic support);

Signed-off-by: Thomas Renninger tr...@suse.de
CC: h...@zytor.com
CC: t...@linutronix.de
CC: c...@conrad-kostecki.de
CC: linux-kernel@vger.kernel.org
CC: x...@kernel.org
CC: mi...@redhat.com
CC: r...@rjwysocki.net
CC: de...@acpica.org
---
 include/acpi/actbl2.h |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/include/acpi/actbl2.h b/include/acpi/actbl2.h
index 094a906..9ed1c20 100644
--- a/include/acpi/actbl2.h
+++ b/include/acpi/actbl2.h
@@ -83,6 +83,7 @@
 #define ACPI_SIG_WDAT   WDAT /* Watchdog Action Table */
 #define ACPI_SIG_WDDT   WDDT /* Watchdog Timer Description Table */
 #define ACPI_SIG_WDRT   WDRT /* Watchdog Resource Table */
+#define ACPI_SIG_BGRT   BGRT  /* Boot Graphic Support */
 
 #ifdef ACPI_UNDEFINED_TABLES
 /*
-- 
1.7.6.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 4/4] ACPI: Add new table signatures that can be overridden/added.

2014-02-28 Thread Thomas Renninger

Signed-off-by: Thomas Renninger tr...@suse.de
CC: h...@zytor.com
CC: t...@linutronix.de
CC: c...@conrad-kostecki.de
CC: linux-kernel@vger.kernel.org
CC: x...@kernel.org
CC: mi...@redhat.com
CC: r...@rjwysocki.net
CC: de...@acpica.org
---
 drivers/acpi/osl.c |4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index 7f9a865..23ce5b1 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -38,6 +38,7 @@
 #include linux/delay.h
 #include linux/workqueue.h
 #include linux/nmi.h
+# define ACPI_UNDEFINED_TABLES 1
 #include linux/acpi.h
 #include linux/efi.h
 #include linux/ioport.h
@@ -559,7 +560,8 @@ static const char * const table_sigs[] = {
ACPI_SIG_IBFT, ACPI_SIG_IVRS, ACPI_SIG_MCFG, ACPI_SIG_MCHI,
ACPI_SIG_SLIC, ACPI_SIG_SPCR, ACPI_SIG_SPMI, ACPI_SIG_TCPA,
ACPI_SIG_UEFI, ACPI_SIG_WAET, ACPI_SIG_WDAT, ACPI_SIG_WDDT,
-   ACPI_SIG_WDRT, ACPI_SIG_DSDT, ACPI_SIG_FADT, ACPI_SIG_PSDT,
+   ACPI_SIG_WDRT, ACPI_SIG_BGRT, ACPI_SIG_ATKG, ACPI_SIG_GSCI,
+   ACPI_SIG_IEIT, ACPI_SIG_DSDT, ACPI_SIG_FADT, ACPI_SIG_PSDT,
ACPI_SIG_RSDT, ACPI_SIG_XSDT, ACPI_SIG_SSDT, NULL };
 
 #define ACPI_HEADER_SIZE sizeof(struct acpi_table_header)
-- 
1.7.6.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 5 6 >

1 - 100 of 549 matches

Mail list logo