date:20190220

Re: [for-next][PATCH 13/29] tracing: No need to free iter->trace in fail path of tracing_open_pipe()

2019-02-20 Thread Steven Rostedt

On Wed, 20 Feb 2019 13:37:50 -0500
Steven Rostedt  wrote:

> From: "zhangyi (F)" 
> 
> Commit d716ff71dd12 ("tracing: Remove taking of trace_types_lock in
> pipe files") use the current tracer instead of the copy in
> tracing_open_pipe(), but it forget to remove the freeing sentence in
> the error path.
> 
> [ Note, this is harmless because kfree(NULL) is allowed and iter is
>   allocated with kzalloc() making iter->trace = NULL -- S. Rostedt ]

Bah, I forgot to update this. I haven't pushed to linux-next yet.

As Zhangyi replied, this is a real issue. I just wished the real issue
was explained in the change log.

I'm going to rebase this to update the change log (no code changes, so
no need to run the tests again), and also, I'll add a Cc stable. No
point in sending this out as a separate patch either, because the merge
window is going to open soon.

-- Steve


> 
> Link: 
> http://lkml.kernel.org/r/1550060946-45984-1-git-send-email-yi.zh...@huawei.com
> 
> Fixes: d716ff71dd12 ("tracing: Remove taking of trace_types_lock in pipe 
> files")
> Signed-off-by: zhangyi (F) 
> Signed-off-by: Steven Rostedt (VMware) 
> ---
>  kernel/trace/trace.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index c521b7347482..b583ff7656bb 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -5624,7 +5624,6 @@ static int tracing_open_pipe(struct inode *inode, 
> struct file *filp)
>   return ret;
>  
>  fail:
> - kfree(iter->trace);
>   kfree(iter);
>   __trace_array_put(tr);
>   mutex_unlock(_types_lock);

Re: [RFC][PATCH 00/16] sched: Core scheduling

2019-02-20 Thread Subhra Mazumdar




On 2/20/19 1:42 AM, Peter Zijlstra wrote:

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing in e-mail?

On Tue, Feb 19, 2019 at 02:07:01PM -0800, Greg Kerr wrote:

Thanks for posting this patchset Peter. Based on the patch titled, "sched: A
quick and dirty cgroup tagging interface," I believe cgroups are used to
define co-scheduling groups in this implementation.

Chrome OS engineers (kerr...@google.com, mpden...@google.com, and
pal...@google.com) are considering an interface that is usable by unprivileged
userspace apps. cgroups are a global resource that require privileged access.
Have you considered an interface that is akin to namespaces? Consider the
following strawperson API proposal (I understand prctl() is generally
used for process
specific actions, so we aren't married to using prctl()):

I don't think we're anywhere near the point where I care about
interfaces with this stuff.

Interfaces are a trivial but tedious matter once the rest works to
satisfaction.

As it happens; there is actually a bug in that very cgroup patch that
can cause undesired scheduling. Try spotting and fixing that.

Another question is if we want to be L1TF complete (and how strict) or
not, and if so, build the missing pieces (for instance we currently
don't kick siblings on IRQ/trap/exception entry -- and yes that's nasty
and horrible code and missing for that reason).

I remember asking Paul about this and he mentioned he has a Address Space
Isolation proposal to cover this. So it seems this is out of scope of
core scheduling?


So first; does this provide what we need? If that's sorted we can
bike-shed on uapi/abi.

Re: [PATCH v15 15/15] tracing: Add hist trigger action 'expected fail' test case

2019-02-20 Thread Tom Zanussi

On Wed, 2019-02-20 at 13:33 -0500, Steven Rostedt wrote:
> On Wed, 20 Feb 2019 12:10:31 -0600
> Tom Zanussi  wrote:
> 
> > 
> > As far as I understand it (there's no other case of an xfail test
> > in
> > the testsuite, so nothing similar to compare it to), the test
> > output is
> >  correct - here we get the expected fail, XFAIL, and not a FAIL as
> > any
> > test, xfail or normal, that failed would produce:
> 
> Yeah, I've been staring at the code, and commit:
> 
> 915de2adb584a ftracetest: Add POSIX.3 standard and XFAIL result codes
> 
> 
> > 
> > tools/testing/selftests/ftrace# ./ftracetest test.d/trigger/
> > === Ftrace unit tests ===
> > [1] event trigger - test inter-event histogram trigger expected
> > fail actions
> > [XFAIL]
> > [2] event trigger - test extended error support
> > [PASS]
> > 
> > And here the summary shows none failed, while we did have one
> > expected
> > xfail, but that's what was expected, and not a failure:
> > 
> > # of passed:  31
> > # of failed:  0
> > # of unresolved:  0
> > # of untested:  0
> > # of unsupported:  0
> > # of xfailed:  1
> 
> Yeah, but it's marked as RED, which is why I thought it was a
> failure.
> 
> > # of undefined(test bug):  0
> > 
> > If that's not correct, I'll fix it but at this point I'm not sure
> > what
> > the output should be if not that.
> 
> OK, so this has nothing to do with your patch set. I've tested
> everything else, and I'm ready to finally push my tree to linux-next.
> 
> I'm thinking that we should get rid of xfail, as it's really
> confusing,
> and I don't understand its purpose. But that shouldn't stop pushing
> your patches.
> 

OK, I'm fine with removing it, if it's too confusing.  IIRC Masami
suggested it to highlight that not all actions and handlers can be used
together, so I guess I'll hold off on a patch removing it until he can
chime in...

Thanks,

Tom 
 
> Thanks,
> 
> -- Steve

Re: [PATCH] kasan: turn off asan-stack for clang-8 and earlier

2019-02-20 Thread Mark Brown

On Wed, Feb 20, 2019 at 10:07:36AM -0800, Nick Desaulniers wrote:

> I like Evgenii's idea:
> https://bugs.llvm.org/show_bug.cgi?id=38809#c10

That's a suggestion to tune the inlining heuristics.

> While I myself share Arnd's goal of driving compiler warnings to zero,
> in general I'd prefer not to disable warning-producing-features or
> disable warnings outright for cases where we have some ideas of
> changes we can make to the compiler.  There's probably a list now of
> false warnings produced by old versions of Clang from bugs in Clang
> that we fixed.  I'm not interested in additionally trying to work
> around those somehow in kernel sources.

We do have infrastructure in the kernel for managing warnings based on
compiler version (Arnd was looking at some improvements to that IIRC),
if we've got a kernel that builds with a given compiler it's worth
looking at tuning what we do with that compiler.  If newer versions of
the compiler work better or have new options we can turn things on for
them.

> Qian previously pointed out that most drivers don't produce this
> warning under KASAN+Clang.  While 114 is a lot, what are the chances
> that someone NEEDS a KASAN+Clang build to compile warning free and
> happen to include one of these problematic drivers?  And if there is a
> chance they do observe the warning, are we doing a disservice by
> disabling the feature (-asan-stack=1) outright for the whole kernel,
> or disabling the warning (`-Wstack-frame-larger-than=`) which can flag
> issues unrelated to KASAN?

People doing treewide work and subsystem maintainers are a reasonably
important target for this sort of thing - for example people looking at
the kernelci output.  It's a lot easier to pay attention to problems if
you don't have to wade through large numbers of false positives.

signature.asc
Description: PGP signature

[for-next][PATCH 04/29] tracing: Add comment to predicate_parse() about "&&" or "||"

2019-02-20 Thread Steven Rostedt

From: "Steven Rostedt (VMware)" 

As the predicat_parse() code is rather complex, commenting subtleties is
important. The switch case statement should be commented to describe that it
is only looking for two '&' or '|' together, which is why the fall through
to an error is after the check.

Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/trace_events_filter.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/trace/trace_events_filter.c 
b/kernel/trace/trace_events_filter.c
index eb694756c4bb..f052ecb085e9 100644
--- a/kernel/trace/trace_events_filter.c
+++ b/kernel/trace/trace_events_filter.c
@@ -491,6 +491,7 @@ predicate_parse(const char *str, int nr_parens, int 
nr_preds,
break;
case '&':
case '|':
+   /* accepting only "&&" or "||" */
if (next[1] == next[0]) {
ptr++;
break;
-- 
2.20.1

[for-next][PATCH 01/29] function_graph: Support displaying relative timestamp

2019-02-20 Thread Steven Rostedt

From: Changbin Du 

When function_graph is used for latency tracers, relative timestamp
is more straightforward than absolute timestamp as function trace
does. This change adds relative timestamp support to function_graph
and applies to latency tracers (wakeup and irqsoff).

Instead of:

 # tracer: irqsoff
 #
 # irqsoff latency trace v1.1.5 on 5.0.0-rc1-test
 # 
 # latency: 521 us, #1125/1125, CPU#2 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:8)
 #-
 #| task: swapper/2-0 (uid:0 nice:0 policy:0 rt_prio:0)
 #-
 #  => started at: __schedule
 #  => ended at:   _raw_spin_unlock_irq
 #
 #
 #   _-=> irqs-off
 #  / _=> need-resched
 # | / _---=> hardirq/softirq
 # || / _--=> preempt-depth
 # ||| /
 # TIMECPU  TASK/PID     DURATION  FUNCTION 
CALLS
 #  |  | ||   |   | |   |   
|   |
   124.974306 |   2)  systemd-693   |  d..1  0.000 us|  __schedule();
   124.974307 |   2)  systemd-693   |  d..1  |
rcu_note_context_switch() {
   124.974308 |   2)  systemd-693   |  d..1  0.487 us|  
rcu_preempt_deferred_qs();
   124.974309 |   2)  systemd-693   |  d..1  0.451 us|  rcu_qs();
   124.974310 |   2)  systemd-693   |  d..1  2.301 us|}
[..]
   124.974826 |   2)-0|  d..2  |  
finish_task_switch() {
   124.974826 |   2)-0|  d..2  |
_raw_spin_unlock_irq() {
   124.974827 |   2)-0|  d..2  0.000 us|  
_raw_spin_unlock_irq();
   124.974828 |   2)-0|  d..2  0.000 us|  
tracer_hardirqs_on();
   -0   2d..2  552us : 
  => __schedule
  => schedule_idle
  => do_idle
  => cpu_startup_entry
  => start_secondary
  => secondary_startup_64

Show:

 # tracer: irqsoff
 #
 # irqsoff latency trace v1.1.5 on 5.0.0-rc1-test+
 # 
 # latency: 511 us, #1053/1053, CPU#7 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:8)
 #-
 #| task: swapper/7-0 (uid:0 nice:0 policy:0 rt_prio:0)
 #-
 #  => started at: __schedule
 #  => ended at:   _raw_spin_unlock_irq
 #
 #
 #   _-=> irqs-off
 #  / _=> need-resched
 # | / _---=> hardirq/softirq
 # || / _--=> preempt-depth
 # ||| /
 #   REL TIME  CPU  TASK/PID     DURATION  FUNCTION 
CALLS
 #  |  | ||   |   | |   |   
|   |
 0 us |   7)   sshd-1704|  d..1  0.000 us|  __schedule();
 1 us |   7)   sshd-1704|  d..1  |
rcu_note_context_switch() {
 1 us |   7)   sshd-1704|  d..1  0.611 us|  
rcu_preempt_deferred_qs();
 2 us |   7)   sshd-1704|  d..1  0.484 us|  rcu_qs();
 3 us |   7)   sshd-1704|  d..1  2.599 us|}
[..]
   509 us |   7)-0|  d..2  |  
finish_task_switch() {
   510 us |   7)-0|  d..2  |
_raw_spin_unlock_irq() {
   510 us |   7)-0|  d..2  0.000 us|  
_raw_spin_unlock_irq();
   512 us |   7)-0|  d..2  0.000 us|  
tracer_hardirqs_on();
   -0   7d..2  543us : 
  => __schedule
  => schedule_idle
  => do_idle
  => cpu_startup_entry
  => start_secondary
  => secondary_startup_64

Link: http://lkml.kernel.org/r/20190101154614.8887-2-changbin...@gmail.com

Signed-off-by: Changbin Du 
Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/trace.h |  9 +
 kernel/trace/trace_functions_graph.c | 25 +
 kernel/trace/trace_irqsoff.c |  2 +-
 kernel/trace/trace_sched_wakeup.c|  2 +-
 4 files changed, 32 insertions(+), 6 deletions(-)

diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 08900828d282..a34fa5e76abb 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -855,10 +855,11 @@ static __always_inline bool ftrace_hash_empty(struct 
ftrace_hash *hash)
 #define TRACE_GRAPH_PRINT_PROC  0x8
 #define TRACE_GRAPH_PRINT_DURATION  0x10
 #define TRACE_GRAPH_PRINT_ABS_TIME  0x20
-#define TRACE_GRAPH_PRINT_IRQS  0x40
-#define TRACE_GRAPH_PRINT_TAIL  0x80
-#define TRACE_GRAPH_SLEEP_TIME 0x100
-#define TRACE_GRAPH_GRAPH_TIME 0x200
+#define TRACE_GRAPH_PRINT_REL_TIME  0x40
+#define TRACE_GRAPH_PRINT_IRQS  0x80
+#define TRACE_GRAPH_PRINT_TAIL  0x100
+#define TRACE_GRAPH_SLEEP_TIME  0x200
+#define TRACE_GRAPH_GRAPH_TIME  0x400

Re: [PATCH] s390/jump_label: Correct asm contraint

2019-02-20 Thread Laura Abbott


On 2/20/19 12:58 AM, Heiko Carstens wrote:

On Sat, Feb 09, 2019 at 12:34:20PM -0800, Laura Abbott wrote:

On 2/5/19 12:43 PM, Heiko Carstens wrote:

On Tue, Jan 29, 2019 at 08:25:58AM +0100, Laura Abbott wrote:

On 1/23/19 5:24 AM, Heiko Carstens wrote:

On Wed, Jan 23, 2019 at 01:55:13PM +0100, Laura Abbott wrote:

There's a build failure with gcc9:

  ./arch/s390/include/asm/jump_label.h: Assembler messages:
  ./arch/s390/include/asm/jump_label.h:23: Error: bad expression
  ./arch/s390/include/asm/jump_label.h:23: Error: junk at end of line, first 
unrecognized character is `r'
  make[1]: *** [scripts/Makefile.build:277: init/main.o] Error 1

...

I've had to turn off s390 in Fedora until this gets fixed :(


Laura, the patch below should fix this (temporarily). If possible,
could you give it a try? It seems to work for me.

Subject: [PATCH] s390: disable section anchors

Tested-by: Laura Abbott <


The patch won't be used. In the meantime Ilya provided a gcc 9 and
kernel patch which should fix this. The kernel patch is available here
https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git/commit/?h=features=146448524bddbf6dfc62de31957e428de001cbda
and will go upstream during the next merge window.

Note: this obviously also requires to update the gcc 9 version in
Fedora, so it contains Ilya's patch, to be able to compile the kernel.

Thanks,
Heiko



Thanks. I'll keep an eye out for that during the next merge
window.

[for-next][PATCH 00/29] tracing: Updates for 5.1

2019-02-20 Thread Steven Rostedt

  git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git
for-next

Head SHA1: 5308e9705d9a017f0e732610ac0a7cab52fb01f7


Changbin Du (6):
  function_graph: Support displaying relative timestamp
  tracing: Show more info for funcgraph wakeup tracers
  tracing: Put a margin between flags and duration for wakeup tracers
  tracing/doc: Add latency tracer funcgraph example
  tracing: Show stacktrace for wakeup tracers
  tracing: Change the function format to display function names by perf

Elena Reshetova (1):
  uprobes: convert uprobe.ref to refcount_t

Mathieu Malaterre (2):
  tracing: Annotate implicit fall through in parse_probe_arg()
  tracing: Annotate implicit fall through in predicate_parse()

Miroslav Benes (1):
  ring-buffer: Remove unused function ring_buffer_page_len()

Steven Rostedt (VMware) (3):
  tracing: Add comment to predicate_parse() about "&&" or "||"
  ftrace: Allow enabling of filters via index of available_filter_functions
  tracing: Comment why cond_snapshot is checked outside of max_lock 
protection

Tom Zanussi (15):
  tracing: Refactor hist trigger action code
  tracing: Make hist trigger Documentation better reflect actions/handlers
  tracing: Split up onmatch action data
  tracing: Generalize hist trigger onmax and save action
  tracing: Add conditional snapshot
  tracing: Add hist trigger snapshot() action
  tracing: Add hist trigger snapshot() action Documentation
  tracing: Add hist trigger onchange() handler
  tracing: Add hist trigger onchange() handler Documentation
  tracing: Add alternative synthetic event trace action syntax
  tracing: Add SPDX license GPL-2.0 license identifier to inter-event 
testcases
  tracing: Add hist trigger snapshot() action test case
  tracing: Add hist trigger onchange() handler test case
  tracing: Add alternative synthetic event trace action test case
  tracing: Add hist trigger action 'expected fail' test case

zhangyi (F) (1):
  tracing: No need to free iter->trace in fail path of tracing_open_pipe()


 Documentation/trace/ftrace.rst |   89 ++
 Documentation/trace/histogram.rst  |  316 +-
 include/linux/ring_buffer.h|2 -
 kernel/events/uprobes.c|8 +-
 kernel/trace/ftrace.c  |   30 +
 kernel/trace/ring_buffer.c |   14 -
 kernel/trace/trace.c   |  217 +++-
 kernel/trace/trace.h   |   66 +-
 kernel/trace/trace_entries.h   |   41 +-
 kernel/trace/trace_events_filter.c |7 +
 kernel/trace/trace_events_hist.c   | 1048 ++--
 kernel/trace/trace_functions_graph.c   |   30 +-
 kernel/trace/trace_irqsoff.c   |2 +-
 kernel/trace/trace_probe.c |1 +
 kernel/trace/trace_sched_wakeup.c  |   11 +-
 .../inter-event/trigger-action-hist-xfail.tc   |   30 +
 .../inter-event/trigger-extended-error-support.tc  |1 +
 .../inter-event/trigger-field-variable-support.tc  |1 +
 .../trigger-inter-event-combined-hist.tc   |1 +
 .../inter-event/trigger-multi-actions-accept.tc|1 +
 .../inter-event/trigger-onchange-action-hist.tc|   28 +
 .../inter-event/trigger-onmatch-action-hist.tc |1 +
 .../trigger-onmatch-onmax-action-hist.tc   |1 +
 .../inter-event/trigger-onmax-action-hist.tc   |1 +
 .../inter-event/trigger-snapshot-action-hist.tc|   43 +
 .../trigger-synthetic-event-createremove.tc|1 +
 .../inter-event/trigger-trace-action-hist.tc   |   42 +
 27 files changed, 1659 insertions(+), 374 deletions(-)
 create mode 100644 
tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-action-hist-xfail.tc
 create mode 100644 
tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-onchange-action-hist.tc
 create mode 100644 
tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-snapshot-action-hist.tc
 create mode 100644 
tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-trace-action-hist.tc

[for-next][PATCH 05/29] tracing: Show more info for funcgraph wakeup tracers

2019-02-20 Thread Steven Rostedt

From: Changbin Du 

Add these info fields to funcgraph wakeup tracers:
  o Show CPU info since the waker could be on a different CPU.
  o Show function duration and overhead.
  o Show IRQ markers.

Link: http://lkml.kernel.org/r/20190101154614.8887-3-changbin...@gmail.com

Signed-off-by: Changbin Du 
Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/trace_sched_wakeup.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/trace/trace_sched_wakeup.c 
b/kernel/trace/trace_sched_wakeup.c
index b6c5fa10347e..da5b6e012840 100644
--- a/kernel/trace/trace_sched_wakeup.c
+++ b/kernel/trace/trace_sched_wakeup.c
@@ -180,8 +180,11 @@ static void wakeup_trace_close(struct trace_iterator *iter)
 }
 
 #define GRAPH_TRACER_FLAGS (TRACE_GRAPH_PRINT_PROC | \
+   TRACE_GRAPH_PRINT_CPU |  \
TRACE_GRAPH_PRINT_REL_TIME | \
-   TRACE_GRAPH_PRINT_DURATION)
+   TRACE_GRAPH_PRINT_DURATION | \
+   TRACE_GRAPH_PRINT_OVERHEAD | \
+   TRACE_GRAPH_PRINT_IRQS)
 
 static enum print_line_t wakeup_print_line(struct trace_iterator *iter)
 {
-- 
2.20.1

[for-next][PATCH 12/29] uprobes: convert uprobe.ref to refcount_t

2019-02-20 Thread Steven Rostedt

From: Elena Reshetova 

atomic_t variables are currently used to implement reference
counters with the following properties:
 - counter is initialized to 1 using atomic_set()
 - a resource is freed upon counter reaching zero
 - once counter reaches zero, its further
   increments aren't allowed
 - counter schema uses basic atomic operations
   (set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable uprobe.ref is used as pure reference counter.
Convert it to refcount_t and fix up the operations.

**Important note for maintainers:

Some functions from refcount_t API defined in lib/refcount.c
have different memory ordering guarantees than their atomic
counterparts.
The full comparison can be seen in
https://lkml.org/lkml/2017/11/15/57 and it is hopefully soon
in state to be merged to the documentation tree.
Normally the differences should not matter since refcount_t provides
enough guarantees to satisfy the refcounting use cases, but in
some rare cases it might matter.
Please double check that you don't have some undocumented
memory guarantees for this variable usage.

For the uprobe.ref it might make a difference
in following places:
 - put_uprobe(): decrement in refcount_dec_and_test() only
   provides RELEASE ordering and control dependency on success
   vs. fully ordered atomic counterpart

Link: 
http://lkml.kernel.org/r/1547637627-29526-1-git-send-email-elena.reshet...@intel.com

Suggested-by: Kees Cook 
Acked-by: Oleg Nesterov 
Reviewed-by: David Windsor 
Reviewed-by: Hans Liljestrand 
Reviewed-by: Srikar Dronamraju 
Signed-off-by: Elena Reshetova 
Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/events/uprobes.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 8aef47ee7bfa..0a8bf7a4fc5e 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -66,7 +66,7 @@ static struct percpu_rw_semaphore dup_mmap_sem;
 
 struct uprobe {
struct rb_node  rb_node;/* node in the rb tree */
-   atomic_tref;
+   refcount_t  ref;
struct rw_semaphore register_rwsem;
struct rw_semaphore consumer_rwsem;
struct list_headpending_list;
@@ -560,13 +560,13 @@ set_orig_insn(struct arch_uprobe *auprobe, struct 
mm_struct *mm, unsigned long v
 
 static struct uprobe *get_uprobe(struct uprobe *uprobe)
 {
-   atomic_inc(>ref);
+   refcount_inc(>ref);
return uprobe;
 }
 
 static void put_uprobe(struct uprobe *uprobe)
 {
-   if (atomic_dec_and_test(>ref)) {
+   if (refcount_dec_and_test(>ref)) {
/*
 * If application munmap(exec_vma) before uprobe_unregister()
 * gets called, we don't get a chance to remove uprobe from
@@ -657,7 +657,7 @@ static struct uprobe *__insert_uprobe(struct uprobe *uprobe)
rb_link_node(>rb_node, parent, p);
rb_insert_color(>rb_node, _tree);
/* get access + creation ref */
-   atomic_set(>ref, 2);
+   refcount_set(>ref, 2);
 
return u;
 }
-- 
2.20.1

[for-next][PATCH 15/29] tracing: Make hist trigger Documentation better reflect actions/handlers

2019-02-20 Thread Steven Rostedt

From: Tom Zanussi 

The action/handler code refactoring didn't change the action/handler
syntax, but did generalize it - the Documentation should reflect that.

Link: 
http://lkml.kernel.org/r/c2fe4144678829c70cad67aaa847dca27d57cb83.1550100284.git.tom.zanu...@linux.intel.com

Signed-off-by: Tom Zanussi 
Signed-off-by: Steven Rostedt (VMware) 
---
 Documentation/trace/histogram.rst | 56 ---
 1 file changed, 43 insertions(+), 13 deletions(-)

diff --git a/Documentation/trace/histogram.rst 
b/Documentation/trace/histogram.rst
index 7dda76503127..63e522107e59 100644
--- a/Documentation/trace/histogram.rst
+++ b/Documentation/trace/histogram.rst
@@ -25,7 +25,7 @@ Documentation written by Tom Zanussi
 
 hist:keys=[:values=]
   [:sort=][:size=#entries][:pause][:continue]
-  [:clear][:name=histname1] [if ]
+  [:clear][:name=histname1][:.] [if ]
 
   When a matching event is hit, an entry is added to a hash table
   using the key(s) and value(s) named.  Keys and values correspond to
@@ -1831,21 +1831,51 @@ and looks and behaves just like any other event::
 Like any other event, once a histogram is enabled for the event, the
 output can be displayed by reading the event's 'hist' file.
 
-2.2.3 Hist trigger 'actions'
-
+2.2.3 Hist trigger 'handlers' and 'actions'
+---
 
-A hist trigger 'action' is a function that's executed whenever a
-histogram entry is added or updated.
+A hist trigger 'action' is a function that's executed (in most cases
+conditionally) whenever a histogram entry is added or updated.
 
-The default 'action' if no special function is explicitly specified is
-as it always has been, to simply update the set of values associated
-with an entry.  Some applications, however, may want to perform
-additional actions at that point, such as generate another event, or
-compare and save a maximum.
+When a histogram entry is added or updated, a hist trigger 'handler'
+is what decides whether the corresponding action is actually invoked
+or not.
 
-The following additional actions are available.  To specify an action
-for a given event, simply specify the action between colons in the
-hist trigger specification.
+Hist trigger handlers and actions are paired together in the general
+form:
+
+  .
+
+To specify a handler.action pair for a given event, simply specify
+that handler.action pair between colons in the hist trigger
+specification.
+
+In theory, any handler can be combined with any action, but in
+practice, not every handler.action combination is currently supported;
+if a given handler.action combination isn't supported, the hist
+trigger will fail with -EINVAL;
+
+The default 'handler.action' if none is explicity specified is as it
+always has been, to simply update the set of values associated with an
+entry.  Some applications, however, may want to perform additional
+actions at that point, such as generate another event, or compare and
+save a maximum.
+
+The supported handlers and actions are listed below, and each is
+described in more detail in the following paragraphs, in the context
+of descriptions of some common and useful handler.action combinations.
+
+The available handlers are:
+
+  - onmatch(matching.event)- invoke action on any addition or update
+  - onmax(var) - invoke action if var exceeds current max
+
+The available actions are:
+
+  - (param list) - generate synthetic event
+  - save(field,...)- save current event fields
+
+The following commonly-used handler.action pairs are available:
 
   - onmatch(matching.event).(param list)
 
-- 
2.20.1

[for-next][PATCH 02/29] tracing: Annotate implicit fall through in parse_probe_arg()

2019-02-20 Thread Steven Rostedt

From: Mathieu Malaterre 

There is a plan to build the kernel with -Wimplicit-fallthrough and
this place in the code produced a warning (W=1).

This commit remove the following warning:

  kernel/trace/trace_probe.c:302:6: warning: this statement may fall through 
[-Wimplicit-fallthrough=]

Link: http://lkml.kernel.org/r/20190114203039.16535-1-ma...@debian.org

Signed-off-by: Mathieu Malaterre 
Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/trace_probe.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index 9962cb5da8ac..89da34b326e3 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -300,6 +300,7 @@ parse_probe_arg(char *arg, const struct fetch_type *type,
 
case '+':   /* deref memory */
arg++;  /* Skip '+', because kstrtol() rejects it. */
+   /* fall through */
case '-':
tmp = strchr(arg, '(');
if (!tmp)
-- 
2.20.1

[for-next][PATCH 07/29] tracing/doc: Add latency tracer funcgraph example

2019-02-20 Thread Steven Rostedt

From: Changbin Du 

This add an example about how to use funcgraph with latency tracers.

Link: http://lkml.kernel.org/r/20190101154614.8887-6-changbin...@gmail.com

Signed-off-by: Changbin Du 
Signed-off-by: Steven Rostedt (VMware) 
---
 Documentation/trace/ftrace.rst | 51 ++
 1 file changed, 51 insertions(+)

diff --git a/Documentation/trace/ftrace.rst b/Documentation/trace/ftrace.rst
index 0131df7f5968..6ce2763a2a3e 100644
--- a/Documentation/trace/ftrace.rst
+++ b/Documentation/trace/ftrace.rst
@@ -1396,6 +1396,57 @@ enabling function tracing, we incur an added overhead. 
This
 overhead may extend the latency times. But nevertheless, this
 trace has provided some very helpful debugging information.
 
+If we prefer function graph output instead of function, we can set
+display-graph option::
+ with echo 1 > options/display-graph
+
+  # tracer: irqsoff
+  #
+  # irqsoff latency trace v1.1.5 on 4.20.0-rc6+
+  # 
+  # latency: 3751 us, #274/274, CPU#0 | (M:desktop VP:0, KP:0, SP:0 HP:0 #P:4)
+  #-
+  #| task: bash-1507 (uid:0 nice:0 policy:0 rt_prio:0)
+  #-
+  #  => started at: free_debug_processing
+  #  => ended at:   return_to_handler
+  #
+  #
+  #   _-=> irqs-off
+  #  / _=> need-resched
+  # | / _---=> hardirq/softirq
+  # || / _--=> preempt-depth
+  # ||| /
+  #   REL TIME  CPU  TASK/PID    DURATION  
FUNCTION CALLS
+  #  |  | ||  |   | |  
 |   |   |
+  0 us |   0)   bash-1507|  d... |   0.000 us|  
_raw_spin_lock_irqsave();
+  0 us |   0)   bash-1507|  d..1 |   0.378 us|
do_raw_spin_trylock();
+  1 us |   0)   bash-1507|  d..2 |   |set_track() {
+  2 us |   0)   bash-1507|  d..2 |   |  
save_stack_trace() {
+  2 us |   0)   bash-1507|  d..2 |   |
__save_stack_trace() {
+  3 us |   0)   bash-1507|  d..2 |   |  
__unwind_start() {
+  3 us |   0)   bash-1507|  d..2 |   |
get_stack_info() {
+  3 us |   0)   bash-1507|  d..2 |   0.351 us|  
in_task_stack();
+  4 us |   0)   bash-1507|  d..2 |   1.107 us|}
+  [...]
+   3750 us |   0)   bash-1507|  d..1 |   0.516 us|  
do_raw_spin_unlock();
+   3750 us |   0)   bash-1507|  d..1 |   0.000 us|  
_raw_spin_unlock_irqrestore();
+   3764 us |   0)   bash-1507|  d..1 |   0.000 us|  
tracer_hardirqs_on();
+  bash-15070d..1 3792us : 
+   => free_debug_processing
+   => __slab_free
+   => kmem_cache_free
+   => vm_area_free
+   => remove_vma
+   => exit_mmap
+   => mmput
+   => flush_old_exec
+   => load_elf_binary
+   => search_binary_handler
+   => __do_execve_file.isra.32
+   => __x64_sys_execve
+   => do_syscall_64
+   => entry_SYSCALL_64_after_hwframe
 
 preemptoff
 --
-- 
2.20.1

[for-next][PATCH 06/29] tracing: Put a margin between flags and duration for wakeup tracers

2019-02-20 Thread Steven Rostedt

From: Changbin Du 

Don't mix context flags with function duration info.

Instead of this:

 # tracer: wakeup_rt
 #
 # wakeup_rt latency trace v1.1.5 on 5.0.0-rc1-test+
 # 
 # latency: 177 us, #545/545, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:8)
 #-
 #| task: migration/0-11 (uid:0 nice:0 policy:1 rt_prio:99)
 #-
 #
 #   _-=> irqs-off
 #  / _=> need-resched
 # | / _---=> hardirq/softirq
 # || / _--=> preempt-depth
 # ||| /
 #   REL TIME  CPU  TASK/PID     DURATION  FUNCTION 
CALLS
 #  |  | ||   |   | |   |   
|   |
 0 us |   0)-0|  dNh5  |  /*  0:120:R   + 
[000]11:  0:R migration/0 */
 2 us |   0)-0|  dNh5  0.000 us|(null)();
 4 us |   0)-0|  dNh4  |  _raw_spin_unlock() {
 4 us |   0)-0|  dNh4  0.304 us|
preempt_count_sub();
 5 us |   0)-0|  dNh3  1.063 us|  }
 5 us |   0)-0|  dNh3  0.266 us|  ttwu_stat();
 6 us |   0)-0|  dNh3  |  
_raw_spin_unlock_irqrestore() {
 6 us |   0)-0|  dNh3  0.273 us|
preempt_count_sub();
 6 us |   0)-0|  dNh2  0.818 us|  }

Show this:

 # tracer: wakeup
 #
 # wakeup latency trace v1.1.5 on 4.20.0+
 # 
 # latency: 593 us, #674/674, CPU#0 | (M:desktop VP:0, KP:0, SP:0 HP:0 #P:4)
 #-
 #| task: kworker/0:1H-339 (uid:0 nice:-20 policy:0 rt_prio:0)
 #-
 #
 #  _-=> irqs-off
 # / _=> need-resched
 #| / _---=> hardirq/softirq
 #|| / _--=> preempt-depth
 #||| /
 #  REL TIME  CPU  TASK/PID    DURATION  
FUNCTION CALLS
 # |  | ||  |   | |   | 
  |   |
0 us |   0)-0|  dNs. |   |  /*  0:120:R   
+ [000]   339:100:R kworker/0:1H */
3 us |   0)-0|  dNs. |   0.000 us|
(null)();
   67 us |   0)-0|  dNs. |   0.721 us|  ttwu_stat();
   69 us |   0)-0|  dNs. |   0.607 us|  
_raw_spin_unlock_irqrestore();
   71 us |   0)-0|  .Ns. |   0.598 us|  
_raw_spin_lock_irq();
   72 us |   0)-0|  .Ns. |   0.584 us|  
_raw_spin_lock_irq();
   73 us |   0)-0|  dNs. | + 11.118 us   |  
__next_timer_interrupt();
   75 us |   0)-0|  dNs. |   |  call_timer_fn() {
   76 us |   0)-0|  dNs. |   |
delayed_work_timer_fn() {
   76 us |   0)-0|  dNs. |   |  
__queue_work() {
   ...

Link: http://lkml.kernel.org/r/20190101154614.8887-4-changbin...@gmail.com

Signed-off-by: Changbin Du 
Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/trace_functions_graph.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/trace_functions_graph.c 
b/kernel/trace/trace_functions_graph.c
index 16ebbdd7b22e..69ebf3c2f1b5 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -380,6 +380,7 @@ static void print_graph_lat_fmt(struct trace_seq *s, struct 
trace_entry *entry)
 {
trace_seq_putc(s, ' ');
trace_print_lat_fmt(s, entry);
+   trace_seq_puts(s, " | ");
 }
 
 /* If the pid changed since the last trace, output this event */
@@ -1153,7 +1154,7 @@ static void __print_graph_headers_flags(struct 
trace_array *tr,
if (flags & TRACE_GRAPH_PRINT_PROC)
seq_puts(s, "  TASK/PID   ");
if (lat)
-   seq_puts(s, "");
+   seq_puts(s, "   ");
if (flags & TRACE_GRAPH_PRINT_DURATION)
seq_puts(s, "  DURATION   ");
seq_puts(s, "   FUNCTION CALLS\n");
@@ -1169,7 +1170,7 @@ static void __print_graph_headers_flags(struct 
trace_array *tr,
if (flags & TRACE_GRAPH_PRINT_PROC)
seq_puts(s, "   ||");
if (lat)
-   seq_puts(s, "");
+   seq_puts(s, "   ");
if (flags & TRACE_GRAPH_PRINT_DURATION)
seq_puts(s, "   |   |  ");
seq_puts(s, "   |   |   |   |\n");
-- 
2.20.1

[for-next][PATCH 08/29] tracing: Show stacktrace for wakeup tracers

2019-02-20 Thread Steven Rostedt

From: Changbin Du 

This align the behavior of wakeup tracers with irqsoff latency tracer
that we record stacktrace at the beginning and end of waking up. The
stacktrace shows us what is happening in the kernel.

Link: http://lkml.kernel.org/r/20190116160249.7554-1-changbin...@gmail.com

Signed-off-by: Changbin Du 
Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/trace_sched_wakeup.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/trace/trace_sched_wakeup.c 
b/kernel/trace/trace_sched_wakeup.c
index da5b6e012840..f4fe7d1781e9 100644
--- a/kernel/trace/trace_sched_wakeup.c
+++ b/kernel/trace/trace_sched_wakeup.c
@@ -475,6 +475,7 @@ probe_wakeup_sched_switch(void *ignore, bool preempt,
 
__trace_function(wakeup_trace, CALLER_ADDR0, CALLER_ADDR1, flags, pc);
tracing_sched_switch_trace(wakeup_trace, prev, next, flags, pc);
+   __trace_stack(wakeup_trace, flags, 0, pc);
 
T0 = data->preempt_timestamp;
T1 = ftrace_now(cpu);
@@ -586,6 +587,7 @@ probe_wakeup(void *ignore, struct task_struct *p)
data = per_cpu_ptr(wakeup_trace->trace_buffer.data, wakeup_cpu);
data->preempt_timestamp = ftrace_now(cpu);
tracing_sched_wakeup_trace(wakeup_trace, p, current, flags, pc);
+   __trace_stack(wakeup_trace, flags, 0, pc);
 
/*
 * We must be careful in using CALLER_ADDR2. But since wake_up
-- 
2.20.1

[for-next][PATCH 20/29] tracing: Add hist trigger snapshot() action Documentation

2019-02-20 Thread Steven Rostedt

From: Tom Zanussi 

Add Documentation for the hist:handlerXXX($var).snapshot() action.

Link: 
http://lkml.kernel.org/r/445861d7822cd4b6aeaea1cecfcdbda466502148.1550100284.git.tom.zanu...@linux.intel.com

Signed-off-by: Tom Zanussi 
Signed-off-by: Steven Rostedt (VMware) 
---
 Documentation/trace/histogram.rst | 110 ++
 1 file changed, 110 insertions(+)

diff --git a/Documentation/trace/histogram.rst 
b/Documentation/trace/histogram.rst
index 63e522107e59..353317bc3825 100644
--- a/Documentation/trace/histogram.rst
+++ b/Documentation/trace/histogram.rst
@@ -1874,6 +1874,7 @@ The available actions are:
 
   - (param list) - generate synthetic event
   - save(field,...)- save current event fields
+  - snapshot() - snapshot the trace buffer
 
 The following commonly-used handler.action pairs are available:
 
@@ -2030,6 +2031,115 @@ The following commonly-used handler.action pairs are 
available:
 Entries: 2
 Dropped: 0
 
+  - onmax(var).snapshot()
+
+The 'onmax(var).snapshot()' hist trigger action is invoked
+whenever the value of 'var' associated with a histogram entry
+exceeds the current maximum contained in that variable.
+
+The end result is that a global snapshot of the trace buffer will
+be saved in the tracing/snapshot file if 'var' exceeds the current
+maximum for any hist trigger entry.
+
+Note that in this case the maximum is a global maximum for the
+current trace instance, which is the maximum across all buckets of
+the histogram.  The key of the specific trace event that caused
+the global maximum and the global maximum itself are displayed,
+along with a message stating that a snapshot has been taken and
+where to find it.  The user can use the key information displayed
+to locate the corresponding bucket in the histogram for even more
+detail.
+
+As an example the below defines a couple of hist triggers, one for
+sched_waking and another for sched_switch, keyed on pid.  Whenever
+a sched_waking event occurs, the timestamp is saved in the entry
+corresponding to the current pid, and when the scheduler switches
+back to that pid, the timestamp difference is calculated.  If the
+resulting latency, stored in wakeup_lat, exceeds the current
+maximum latency, a snapshot is taken.  As part of the setup, all
+the scheduler events are also enabled, which are the events that
+will show up in the snapshot when it is taken at some point:
+
+# echo 1 > /sys/kernel/debug/tracing/events/sched/enable
+
+# echo 'hist:keys=pid:ts0=common_timestamp.usecs \
+if comm=="cyclictest"' >> \
+/sys/kernel/debug/tracing/events/sched/sched_waking/trigger
+
+# echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0: \
+onmax($wakeup_lat).save(next_prio,next_comm,prev_pid,prev_prio, \
+   prev_comm):onmax($wakeup_lat).snapshot() \
+   if next_comm=="cyclictest"' >> \
+   /sys/kernel/debug/tracing/events/sched/sched_switch/trigger
+
+When the histogram is displayed, for each bucket the max value
+and the saved values corresponding to the max are displayed
+following the rest of the fields.
+
+If a snaphot was taken, there is also a message indicating that,
+along with the value and event that triggered the global maximum:
+
+# cat /sys/kernel/debug/tracing/events/sched/sched_switch/hist
+  { next_pid:   2101 } hitcount:200
+   max: 52  next_prio:120  next_comm: cyclictest \
+prev_pid:  0  prev_prio:120  prev_comm: swapper/6
+
+  { next_pid:   2103 } hitcount:   1326
+   max:572  next_prio: 19  next_comm: cyclictest \
+prev_pid:  0  prev_prio:120  prev_comm: swapper/1
+
+  { next_pid:   2102 } hitcount:   1982 \
+   max: 74  next_prio: 19  next_comm: cyclictest \
+prev_pid:  0  prev_prio:120  prev_comm: swapper/5
+
+Snapshot taken (see tracing/snapshot).  Details:
+   triggering value { onmax($wakeup_lat) }:572 \
+   triggered by event with key: { next_pid:   2103 }
+
+Totals:
+Hits: 3508
+Entries: 3
+Dropped: 0
+
+In the above case, the event that triggered the global maximum has
+the key with next_pid == 2103.  If you look at the bucket that has
+2103 as the key, you'll find the additional values save()'d along
+with the local maximum for that bucket, which should be the same
+as the global maximum (since that was the same value that
+triggered the global snapshot).
+
+And finally, looking at the snapshot data should show at or near
+the end the event that triggered the snapshot (in this case you
+can verify the timestamps between the sched_waking and
+

[for-next][PATCH 13/29] tracing: No need to free iter->trace in fail path of tracing_open_pipe()

2019-02-20 Thread Steven Rostedt

From: "zhangyi (F)" 

Commit d716ff71dd12 ("tracing: Remove taking of trace_types_lock in
pipe files") use the current tracer instead of the copy in
tracing_open_pipe(), but it forget to remove the freeing sentence in
the error path.

[ Note, this is harmless because kfree(NULL) is allowed and iter is
  allocated with kzalloc() making iter->trace = NULL -- S. Rostedt ]

Link: 
http://lkml.kernel.org/r/1550060946-45984-1-git-send-email-yi.zh...@huawei.com

Fixes: d716ff71dd12 ("tracing: Remove taking of trace_types_lock in pipe files")
Signed-off-by: zhangyi (F) 
Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/trace.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index c521b7347482..b583ff7656bb 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -5624,7 +5624,6 @@ static int tracing_open_pipe(struct inode *inode, struct 
file *filp)
return ret;
 
 fail:
-   kfree(iter->trace);
kfree(iter);
__trace_array_put(tr);
mutex_unlock(_types_lock);
-- 
2.20.1

[for-next][PATCH 18/29] tracing: Add conditional snapshot

2019-02-20 Thread Steven Rostedt

From: Tom Zanussi 

Currently, tracing snapshots are context-free - they capture the ring
buffer contents at the time the tracing_snapshot() function was
invoked, and nothing else.  Additionally, they're always taken
unconditionally - the calling code can decide whether or not to take a
snapshot, but the data used to make that decision is kept separately
from the snapshot itself.

This change adds the ability to associate with each trace instance
some user data, along with an 'update' function that can use that data
to determine whether or not to actually take a snapshot.  The update
function can then update that data along with any other state (as part
of the data presumably), if warranted.

Because snapshots are 'global' per-instance, only one user can enable
and use a conditional snapshot for any given trace instance.  To
enable a conditional snapshot (see details in the function and data
structure comments), the user calls tracing_snapshot_cond_enable().
Similarly, to disable a conditional snapshot and free it up for other
users, tracing_snapshot_cond_disable() should be called.

To actually initiate a conditional snapshot, tracing_snapshot_cond()
should be called.  tracing_snapshot_cond() will invoke the update()
callback, allowing the user to decide whether or not to actually take
the snapshot and update the user-defined data associated with the
snapshot.  If the callback returns 'true', tracing_snapshot_cond()
will then actually take the snapshot and return.

This scheme allows for flexibility in snapshot implementations - for
example, by implementing slightly different update() callbacks,
snapshots can be taken in situations where the user is only interested
in taking a snapshot when a new maximum in hit versus when a value
changes in any way at all.  Future patches will demonstrate both
cases.

Link: 
http://lkml.kernel.org/r/1bea07828d5fd6864a585f83b1eed47ce097eb45.1550100284.git.tom.zanu...@linux.intel.com

Signed-off-by: Tom Zanussi 
Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/trace.c  | 192 +-
 kernel/trace/trace.h  |  56 -
 kernel/trace/trace_sched_wakeup.c |   2 +-
 3 files changed, 244 insertions(+), 6 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index b477926ac3bc..9f4d56f74b46 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -894,7 +894,7 @@ int __trace_bputs(unsigned long ip, const char *str)
 EXPORT_SYMBOL_GPL(__trace_bputs);
 
 #ifdef CONFIG_TRACER_SNAPSHOT
-void tracing_snapshot_instance(struct trace_array *tr)
+void tracing_snapshot_instance_cond(struct trace_array *tr, void *cond_data)
 {
struct tracer *tracer = tr->current_trace;
unsigned long flags;
@@ -920,10 +920,15 @@ void tracing_snapshot_instance(struct trace_array *tr)
}
 
local_irq_save(flags);
-   update_max_tr(tr, current, smp_processor_id());
+   update_max_tr(tr, current, smp_processor_id(), cond_data);
local_irq_restore(flags);
 }
 
+void tracing_snapshot_instance(struct trace_array *tr)
+{
+   tracing_snapshot_instance_cond(tr, NULL);
+}
+
 /**
  * tracing_snapshot - take a snapshot of the current buffer.
  *
@@ -946,6 +951,54 @@ void tracing_snapshot(void)
 }
 EXPORT_SYMBOL_GPL(tracing_snapshot);
 
+/**
+ * tracing_snapshot_cond - conditionally take a snapshot of the current buffer.
+ * @tr:The tracing instance to snapshot
+ * @cond_data: The data to be tested conditionally, and possibly saved
+ *
+ * This is the same as tracing_snapshot() except that the snapshot is
+ * conditional - the snapshot will only happen if the
+ * cond_snapshot.update() implementation receiving the cond_data
+ * returns true, which means that the trace array's cond_snapshot
+ * update() operation used the cond_data to determine whether the
+ * snapshot should be taken, and if it was, presumably saved it along
+ * with the snapshot.
+ */
+void tracing_snapshot_cond(struct trace_array *tr, void *cond_data)
+{
+   tracing_snapshot_instance_cond(tr, cond_data);
+}
+EXPORT_SYMBOL_GPL(tracing_snapshot_cond);
+
+/**
+ * tracing_snapshot_cond_data - get the user data associated with a snapshot
+ * @tr:The tracing instance
+ *
+ * When the user enables a conditional snapshot using
+ * tracing_snapshot_cond_enable(), the user-defined cond_data is saved
+ * with the snapshot.  This accessor is used to retrieve it.
+ *
+ * Should not be called from cond_snapshot.update(), since it takes
+ * the tr->max_lock lock, which the code calling
+ * cond_snapshot.update() has already done.
+ *
+ * Returns the cond_data associated with the trace array's snapshot.
+ */
+void *tracing_cond_snapshot_data(struct trace_array *tr)
+{
+   void *cond_data = NULL;
+
+   arch_spin_lock(>max_lock);
+
+   if (tr->cond_snapshot)
+   cond_data = tr->cond_snapshot->cond_data;
+
+   arch_spin_unlock(>max_lock);
+
+   return cond_data;
+}

[for-next][PATCH 21/29] tracing: Add hist trigger onchange() handler

2019-02-20 Thread Steven Rostedt

From: Tom Zanussi 

Add support for a hist:onchange($var) handler, similar to the onmax()
handler but triggering whenever there's any change in $var, not just a
max.

Link: 
http://lkml.kernel.org/r/dfbc7e4ada242603e9ec3f049b5ad076a07dfd03.1550100284.git.tom.zanu...@linux.intel.com

Signed-off-by: Tom Zanussi 
Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/trace.c |  3 +-
 kernel/trace/trace_events_hist.c | 58 +++-
 2 files changed, 52 insertions(+), 9 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index dd60c14a0fb0..be6779f963c6 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4915,7 +4915,8 @@ static const char readme_msg[] =
"\t.\n\n"
"\tThe available handlers are:\n\n"
"\tonmatch(matching.event)  - invoke on addition or update\n"
-   "\tonmax(var)   - invoke if var exceeds current 
max\n\n"
+   "\tonmax(var)   - invoke if var exceeds current 
max\n"
+   "\tonchange(var)- invoke action if var changes\n\n"
"\tThe available actions are:\n\n"
"\t(param list)- generate synthetic 
event\n"
"\tsave(field,...)  - save current event 
fields\n"
diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index 571937a268a3..2f3323ca9d24 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -391,6 +391,7 @@ typedef bool (*check_track_val_fn_t) (u64 track_val, u64 
var_val);
 enum handler_id {
HANDLER_ONMATCH = 1,
HANDLER_ONMAX,
+   HANDLER_ONCHANGE,
 };
 
 enum action_id {
@@ -1989,7 +1990,8 @@ static int parse_action(char *str, struct 
hist_trigger_attrs *attrs)
return ret;
 
if ((str_has_prefix(str, "onmatch(")) ||
-   (str_has_prefix(str, "onmax("))) {
+   (str_has_prefix(str, "onmax(")) ||
+   (str_has_prefix(str, "onchange("))) {
attrs->action_str[attrs->n_actions] = kstrdup(str, GFP_KERNEL);
if (!attrs->action_str[attrs->n_actions]) {
ret = -ENOMEM;
@@ -3481,6 +3483,14 @@ static bool check_track_val_max(u64 track_val, u64 
var_val)
return true;
 }
 
+static bool check_track_val_changed(u64 track_val, u64 var_val)
+{
+   if (var_val == track_val)
+   return false;
+
+   return true;
+}
+
 static u64 get_track_val(struct hist_trigger_data *hist_data,
 struct tracing_map_elt *elt,
 struct action_data *data)
@@ -3640,6 +3650,8 @@ static void track_data_print(struct seq_file *m,
 
if (data->handler == HANDLER_ONMAX)
seq_printf(m, "\n\tmax: %10llu", track_val);
+   else if (data->handler == HANDLER_ONCHANGE)
+   seq_printf(m, "\n\tchanged: %10llu", track_val);
 
if (data->action == ACTION_SNAPSHOT)
return;
@@ -3727,14 +3739,14 @@ static int track_data_create(struct hist_trigger_data 
*hist_data,
 
track_data_var_str = data->track_data.var_str;
if (track_data_var_str[0] != '$') {
-   hist_err("For onmax(x), x must be a variable: ", 
track_data_var_str);
+   hist_err("For onmax(x) or onchange(x), x must be a variable: ", 
track_data_var_str);
return -EINVAL;
}
track_data_var_str++;
 
var_field = find_target_event_var(hist_data, NULL, NULL, 
track_data_var_str);
if (!var_field) {
-   hist_err("Couldn't find onmax variable: ", track_data_var_str);
+   hist_err("Couldn't find onmax or onchange variable: ", 
track_data_var_str);
return -EINVAL;
}
 
@@ -3751,6 +3763,14 @@ static int track_data_create(struct hist_trigger_data 
*hist_data,
ret = PTR_ERR(track_var);
goto out;
}
+
+   if (data->handler == HANDLER_ONCHANGE)
+   track_var = create_var(hist_data, file, "__change", 
sizeof(u64), "u64");
+   if (IS_ERR(track_var)) {
+   hist_err("Couldn't create onchange variable: ", "__change");
+   ret = PTR_ERR(track_var);
+   goto out;
+   }
data->track_data.track_var = track_var;
 
ret = action_create(hist_data, data);
@@ -3830,6 +3850,8 @@ static int action_parse(char *str, struct action_data 
*data,
 
if (handler == HANDLER_ONMAX)
data->track_data.check_val = check_track_val_max;
+   else if (handler == HANDLER_ONCHANGE)
+   data->track_data.check_val = check_track_val_changed;
else {
hist_err("action parsing: Handler doesn't support 
action: ", action_name);
ret = -EINVAL;
@@ -3850,6 +3872,8 @@ static int action_parse(char *str, struct action_data

[for-next][PATCH 19/29] tracing: Add hist trigger snapshot() action

2019-02-20 Thread Steven Rostedt

From: Tom Zanussi 

Add support for hist:handlerXXX($var).snapshot(), which will take a
snapshot of the current trace buffer whenever handlerXXX is hit.

As a first user, this also adds snapshot() action support for the
onmax() handler i.e. hist:onmax($var).snapshot().

Also, the hist trigger key printing is moved into a separate function
so the snapshot() action can print a histogram key outside the
histogram display - add and use hist_trigger_print_key() for that
purpose.

Link: 
http://lkml.kernel.org/r/2f1a952c0dcd8aca8702ce81269581a692396d45.1550100284.git.tom.zanu...@linux.intel.com

Signed-off-by: Tom Zanussi 
Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/trace.c |   3 +
 kernel/trace/trace_events_hist.c | 266 +--
 2 files changed, 259 insertions(+), 10 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 9f4d56f74b46..dd60c14a0fb0 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4919,6 +4919,9 @@ static const char readme_msg[] =
"\tThe available actions are:\n\n"
"\t(param list)- generate synthetic 
event\n"
"\tsave(field,...)  - save current event 
fields\n"
+#ifdef CONFIG_TRACER_SNAPSHOT
+   "\tsnapshot()   - snapshot the trace 
buffer\n"
+#endif
 #endif
 ;
 
diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index 0515229e5f95..571937a268a3 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -396,6 +396,7 @@ enum handler_id {
 enum action_id {
ACTION_SAVE = 1,
ACTION_TRACE,
+   ACTION_SNAPSHOT,
 };
 
 struct action_data {
@@ -454,6 +455,83 @@ struct action_data {
};
 };
 
+struct track_data {
+   u64 track_val;
+   boolupdated;
+
+   unsigned intkey_len;
+   void*key;
+   struct tracing_map_elt  elt;
+
+   struct action_data  *action_data;
+   struct hist_trigger_data*hist_data;
+};
+
+struct hist_elt_data {
+   char *comm;
+   u64 *var_ref_vals;
+   char *field_var_str[SYNTH_FIELDS_MAX];
+};
+
+struct snapshot_context {
+   struct tracing_map_elt  *elt;
+   void*key;
+};
+
+static void track_data_free(struct track_data *track_data)
+{
+   struct hist_elt_data *elt_data;
+
+   if (!track_data)
+   return;
+
+   kfree(track_data->key);
+
+   elt_data = track_data->elt.private_data;
+   if (elt_data) {
+   kfree(elt_data->comm);
+   kfree(elt_data);
+   }
+
+   kfree(track_data);
+}
+
+static struct track_data *track_data_alloc(unsigned int key_len,
+  struct action_data *action_data,
+  struct hist_trigger_data *hist_data)
+{
+   struct track_data *data = kzalloc(sizeof(*data), GFP_KERNEL);
+   struct hist_elt_data *elt_data;
+
+   if (!data)
+   return ERR_PTR(-ENOMEM);
+
+   data->key = kzalloc(key_len, GFP_KERNEL);
+   if (!data->key) {
+   track_data_free(data);
+   return ERR_PTR(-ENOMEM);
+   }
+
+   data->key_len = key_len;
+   data->action_data = action_data;
+   data->hist_data = hist_data;
+
+   elt_data = kzalloc(sizeof(*elt_data), GFP_KERNEL);
+   if (!elt_data) {
+   track_data_free(data);
+   return ERR_PTR(-ENOMEM);
+   }
+   data->elt.private_data = elt_data;
+
+   elt_data->comm = kzalloc(TASK_COMM_LEN, GFP_KERNEL);
+   if (!elt_data->comm) {
+   track_data_free(data);
+   return ERR_PTR(-ENOMEM);
+   }
+
+   return data;
+}
+
 static char last_hist_cmd[MAX_FILTER_STR_VAL];
 static char hist_err_str[MAX_FILTER_STR_VAL];
 
@@ -1726,12 +1804,6 @@ static struct hist_field *find_event_var(struct 
hist_trigger_data *hist_data,
return hist_field;
 }
 
-struct hist_elt_data {
-   char *comm;
-   u64 *var_ref_vals;
-   char *field_var_str[SYNTH_FIELDS_MAX];
-};
-
 static u64 hist_field_var_ref(struct hist_field *hist_field,
  struct tracing_map_elt *elt,
  struct ring_buffer_event *rbe,
@@ -3452,6 +3524,112 @@ static bool check_track_val(struct tracing_map_elt *elt,
return data->track_data.check_val(track_val, var_val);
 }
 
+#ifdef CONFIG_TRACER_SNAPSHOT
+static bool cond_snapshot_update(struct trace_array *tr, void *cond_data)
+{
+   /* called with tr->max_lock held */
+   struct track_data *track_data = tr->cond_snapshot->cond_data;
+   struct hist_elt_data *elt_data, *track_elt_data;
+   struct snapshot_context *context = cond_data;
+   u64 track_val;
+
+   if (!track_data)
+   return

[for-next][PATCH 09/29] ring-buffer: Remove unused function ring_buffer_page_len()

2019-02-20 Thread Steven Rostedt

From: Miroslav Benes 

Commit 6b7e633fe9c2 ("tracing: Remove extra zeroing out of the ring
buffer page") removed the only caller of ring_buffer_page_len(). The
function is now unused and may be removed.

Link: http://lkml.kernel.org/r/20181228133847.106177-1-mbe...@suse.cz

Signed-off-by: Miroslav Benes 
Signed-off-by: Steven Rostedt (VMware) 
---
 include/linux/ring_buffer.h |  2 --
 kernel/trace/ring_buffer.c  | 14 --
 2 files changed, 16 deletions(-)

diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index 5b9ae62272bb..f1429675f252 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -187,8 +187,6 @@ void ring_buffer_set_clock(struct ring_buffer *buffer,
 void ring_buffer_set_time_stamp_abs(struct ring_buffer *buffer, bool abs);
 bool ring_buffer_time_stamp_abs(struct ring_buffer *buffer);
 
-size_t ring_buffer_page_len(void *page);
-
 size_t ring_buffer_nr_pages(struct ring_buffer *buffer, int cpu);
 size_t ring_buffer_nr_dirty_pages(struct ring_buffer *buffer, int cpu);
 
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 06e864a334bb..9a91479bbbfe 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -353,20 +353,6 @@ static void rb_init_page(struct buffer_data_page *bpage)
local_set(>commit, 0);
 }
 
-/**
- * ring_buffer_page_len - the size of data on the page.
- * @page: The page to read
- *
- * Returns the amount of data on the page, including buffer page header.
- */
-size_t ring_buffer_page_len(void *page)
-{
-   struct buffer_data_page *bpage = page;
-
-   return (local_read(>commit) & ~RB_MISSED_FLAGS)
-   + BUF_PAGE_HDR_SIZE;
-}
-
 /*
  * Also stolen from mm/slob.c. Thanks to Mathieu Desnoyers for pointing
  * this issue out.
-- 
2.20.1

[for-next][PATCH 26/29] tracing: Add hist trigger onchange() handler test case

2019-02-20 Thread Steven Rostedt

From: Tom Zanussi 

Add a test case verifying the basic functionality of the
hist:onchange($var) handler.

Link: 
http://lkml.kernel.org/r/bec87aa8ed7d81794510b3d465096a750c71fce7.1550100284.git.tom.zanu...@linux.intel.com

Signed-off-by: Tom Zanussi 
Acked-by: Masami Hiramatsu 
Signed-off-by: Steven Rostedt (VMware) 
---
 .../trigger-onchange-action-hist.tc   | 28 +++
 1 file changed, 28 insertions(+)
 create mode 100644 
tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-onchange-action-hist.tc

diff --git 
a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-onchange-action-hist.tc
 
b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-onchange-action-hist.tc
new file mode 100644
index ..064a284e4e75
--- /dev/null
+++ 
b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-onchange-action-hist.tc
@@ -0,0 +1,28 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+# description: event trigger - test inter-event histogram trigger onchange 
action
+
+fail() { #msg
+echo $1
+exit_fail
+}
+
+if [ ! -f set_event ]; then
+echo "event tracing is not supported"
+exit_unsupported
+fi
+
+grep -q "onchange(var)" README || exit_unsupported # version issue
+
+echo "Test onchange action"
+
+echo 'hist:keys=comm:newprio=prio:onchange($newprio).save(comm,prio) if 
comm=="ping"' >> /sys/kernel/debug/tracing/events/sched/sched_waking/trigger
+
+ping $LOCALHOST -c 3
+nice -n 1 ping $LOCALHOST -c 3
+
+if ! grep -q "changed:" events/sched/sched_waking/hist; then
+fail "Failed to create onchange action inter-event histogram"
+fi
+
+exit 0
-- 
2.20.1

[for-next][PATCH 14/29] tracing: Refactor hist trigger action code

2019-02-20 Thread Steven Rostedt

From: Tom Zanussi 

The hist trigger action code currently implements two essentially
hard-coded pairs of 'actions' - onmax(), which tracks a variable and
saves some event fields when a max is hit, and onmatch(), which is
hard-coded to generate a synthetic event.

These hardcoded pairs (track max/save fields and detect match/generate
synthetic event) should really be decoupled into separate components
that can then be arbitrarily combined.  The first component of each
pair (track max/detect match) is called a 'handler' in the new code,
while the second component (save fields/generate synthetic event) is
called an 'action' in this scheme.

This change refactors the action code to reflect this split by adding
two handlers, HANDLER_ONMATCH and HANDLER_ONMAX, along with two
actions, ACTION_SAVE and ACTION_TRACE.

The new code combines them to produce the existing ONMATCH/TRACE and
ONMAX/SAVE functionality, but doesn't implement the other combinations
now possible.  Future patches will expand these to further useful
cases, such as ONMAX/TRACE, as well as add additional handlers and
actions such as ONCHANGE and SNAPSHOT.

Also, add abbreviated documentation for handlers and actions to
README.

Link: 
http://lkml.kernel.org/r/98bfdd48c1b4ff29fc5766442f99f5bc3c34b76b.1550100284.git.tom.zanu...@linux.intel.com

Signed-off-by: Tom Zanussi 
Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/trace_events_hist.c | 407 ++-
 1 file changed, 238 insertions(+), 169 deletions(-)

diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index 449d90cfa151..dfaaad582797 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -313,9 +313,9 @@ struct hist_trigger_data {
struct field_var_hist   *field_var_hists[SYNTH_FIELDS_MAX];
unsigned intn_field_var_hists;
 
-   struct field_var*max_vars[SYNTH_FIELDS_MAX];
-   unsigned intn_max_vars;
-   unsigned intn_max_var_str;
+   struct field_var*save_vars[SYNTH_FIELDS_MAX];
+   unsigned intn_save_vars;
+   unsigned intn_save_var_str;
 };
 
 static int synth_event_create(int argc, const char **argv);
@@ -383,11 +383,25 @@ struct action_data;
 
 typedef void (*action_fn_t) (struct hist_trigger_data *hist_data,
 struct tracing_map_elt *elt, void *rec,
-struct ring_buffer_event *rbe,
+struct ring_buffer_event *rbe, void *key,
 struct action_data *data, u64 *var_ref_vals);
 
+enum handler_id {
+   HANDLER_ONMATCH = 1,
+   HANDLER_ONMAX,
+};
+
+enum action_id {
+   ACTION_SAVE = 1,
+   ACTION_TRACE,
+};
+
 struct action_data {
+   enum handler_id handler;
+   enum action_id  action;
+   char*action_name;
action_fn_t fn;
+
unsigned intn_params;
char*params[SYNTH_FIELDS_MAX];
 
@@ -404,13 +418,11 @@ struct action_data {
unsigned intvar_ref_idx;
char*match_event;
char*match_event_system;
-   char*synth_event_name;
struct synth_event  *synth_event;
} onmatch;
 
struct {
char*var_str;
-   char*fn_name;
unsigned intmax_var_ref_idx;
struct hist_field   *max_var;
struct hist_field   *var;
@@ -1078,7 +1090,7 @@ static struct synth_event *alloc_synth_event(const char 
*name, int n_fields,
 
 static void action_trace(struct hist_trigger_data *hist_data,
 struct tracing_map_elt *elt, void *rec,
-struct ring_buffer_event *rbe,
+struct ring_buffer_event *rbe, void *key,
 struct action_data *data, u64 *var_ref_vals)
 {
struct synth_event *event = data->onmatch.synth_event;
@@ -1644,7 +1656,7 @@ find_match_var(struct hist_trigger_data *hist_data, char 
*var_name)
for (i = 0; i < hist_data->n_actions; i++) {
struct action_data *data = hist_data->actions[i];
 
-   if (data->fn == action_trace) {
+   if (data->handler == HANDLER_ONMATCH) {
char *system = data->onmatch.match_event_system;
char *event_name = data->onmatch.match_event;
 
@@ -2076,7 +2088,7 @@ static int hist_trigger_elt_data_alloc(struct 
tracing_map_elt *elt)
}
}
 
-   n_str = hist_data->n_field_var_str +

[for-next][PATCH 16/29] tracing: Split up onmatch action data

2019-02-20 Thread Steven Rostedt

From: Tom Zanussi 

Currently, the onmatch action data binds the onmatch action to data
related to synthetic event generation.  Since we want to allow the
onmatch handler to potentially invoke a different action, and because
we expect other handlers to generate synthetic events, we need to
separate the data related to these two functions.

Also rename the onmatch data to something more descriptive, and create
and use common action data destroy function.

Link: 
http://lkml.kernel.org/r/b9abbf9aae69fe3920cdc8ddbcaad544dd258d78.1550100284.git.tom.zanu...@linux.intel.com

Signed-off-by: Tom Zanussi 
Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/trace.c |  12 +++-
 kernel/trace/trace_events_hist.c | 103 ---
 2 files changed, 63 insertions(+), 52 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index b583ff7656bb..b477926ac3bc 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4700,6 +4700,7 @@ static const char readme_msg[] =
"\t[:size=#entries]\n"
"\t[:pause][:continue][:clear]\n"
"\t[:name=histname1]\n"
+   "\t[:.]\n"
"\t[if ]\n\n"
"\tWhen a matching event is hit, an entry is added to a hash\n"
"\ttable using the key(s) and value(s) named, and the value of a\n"
@@ -4741,7 +4742,16 @@ static const char readme_msg[] =
"\tThe enable_hist and disable_hist triggers can be used to\n"
"\thave one event conditionally start and stop another event's\n"
"\talready-attached hist trigger.  The syntax is analagous to\n"
-   "\tthe enable_event and disable_event triggers.\n"
+   "\tthe enable_event and disable_event triggers.\n\n"
+   "\tHist trigger handlers and actions are executed whenever a\n"
+   "\ta histogram entry is added or updated.  They take the form:\n\n"
+   "\t.\n\n"
+   "\tThe available handlers are:\n\n"
+   "\tonmatch(matching.event)  - invoke on addition or update\n"
+   "\tonmax(var)   - invoke if var exceeds current 
max\n\n"
+   "\tThe available actions are:\n\n"
+   "\t(param list)- generate synthetic 
event\n"
+   "\tsave(field,...)  - save current event 
fields\n"
 #endif
 ;
 
diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index dfaaad582797..0b843ecef547 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -405,21 +405,22 @@ struct action_data {
unsigned intn_params;
char*params[SYNTH_FIELDS_MAX];
 
+   /*
+* When a histogram trigger is hit, the values of any
+* references to variables, including variables being passed
+* as parameters to synthetic events, are collected into a
+* var_ref_vals array.  This var_ref_idx is the index of the
+* first param in the array to be passed to the synthetic
+* event invocation.
+*/
+   unsigned intvar_ref_idx;
+   struct synth_event  *synth_event;
+
union {
struct {
-   /*
-* When a histogram trigger is hit, the values of any
-* references to variables, including variables being 
passed
-* as parameters to synthetic events, are collected 
into a
-* var_ref_vals array.  This var_ref_idx is the index 
of the
-* first param in the array to be passed to the 
synthetic
-* event invocation.
-*/
-   unsigned intvar_ref_idx;
-   char*match_event;
-   char*match_event_system;
-   struct synth_event  *synth_event;
-   } onmatch;
+   char*event;
+   char*event_system;
+   } match_data;
 
struct {
char*var_str;
@@ -1093,9 +1094,9 @@ static void action_trace(struct hist_trigger_data 
*hist_data,
 struct ring_buffer_event *rbe, void *key,
 struct action_data *data, u64 *var_ref_vals)
 {
-   struct synth_event *event = data->onmatch.synth_event;
+   struct synth_event *event = data->synth_event;
 
-   trace_synth(event, var_ref_vals, data->onmatch.var_ref_idx);
+   trace_synth(event, var_ref_vals, data->var_ref_idx);
 }
 
 struct hist_var_data {
@@ -1657,8 +1658,8 @@ find_match_var(struct hist_trigger_data *hist_data, char 
*var_name)
struct action_data *data = hist_data->actions[i];
 
if

[for-next][PATCH 17/29] tracing: Generalize hist trigger onmax and save action

2019-02-20 Thread Steven Rostedt

From: Tom Zanussi 

The action refactor code allowed actions and handlers to be separated,
but the existing onmax handler and save action code is still not
flexible enough to handle arbitrary coupling.  This change generalizes
them and in the process makes additional handlers and actions easier
to implement.

The onmax action can be broken up and thought of as two separate
components - a variable to be tracked (the parameter given to the
onmax($var_to_track) function) and an invisible variable created to
save the ongoing result of doing something with that variable, such as
saving the max value of that variable so far seen.

Separating it out like this and renaming it appropriately allows us to
use the same code for similar tracking functions such as
onchange($var_to_track), which would just track the last value seen
rather than the max seen so far, which is useful in some situations.

Additionally, because different handlers and actions may want to save
and access data differently e.g. save and retrieve tracking values as
local variables vs something more global, save_val() and get_val()
interface functions are introduced and max-specific implementations
are used instead.

The same goes for the code that checks whether a maximum has been hit
- a generic check_val() interface and max-checking implementation is
used instead, which allows future patches to make use of he same code
using their own implemetations of similar functionality.

Link: 
http://lkml.kernel.org/r/980ea73dd8e3f36db3d646f99652f8fed42b77d4.1550100284.git.tom.zanu...@linux.intel.com

Signed-off-by: Tom Zanussi 
Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/trace_events_hist.c | 236 +--
 1 file changed, 160 insertions(+), 76 deletions(-)

diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index 0b843ecef547..0515229e5f95 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -386,6 +386,8 @@ typedef void (*action_fn_t) (struct hist_trigger_data 
*hist_data,
 struct ring_buffer_event *rbe, void *key,
 struct action_data *data, u64 *var_ref_vals);
 
+typedef bool (*check_track_val_fn_t) (u64 track_val, u64 var_val);
+
 enum handler_id {
HANDLER_ONMATCH = 1,
HANDLER_ONMAX,
@@ -423,15 +425,35 @@ struct action_data {
} match_data;
 
struct {
+   /*
+* var_str contains the $-unstripped variable
+* name referenced by var_ref, and used when
+* printing the action.  Because var_ref
+* creation is deferred to create_actions(),
+* we need a per-action way to save it until
+* then, thus var_str.
+*/
char*var_str;
-   unsigned intmax_var_ref_idx;
-   struct hist_field   *max_var;
-   struct hist_field   *var;
-   } onmax;
+
+   /*
+* var_ref refers to the variable being
+* tracked e.g onmax($var).
+*/
+   struct hist_field   *var_ref;
+
+   /*
+* track_var contains the 'invisible' tracking
+* variable created to keep the current
+* e.g. max value.
+*/
+   struct hist_field   *track_var;
+
+   check_track_val_fn_tcheck_val;
+   action_fn_t save_data;
+   } track_data;
};
 };
 
-
 static char last_hist_cmd[MAX_FILTER_STR_VAL];
 static char hist_err_str[MAX_FILTER_STR_VAL];
 
@@ -3238,10 +3260,10 @@ static void update_field_vars(struct hist_trigger_data 
*hist_data,
hist_data->n_field_vars, 0);
 }
 
-static void update_max_vars(struct hist_trigger_data *hist_data,
-   struct tracing_map_elt *elt,
-   struct ring_buffer_event *rbe,
-   void *rec)
+static void save_track_data_vars(struct hist_trigger_data *hist_data,
+struct tracing_map_elt *elt, void *rec,
+struct ring_buffer_event *rbe, void *key,
+struct action_data *data, u64 *var_ref_vals)
 {
__update_field_vars(elt, rbe, rec, hist_data->save_vars,
hist_data->n_save_vars, hist_data->n_field_var_str);
@@ -3379,14 +3401,67 @@ create_target_field_var(struct hist_trigger_data 
*target_hist_data,
return create_field_var(target_hist_data, file, var_name);
 }
 
-static void onmax_print(struct seq_file *m,
-

[for-next][PATCH 27/29] tracing: Add alternative synthetic event trace action test case

2019-02-20 Thread Steven Rostedt

From: Tom Zanussi 

Add a test case for the alternative trace(http://lkml.kernel.org/r/0616d18423ab1dfdbf333bce9c92ac4fa0779207.1550100284.git.tom.zanu...@linux.intel.com

Acked-by: Masami Hiramatsu 
Signed-off-by: Tom Zanussi 
Signed-off-by: Steven Rostedt (VMware) 
---
 .../inter-event/trigger-trace-action-hist.tc  | 42 +++
 1 file changed, 42 insertions(+)
 create mode 100644 
tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-trace-action-hist.tc

diff --git 
a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-trace-action-hist.tc
 
b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-trace-action-hist.tc
new file mode 100644
index ..8021d60aafec
--- /dev/null
+++ 
b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-trace-action-hist.tc
@@ -0,0 +1,42 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+# description: event trigger - test inter-event histogram trigger trace action
+
+fail() { #msg
+echo $1
+exit_fail
+}
+
+if [ ! -f set_event ]; then
+echo "event tracing is not supported"
+exit_unsupported
+fi
+
+if [ ! -f synthetic_events ]; then
+echo "synthetic event is not supported"
+exit_unsupported
+fi
+
+grep -q "trace(" README || exit_unsupported # version issue
+
+echo "Test create synthetic event"
+
+echo 'wakeup_latency  u64 lat pid_t pid char comm[16]' > synthetic_events
+if [ ! -d events/synthetic/wakeup_latency ]; then
+fail "Failed to create wakeup_latency synthetic event"
+fi
+
+echo "Test create histogram for synthetic event using trace action"
+echo "Test histogram variables,simple expression support and trace action"
+
+echo 'hist:keys=pid:ts0=common_timestamp.usecs if comm=="ping"' > 
events/sched/sched_wakeup/trigger
+echo 
'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0:onmatch(sched.sched_wakeup).trace(wakeup_latency,$wakeup_lat,next_pid,next_comm)
 if next_comm=="ping"' > events/sched/sched_switch/trigger
+echo 'hist:keys=comm,pid,lat:wakeup_lat=lat:sort=lat' > 
events/synthetic/wakeup_latency/trigger
+
+ping $LOCALHOST -c 5
+
+if ! grep -q "ping" events/synthetic/wakeup_latency/hist; then
+fail "Failed to create trace action inter-event histogram"
+fi
+
+exit 0
-- 
2.20.1

[for-next][PATCH 23/29] tracing: Add alternative synthetic event trace action syntax

2019-02-20 Thread Steven Rostedt

From: Tom Zanussi 

Add a 'trace(synthetic_event_name, params)' alternative to
synthetic_event_name(params).

Currently, the syntax used for generating synthetic events is to
invoke synthetic_event_name(params) i.e. use the synthetic event name
as a function call.

Users requested a new form that more explicitly shows that the
synthetic event is in effect being traced.  In this version, a new
'trace()' keyword is used, and the synthetic event name is passed in
as the first argument.

In addition, for the sake of consistency with other actions, change
the documention to emphasize the trace() form over the function-call
form, which remains documented as equivalent.

Link: 
http://lkml.kernel.org/r/d082773e50232a001480cf837679a1e01c1a2eb7.1550100284.git.tom.zanu...@linux.intel.com

Signed-off-by: Tom Zanussi 
Signed-off-by: Steven Rostedt (VMware) 
---
 Documentation/trace/histogram.rst | 54 +--
 kernel/trace/trace.c  |  2 +-
 kernel/trace/trace_events_hist.c  | 42 +---
 3 files changed, 76 insertions(+), 22 deletions(-)

diff --git a/Documentation/trace/histogram.rst 
b/Documentation/trace/histogram.rst
index 79476c906b1a..0ea59d45aef1 100644
--- a/Documentation/trace/histogram.rst
+++ b/Documentation/trace/histogram.rst
@@ -1873,31 +1873,45 @@ The available handlers are:
 
 The available actions are:
 
-  - (param list) - generate synthetic event
+  - trace(,param list)   - generate synthetic event
   - save(field,...)- save current event fields
   - snapshot() - snapshot the trace buffer
 
 The following commonly-used handler.action pairs are available:
 
-  - onmatch(matching.event).(param list)
+  - onmatch(matching.event).trace(,param list)
 
-The 'onmatch(matching.event).(params)' hist
-trigger action is invoked whenever an event matches and the
-histogram entry would be added or updated.  It causes the named
-synthetic event to be generated with the values given in the
+The 'onmatch(matching.event).trace(,param
+list)' hist trigger action is invoked whenever an event matches
+and the histogram entry would be added or updated.  It causes the
+named synthetic event to be generated with the values given in the
 'param list'.  The result is the generation of a synthetic event
 that consists of the values contained in those variables at the
-time the invoking event was hit.
-
-The 'param list' consists of one or more parameters which may be
-either variables or fields defined on either the 'matching.event'
-or the target event.  The variables or fields specified in the
-param list may be either fully-qualified or unqualified.  If a
-variable is specified as unqualified, it must be unique between
-the two events.  A field name used as a param can be unqualified
-if it refers to the target event, but must be fully qualified if
-it refers to the matching event.  A fully-qualified name is of the
-form 'system.event_name.$var_name' or 'system.event_name.field'.
+time the invoking event was hit.  For example, if the synthetic
+event name is 'wakeup_latency', a wakeup_latency event is
+generated using onmatch(event).trace(wakeup_latency,arg1,arg2).
+
+There is also an equivalent alternative form available for
+generating synthetic events.  In this form, the synthetic event
+name is used as if it were a function name.  For example, using
+the 'wakeup_latency' synthetic event name again, the
+wakeup_latency event would be generated by invoking it as if it
+were a function call, with the event field values passed in as
+arguments: onmatch(event).wakeup_latency(arg1,arg2).  The syntax
+for this form is:
+
+  onmatch(matching.event).(param list)
+
+In either case, the 'param list' consists of one or more
+parameters which may be either variables or fields defined on
+either the 'matching.event' or the target event.  The variables or
+fields specified in the param list may be either fully-qualified
+or unqualified.  If a variable is specified as unqualified, it
+must be unique between the two events.  A field name used as a
+param can be unqualified if it refers to the target event, but
+must be fully qualified if it refers to the matching event.  A
+fully-qualified name is of the form 'system.event_name.$var_name'
+or 'system.event_name.field'.
 
 The 'matching.event' specification is simply the fully qualified
 event name of the event that matches the target event for the
@@ -1928,6 +1942,12 @@ The following commonly-used handler.action pairs are 
available:
   wakeup_new_test($testpid) if comm=="cyclictest"' >> \
   /sys/kernel/debug/tracing/events/sched/sched_wakeup_new/trigger
 
+Or, equivalently, using the 'trace' keyword syntax:
+
+# echo

[for-next][PATCH 28/29] tracing: Add hist trigger action expected fail test case

2019-02-20 Thread Steven Rostedt

From: Tom Zanussi 

Add a test case verifying that basic action combinations fail as
expected.

Link: 
http://lkml.kernel.org/r/1790bf93e01dbdfa1b4af945f42147d92bd565aa.1550100284.git.tom.zanu...@linux.intel.com

Acked-by: Masami Hiramatsu 
Signed-off-by: Tom Zanussi 
Signed-off-by: Steven Rostedt (VMware) 
---
 .../inter-event/trigger-action-hist-xfail.tc  | 30 +++
 1 file changed, 30 insertions(+)
 create mode 100644 
tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-action-hist-xfail.tc

diff --git 
a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-action-hist-xfail.tc
 
b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-action-hist-xfail.tc
new file mode 100644
index ..1221240f8cf6
--- /dev/null
+++ 
b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-action-hist-xfail.tc
@@ -0,0 +1,30 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+# description: event trigger - test inter-event histogram trigger expected 
fail actions
+
+fail() { #msg
+echo $1
+exit_fail
+}
+
+if [ ! -f set_event ]; then
+echo "event tracing is not supported"
+exit_unsupported
+fi
+
+if [ ! -f snapshot ]; then
+echo "snapshot is not supported"
+exit_unsupported
+fi
+
+grep -q "snapshot()" README || exit_unsupported # version issue
+
+echo "Test expected snapshot action failure"
+
+echo 'hist:keys=comm:onmatch(sched.sched_wakeup).snapshot()' >> 
/sys/kernel/debug/tracing/events/sched/sched_waking/trigger && exit_fail
+
+echo "Test expected save action failure"
+
+echo 'hist:keys=comm:onmatch(sched.sched_wakeup).save(comm,prio)' >> 
/sys/kernel/debug/tracing/events/sched/sched_waking/trigger && exit_fail
+
+exit_xfail
-- 
2.20.1

[for-next][PATCH 29/29] tracing: Comment why cond_snapshot is checked outside of max_lock protection

2019-02-20 Thread Steven Rostedt

From: "Steven Rostedt (VMware)" 

Before setting tr->cond_snapshot, it must be NULL before it can be updated.
It can go to NULL when a trace event hist trigger is created or removed, and
can only be modified under the max_lock spin lock. But because it can only
be set to something other than NULL under both the max_lock spin lock as
well as the trace_types_lock, we can perform the check if it is not NULL
only under the trace_types_lock and fail out without having to grab the
max_lock spin lock.

This is very subtle, and deserves a comment.

Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/trace.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 0460cc0f28fd..2cf3c747a357 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -1116,6 +1116,14 @@ int tracing_snapshot_cond_enable(struct trace_array *tr, 
void *cond_data,
goto fail_unlock;
}
 
+   /*
+* The cond_snapshot can only change to NULL without the
+* trace_types_lock. We don't care if we race with it going
+* to NULL, but we want to make sure that it's not set to
+* something other than NULL when we get here, which we can
+* do safely with only holding the trace_types_lock and not
+* having to take the max_lock.
+*/
if (tr->cond_snapshot) {
ret = -EBUSY;
goto fail_unlock;
-- 
2.20.1

[for-next][PATCH 24/29] tracing: Add SPDX license GPL-2.0 license identifier to inter-event testcases

2019-02-20 Thread Steven Rostedt

From: Tom Zanussi 

Apparently this directory was missed in the license cleanup process -
add the missing identifiers to the trigger/inter-event test cases.

Link: 
http://lkml.kernel.org/r/6f9828c2cfb0b378ebd217a39a1b44f063fc17fb.1550100284.git.tom.zanu...@linux.intel.com

Signed-off-by: Tom Zanussi 
Signed-off-by: Steven Rostedt (VMware) 
---
 .../test.d/trigger/inter-event/trigger-extended-error-support.tc | 1 +
 .../test.d/trigger/inter-event/trigger-field-variable-support.tc | 1 +
 .../trigger/inter-event/trigger-inter-event-combined-hist.tc | 1 +
 .../test.d/trigger/inter-event/trigger-multi-actions-accept.tc   | 1 +
 .../test.d/trigger/inter-event/trigger-onmatch-action-hist.tc| 1 +
 .../trigger/inter-event/trigger-onmatch-onmax-action-hist.tc | 1 +
 .../test.d/trigger/inter-event/trigger-onmax-action-hist.tc  | 1 +
 .../trigger/inter-event/trigger-synthetic-event-createremove.tc  | 1 +
 8 files changed, 8 insertions(+)

diff --git 
a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-extended-error-support.tc
 
b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-extended-error-support.tc
index 401104344593..9912616a8672 100644
--- 
a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-extended-error-support.tc
+++ 
b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-extended-error-support.tc
@@ -1,4 +1,5 @@
 #!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
 # description: event trigger - test extended error support
 
 
diff --git 
a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-field-variable-support.tc
 
b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-field-variable-support.tc
index f59b2a9a1f22..77be6e1f6e7b 100644
--- 
a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-field-variable-support.tc
+++ 
b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-field-variable-support.tc
@@ -1,4 +1,5 @@
 #!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
 # description: event trigger - test field variable support
 
 fail() { #msg
diff --git 
a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-inter-event-combined-hist.tc
 
b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-inter-event-combined-hist.tc
index 524d9ce361e2..f3eb8aacec0e 100644
--- 
a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-inter-event-combined-hist.tc
+++ 
b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-inter-event-combined-hist.tc
@@ -1,4 +1,5 @@
 #!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
 # description: event trigger - test inter-event combined histogram trigger
 
 fail() { #msg
diff --git 
a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-multi-actions-accept.tc
 
b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-multi-actions-accept.tc
index 4ddc546771b5..d281f056f980 100644
--- 
a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-multi-actions-accept.tc
+++ 
b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-multi-actions-accept.tc
@@ -1,4 +1,5 @@
 #!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
 # description: event trigger - test multiple actions on hist trigger
 
 fail() { #msg
diff --git 
a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-onmatch-action-hist.tc
 
b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-onmatch-action-hist.tc
index 39fb65b0cd9f..a708f0e7858a 100644
--- 
a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-onmatch-action-hist.tc
+++ 
b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-onmatch-action-hist.tc
@@ -1,4 +1,5 @@
 #!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
 # description: event trigger - test inter-event histogram trigger onmatch 
action
 
 fail() { #msg
diff --git 
a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-onmatch-onmax-action-hist.tc
 
b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-onmatch-onmax-action-hist.tc
index 81ab3939c96a..dfce6932d8be 100644
--- 
a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-onmatch-onmax-action-hist.tc
+++ 
b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-onmatch-onmax-action-hist.tc
@@ -1,4 +1,5 @@
 #!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
 # description: event trigger - test inter-event histogram trigger 
onmatch-onmax action
 
 fail() { #msg
diff --git 
a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-onmax-action-hist.tc
 
b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-onmax-action-hist.tc
index 1180ab5f0845..0035995c2194 100644
--- 
a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-onmax-action-hist.tc
+++ 
b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-onmax-action-hist.tc
@@ -1,4 +1,5 @@
 #!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
 #

[for-next][PATCH 25/29] tracing: Add hist trigger snapshot() action test case

2019-02-20 Thread Steven Rostedt

From: Tom Zanussi 

Add a test case verifying the basic functionality of the
hist:snapshot() action.

Link: 
http://lkml.kernel.org/r/c0555f462cbfe56dadfec6e63e531e109bd72930.1550100284.git.tom.zanu...@linux.intel.com

Signed-off-by: Tom Zanussi 
Acked-by: Masami Hiramatsu 
Signed-off-by: Steven Rostedt (VMware) 
---
 .../trigger-snapshot-action-hist.tc   | 43 +++
 1 file changed, 43 insertions(+)
 create mode 100644 
tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-snapshot-action-hist.tc

diff --git 
a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-snapshot-action-hist.tc
 
b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-snapshot-action-hist.tc
new file mode 100644
index ..18fff69fc433
--- /dev/null
+++ 
b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-snapshot-action-hist.tc
@@ -0,0 +1,43 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+# description: event trigger - test inter-event histogram trigger snapshot 
action
+
+fail() { #msg
+echo $1
+exit_fail
+}
+
+if [ ! -f set_event ]; then
+echo "event tracing is not supported"
+exit_unsupported
+fi
+
+if [ ! -f snapshot ]; then
+echo "snapshot is not supported"
+exit_unsupported
+fi
+
+grep -q "onchange(var)" README || exit_unsupported # version issue
+
+grep -q "snapshot()" README || exit_unsupported # version issue
+
+echo "Test snapshot action"
+
+echo 1 > /sys/kernel/debug/tracing/events/sched/enable
+
+echo 
'hist:keys=comm:newprio=prio:onchange($newprio).save(comm,prio):onchange($newprio).snapshot()
 if comm=="ping"' >> /sys/kernel/debug/tracing/events/sched/sched_waking/trigger
+
+ping $LOCALHOST -c 3
+nice -n 1 ping $LOCALHOST -c 3
+
+echo 0 > tracing_on
+
+if ! grep -q "changed:" events/sched/sched_waking/hist; then
+fail "Failed to create onchange action inter-event histogram"
+fi
+
+if ! grep -q "comm=ping" snapshot; then
+fail "Failed to create snapshot action inter-event histogram"
+fi
+
+exit 0
-- 
2.20.1

[for-next][PATCH 22/29] tracing: Add hist trigger onchange() handler Documentation

2019-02-20 Thread Steven Rostedt

From: Tom Zanussi 

Add Documentation for the hist:onchange($var) handler.

Link: 
http://lkml.kernel.org/r/ab54b7383b265609fda52648a8fbfbd2631a640f.1550100284.git.tom.zanu...@linux.intel.com

Signed-off-by: Tom Zanussi 
Signed-off-by: Steven Rostedt (VMware) 
---
 Documentation/trace/histogram.rst | 98 +++
 1 file changed, 98 insertions(+)

diff --git a/Documentation/trace/histogram.rst 
b/Documentation/trace/histogram.rst
index 353317bc3825..79476c906b1a 100644
--- a/Documentation/trace/histogram.rst
+++ b/Documentation/trace/histogram.rst
@@ -1869,6 +1869,7 @@ The available handlers are:
 
   - onmatch(matching.event)- invoke action on any addition or update
   - onmax(var) - invoke action if var exceeds current max
+  - onchange(var)  - invoke action if var changes
 
 The available actions are:
 
@@ -2140,6 +2141,103 @@ The following commonly-used handler.action pairs are 
available:
   <...>-2102  [005] d..4   309.875185: sched_wake_idle_without_ipi: cpu=1
  -0 [001] d..3   309.875200: sched_switch: prev_comm=swapper/1 
prev_pid=0 prev_prio=120 prev_state=S ==> next_comm=cyclictest next_pid=2103 
next_prio=19
 
+  - onchange(var).save(field,...)
+
+The 'onchange(var).save(field,...)' hist trigger action is invoked
+whenever the value of 'var' associated with a histogram entry
+changes.
+
+The end result is that the trace event fields specified as the
+onchange.save() params will be saved if 'var' changes for that
+hist trigger entry.  This allows context from the event that
+changed the value to be saved for later reference.  When the
+histogram is displayed, additional fields displaying the saved
+values will be printed.
+
+  - onchange(var).snapshot()
+
+The 'onchange(var).snapshot()' hist trigger action is invoked
+whenever the value of 'var' associated with a histogram entry
+changes.
+
+The end result is that a global snapshot of the trace buffer will
+be saved in the tracing/snapshot file if 'var' changes for any
+hist trigger entry.
+
+Note that in this case the changed value is a global variable
+associated withe current trace instance.  The key of the specific
+trace event that caused the value to change and the global value
+itself are displayed, along with a message stating that a snapshot
+has been taken and where to find it.  The user can use the key
+information displayed to locate the corresponding bucket in the
+histogram for even more detail.
+
+As an example the below defines a hist trigger on the tcp_probe
+event, keyed on dport.  Whenever a tcp_probe event occurs, the
+cwnd field is checked against the current value stored in the
+$cwnd variable.  If the value has changed, a snapshot is taken.
+As part of the setup, all the scheduler and tcp events are also
+enabled, which are the events that will show up in the snapshot
+when it is taken at some point:
+
+# echo 1 > /sys/kernel/debug/tracing/events/sched/enable
+# echo 1 > /sys/kernel/debug/tracing/events/tcp/enable
+
+# echo 'hist:keys=dport:cwnd=snd_cwnd: \
+onchange($cwnd).save(snd_wnd,srtt,rcv_wnd): \
+   onchange($cwnd).snapshot()' >> \
+   /sys/kernel/debug/tracing/events/tcp/tcp_probe/trigger
+
+When the histogram is displayed, for each bucket the tracked value
+and the saved values corresponding to that value are displayed
+following the rest of the fields.
+
+If a snaphot was taken, there is also a message indicating that,
+along with the value and event that triggered the snapshot:
+
+# cat /sys/kernel/debug/tracing/events/tcp/tcp_probe/hist
+  { dport:   1521 } hitcount:  8
+   changed: 10  snd_wnd:  35456  srtt: 154262  rcv_wnd:
  42112
+
+  { dport: 80 } hitcount: 23
+   changed: 10  snd_wnd:  28960  srtt:  19604  rcv_wnd:
  29312
+
+  { dport:   9001 } hitcount:172
+   changed: 10  snd_wnd:  48384  srtt: 260444  rcv_wnd:
  55168
+
+  { dport:443 } hitcount:211
+   changed: 10  snd_wnd:  26960  srtt:  17379  rcv_wnd:
  28800
+
+Snapshot taken (see tracing/snapshot).  Details:
+triggering value { onchange($cwnd) }: 10
+triggered by event with key: { dport: 80 }
+
+Totals:
+Hits: 414
+Entries: 4
+Dropped: 0
+
+In the above case, the event that triggered the snapshot has the
+key with dport == 80.  If you look at the bucket that has 80 as
+the key, you'll find the additional values save()'d along with the
+changed value for that bucket, which should be the same as the
+global changed value (since that was the same value that triggered
+the global snapshot).
+
+And finally, looking at the snapshot data should

[for-next][PATCH 10/29] tracing: Change the function format to display function names by perf

2019-02-20 Thread Steven Rostedt

From: Changbin Du 

Here is an example for this change.

$ sudo perf record -e 'ftrace:function' --filter='ip==schedule'
$ sudo perf report

The output of perf before this patch:

\# Samples: 100  of event 'ftrace:function'
\# Event count (approx.): 100
\#
\# Overhead  Trace output
\#   ..
\#
51.00%   81f6aaa0 <-- 81158e8d
29.00%   81f6aaa0 <-- 8116ccb2
 8.00%   81f6aaa0 <-- 81f6f2ed
 4.00%   81f6aaa0 <-- 811628db
 4.00%   81f6aaa0 <-- 81f6ec5b
 2.00%   81f6aaa0 <-- 81f6f21a
 1.00%   81f6aaa0 <-- 811b04af
 1.00%   81f6aaa0 <-- 8143ce17

After this patch:

\# Samples: 36  of event 'ftrace:function'
\# Event count (approx.): 36
\#
\# Overhead  Trace output
\#   
\#
38.89%   schedule <-- schedule_hrtimeout_range_clock
27.78%   schedule <-- worker_thread
13.89%   schedule <-- schedule_timeout
11.11%   schedule <-- smpboot_thread_fn
 5.56%   schedule <-- rcu_gp_kthread
 2.78%   schedule <-- exit_to_usermode_loop

Link: http://lkml.kernel.org/r/20190209161919.32350-1-changbin...@gmail.com

Signed-off-by: Changbin Du 
Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/trace_entries.h | 41 +---
 1 file changed, 19 insertions(+), 22 deletions(-)

diff --git a/kernel/trace/trace_entries.h b/kernel/trace/trace_entries.h
index 06bb2fd9a56c..fc8e97328e54 100644
--- a/kernel/trace/trace_entries.h
+++ b/kernel/trace/trace_entries.h
@@ -65,7 +65,8 @@ FTRACE_ENTRY_REG(function, ftrace_entry,
__field(unsigned long,  parent_ip   )
),
 
-   F_printk(" %lx <-- %lx", __entry->ip, __entry->parent_ip),
+   F_printk(" %ps <-- %ps",
+(void *)__entry->ip, (void *)__entry->parent_ip),
 
FILTER_TRACE_FN,
 
@@ -83,7 +84,7 @@ FTRACE_ENTRY_PACKED(funcgraph_entry, ftrace_graph_ent_entry,
__field_desc(   int,graph_ent,  depth   
)
),
 
-   F_printk("--> %lx (%d)", __entry->func, __entry->depth),
+   F_printk("--> %ps (%d)", (void *)__entry->func, __entry->depth),
 
FILTER_OTHER
 );
@@ -102,8 +103,8 @@ FTRACE_ENTRY_PACKED(funcgraph_exit, ftrace_graph_ret_entry,
__field_desc(   int,ret,depth   )
),
 
-   F_printk("<-- %lx (%d) (start: %llx  end: %llx) over: %d",
-__entry->func, __entry->depth,
+   F_printk("<-- %ps (%d) (start: %llx  end: %llx) over: %d",
+(void *)__entry->func, __entry->depth,
 __entry->calltime, __entry->rettime,
 __entry->depth),
 
@@ -167,12 +168,6 @@ FTRACE_ENTRY_DUP(wakeup, ctx_switch_entry,
 
 #define FTRACE_STACK_ENTRIES   8
 
-#ifndef CONFIG_64BIT
-# define IP_FMT "%08lx"
-#else
-# define IP_FMT "%016lx"
-#endif
-
 FTRACE_ENTRY(kernel_stack, stack_entry,
 
TRACE_STACK,
@@ -182,12 +177,13 @@ FTRACE_ENTRY(kernel_stack, stack_entry,
__dynamic_array(unsigned long,  caller  )
),
 
-   F_printk("\t=> (" IP_FMT ")\n\t=> (" IP_FMT ")\n\t=> (" IP_FMT ")\n"
-"\t=> (" IP_FMT ")\n\t=> (" IP_FMT ")\n\t=> (" IP_FMT ")\n"
-"\t=> (" IP_FMT ")\n\t=> (" IP_FMT ")\n",
-__entry->caller[0], __entry->caller[1], __entry->caller[2],
-__entry->caller[3], __entry->caller[4], __entry->caller[5],
-__entry->caller[6], __entry->caller[7]),
+   F_printk("\t=> %ps\n\t=> %ps\n\t=> %ps\n"
+"\t=> %ps\n\t=> %ps\n\t=> %ps\n"
+"\t=> %ps\n\t=> %ps\n",
+(void *)__entry->caller[0], (void *)__entry->caller[1],
+(void *)__entry->caller[2], (void *)__entry->caller[3],
+(void *)__entry->caller[4], (void *)__entry->caller[5],
+(void *)__entry->caller[6], (void *)__entry->caller[7]),
 
FILTER_OTHER
 );
@@ -201,12 +197,13 @@ FTRACE_ENTRY(user_stack, userstack_entry,
__array(unsigned long,  caller, FTRACE_STACK_ENTRIES
)
),
 
-   F_printk("\t=> (" IP_FMT ")\n\t=> (" IP_FMT ")\n\t=> (" IP_FMT ")\n"
-"\t=> (" IP_FMT ")\n\t=> (" IP_FMT ")\n\t=> (" IP_FMT ")\n"
-"\t=> (" IP_FMT ")\n\t=> (" IP_FMT ")\n",
-__entry->caller[0], __entry->caller[1], __entry->caller[2],
-__entry->caller[3], __entry->caller[4], __entry->caller[5],
-__entry->caller[6], __entry->caller[7]),
+   F_printk("\t=> %ps\n\t=> %ps\n\t=> %ps\n"
+"\t=> %ps\n\t=> %ps\n\t=> %ps\n"
+"\t=> %ps\n\t=> %ps\n",
+(void *)__entry->caller[0], (void *)__entry->caller[1],
+(void *)__entry->caller[2], (void *)__entry->caller[3],
+(void

[for-next][PATCH 03/29] tracing: Annotate implicit fall through in predicate_parse()

2019-02-20 Thread Steven Rostedt

From: Mathieu Malaterre 

There is a plan to build the kernel with -Wimplicit-fallthrough and
this place in the code produced a warning (W=1).

This commit remove the following warning:

  kernel/trace/trace_events_filter.c:494:8: warning: this statement may fall 
through [-Wimplicit-fallthrough=]

Link: http://lkml.kernel.org/r/20190114203039.16535-2-ma...@debian.org

Signed-off-by: Mathieu Malaterre 
Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/trace_events_filter.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/trace/trace_events_filter.c 
b/kernel/trace/trace_events_filter.c
index 27821480105e..eb694756c4bb 100644
--- a/kernel/trace/trace_events_filter.c
+++ b/kernel/trace/trace_events_filter.c
@@ -495,6 +495,7 @@ predicate_parse(const char *str, int nr_parens, int 
nr_preds,
ptr++;
break;
}
+   /* fall through */
default:
parse_error(pe, FILT_ERR_TOO_MANY_PREDS,
next - str);
-- 
2.20.1

[for-next][PATCH 11/29] ftrace: Allow enabling of filters via index of available_filter_functions

2019-02-20 Thread Steven Rostedt

From: "Steven Rostedt (VMware)" 

Enabling of large number of functions by echoing in a large subset of the
functions in available_filter_functions can take a very long time. The
process requires testing all functions registered by the function tracer
(which is in the 10s of thousands), and doing a kallsyms lookup to convert
the ip address into a name, then comparing that name with the string passed
in.

When a function causes the function tracer to crash the system, a binary
bisect of the available_filter_functions can be done to find the culprit.
But this requires passing in half of the functions in
available_filter_functions over and over again, which makes it basically a
O(n^2) operation. With 40,000 functions, that ends up bing 1,600,000,000
opertions! And enabling this can take over 20 minutes.

As a quick speed up, if a number is passed into one of the filter files,
instead of doing a search, it just enables the function at the corresponding
line of the available_filter_functions file. That is:

 # echo 50 > set_ftrace_filter
 # cat set_ftrace_filter
 x86_pmu_commit_txn

 # head -50 available_filter_functions | tail -1
 x86_pmu_commit_txn

This allows setting of half the available_filter_functions to take place in
less than a second!

 # time seq 2 > set_ftrace_filter
 real0m0.042s
 user0m0.005s
 sys 0m0.015s

 # wc -l set_ftrace_filter
 2 set_ftrace_filter

Signed-off-by: Steven Rostedt (VMware) 
---
 Documentation/trace/ftrace.rst | 38 ++
 kernel/trace/ftrace.c  | 30 +++
 kernel/trace/trace.h   |  1 +
 kernel/trace/trace_events_filter.c |  5 
 4 files changed, 74 insertions(+)

diff --git a/Documentation/trace/ftrace.rst b/Documentation/trace/ftrace.rst
index 6ce2763a2a3e..7c5e6d6ab5d1 100644
--- a/Documentation/trace/ftrace.rst
+++ b/Documentation/trace/ftrace.rst
@@ -233,6 +233,12 @@ of ftrace. Here is a list of some of the key files:
This interface also allows for commands to be used. See the
"Filter commands" section for more details.
 
+   As a speed up, since processing strings can't be quite expensive
+   and requires a check of all functions registered to tracing, instead
+   an index can be written into this file. A number (starting with "1")
+   written will instead select the same corresponding at the line position
+   of the "available_filter_functions" file.
+
   set_ftrace_notrace:
 
This has an effect opposite to that of
@@ -2835,6 +2841,38 @@ Produces::
 
 We can see that there's no more lock or preempt tracing.
 
+Selecting function filters via index
+
+
+Because processing of strings is expensive (the address of the function
+needs to be looked up before comparing to the string being passed in),
+an index can be used as well to enable functions. This is useful in the
+case of setting thousands of specific functions at a time. By passing
+in a list of numbers, no string processing will occur. Instead, the function
+at the specific location in the internal array (which corresponds to the
+functions in the "available_filter_functions" file), is selected.
+
+::
+
+  # echo 1 > set_ftrace_filter
+
+Will select the first function listed in "available_filter_functions"
+
+::
+
+  # head -1 available_filter_functions
+  trace_initcall_finish_cb
+
+  # cat set_ftrace_filter
+  trace_initcall_finish_cb
+
+  # head -50 available_filter_functions | tail -1
+  x86_pmu_commit_txn
+
+  # echo 1 50 > set_ftrace_filter
+  # cat set_ftrace_filter
+  trace_initcall_finish_cb
+  x86_pmu_commit_txn
 
 Dynamic ftrace with the function graph tracer
 -
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index aac7847c0214..fa79323331b2 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -3701,6 +3701,31 @@ enter_record(struct ftrace_hash *hash, struct dyn_ftrace 
*rec, int clear_filter)
return ret;
 }
 
+static int
+add_rec_by_index(struct ftrace_hash *hash, struct ftrace_glob *func_g,
+int clear_filter)
+{
+   long index = simple_strtoul(func_g->search, NULL, 0);
+   struct ftrace_page *pg;
+   struct dyn_ftrace *rec;
+
+   /* The index starts at 1 */
+   if (--index < 0)
+   return 0;
+
+   do_for_each_ftrace_rec(pg, rec) {
+   if (pg->index <= index) {
+   index -= pg->index;
+   /* this is a double loop, break goes to the next page */
+   break;
+   }
+   rec = >records[index];
+   enter_record(hash, rec, clear_filter);
+   return 1;
+   } while_for_each_ftrace_rec();
+   return 0;
+}
+
 static int
 ftrace_match_record(struct dyn_ftrace *rec, struct ftrace_glob *func_g,
struct ftrace_glob *mod_g, int exclude_mod)
@@ -3769,6 +3794,11 @@ match_records(struct

Re: [RFC PATCH net-next v3 13/21] ethtool: provide timestamping information in GET_INFO request

2019-02-20 Thread Jakub Kicinski

On Wed, 20 Feb 2019 14:00:07 +0100, Michal Kubecek wrote:
> On Tue, Feb 19, 2019 at 07:00:48PM -0800, Jakub Kicinski wrote:
> > On Mon, 18 Feb 2019 19:22:29 +0100 (CET), Michal Kubecek wrote:  
> > > Add timestamping information as provided by ETHTOOL_GET_TS_INFO ioctl
> > > command in GET_INFO reply if ETH_INFO_IM_TSINFO flag is set in the 
> > > request.
> > > 
> > > Add constants for counts of HWTSTAMP_TX_* and HWTSTAM_FILTER_* constants
> > > and provide symbolic names for timestamping related values so that they 
> > > can
> > > be retrieved in GET_STRSET and GET_INFO requests.  
> > 
> > What's the reason for providing the symbolic names?  
> 
> One of the the goals I had was to reduce the need to keep the lists of
> possible values in sync between kernel and userspace ethtool and other
> users of the interface so that when a new value is added, we don't have
> to update all userspace tools to be able to use or present it.
> 
> This already works in ethtool for some newer commands (e.g. features)
> and obviously for those where the list of available options depends on
> the device (e.g. private flags or statistics). I would like to extend
> the principle also to older commands and new ones which do not work like
> this (e.g. device reset).

Let me try to argue that's the wrong direction.  People should learn to
update their user space tooling if they want access to new features.  
In my (limited) experience trying to solve forward compatibility leads
to short term gains, and long term warts in the APIs and increased
maintenance burden in the kernel.

Re: [PATCH] iio: cros_ec_accel_legacy: Mark expected switch fall-throughs

2019-02-20 Thread Jonathan Cameron

On Wed, 20 Feb 2019 10:20:39 -0800
Kees Cook  wrote:

> On Mon, Oct 8, 2018 at 10:24 AM Gustavo A. R. Silva
>  wrote:
> >
> > In preparation to enabling -Wimplicit-fallthrough, mark switch cases
> > where we are expecting to fall through.
> >
> > Addresses-Coverity-ID: 1397962 ("Missing break in switch")
> > Signed-off-by: Gustavo A. R. Silva 
> > ---
> >  drivers/iio/accel/cros_ec_accel_legacy.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/drivers/iio/accel/cros_ec_accel_legacy.c 
> > b/drivers/iio/accel/cros_ec_accel_legacy.c
> > index 063e89e..d609654 100644
> > --- a/drivers/iio/accel/cros_ec_accel_legacy.c
> > +++ b/drivers/iio/accel/cros_ec_accel_legacy.c
> > @@ -385,8 +385,10 @@ static int cros_ec_accel_legacy_probe(struct 
> > platform_device *pdev)
> > switch (i) {
> > case X:
> > ec_accel_channels[X].scan_index = Y;
> > +   /* fall through */
> > case Y:
> > ec_accel_channels[Y].scan_index = X;
> > +   /* fall through */
> > case Z:
> > ec_accel_channels[Z].scan_index = Z;
> > }  
> 
> Shouldn't these actually be "break;"s ? It seems like the loop is
> stepping through X, Y, and Z. The _result_ is accidentally the same:
> 
> X: set X, Y, and Z
> Y: set Y and Z
> Z: set Z
> 
> result: X, Y, and Z are set correctly. But the code is technically wrong.
> 

Agreed, it's 'novel'.  Waiting for Gwendal or someone else to come
back and check it wasn't meant to be doing something else.

Jonathan

>

Re: [PATCH v15 15/15] tracing: Add hist trigger action 'expected fail' test case

2019-02-20 Thread Steven Rostedt

On Wed, 20 Feb 2019 12:10:31 -0600
Tom Zanussi  wrote:

> 
> As far as I understand it (there's no other case of an xfail test in
> the testsuite, so nothing similar to compare it to), the test output is
>  correct - here we get the expected fail, XFAIL, and not a FAIL as any
> test, xfail or normal, that failed would produce:

Yeah, I've been staring at the code, and commit:

915de2adb584a ftracetest: Add POSIX.3 standard and XFAIL result codes


> 
> tools/testing/selftests/ftrace# ./ftracetest test.d/trigger/
> === Ftrace unit tests ===
> [1] event trigger - test inter-event histogram trigger expected fail actions
> [XFAIL]
> [2] event trigger - test extended error support
> [PASS]
> 
> And here the summary shows none failed, while we did have one expected
> xfail, but that's what was expected, and not a failure:
> 
> # of passed:  31
> # of failed:  0
> # of unresolved:  0
> # of untested:  0
> # of unsupported:  0
> # of xfailed:  1

Yeah, but it's marked as RED, which is why I thought it was a failure.

> # of undefined(test bug):  0
> 
> If that's not correct, I'll fix it but at this point I'm not sure what
> the output should be if not that.

OK, so this has nothing to do with your patch set. I've tested
everything else, and I'm ready to finally push my tree to linux-next.

I'm thinking that we should get rid of xfail, as it's really confusing,
and I don't understand its purpose. But that shouldn't stop pushing
your patches.

Thanks,

-- Steve

Re: [RFC 0/5] RCU fixes for rcu_assign_pointer usage

2019-02-20 Thread Paul E. McKenney

On Wed, Feb 20, 2019 at 01:09:52PM -0500, Joel Fernandes wrote:
> On Wed, Feb 20, 2019 at 08:42:43AM -0800, Paul E. McKenney wrote:
> > On Tue, Feb 19, 2019 at 08:11:36PM -0800, Joel Fernandes wrote:
> > > On Tue, Feb 19, 2019 at 8:08 PM Joel Fernandes (Google)
> > >  wrote:
> > > >
> > > > These patches fix various RCU API usage issues found due to sparse 
> > > > errors as a
> > > > result of the recent check to add rcu_check_sparse() to 
> > > > rcu_assign_pointer().
> > > >
> > > > This is very early RFC stage, and is only build tested. I am also only 
> > > > sending
> > > > to the RCU group for initial review before sending to LKML. Thanks for 
> > > > any feedback!
> > > >
> > > > There are still more usages that cause errors such as rbtree which I am
> > > > looking into.
> > > 
> > > Looks like it got sent to LKML anyway, ;-) That's Ok since it is
> > > prefixed as RFC.
> > 
> > As is only right and proper.  ;-)
> > 
> > I don't see an immediate problem with them, but it would be good to get
> > the relevant developers and maintainers on CC for the next version.  I
> > cannot claim to know that code very well.
> 
> Definitely will CC them next time, sorry about that. I'll stop being so shy
> but I have some scars that are still healing ;-)

If you don't get at least a few scars, you aren't trying hard enough!  ;-)

Thanx, Paul

Re: [PATCHv6 00/10] Heterogenous memory node attributes

2019-02-20 Thread Keith Busch

On Thu, Feb 14, 2019 at 10:10:07AM -0700, Keith Busch wrote:
> Platforms may provide multiple types of cpu attached system memory. The
> memory ranges for each type may have different characteristics that
> applications may wish to know about when considering what node they want
> their memory allocated from. 
> 
> It had previously been difficult to describe these setups as memory
> rangers were generally lumped into the NUMA node of the CPUs. New
> platform attributes have been created and in use today that describe
> the more complex memory hierarchies that can be created.
> 
> This series' objective is to provide the attributes from such systems
> that are useful for applications to know about, and readily usable with
> existing tools and libraries. Those applications may query performance
> attributes relative to a particular CPU they're running on in order to
> make more informed choices for where they want to allocate hot and cold
> data. This works with mbind() or the numactl library.

Hi all,

So this seems very calm at this point. Unless there are any late concerns
or suggestions, could we open consideration for queueing in a staging
tree for a future merge window?

Thanks,
Keith

 
> Keith Busch (10):
>   acpi: Create subtable parsing infrastructure
>   acpi: Add HMAT to generic parsing tables
>   acpi/hmat: Parse and report heterogeneous memory
>   node: Link memory nodes to their compute nodes
>   node: Add heterogenous memory access attributes
>   node: Add memory-side caching attributes
>   acpi/hmat: Register processor domain to its memory
>   acpi/hmat: Register performance attributes
>   acpi/hmat: Register memory side cache attributes
>   doc/mm: New documentation for memory performance
> 
>  Documentation/ABI/stable/sysfs-devices-node   |  89 +++-
>  Documentation/admin-guide/mm/numaperf.rst | 164 +++
>  arch/arm64/kernel/acpi_numa.c |   2 +-
>  arch/arm64/kernel/smp.c   |   4 +-
>  arch/ia64/kernel/acpi.c   |  12 +-
>  arch/x86/kernel/acpi/boot.c   |  36 +-
>  drivers/acpi/Kconfig  |   1 +
>  drivers/acpi/Makefile |   1 +
>  drivers/acpi/hmat/Kconfig |   9 +
>  drivers/acpi/hmat/Makefile|   1 +
>  drivers/acpi/hmat/hmat.c  | 677 
> ++
>  drivers/acpi/numa.c   |  16 +-
>  drivers/acpi/scan.c   |   4 +-
>  drivers/acpi/tables.c |  76 ++-
>  drivers/base/Kconfig  |   8 +
>  drivers/base/node.c   | 351 -
>  drivers/irqchip/irq-gic-v2m.c |   2 +-
>  drivers/irqchip/irq-gic-v3-its-pci-msi.c  |   2 +-
>  drivers/irqchip/irq-gic-v3-its-platform-msi.c |   2 +-
>  drivers/irqchip/irq-gic-v3-its.c  |   6 +-
>  drivers/irqchip/irq-gic-v3.c  |  10 +-
>  drivers/irqchip/irq-gic.c |   4 +-
>  drivers/mailbox/pcc.c |   2 +-
>  include/linux/acpi.h  |   6 +-
>  include/linux/node.h  |  60 ++-
>  25 files changed, 1480 insertions(+), 65 deletions(-)
>  create mode 100644 Documentation/admin-guide/mm/numaperf.rst
>  create mode 100644 drivers/acpi/hmat/Kconfig
>  create mode 100644 drivers/acpi/hmat/Makefile
>  create mode 100644 drivers/acpi/hmat/hmat.c

Re: BUG: assuming atomic context at kernel/seccomp.c:LINE

2019-02-20 Thread Kees Cook

On Wed, Feb 20, 2019 at 2:00 AM Daniel Borkmann  wrote:
>
> On 02/20/2019 10:32 AM, syzbot wrote:
> > Hello,
> >
> > syzbot found the following crash on:
> >
> > HEAD commit:abf446c90405 Add linux-next specific files for 20190220
> > git tree:   linux-next
> > console output: https://syzkaller.appspot.com/x/log.txt?x=17f250d8c0
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=463cb576ac40e350
> > dashboard link: https://syzkaller.appspot.com/bug?extid=8bf19ee2aa580de7a2a7
> > compiler:   gcc (GCC) 9.0.0 20181231 (experimental)
> >
> > Unfortunately, I don't have any reproducer for this crash yet.
> >
> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > Reported-by: syzbot+8bf19ee2aa580de7a...@syzkaller.appspotmail.com
> >
> > BUG: assuming atomic context at kernel/seccomp.c:271
> > in_atomic(): 0, irqs_disabled(): 0, pid: 12803, name: syz-executor.5
> > no locks held by syz-executor.5/12803.
> > CPU: 1 PID: 12803 Comm: syz-executor.5 Not tainted 5.0.0-rc7-next-20190220 
> > #39
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
> > Google 01/01/2011
> > Call Trace:
> >  __dump_stack lib/dump_stack.c:77 [inline]
> >  dump_stack+0x172/0x1f0 lib/dump_stack.c:113
> >  __cant_sleep kernel/sched/core.c:6218 [inline]
> >  __cant_sleep.cold+0xa3/0xbb kernel/sched/core.c:6195
> >  seccomp_run_filters kernel/seccomp.c:271 [inline]
> >  __seccomp_filter+0x12b/0x12b0 kernel/seccomp.c:801
> >  __secure_computing+0x101/0x360 kernel/seccomp.c:932
> >  syscall_trace_enter+0x5bf/0xe10 arch/x86/entry/common.c:120
> >  do_syscall_64+0x479/0x610 arch/x86/entry/common.c:280
> >  entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> False positive; bpf-next only. Pushing this out in a bit:
>
> From d56547070162a105ff666f3324e558fa6492aedd Mon Sep 17 00:00:00 2001
> From: Daniel Borkmann 
> Date: Wed, 20 Feb 2019 10:51:17 +0100
> Subject: [PATCH bpf-next] bpf, seccomp: fix false positive preemption splat 
> for
>  cbpf->ebpf progs
>
> In 568f196756ad ("bpf: check that BPF programs run with preemption disabled")
> a check was added for BPF_PROG_RUN() that for every invocation preemption is
> disabled to not break eBPF assumptions (e.g. per-cpu map). Of course this does
> not count for seccomp because only cBPF -> eBPF is loaded here and it does not
> make use of any functionality that would require this assertion. Fix this 
> false
> positive by adding and using __BPF_PROG_RUN() variant that does not have the
> cant_sleep(); check.
>
> Fixes: 568f196756ad ("bpf: check that BPF programs run with preemption 
> disabled")
> Reported-by: syzbot+8bf19ee2aa580de7a...@syzkaller.appspotmail.com
> Signed-off-by: Daniel Borkmann 

Acked-by: Kees Cook 

-Kees

> ---
>  include/linux/filter.h | 9 -
>  kernel/seccomp.c   | 2 +-
>  2 files changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/filter.h b/include/linux/filter.h
> index f32b3ec..2f3e29a 100644
> --- a/include/linux/filter.h
> +++ b/include/linux/filter.h
> @@ -533,7 +533,14 @@ struct sk_filter {
> struct bpf_prog *prog;
>  };
>
> -#define BPF_PROG_RUN(filter, ctx)  ({ cant_sleep(); 
> (*(filter)->bpf_func)(ctx, (filter)->insnsi); })
> +#define bpf_prog_run__non_preempt(prog, ctx)   \
> +   ({ cant_sleep(); __BPF_PROG_RUN(prog, ctx); })
> +/* Native eBPF or cBPF -> eBPF transitions. Preemption must be disabled. */
> +#define BPF_PROG_RUN(prog, ctx)\
> +   bpf_prog_run__non_preempt(prog, ctx)
> +/* cBPF -> eBPF only, but not for native eBPF. */
> +#define __BPF_PROG_RUN(prog, ctx)  \
> +   (*(prog)->bpf_func)(ctx, (prog)->insnsi)
>
>  #define BPF_SKB_CB_LEN QDISC_CB_PRIV_LEN
>
> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> index e815781..826d4e4 100644
> --- a/kernel/seccomp.c
> +++ b/kernel/seccomp.c
> @@ -268,7 +268,7 @@ static u32 seccomp_run_filters(const struct seccomp_data 
> *sd,
>  * value always takes priority (ignoring the DATA).
>  */
> for (; f; f = f->prev) {
> -   u32 cur_ret = BPF_PROG_RUN(f->prog, sd);
> +   u32 cur_ret = __BPF_PROG_RUN(f->prog, sd);
>
> if (ACTION_ONLY(cur_ret) < ACTION_ONLY(ret)) {
> ret = cur_ret;
> --
> 2.9.5



-- 
Kees Cook

Re: [PATCH] iio: cros_ec_accel_legacy: Mark expected switch fall-throughs

2019-02-20 Thread Kees Cook

On Mon, Oct 8, 2018 at 10:24 AM Gustavo A. R. Silva
 wrote:
>
> In preparation to enabling -Wimplicit-fallthrough, mark switch cases
> where we are expecting to fall through.
>
> Addresses-Coverity-ID: 1397962 ("Missing break in switch")
> Signed-off-by: Gustavo A. R. Silva 
> ---
>  drivers/iio/accel/cros_ec_accel_legacy.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/iio/accel/cros_ec_accel_legacy.c 
> b/drivers/iio/accel/cros_ec_accel_legacy.c
> index 063e89e..d609654 100644
> --- a/drivers/iio/accel/cros_ec_accel_legacy.c
> +++ b/drivers/iio/accel/cros_ec_accel_legacy.c
> @@ -385,8 +385,10 @@ static int cros_ec_accel_legacy_probe(struct 
> platform_device *pdev)
> switch (i) {
> case X:
> ec_accel_channels[X].scan_index = Y;
> +   /* fall through */
> case Y:
> ec_accel_channels[Y].scan_index = X;
> +   /* fall through */
> case Z:
> ec_accel_channels[Z].scan_index = Z;
> }

Shouldn't these actually be "break;"s ? It seems like the loop is
stepping through X, Y, and Z. The _result_ is accidentally the same:

X: set X, Y, and Z
Y: set Y and Z
Z: set Z

result: X, Y, and Z are set correctly. But the code is technically wrong.


-- 
Kees Cook

[PATCH v2 1/2] soc: bcm: bcm2835-pm: Fix PM_IMAGE_PERI power domain support.

2019-02-20 Thread Eric Anholt

We don't have ASB master/slave regs for this domain, so just skip that
step.

Signed-off-by: Eric Anholt 
Fixes: 670c672608a1 ("soc: bcm: bcm2835-pm: Add support for power domains under 
a new binding.")
---
 drivers/soc/bcm/bcm2835-power.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/soc/bcm/bcm2835-power.c b/drivers/soc/bcm/bcm2835-power.c
index 48412957ec7a..4a1b99b773c0 100644
--- a/drivers/soc/bcm/bcm2835-power.c
+++ b/drivers/soc/bcm/bcm2835-power.c
@@ -150,7 +150,12 @@ struct bcm2835_power {
 
 static int bcm2835_asb_enable(struct bcm2835_power *power, u32 reg)
 {
-   u64 start = ktime_get_ns();
+   u64 start;
+
+   if (!reg)
+   return 0;
+
+   start = ktime_get_ns();
 
/* Enable the module's async AXI bridges. */
ASB_WRITE(reg, ASB_READ(reg) & ~ASB_REQ_STOP);
@@ -165,7 +170,12 @@ static int bcm2835_asb_enable(struct bcm2835_power *power, 
u32 reg)
 
 static int bcm2835_asb_disable(struct bcm2835_power *power, u32 reg)
 {
-   u64 start = ktime_get_ns();
+   u64 start;
+
+   if (!reg)
+   return 0;
+
+   start = ktime_get_ns();
 
/* Enable the module's async AXI bridges. */
ASB_WRITE(reg, ASB_READ(reg) | ASB_REQ_STOP);
-- 
2.20.1

Re: [PATCH 2/2] soc: bcm: bcm2835-pm: Fix error paths of initialization.

2019-02-20 Thread Eric Anholt

Florian Fainelli  writes:

> On 2/13/19 2:33 PM, Stefan Wahren wrote:
>> 
>>> Eric Anholt  hat am 13. Februar 2019 um 19:28 geschrieben:
>>>
>>>
>>> Stefan Wahren  writes:
>>>
 Hi Eric,

 Am 13.02.19 um 01:33 schrieb Eric Anholt:
> The clock driver may probe after ours and so we need to pass the
> -EPROBE_DEFER out.  Fix the other error path while we're here.
>
> Signed-off-by: Eric Anholt 
> Fixes: 670c672608a1 ("soc: bcm: bcm2835-pm: Add support for power domains 
> under a new binding.")
> ---
>  drivers/soc/bcm/bcm2835-power.c | 30 +-
>  1 file changed, 25 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/soc/bcm/bcm2835-power.c 
> b/drivers/soc/bcm/bcm2835-power.c
> index 4a1b99b773c0..11f9469423f7 100644
> --- a/drivers/soc/bcm/bcm2835-power.c
> +++ b/drivers/soc/bcm/bcm2835-power.c
> @@ -485,7 +485,7 @@ static int bcm2835_power_pd_power_off(struct 
> generic_pm_domain *domain)
>   }
>  }
>  
> -static void
> +static int
>  bcm2835_init_power_domain(struct bcm2835_power *power,
> int pd_xlate_index, const char *name)
>  {
> @@ -493,6 +493,12 @@ bcm2835_init_power_domain(struct bcm2835_power 
> *power,
>   struct bcm2835_power_domain *dom = >domains[pd_xlate_index];
>  
>   dom->clk = devm_clk_get(dev->parent, name);
> + if (IS_ERR(dom->clk)) {
> + int ret = PTR_ERR(dom->clk);
> +
> + if (ret == -EPROBE_DEFER)
> + return ret;
 is it safe to proceed in the other error cases?
 Even it would be more consistent with clk_prepare_enable() to print an
 error here.
>>>
>>> Yes, not all domains have a clk, so we want to ignore the other error.
>> 
>> But shouldn't we set dom->clk to NULL instead of keeping the error
>> pointer? AFAIK clk_prepare_enable is aware of NULL instead of error
>> pointer.
>
> If the clock is really optional, then yes, this should be the way to go.

Sigh, error pointers.  Fixed, sending a v2.


signature.asc
Description: PGP signature

[PATCH v2 2/2] soc: bcm: bcm2835-pm: Fix error paths of initialization.

2019-02-20 Thread Eric Anholt

The clock driver may probe after ours and so we need to pass the
-EPROBE_DEFER out.  Fix the other error path while we're here.

v2: Use dom->name instead of dom->gov as the flag for initialized
domains, since we aren't setting up a governor.  Make sure to
clear ->clk when no clk is present in the DT.

Signed-off-by: Eric Anholt 
Fixes: 670c672608a1 ("soc: bcm: bcm2835-pm: Add support for power domains under 
a new binding.")
---
 drivers/soc/bcm/bcm2835-power.c | 35 -
 1 file changed, 30 insertions(+), 5 deletions(-)

diff --git a/drivers/soc/bcm/bcm2835-power.c b/drivers/soc/bcm/bcm2835-power.c
index 4a1b99b773c0..241c4ed80899 100644
--- a/drivers/soc/bcm/bcm2835-power.c
+++ b/drivers/soc/bcm/bcm2835-power.c
@@ -485,7 +485,7 @@ static int bcm2835_power_pd_power_off(struct 
generic_pm_domain *domain)
}
 }
 
-static void
+static int
 bcm2835_init_power_domain(struct bcm2835_power *power,
  int pd_xlate_index, const char *name)
 {
@@ -493,6 +493,17 @@ bcm2835_init_power_domain(struct bcm2835_power *power,
struct bcm2835_power_domain *dom = >domains[pd_xlate_index];
 
dom->clk = devm_clk_get(dev->parent, name);
+   if (IS_ERR(dom->clk)) {
+   int ret = PTR_ERR(dom->clk);
+
+   if (ret == -EPROBE_DEFER)
+   return ret;
+
+   /* Some domains don't have a clk, so make sure that we
+* don't deref an error pointer later.
+*/
+   dom->clk = NULL;
+   }
 
dom->base.name = name;
dom->base.power_on = bcm2835_power_pd_power_on;
@@ -505,6 +516,8 @@ bcm2835_init_power_domain(struct bcm2835_power *power,
pm_genpd_init(>base, NULL, true);
 
power->pd_xlate.domains[pd_xlate_index] = >base;
+
+   return 0;
 }
 
 /** bcm2835_reset_reset - Resets a block that has a reset line in the
@@ -602,7 +615,7 @@ static int bcm2835_power_probe(struct platform_device *pdev)
{ BCM2835_POWER_DOMAIN_IMAGE_PERI, BCM2835_POWER_DOMAIN_CAM0 },
{ BCM2835_POWER_DOMAIN_IMAGE_PERI, BCM2835_POWER_DOMAIN_CAM1 },
};
-   int ret, i;
+   int ret = 0, i;
u32 id;
 
power = devm_kzalloc(dev, sizeof(*power), GFP_KERNEL);
@@ -629,8 +642,11 @@ static int bcm2835_power_probe(struct platform_device 
*pdev)
 
power->pd_xlate.num_domains = ARRAY_SIZE(power_domain_names);
 
-   for (i = 0; i < ARRAY_SIZE(power_domain_names); i++)
-   bcm2835_init_power_domain(power, i, power_domain_names[i]);
+   for (i = 0; i < ARRAY_SIZE(power_domain_names); i++) {
+   ret = bcm2835_init_power_domain(power, i, 
power_domain_names[i]);
+   if (ret)
+   goto fail;
+   }
 
for (i = 0; i < ARRAY_SIZE(domain_deps); i++) {

pm_genpd_add_subdomain(>domains[domain_deps[i].parent].base,
@@ -644,12 +660,21 @@ static int bcm2835_power_probe(struct platform_device 
*pdev)
 
ret = devm_reset_controller_register(dev, >reset);
if (ret)
-   return ret;
+   goto fail;
 
of_genpd_add_provider_onecell(dev->parent->of_node, >pd_xlate);
 
dev_info(dev, "Broadcom BCM2835 power domains driver");
return 0;
+
+fail:
+   for (i = 0; i < ARRAY_SIZE(power_domain_names); i++) {
+   struct generic_pm_domain *dom = >domains[i].base;
+
+   if (dom->name)
+   pm_genpd_remove(dom);
+   }
+   return ret;
 }
 
 static int bcm2835_power_remove(struct platform_device *pdev)
-- 
2.20.1

Re: [PATCH] intel_th: Mark expected switch fall-throughs

2019-02-20 Thread Gustavo A. R. Silva

Hi all,

Friendly ping:

Who can take this?

Thanks
--
Gustavo

On 2/12/19 3:43 PM, Gustavo A. R. Silva wrote:
> In preparation to enabling -Wimplicit-fallthrough, mark switch
> cases where we are expecting to fall through.
> 
> This patch fixes the following warnings:
> 
> drivers/hwtracing/intel_th/sth.c: In function ‘sth_stm_packet’:
> drivers/hwtracing/intel_th/sth.c:86:7: warning: this statement may fall 
> through [-Wimplicit-fallthrough=]
>reg += 4;
>^~~~
> drivers/hwtracing/intel_th/sth.c:87:2: note: here
>   case STP_PACKET_XSYNC:
>   ^~~~
> drivers/hwtracing/intel_th/sth.c:88:7: warning: this statement may fall 
> through [-Wimplicit-fallthrough=]
>reg += 8;
>^~~~
> drivers/hwtracing/intel_th/sth.c:89:2: note: here
>   case STP_PACKET_TRIG:
>   ^~~~
> 
> Warning level 3 was used: -Wimplicit-fallthrough=3
> 
> This patch is part of the ongoing efforts to enable
> -Wimplicit-fallthrough.
> 
> Signed-off-by: Gustavo A. R. Silva 
> ---
>  drivers/hwtracing/intel_th/sth.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/hwtracing/intel_th/sth.c 
> b/drivers/hwtracing/intel_th/sth.c
> index 4b7ae47789d2..3a1f4e650378 100644
> --- a/drivers/hwtracing/intel_th/sth.c
> +++ b/drivers/hwtracing/intel_th/sth.c
> @@ -84,8 +84,12 @@ static ssize_t notrace sth_stm_packet(struct stm_data 
> *stm_data,
>   /* Global packets (GERR, XSYNC, TRIG) are sent with register writes */
>   case STP_PACKET_GERR:
>   reg += 4;
> + /* fall through */
> +
>   case STP_PACKET_XSYNC:
>   reg += 8;
> + /* fall through */
> +
>   case STP_PACKET_TRIG:
>   if (flags & STP_PACKET_TIMESTAMPED)
>   reg += 4;
>

[PATCH] net: dsa: add missing phy address offset

2019-02-20 Thread Marcel Reichmuth

When phys do not start at address 0 like on the mv88e6341 the wrong
phy address is used and therefore the slave ports can not be
initialized. This patch adds the proper offset to the phy address.

Signed-off-by: Marcel Reichmuth 
---
 drivers/net/dsa/mv88e6xxx/chip.c | 3 +++
 include/net/dsa.h| 1 +
 net/dsa/slave.c  | 3 ++-
 3 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 12fd7ce3f1ff..0ca649f784d2 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -2198,12 +2198,15 @@ static int mv88e6xxx_setup_upstream_port(struct 
mv88e6xxx_chip *chip, int port)
 static int mv88e6xxx_setup_port(struct mv88e6xxx_chip *chip, int port)
 {
struct dsa_switch *ds = chip->ds;
+   struct dsa_port *dp = >ports[port];
int err;
u16 reg;
 
chip->ports[port].chip = chip;
chip->ports[port].port = port;
 
+   dp->phy_base_addr = chip->info->phy_base_addr;
+
/* MAC Forcing register: don't force link, speed, duplex or flow control
 * state to any particular values on physical ports, but force the CPU
 * port and all DSA ports to their maximum bandwidth and full duplex.
diff --git a/include/net/dsa.h b/include/net/dsa.h
index b3eefe8e18fd..f9c9dc1f6d21 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -196,6 +196,7 @@ struct dsa_port {
 
struct dsa_switch   *ds;
unsigned intindex;
+   unsigned intphy_base_addr;
const char  *name;
const struct dsa_port   *cpu_dp;
struct device_node  *dn;
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index a1c9fe155057..4f67dff34a3b 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -1221,7 +1221,8 @@ static int dsa_slave_phy_setup(struct net_device 
*slave_dev)
/* We could not connect to a designated PHY or SFP, so use the
 * switch internal MDIO bus instead
 */
-   ret = dsa_slave_phy_connect(slave_dev, dp->index);
+   ret = dsa_slave_phy_connect(slave_dev, dp->phy_base_addr +
+   dp->index);
if (ret) {
netdev_err(slave_dev,
   "failed to connect to port %d: %d\n",
-- 
2.11.0

Re: [PATCH v15 15/15] tracing: Add hist trigger action 'expected fail' test case

2019-02-20 Thread Tom Zanussi

Hi Steve,

On Wed, 2019-02-20 at 12:56 -0500, Steven Rostedt wrote:
> On Wed, 20 Feb 2019 11:38:22 -0600
> Tom Zanussi  wrote:
> 
> > Hi Steve,
> > 
> > On Wed, 2019-02-20 at 12:17 -0500, Steven Rostedt wrote:
> > > On Wed, 13 Feb 2019 17:42:55 -0600
> > > Tom Zanussi  wrote:
> > >   
> > > > From: Tom Zanussi 
> > > > 
> > > > Add a test case verifying that basic action combinations fail
> > > > as
> > > > expected.
> > > >   
> > > 
> > > Hi Tom,
> > > 
> > > This test appears to fail:
> > > 
> > > # echo
> > > 'hist:keys=comm:onmatch(sched.sched_wakeup).save(comm,prio)'  
> > > > > /sys/kernel/debug/tracing/events/sched/sched_waking/trigger  
> > > 
> > > -bash: echo: write error: Invalid argument
> > > 
> > > # cat /sys/kernel/debug/tracing/events/sched/sched_waking/hist 
> > > 
> > > ERROR: action parsing: Handler doesn't support action: save
> > >   Last command:
> > > keys=comm:onmatch(sched.sched_wakeup).save(comm,prio)
> > > 
> > > 
> > > Is the "save" feature implemented here? It's in the README too.
> > > Should
> > > it be removed?
> > >   
> > 
> > The "save" feature is implemented, but it's not currently supported
> > with onmatch(), which is why it fails, and is used in the xfail
> > test,
> > since it's expected to.  So, in this case, the command fails, which
> > means the xfail test actually passed.  ;-)
> > 
> > There are other tests in the inter-event testcases that use save()
> > but
> > with onmax() and onchange(), and they pass.
> 
> So the test needs to pass on failure?
> 
> Because, it shouldn't be flagged as a failure in the test suite.
> 

As far as I understand it (there's no other case of an xfail test in
the testsuite, so nothing similar to compare it to), the test output is
 correct - here we get the expected fail, XFAIL, and not a FAIL as any
test, xfail or normal, that failed would produce:

tools/testing/selftests/ftrace# ./ftracetest test.d/trigger/
=== Ftrace unit tests ===
[1] event trigger - test inter-event histogram trigger expected fail actions
[XFAIL]
[2] event trigger - test extended error support
[PASS]

And here the summary shows none failed, while we did have one expected
xfail, but that's what was expected, and not a failure:

# of passed:  31
# of failed:  0
# of unresolved:  0
# of untested:  0
# of unsupported:  0
# of xfailed:  1
# of undefined(test bug):  0

If that's not correct, I'll fix it but at this point I'm not sure what
the output should be if not that.

Thanks,

Tom

> -- Steve
> 
> > 
> > Hope that explains things in this case,
> > 
> > Tom
> > 
> > > -- Steve
> > >   
> > > > Signed-off-by: Tom Zanussi 
> > > > ---
> > > >  .../inter-event/trigger-action-hist-xfail.tc   | 30
> > > > ++
> > > >  1 file changed, 30 insertions(+)
> > > >  create mode 100644
> > > > tools/testing/selftests/ftrace/test.d/trigger/inter-
> > > > event/trigger-
> > > > action-hist-xfail.tc
> > > > 
> > > > diff --git
> > > > a/tools/testing/selftests/ftrace/test.d/trigger/inter-
> > > > event/trigger-action-hist-xfail.tc
> > > > b/tools/testing/selftests/ftrace/test.d/trigger/inter-
> > > > event/trigger-action-hist-xfail.tc
> > > > new file mode 100644
> > > > index ..1221240f8cf6
> > > > --- /dev/null
> > > > +++ b/tools/testing/selftests/ftrace/test.d/trigger/inter-
> > > > event/trigger-action-hist-xfail.tc
> > > > @@ -0,0 +1,30 @@
> > > > +#!/bin/sh
> > > > +# SPDX-License-Identifier: GPL-2.0
> > > > +# description: event trigger - test inter-event histogram
> > > > trigger
> > > > expected fail actions
> > > > +
> > > > +fail() { #msg
> > > > +echo $1
> > > > +exit_fail
> > > > +}
> > > > +
> > > > +if [ ! -f set_event ]; then
> > > > +echo "event tracing is not supported"
> > > > +exit_unsupported
> > > > +fi
> > > > +
> > > > +if [ ! -f snapshot ]; then
> > > > +echo "snapshot is not supported"
> > > > +exit_unsupported
> > > > +fi
> > > > +
> > > > +grep -q "snapshot()" README || exit_unsupported # version
> > > > issue
> > > > +
> > > > +echo "Test expected snapshot action failure"
> > > > +
> > > > +echo 'hist:keys=comm:onmatch(sched.sched_wakeup).snapshot()'
> > > > >>
> > > > /sys/kernel/debug/tracing/events/sched/sched_waking/trigger &&
> > > > exit_fail
> > > > +
> > > > +echo "Test expected save action failure"
> > > > +
> > > > +echo
> > > > 'hist:keys=comm:onmatch(sched.sched_wakeup).save(comm,prio)'  
> > > > > > /sys/kernel/debug/tracing/events/sched/sched_waking/trigger
> > > > > > &&  
> > > > 
> > > > exit_fail
> > > > +
> > > > +exit_xfail  
> > > 
> > >   
> 
>

Re: [PATCH] crypto: ccree - fix missing break in switch statement

2019-02-20 Thread Gustavo A. R. Silva

Hi all,

Friendly ping:

Who can take this?

Thanks
--
Gustavo

On 2/11/19 12:31 PM, Gustavo A. R. Silva wrote:
> Add missing break statement in order to prevent the code from falling
> through to case S_DIN_to_DES.
> 
> This bug was found thanks to the ongoing efforts to enable
> -Wimplicit-fallthrough.
> 
> Fixes: 63ee04c8b491 ("crypto: ccree - add skcipher support")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Gustavo A. R. Silva 
> ---
>  drivers/crypto/ccree/cc_cipher.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/crypto/ccree/cc_cipher.c 
> b/drivers/crypto/ccree/cc_cipher.c
> index 5e3361a363b5..d9c17078517b 100644
> --- a/drivers/crypto/ccree/cc_cipher.c
> +++ b/drivers/crypto/ccree/cc_cipher.c
> @@ -80,6 +80,7 @@ static int validate_keys_sizes(struct cc_cipher_ctx *ctx_p, 
> u32 size)
>   default:
>   break;
>   }
> + break;
>   case S_DIN_to_DES:
>   if (size == DES3_EDE_KEY_SIZE || size == DES_KEY_SIZE)
>   return 0;
>

Re: [RFC 0/5] RCU fixes for rcu_assign_pointer usage

2019-02-20 Thread Joel Fernandes

On Wed, Feb 20, 2019 at 08:42:43AM -0800, Paul E. McKenney wrote:
> On Tue, Feb 19, 2019 at 08:11:36PM -0800, Joel Fernandes wrote:
> > On Tue, Feb 19, 2019 at 8:08 PM Joel Fernandes (Google)
> >  wrote:
> > >
> > > These patches fix various RCU API usage issues found due to sparse errors 
> > > as a
> > > result of the recent check to add rcu_check_sparse() to 
> > > rcu_assign_pointer().
> > >
> > > This is very early RFC stage, and is only build tested. I am also only 
> > > sending
> > > to the RCU group for initial review before sending to LKML. Thanks for 
> > > any feedback!
> > >
> > > There are still more usages that cause errors such as rbtree which I am
> > > looking into.
> > 
> > Looks like it got sent to LKML anyway, ;-) That's Ok since it is
> > prefixed as RFC.
> 
> As is only right and proper.  ;-)
> 
> I don't see an immediate problem with them, but it would be good to get
> the relevant developers and maintainers on CC for the next version.  I
> cannot claim to know that code very well.

Definitely will CC them next time, sorry about that. I'll stop being so shy
but I have some scars that are still healing ;-)

 - Joel

[PATCH v2] x86/asm: Pin sensitive CR4 bits

2019-02-20 Thread Kees Cook

Several recent exploits have used direct calls to the native_write_cr4()
function to disable SMEP and SMAP before then continuing their exploits
using userspace memory access. This pins bits of cr4 so that they cannot
be changed through a common function. This is not intended to be general
ROP protection (which would require CFI to defend against properly), but
rather a way to avoid trivial direct function calling (or CFI bypassing
via a matching function prototype) as seen in:

https://googleprojectzero.blogspot.com/2017/05/exploiting-linux-kernel-via-packet.html
(https://github.com/xairy/kernel-exploits/tree/master/CVE-2017-7308)

The goals of this change:
 - pin specific bits (SMEP, SMAP, and UMIP) when writing cr4.
 - avoid setting the bits too early (they must become pinned only after
   first being used).
 - pinning mask needs to be read-only during normal runtime.
 - pinning needs to be rechecked after set to avoid jumps into the middle
   of the function.

Using __ro_after_init on the mask is done so it can't be first disabled
with a malicious write. And since it becomes read-only, we must avoid
writing to it later (hence the check for bits already having been set
instead of unconditionally writing to the mask).

The use of volatile is done to force the compiler to perform a full reload
of the mask after setting cr4 (to protect against just jumping into the
function past where the masking happens; we must check that the mask was
applied after we do the set). Due to how this function can be built by the
compiler (especially due to the removal of frame pointers), jumping into
the middle of the function frequently doesn't require stack manipulation
to construct a stack frame (there may only a retq without pops, which is
sufficient for use with exploits like timer overwrites mentioned above).

For example, without the recheck, the function may appear as:

   native_write_cr4:
  mov [pin], %rbx
  or  %rbx, %rdi
   1: mov %rdi, %cr4
  retq

The masking "or" could be trivially bypassed by just calling to label "1"
instead of "native_write_cr4". (CFI will force calls to only be able to
call into native_write_cr4, but CFI and CET are uncommon currently.)

Signed-off-by: Kees Cook 
---
v2: fix think-o in cr4_pin recheck (Jann Horn)
---
 arch/x86/include/asm/special_insns.h | 11 +++
 arch/x86/kernel/cpu/common.c | 12 +++-
 2 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/special_insns.h 
b/arch/x86/include/asm/special_insns.h
index 43c029cdc3fe..4c26004ed5d4 100644
--- a/arch/x86/include/asm/special_insns.h
+++ b/arch/x86/include/asm/special_insns.h
@@ -72,9 +72,20 @@ static inline unsigned long native_read_cr4(void)
return val;
 }
 
+extern volatile unsigned long cr4_pin;
+
 static inline void native_write_cr4(unsigned long val)
 {
+again:
+   val |= cr4_pin;
asm volatile("mov %0,%%cr4": : "r" (val), "m" (__force_order));
+   /*
+* If the MOV above was used directly as a ROP gadget we can
+* notice the lack of pinned bits in "val" and start the function
+* from the beginning to gain the cr4_pin bits for sure.
+*/
+   if (WARN_ONCE((val & cr4_pin) != cr4_pin, "cr4 bypass attempt?!\n"))
+   goto again;
 }
 
 #ifdef CONFIG_X86_64
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index cb28e98a0659..7e0ea4470f8e 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -312,10 +312,16 @@ static __init int setup_disable_smep(char *arg)
 }
 __setup("nosmep", setup_disable_smep);
 
+volatile unsigned long cr4_pin __ro_after_init;
+EXPORT_SYMBOL_GPL(cr4_pin);
+
 static __always_inline void setup_smep(struct cpuinfo_x86 *c)
 {
-   if (cpu_has(c, X86_FEATURE_SMEP))
+   if (cpu_has(c, X86_FEATURE_SMEP)) {
+   if (!(cr4_pin & X86_CR4_SMEP))
+   cr4_pin |= X86_CR4_SMEP;
cr4_set_bits(X86_CR4_SMEP);
+   }
 }
 
 static __init int setup_disable_smap(char *arg)
@@ -334,6 +340,8 @@ static __always_inline void setup_smap(struct cpuinfo_x86 
*c)
 
if (cpu_has(c, X86_FEATURE_SMAP)) {
 #ifdef CONFIG_X86_SMAP
+   if (!(cr4_pin & X86_CR4_SMAP))
+   cr4_pin |= X86_CR4_SMAP;
cr4_set_bits(X86_CR4_SMAP);
 #else
cr4_clear_bits(X86_CR4_SMAP);
@@ -351,6 +359,8 @@ static __always_inline void setup_umip(struct cpuinfo_x86 
*c)
if (!cpu_has(c, X86_FEATURE_UMIP))
goto out;
 
+   if (!(cr4_pin & X86_CR4_UMIP))
+   cr4_pin |= X86_CR4_UMIP;
cr4_set_bits(X86_CR4_UMIP);
 
pr_info_once("x86/cpu: User Mode Instruction Prevention (UMIP) 
activated\n");
-- 
2.17.1


-- 
Kees Cook

Re: [RFC 1/5] net: rtnetlink: Fix incorrect RCU API usage

2019-02-20 Thread Joel Fernandes

On Wed, Feb 20, 2019 at 08:40:34AM -0800, Paul E. McKenney wrote:
> On Tue, Feb 19, 2019 at 11:08:23PM -0500, Joel Fernandes (Google) wrote:
> > From: Joel Fernandes 
> > 
> > rtnl_register_internal() and rtnl_unregister_all tries to directly
> > dereference an RCU protected pointed outside RCU read side section.
> > While this is Ok to do since a lock is held, let us use the correct
> > API to avoid programmer bugs in the future.
> > 
> > This also fixes sparse warnings arising from not using RCU API.
> > 
> > net/core/rtnetlink.c:332:13: warning: incorrect type in assignment
> > (different address spaces) net/core/rtnetlink.c:332:13:expected
> > struct rtnl_link **tab net/core/rtnetlink.c:332:13:got struct
> > rtnl_link *[noderef] *
> > 
> > Signed-off-by: Joel Fernandes 
> 
> First, thank you for doing this!

No problem, it is my pleasure. It is just good to see these warnings/errors
show up (which I didn't anticipate when I first wrote the check) so we can
harden the kernel more fwiw.

> I was going to complain that these were update-side accesses, but it
> looks like rtnl_dereference() already handles both readers and updaters.
> 
> So looks good to me, but the maintainers of course have the final word.

Thanks!
Also my confidence level is a bit less for patches 4/5 and 5/5, could
you share your thoughts on those? The scheduler code seems to use
rcu_assign_pointer() in those where it seems a WRITE_ONCE() would just suffice.
In fact, in some cases I replaced with smp_store_release() just to be safe.
Speaking of which, do you feel those are legit uses of rcu_assign_pointer()
or would you expect rcu_assign_pointer() to be used only for RCU protected
pointers? I am hoping it is the latter since that is what the sparse check
expects (and RCU protected pointer being assigned to).

 - Joel


> 
> > ---
> >  net/core/rtnetlink.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> > index 5ea1bed08ede..98be4b4818a9 100644
> > --- a/net/core/rtnetlink.c
> > +++ b/net/core/rtnetlink.c
> > @@ -188,7 +188,7 @@ static int rtnl_register_internal(struct module *owner,
> > msgindex = rtm_msgindex(msgtype);
> >  
> > rtnl_lock();
> > -   tab = rtnl_msg_handlers[protocol];
> > +   tab = rtnl_dereference(rtnl_msg_handlers[protocol]);
> > if (tab == NULL) {
> > tab = kcalloc(RTM_NR_MSGTYPES, sizeof(void *), GFP_KERNEL);
> > if (!tab)
> > @@ -329,7 +329,7 @@ void rtnl_unregister_all(int protocol)
> > BUG_ON(protocol < 0 || protocol > RTNL_FAMILY_MAX);
> >  
> > rtnl_lock();
> > -   tab = rtnl_msg_handlers[protocol];
> > +   tab = rtnl_dereference(rtnl_msg_handlers[protocol]);
> > if (!tab) {
> > rtnl_unlock();
> > return;
> > -- 
> > 2.21.0.rc0.258.g878e2cd30e-goog
> > 
>

Re: [PATCH] kasan: turn off asan-stack for clang-8 and earlier

2019-02-20 Thread Nick Desaulniers

+ Evgenii

On Wed, Feb 20, 2019 at 9:36 AM Arnd Bergmann  wrote:
>
> On Wed, Feb 20, 2019 at 6:00 PM Andrey Ryabinin  
> wrote:
> > On 2/20/19 5:51 PM, Arnd Bergmann wrote:
> > > On Wed, Feb 20, 2019 at 3:45 PM Andrey Konovalov  
> > > wrote:
> > > I would have to some more research, but I expect several hundred
> > > patches before we get to a clean randconfig build with a broken
> > > compiler.
> >
> > Manually maintaining asan-stack parameter for the sake of one broken 
> > compiler isn't a great idea either.
> >
> > Couple alternative suggestions:
> >
> > 1) If we can't fix the problem or the cost of fixing is too high, maybe 
> > just hide it? Disable -Wframe-larger-then on pre clang-9 compilers.
> >
> > 2) Fallback cflags. The idea is to try to compile every the file with 
> > "-mllvm -asan-stack=1 -Wframe-larger-than=2048 -Werror" at first,
> >  and fallback to "-mllvm -asan-stack=0" if failed. So it would be something 
> > similar to $(call cc-option, -mllvm -asan-stack=1 -Wframe-larger-than=2048 
> > -Werror, -mllvm -asan-stack=0)
> >  except that "cc-option" tries options only once on some code example while 
> >  we need to try options on every file that we actually compile.
> >  Honestly, I'm not sure that it's worthy to hack Kbuild engine for that 
> > particular use-case.
>
> My original plan was to put this under CONFIG_KASAN_EXTRA to allow you
> to still enable it in older compilers, but you just removed that option ;-)
>
> Maybe bringing it back would be a compromise? That way it's hidden from
> all the build testing bots (because of the !CONFIG_COMPILE_TEST dependency),
> but anyone who really wants it can still have the option, and set
> CONFIG_FRAME_WARN
> to whichever value they like.
>
>  Arnd

I like Evgenii's idea:
https://bugs.llvm.org/show_bug.cgi?id=38809#c10

Even though something like that wouldn't make the clang-8 train, I
think it's ok.

While I myself share Arnd's goal of driving compiler warnings to zero,
in general I'd prefer not to disable warning-producing-features or
disable warnings outright for cases where we have some ideas of
changes we can make to the compiler.  There's probably a list now of
false warnings produced by old versions of Clang from bugs in Clang
that we fixed.  I'm not interested in additionally trying to work
around those somehow in kernel sources.

Qian previously pointed out that most drivers don't produce this
warning under KASAN+Clang.  While 114 is a lot, what are the chances
that someone NEEDS a KASAN+Clang build to compile warning free and
happen to include one of these problematic drivers?  And if there is a
chance they do observe the warning, are we doing a disservice by
disabling the feature (-asan-stack=1) outright for the whole kernel,
or disabling the warning (`-Wstack-frame-larger-than=`) which can flag
issues unrelated to KASAN?

To Evgenii's idea, I vote that the compiler is incorrect here, and we
shouldn't start turning things off.  Evgenii, do you have some sense
of how to tune the inliner as you described?
-- 
Thanks,
~Nick Desaulniers

Re: [PATCH] iio: mma8452: mark expected switch fall-through

2019-02-20 Thread Gustavo A. R. Silva




On 2/20/19 11:21 AM, Gustavo A. R. Silva wrote:
> 
> 
> On 2/20/19 6:17 AM, Jonathan Cameron wrote:
>> On Mon, 11 Feb 2019 16:23:18 -0600
>> "Gustavo A. R. Silva"  wrote:
>>
>>> In preparation to enabling -Wimplicit-fallthrough, mark switch
>>> cases where we are expecting to fall through.
>>>
>>> This patch fixes the following warning:
>>>
>>> drivers/iio/accel/mma8452.c: In function ‘mma8452_probe’:
>>> drivers/iio/accel/mma8452.c:1581:6: warning: this statement may fall 
>>> through [-Wimplicit-fallthrough=]
>>>if (ret == data->chip_info->chip_id)
>>>   ^
>>> drivers/iio/accel/mma8452.c:1584:2: note: here
>>>   default:
>>>   ^~~
>>>
>>> Warning level 3 was used: -Wimplicit-fallthrough=3
>>>
>>> Notice that, in this particular case, the code comment is modified
>>> in accordance with what GCC is expecting to find.
>>>
>>> This patch is part of the ongoing efforts to enable
>>> -Wimplicit-fallthrough.
>>>
>>> Signed-off-by: Gustavo A. R. Silva 
>> I know Peter probably won't like this, as it doesn't
>> read a as well, with the else dropped, but I'm going to take
>> it as we have had a lot of bugs caught by this code and this
>> is generating a false positive.
>>
>> Applied to the togreg branch of iio.git and pushed out as testing
>> for the autobuilders to play with it.
>>
> 
> Thanks, Jonathan.
> 

BTW, Jonathan, I wonder if you can apply this one too:

https://lore.kernel.org/patchwork/patch/996804/

Thanks
--
Gustavo

Re: [PATCH v2] driver: platform: Support parsing GpioInt 0 in platform_get_irq()

2019-02-20 Thread Brian Norris

Hi,

On Mon, Feb 11, 2019 at 11:01:12AM -0800, egran...@chromium.org wrote:
> From: Enrico Granata 
> 
> ACPI 5 added support for GpioInt resources as a way to provide
> information about interrupts mediated via a GPIO controller.
> 
> Several device buses (e.g. SPI, I2C) have support for retrieving
> an IRQ specified via this type of resource, and providing it
> directly to the driver as an IRQ number.
> 
> This is not currently done for the platform drivers, as platform_get_irq()
> does not try to parse GpioInt() resources. This requires drivers to
> either have to support only one possible IRQ resource, or to have code
> in place to try both as a failsafe.
> 
> While there is a possibility of ambiguity for devices that exposes
> multiple IRQs, it is easy and feasible to support the common case
> of devices that only expose one IRQ which would be of either type
> depending on the underlying system's architecture.
> 
> This commit adds support for parsing a GpioInt resource in order
> to fulfill a request for the index 0 IRQ for a platform device.
> 
> Signed-off-by: Enrico Granata 
> ---
> Changes in v2:
>  - only support IRQ index 0
> 
>  drivers/base/platform.c | 15 ++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/base/platform.c b/drivers/base/platform.c
> index 1c958eb33ef4d..0d3611cd1b3bc 100644
> --- a/drivers/base/platform.c
> +++ b/drivers/base/platform.c
> @@ -127,7 +127,20 @@ int platform_get_irq(struct platform_device *dev, 
> unsigned int num)
>   irqd_set_trigger_type(irqd, r->flags & IORESOURCE_BITS);
>   }
>  
> - return r ? r->start : -ENXIO;
> + if (r)
> + return r->start;
> +
> + /*
> +  * For the index 0 interrupt, allow falling back to GpioInt
> +  * resources. While a device could have both Interrupt and GpioInt
> +  * resources, making this fallback ambiguous, in many common cases
> +  * the device will only expose one IRQ, and this fallback
> +  * allows a common code path across either kind of resource.
> +  */
> + if (num == 0 && has_acpi_companion(>dev))
> + return acpi_dev_gpio_irq_get(ACPI_COMPANION(>dev), num);

For ACPI devices, this changes the return code for a missing interrupt
0 from ENXIO to ENOENT, because acpi_dev_gpio_irq_get() uses ENOENT
instead of ENXIO. While ENXIO isn't exactly documented as the *specific*
error code for a missing interrupt in platform_get_irq(), there are
definitely drivers out there that are looking specifically for ENXIO
(grepping the tree finds several Rockchip platform drivers and a few
ethernet drivers at a minimum). And it also incidentally broke some
usage of the very driver you were trying to support
(drivers/platform/chrome/cros_ec_lpc.c).

I suspect a good strategy here would be to check
acpi_dev_gpio_irq_get()'s return codes here with something like:

if (ret > 0 || ret == -EPROBE_DEFER)
return ret;
return -ENXIO;

Although, the gpiolib functions embedded in there also can return EIO,
so maybe something like this is better?

if (ret == -ENOENT || ret == 0)
return -ENXIO;
return ret;

I'm kinda unsure what to do with error codes besides PROBE_DEFER or
"missing", since most users don't really have it in their mind that
platform_get_irq() can fail with EIO or similar.

Brian

> +
> + return -ENXIO;
>  #endif
>  }
>  EXPORT_SYMBOL_GPL(platform_get_irq);
> -- 
> 2.20.1.791.gb4d0f1c61a-goog
>

Re: xen/evtchn and forced threaded irq

2019-02-20 Thread Julien Grall


Hi,

On 20/02/2019 17:07, Boris Ostrovsky wrote:

On 2/20/19 9:15 AM, Julien Grall wrote:

Hi Boris,

Thank you for your answer.

On 20/02/2019 00:02, Boris Ostrovsky wrote:

On Tue, Feb 19, 2019 at 05:31:10PM +, Julien Grall wrote:

Hi all,

I have been looking at using Linux RT in Dom0. Once the guest is
started,
the console is ending to have a lot of warning (see trace below).

After some investigation, this is because the irq handler will now
be threaded.
I can reproduce the same error with the vanilla Linux when passing
the option
'threadirqs' on the command line (the trace below is from 5.0.0-rc7
that has
not RT support).

FWIW, the interrupt for port 6 is used to for the guest to
communicate with
xenstore.

  From my understanding, this is happening because the interrupt
handler is now
run in a thread. So we can have the following happening.

     Interrupt context    | Interrupt thread
  |
     receive interrupt port 6 |
     clear the evtchn port    |
     set IRQF_RUNTHREAD    |
     kick interrupt thread    |
  |    clear IRQF_RUNTHREAD
  |    call evtchn_interrupt
     receive interrupt port 6 |
     clear the evtchn port    |
     set IRQF_RUNTHREAD   |
     kick interrupt thread    |
  |    disable interrupt port 6
  |    evtchn->enabled = false
  |    []
  |
  |    *** Handling the second
interrupt ***
  |    clear IRQF_RUNTHREAD
  |    call evtchn_interrupt
  |    WARN(...)

I am not entirely sure how to fix this. I have two solutions in mind:

1) Prevent the interrupt handler to be threaded. We would also need to
switch from spin_lock to raw_spin_lock as the former may sleep on
RT-Linux.

2) Remove the warning


I think access to evtchn->enabled is racy so (with or without the
warning) we can't use it reliably.


Thinking about it, it would not be the only issue. The ring is sized
to contain only one instance of the same event. So if you receive
twice the event, you may overflow the ring.


Hm... That's another argument in favor of "unthreading" the handler.


I first thought it would be possible to unthread it. However, 
wake_up_interruptible is using a spin_lock. On RT spin_lock can sleep, so this 
cannot be used in an interrupt context.


So I think "unthreading" the handler is not an option here.







Another alternative could be to queue the irq if !evtchn->enabled and
handle it in evtchn_write() (which is where irq is supposed to be
re-enabled).

What do you mean by queue? Is it queueing in the ring?



No, I was thinking about having a new structure for deferred interrupts.


Hmmm, I am not entirely sure what would be the structure here. Could you expand 
your thinking?


Cheers,

--
Julien Grall

Re: [PATCH V19 5/7] i2c: tegra: Add DMA support

2019-02-20 Thread Dmitry Osipenko

20.02.2019 21:02, Jon Hunter пишет:
> 
> On 12/02/2019 19:06, Sowjanya Komatineni wrote:
>> This patch adds DMA support for Tegra I2C.
>>
>> Tegra I2C TX and RX FIFO depth is 8 words. PIO mode is used for
>> transfer size of the max FIFO depth and DMA mode is used for
>> transfer size higher than max FIFO depth to save CPU overhead.
>>
>> PIO mode needs full intervention of CPU to fill or empty FIFO's
>> and also need to service multiple data requests interrupt for the
>> same transaction. This adds delay between data bytes of the same
>> transfer when CPU is fully loaded and some slave devices has
>> internal timeout for no bus activity and stops transaction to
>> avoid bus hang. DMA mode is helpful in such cases.
>>
>> DMA mode is also helpful for Large transfers during downloading or
>> uploading FW over I2C to some external devices.
>>
>> Tegra210 and prior Tegra chips use APBDMA driver which is replaced
>> with GPCDMA on Tegra186 and Tegra194.
>> This patch uses has_apb_dma flag in hw_feature to differentiate
>> DMA driver change between Tegra chipset.
>>
>> APBDMA driver is registered from module-init level and this patch
>> also has a change to register I2C driver at module-init level
>> rather than subsys-init to avoid deferring I2C probe till APBDMA
>> driver is registered.
>>
>> Acked-by: Thierry Reding 
>> Reviewed-by: Dmitry Osipenko 
>> Tested-by: Dmitry Osipenko 
>> Signed-off-by: Sowjanya Komatineni 
> 
> ...
> 
>> +static int tegra_i2c_init_dma(struct tegra_i2c_dev *i2c_dev)
>> +{
>> +struct dma_chan *chan;
>> +u32 *dma_buf;
>> +dma_addr_t dma_phys;
>> +int err;
>> +
>> +if (!IS_ENABLED(CONFIG_TEGRA20_APB_DMA) ||
>> +!i2c_dev->hw->has_apb_dma) {
>> +err = -ENODEV;
>> +goto err_out;
>> +}
>> +
>> +chan = dma_request_slave_channel_reason(i2c_dev->dev, "rx");
>> +if (IS_ERR(chan)) {
>> +err = PTR_ERR(chan);
>> +goto err_out;
>> +}
>> +
>> +i2c_dev->rx_dma_chan = chan;
>> +
>> +chan = dma_request_slave_channel_reason(i2c_dev->dev, "tx");
>> +if (IS_ERR(chan)) {
>> +err = PTR_ERR(chan);
>> +goto err_out;
>> +}
>> +
>> +i2c_dev->tx_dma_chan = chan;
>> +
>> +dma_buf = dma_alloc_coherent(i2c_dev->dev, i2c_dev->dma_buf_size,
>> + _phys, GFP_KERNEL | __GFP_NOWARN);
>> +if (!dma_buf) {
>> +dev_err(i2c_dev->dev, "failed to allocate the DMA buffer\n");
>> +err = -ENOMEM;
>> +goto err_out;
>> +}
>> +
>> +i2c_dev->dma_buf = dma_buf;
>> +i2c_dev->dma_phys = dma_phys;
>> +return 0;
>> +
>> +err_out:
>> +tegra_i2c_release_dma(i2c_dev);
>> +if (err != -EPROBE_DEFER) {
>> +dev_err(i2c_dev->dev, "cannot use DMA: %d\n", err);
>> +dev_err(i2c_dev->dev, "fallbacking to PIO\n");
>> +return 0;
>> +}
> I think that the above should be a dev_dbg print or re-worked in someway
> because now for Tegra194 which does not have an APB DMA I see ...
> 
> [   6.093234] ERR KERN tegra-i2c 31c.i2c: cannot use DMA: -19
> [   6.096847] ERR KERN tegra-i2c 31c.i2c: falling back to PIO
> 
> Given that the APB DMA is not supported for Tegra186/Tegra194, there is
> no point in printing these error messages. Now it looks like something
> is wrong but really it is not :-(

Jon, patches are welcome ;)

Re: [PATCH V19 5/7] i2c: tegra: Add DMA support

2019-02-20 Thread Jon Hunter



On 12/02/2019 19:06, Sowjanya Komatineni wrote:
> This patch adds DMA support for Tegra I2C.
> 
> Tegra I2C TX and RX FIFO depth is 8 words. PIO mode is used for
> transfer size of the max FIFO depth and DMA mode is used for
> transfer size higher than max FIFO depth to save CPU overhead.
> 
> PIO mode needs full intervention of CPU to fill or empty FIFO's
> and also need to service multiple data requests interrupt for the
> same transaction. This adds delay between data bytes of the same
> transfer when CPU is fully loaded and some slave devices has
> internal timeout for no bus activity and stops transaction to
> avoid bus hang. DMA mode is helpful in such cases.
> 
> DMA mode is also helpful for Large transfers during downloading or
> uploading FW over I2C to some external devices.
> 
> Tegra210 and prior Tegra chips use APBDMA driver which is replaced
> with GPCDMA on Tegra186 and Tegra194.
> This patch uses has_apb_dma flag in hw_feature to differentiate
> DMA driver change between Tegra chipset.
> 
> APBDMA driver is registered from module-init level and this patch
> also has a change to register I2C driver at module-init level
> rather than subsys-init to avoid deferring I2C probe till APBDMA
> driver is registered.
> 
> Acked-by: Thierry Reding 
> Reviewed-by: Dmitry Osipenko 
> Tested-by: Dmitry Osipenko 
> Signed-off-by: Sowjanya Komatineni 

...

> +static int tegra_i2c_init_dma(struct tegra_i2c_dev *i2c_dev)
> +{
> + struct dma_chan *chan;
> + u32 *dma_buf;
> + dma_addr_t dma_phys;
> + int err;
> +
> + if (!IS_ENABLED(CONFIG_TEGRA20_APB_DMA) ||
> + !i2c_dev->hw->has_apb_dma) {
> + err = -ENODEV;
> + goto err_out;
> + }
> +
> + chan = dma_request_slave_channel_reason(i2c_dev->dev, "rx");
> + if (IS_ERR(chan)) {
> + err = PTR_ERR(chan);
> + goto err_out;
> + }
> +
> + i2c_dev->rx_dma_chan = chan;
> +
> + chan = dma_request_slave_channel_reason(i2c_dev->dev, "tx");
> + if (IS_ERR(chan)) {
> + err = PTR_ERR(chan);
> + goto err_out;
> + }
> +
> + i2c_dev->tx_dma_chan = chan;
> +
> + dma_buf = dma_alloc_coherent(i2c_dev->dev, i2c_dev->dma_buf_size,
> +  _phys, GFP_KERNEL | __GFP_NOWARN);
> + if (!dma_buf) {
> + dev_err(i2c_dev->dev, "failed to allocate the DMA buffer\n");
> + err = -ENOMEM;
> + goto err_out;
> + }
> +
> + i2c_dev->dma_buf = dma_buf;
> + i2c_dev->dma_phys = dma_phys;
> + return 0;
> +
> +err_out:
> + tegra_i2c_release_dma(i2c_dev);
> + if (err != -EPROBE_DEFER) {
> + dev_err(i2c_dev->dev, "cannot use DMA: %d\n", err);
> + dev_err(i2c_dev->dev, "fallbacking to PIO\n");
> + return 0;
> + }
I think that the above should be a dev_dbg print or re-worked in someway
because now for Tegra194 which does not have an APB DMA I see ...

[   6.093234] ERR KERN tegra-i2c 31c.i2c: cannot use DMA: -19
[   6.096847] ERR KERN tegra-i2c 31c.i2c: falling back to PIO

Given that the APB DMA is not supported for Tegra186/Tegra194, there is
no point in printing these error messages. Now it looks like something
is wrong but really it is not :-(

Cheers
Jon

-- 
nvpublic

Re: [RESEND PATCH 0/7] Add FOLL_LONGTERM to GUP fast and use it

2019-02-20 Thread Ira Weiny

On Wed, Feb 20, 2019 at 07:19:30AM -0800, Christoph Hellwig wrote:
> On Tue, Feb 19, 2019 at 09:30:33PM -0800, ira.we...@intel.com wrote:
> > From: Ira Weiny 
> > 
> > Resending these as I had only 1 minor comment which I believe we have 
> > covered
> > in this series.  I was anticipating these going through the mm tree as they
> > depend on a cleanup patch there and the IB changes are very minor.  But they
> > could just as well go through the IB tree.
> > 
> > NOTE: This series depends on my clean up patch to remove the write parameter
> > from gup_fast_permitted()[1]
> > 
> > HFI1, qib, and mthca, use get_user_pages_fast() due to it performance
> > advantages.  These pages can be held for a significant time.  But
> > get_user_pages_fast() does not protect against mapping of FS DAX pages.
> 
> This I don't get - if you do lock down long term mappings performance
> of the actual get_user_pages call shouldn't matter to start with.
> 
> What do I miss?

A couple of points.

First "longterm" is a relative thing and at this point is probably a misnomer.
This is really flagging a pin which is going to be given to hardware and can't
move.  I've thought of a couple of alternative names but I think we have to
settle on if we are going to use FL_LAYOUT or something else to solve the
"longterm" problem.  Then I think we can change the flag to a better name.

Second, It depends on how often you are registering memory.  I have spoken with
some RDMA users who consider MR in the performance path...  For the overall
application performance.  I don't have the numbers as the tests for HFI1 were
done a long time ago.  But there was a significant advantage.  Some of which is
probably due to the fact that you don't have to hold mmap_sem.

Finally, architecturally I think it would be good for everyone to use *_fast.
There are patches submitted to the RDMA list which would allow the use of
*_fast (they reworking the use of mmap_sem) and as soon as they are accepted
I'll submit a patch to convert the RDMA core as well.  Also to this point
others are looking to use *_fast.[2]

As an asside, Jasons pointed out in my previous submission that *_fast and
*_unlocked look very much the same.  I agree and I think further cleanup will
be coming.  But I'm focused on getting the final solution for DAX at the
moment.

Ira

Re: [PATCH] x86/nmi: ratelimit unknown nmi logs

2019-02-20 Thread Olof Johansson

On Wed, Feb 20, 2019 at 12:59 AM Peter Zijlstra  wrote:
>
> On Tue, Feb 19, 2019 at 05:48:36PM -0800, Olof Johansson wrote:
> > Getting notified of unknown NMIs is obviously important, but getting
> > notified on every single one, especially on larger systems with slow
> > (serial) console causes more harm than good when it's a known noisy
> > non-relevant event.
> >
> > So, let's ratelimit to avoid locking up the system.
>
> What kind of bonghit broken crap system is that?
>
> That is; this _really_ should not happen, and this is a bandaid, not
> fixing the cause.

Oh, I agree -- this shouldn't happen, and it's being debugged and fixed.

So, I'm not looking at this as a bandaid to the real problem, but
there's also no reason to DoS the system with prink when it does
occur. If you want to configure the system to panic on unknown NMI
there are already hooks for it.

I'm obviously happy to carry local patches for this, since it's a
temporary problem. But yet again, I don't see a reason to have the
kernel run off the rails for this condition.

-Olof

Re: [PATCH v15 15/15] tracing: Add hist trigger action 'expected fail' test case

2019-02-20 Thread Steven Rostedt

On Wed, 20 Feb 2019 11:38:22 -0600
Tom Zanussi  wrote:

> Hi Steve,
> 
> On Wed, 2019-02-20 at 12:17 -0500, Steven Rostedt wrote:
> > On Wed, 13 Feb 2019 17:42:55 -0600
> > Tom Zanussi  wrote:
> >   
> > > From: Tom Zanussi 
> > > 
> > > Add a test case verifying that basic action combinations fail as
> > > expected.
> > >   
> > 
> > Hi Tom,
> > 
> > This test appears to fail:
> > 
> > # echo 'hist:keys=comm:onmatch(sched.sched_wakeup).save(comm,prio)'  
> > >> /sys/kernel/debug/tracing/events/sched/sched_waking/trigger  
> > -bash: echo: write error: Invalid argument
> > 
> > # cat /sys/kernel/debug/tracing/events/sched/sched_waking/hist 
> > 
> > ERROR: action parsing: Handler doesn't support action: save
> >   Last command: keys=comm:onmatch(sched.sched_wakeup).save(comm,prio)
> > 
> > 
> > Is the "save" feature implemented here? It's in the README too.
> > Should
> > it be removed?
> >   
> 
> The "save" feature is implemented, but it's not currently supported
> with onmatch(), which is why it fails, and is used in the xfail test,
> since it's expected to.  So, in this case, the command fails, which
> means the xfail test actually passed.  ;-)
> 
> There are other tests in the inter-event testcases that use save() but
> with onmax() and onchange(), and they pass.

So the test needs to pass on failure?

Because, it shouldn't be flagged as a failure in the test suite.

-- Steve

> 
> Hope that explains things in this case,
> 
> Tom
> 
> > -- Steve
> >   
> > > Signed-off-by: Tom Zanussi 
> > > ---
> > >  .../inter-event/trigger-action-hist-xfail.tc   | 30
> > > ++
> > >  1 file changed, 30 insertions(+)
> > >  create mode 100644
> > > tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-
> > > action-hist-xfail.tc
> > > 
> > > diff --git a/tools/testing/selftests/ftrace/test.d/trigger/inter-
> > > event/trigger-action-hist-xfail.tc
> > > b/tools/testing/selftests/ftrace/test.d/trigger/inter-
> > > event/trigger-action-hist-xfail.tc
> > > new file mode 100644
> > > index ..1221240f8cf6
> > > --- /dev/null
> > > +++ b/tools/testing/selftests/ftrace/test.d/trigger/inter-
> > > event/trigger-action-hist-xfail.tc
> > > @@ -0,0 +1,30 @@
> > > +#!/bin/sh
> > > +# SPDX-License-Identifier: GPL-2.0
> > > +# description: event trigger - test inter-event histogram trigger
> > > expected fail actions
> > > +
> > > +fail() { #msg
> > > +echo $1
> > > +exit_fail
> > > +}
> > > +
> > > +if [ ! -f set_event ]; then
> > > +echo "event tracing is not supported"
> > > +exit_unsupported
> > > +fi
> > > +
> > > +if [ ! -f snapshot ]; then
> > > +echo "snapshot is not supported"
> > > +exit_unsupported
> > > +fi
> > > +
> > > +grep -q "snapshot()" README || exit_unsupported # version issue
> > > +
> > > +echo "Test expected snapshot action failure"
> > > +
> > > +echo 'hist:keys=comm:onmatch(sched.sched_wakeup).snapshot()' >>
> > > /sys/kernel/debug/tracing/events/sched/sched_waking/trigger &&
> > > exit_fail
> > > +
> > > +echo "Test expected save action failure"
> > > +
> > > +echo 'hist:keys=comm:onmatch(sched.sched_wakeup).save(comm,prio)'  
> > > >> /sys/kernel/debug/tracing/events/sched/sched_waking/trigger &&  
> > > exit_fail
> > > +
> > > +exit_xfail  
> > 
> >

Re: [PATCH v2] sched/topology: fix kernel crash when a CPU is hotplugged in a memoryless node

2019-02-20 Thread Laurent Vivier

On 20/02/2019 18:08, Peter Zijlstra wrote:
> On Wed, Feb 20, 2019 at 05:55:20PM +0100, Laurent Vivier wrote:
>> index 3f35ba1d8fde..372278605f0d 100644
>> --- a/kernel/sched/topology.c
>> +++ b/kernel/sched/topology.c
>> @@ -1651,6 +1651,7 @@ void sched_init_numa(void)
>>   */
>>  tl[i++] = (struct sched_domain_topology_level){
>>  .mask = sd_numa_mask,
>> +.flags = SDTL_OVERLAP,
> 
> This makes no sense what so ever. The numa identify node should not have
> overlap with other domains.
> 
> Are you sure this is not because of the utterly broken powerpc nonsense
> where they move CPUs between nodes?

No, I'm not sure. This why I've Cc: powerpc folks. My conclusion is only
based on the before/after changes.

I've tested some patches from powerpc ML, but they don't fix this problem:
  powerpc/numa: Perform full re-add of CPU for PRRN/VPHN topology update
  powerpc/pseries: Perform full re-add of CPU for topology update
post-migration

So the only reason I can see to have a corrupted sched_group list is the
sched_domain_span() fonction doesn't return a correct cpumask for the
domain once a new CPU is added.

Thanks,
Laurent

Re: [PATCH] iwlwifi: mvm: Use div64_s64 instead of do_div in iwl_mvm_debug_range_resp

2019-02-20 Thread Nathan Chancellor

On Wed, Feb 20, 2019 at 11:51:34AM +0100, Arnd Bergmann wrote:
> On Tue, Feb 19, 2019 at 7:22 PM Nathan Chancellor
>  wrote:
> >
> 
> > diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/ftm-initiator.c 
> > b/drivers/net/wireless/intel/iwlwifi/mvm/ftm-initiator.c
> > index e9822a3ec373..92b22250eb7d 100644
> > --- a/drivers/net/wireless/intel/iwlwifi/mvm/ftm-initiator.c
> > +++ b/drivers/net/wireless/intel/iwlwifi/mvm/ftm-initiator.c
> > @@ -462,7 +462,7 @@ static void iwl_mvm_debug_range_resp(struct iwl_mvm 
> > *mvm, u8 index,
> >  {
> > s64 rtt_avg = res->ftm.rtt_avg * 100;
> >
> > -   do_div(rtt_avg, );
> > +   div64_s64(rtt_avg, );
> 
> This is wrong: div64_s64 does not modify its argument like do_div(), but
> it returns the result instead. You also don't want to divide by a 64-bit
> value when the second argument is a small constant.
> 
> I think the correct way should be
> 
>s64 rtt_avg = div_s64(res->ftm.rtt_avg * 100, );
> 
> If you know that the value is positive, using unsigned types
> and div_u64() would be slightly faster.
> 
>   Arnd

Thanks for the review and explanation, Arnd.

Luca, could you drop this version so I can resend it?

Nathan

Re: [PATCH v5 0/3] drm/vc4: Add a load tracker

2019-02-20 Thread Eric Anholt

Paul Kocialkowski  writes:

> Hi,
>
> Here is a fourth iteration of the VC4 load tracking series, which was
> initially developed by Boris Brezillon and that I have now taken over.
>
> This new iteration takes in account comments from v3 and comes with a
> new approach for avoiding underrun reports when reconfiguring the
> pipeline. It is now based on detection instead of delaying the underrun
> interrupt unmasking.
>
> It can be tested with a dedicated IGT GPU Tools series:
>   VC4 load tracker testing

Series is:

Reviewed-by: Eric Anholt 

Thanks for persisting on this!


signature.asc
Description: PGP signature

Re: [PATCH v2 1/3] libertas_tf: move hardware callbacks to a separate structure

2019-02-20 Thread Kalle Valo

Lubomir Rintel  wrote:

> We'll need to talk to the firmware to get a hardware address before
> device is registered with ieee80211 subsystem at the end of
> lbtf_add_card(). Hooking the callbacks after that is too late.
> 
> Signed-off-by: Lubomir Rintel 

3 patches applied to wireless-drivers-next.git, thanks.

be9d0d3fe139 libertas_tf: move hardware callbacks to a separate structure
baa0280f08c7 libertas_tf: don't defer firmware loading until start()
5d04b22b881d libertas_tf: get the MAC address before registering the device

-- 
https://patchwork.kernel.org/patch/10821819/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

Re: [PATCH][next] rtlwifi: rtl8192ce: fix typo, "PairwiseENcAlgorithm" -> "PairwiseEncAlgorithm"

2019-02-20 Thread Kalle Valo

Colin King  wrote:

> From: Colin Ian King 
> 
> There is an uppercase 'N' that should be a lowercase 'n', fix this.
> 
> Signed-off-by: Colin Ian King 

Patch applied to wireless-drivers-next.git, thanks.

0421dd4167ec rtlwifi: rtl8192ce: fix typo, "PairwiseENcAlgorithm" -> 
"PairwiseEncAlgorithm"

-- 
https://patchwork.kernel.org/patch/10821751/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

Applied "regulator: max77620: Add missing .owner field in regulator_desc" to the regulator tree

2019-02-20 Thread Mark Brown

The patch

   regulator: max77620: Add missing .owner field in regulator_desc

has been applied to the regulator tree at

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator.git 

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.  

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark

>From 96173b8c8b1cbbc39436b1592f37255ee5e723cb Mon Sep 17 00:00:00 2001
From: Axel Lin 
Date: Wed, 20 Feb 2019 09:53:27 +0800
Subject: [PATCH] regulator: max77620: Add missing .owner field in
 regulator_desc

Add missing .owner field in regulator_desc, which is used for refcounting.

Signed-off-by: Axel Lin 
Signed-off-by: Mark Brown 
---
 drivers/regulator/max77620-regulator.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/regulator/max77620-regulator.c 
b/drivers/regulator/max77620-regulator.c
index cd93cf53e23c..1607ac673e44 100644
--- a/drivers/regulator/max77620-regulator.c
+++ b/drivers/regulator/max77620-regulator.c
@@ -690,6 +690,7 @@ static const struct regulator_ops max77620_regulator_ops = {
.active_discharge_mask = MAX77620_SD_CFG1_ADE_MASK, \
.active_discharge_reg = MAX77620_REG_##_id##_CFG, \
.type = REGULATOR_VOLTAGE,  \
+   .owner = THIS_MODULE,   \
},  \
}
 
@@ -721,6 +722,7 @@ static const struct regulator_ops max77620_regulator_ops = {
.active_discharge_mask = MAX77620_LDO_CFG2_ADE_MASK, \
.active_discharge_reg = MAX77620_REG_##_id##_CFG2, \
.type = REGULATOR_VOLTAGE,  \
+   .owner = THIS_MODULE,   \
},  \
}
 
-- 
2.20.1

Applied "regulator: twl6030: Use regulator_list_voltage_linear_range for twl6030ldo_ops" to the regulator tree

2019-02-20 Thread Mark Brown

The patch

   regulator: twl6030: Use regulator_list_voltage_linear_range for 
twl6030ldo_ops

has been applied to the regulator tree at

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator.git 

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.  

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark

>From 4a43870ae166a1a0bbc44a7b5ee63653303827bb Mon Sep 17 00:00:00 2001
From: Axel Lin 
Date: Sun, 17 Feb 2019 21:48:08 +0800
Subject: [PATCH] regulator: twl6030: Use regulator_list_voltage_linear_range
 for twl6030ldo_ops

Use linear range to replace the twl6030ldo_list_voltage implementation.
With this change, the min_mV is not used and can be removed from
struct twlreg_info.

Signed-off-by: Axel Lin 
Signed-off-by: Mark Brown 
---
 drivers/regulator/twl6030-regulator.c | 73 ++-
 1 file changed, 27 insertions(+), 46 deletions(-)

diff --git a/drivers/regulator/twl6030-regulator.c 
b/drivers/regulator/twl6030-regulator.c
index 78b964539775..dcaa6512d760 100644
--- a/drivers/regulator/twl6030-regulator.c
+++ b/drivers/regulator/twl6030-regulator.c
@@ -31,9 +31,6 @@ struct twlreg_info {
/* twl resource ID, for resource control state machine */
u8  id;
 
-   /* chip constraints on regulator behavior */
-   u16 min_mV;
-
u8  flags;
 
/* used by regulator core */
@@ -252,27 +249,6 @@ static struct regulator_ops twl6030coresmps_ops = {
.get_voltage= twl6030coresmps_get_voltage,
 };
 
-static int twl6030ldo_list_voltage(struct regulator_dev *rdev, unsigned sel)
-{
-   struct twlreg_info *info = rdev_get_drvdata(rdev);
-
-   switch (sel) {
-   case 0:
-   return 0;
-   case 1 ... 24:
-   /* Linear mapping from 0001 to 00011000:
-* Absolute voltage value = 1.0 V + 0.1 V Ã (sel â 0001)
-*/
-   return (info->min_mV + 100 * (sel - 1)) * 1000;
-   case 25 ... 30:
-   return -EINVAL;
-   case 31:
-   return 275;
-   default:
-   return -EINVAL;
-   }
-}
-
 static int
 twl6030ldo_set_voltage_sel(struct regulator_dev *rdev, unsigned selector)
 {
@@ -291,7 +267,7 @@ static int twl6030ldo_get_voltage_sel(struct regulator_dev 
*rdev)
 }
 
 static struct regulator_ops twl6030ldo_ops = {
-   .list_voltage   = twl6030ldo_list_voltage,
+   .list_voltage   = regulator_list_voltage_linear_range,
 
.set_voltage_sel = twl6030ldo_set_voltage_sel,
.get_voltage_sel = twl6030ldo_get_voltage_sel,
@@ -513,6 +489,11 @@ static struct regulator_ops twlsmps_ops = {
 };
 
 /*--*/
+static const struct regulator_linear_range twl6030ldo_linear_range[] = {
+   REGULATOR_LINEAR_RANGE(0, 0, 0, 0),
+   REGULATOR_LINEAR_RANGE(100, 1, 24, 10),
+   REGULATOR_LINEAR_RANGE(275, 31, 31, 0),
+};
 
 #define TWL6030_ADJUSTABLE_SMPS(label) \
 static const struct twlreg_info TWL6030_INFO_##label = { \
@@ -525,28 +506,30 @@ static const struct twlreg_info TWL6030_INFO_##label = { \
}, \
}
 
-#define TWL6030_ADJUSTABLE_LDO(label, offset, min_mVolts) \
+#define TWL6030_ADJUSTABLE_LDO(label, offset) \
 static const struct twlreg_info TWL6030_INFO_##label = { \
.base = offset, \
-   .min_mV = min_mVolts, \
.desc = { \
.name = #label, \
.id = TWL6030_REG_##label, \
.n_voltages = 32, \
+   .linear_ranges = twl6030ldo_linear_range, \
+   .n_linear_ranges = ARRAY_SIZE(twl6030ldo_linear_range), \
.ops = _ops, \
.type = REGULATOR_VOLTAGE, \
.owner = THIS_MODULE, \
}, \
}
 
-#define TWL6032_ADJUSTABLE_LDO(label, offset, min_mVolts) \
+#define TWL6032_ADJUSTABLE_LDO(label, offset) \
 static const struct twlreg_info TWL6032_INFO_##label = { \
.base = offset, \
-   .min_mV = min_mVolts, \
.desc = { \
.name = #label, \
.id = TWL6032_REG_##label, \
.n_voltages = 32, \
+   .linear_ranges = twl6030ldo_linear_range, \
+   .n_linear_ranges =

Applied "regulator: tps65218.c: fix LS3 issues" to the regulator tree

2019-02-20 Thread Mark Brown

The patch

   regulator: tps65218.c: fix LS3 issues

has been applied to the regulator tree at

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator.git 

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.  

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark

>From 71a64ba2031f4b67769618b9e35389906026130d Mon Sep 17 00:00:00 2001
From: Christian Hohnstaedt 
Date: Wed, 20 Feb 2019 09:15:50 +0100
Subject: [PATCH] regulator: tps65218.c: fix LS3 issues
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

- Fix list of valid LS3 currents from mA to ÂµA
- Fix selection of min/max microAmps of LS3.
  Selecting one of the configured values as max value now really
  selects it instead of the next lower one

Signed-off-by: Christian Hohnstaedt 
Signed-off-by: Mark Brown 
---
 drivers/regulator/tps65218-regulator.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/regulator/tps65218-regulator.c 
b/drivers/regulator/tps65218-regulator.c
index 6209beee1018..5dd559eabc81 100644
--- a/drivers/regulator/tps65218-regulator.c
+++ b/drivers/regulator/tps65218-regulator.c
@@ -188,7 +188,8 @@ static struct regulator_ops tps65218_ldo1_dcdc34_ops = {
.set_suspend_disable= tps65218_pmic_set_suspend_disable,
 };
 
-static const int ls3_currents[] = { 100, 200, 500, 1000 };
+static const int ls3_currents[] = { 10, 20, 50, 100 };
+
 
 static int tps65218_pmic_set_input_current_lim(struct regulator_dev *dev,
   int lim_uA)
@@ -214,7 +215,7 @@ static int tps65218_pmic_set_current_limit(struct 
regulator_dev *dev,
unsigned int num_currents = ARRAY_SIZE(ls3_currents);
struct tps65218 *tps = rdev_get_drvdata(dev);
 
-   while (index < num_currents && ls3_currents[index] < max_uA)
+   while (index < num_currents && ls3_currents[index] <= max_uA)
index++;
 
index--;
-- 
2.20.1

Applied "regulator: twl6030: Constify regulator_ops" to the regulator tree

2019-02-20 Thread Mark Brown

The patch

   regulator: twl6030: Constify regulator_ops

has been applied to the regulator tree at

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator.git 

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.  

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark

>From 606640bbbe449d05d12b51b0500e6b535ec54987 Mon Sep 17 00:00:00 2001
From: Axel Lin 
Date: Sun, 17 Feb 2019 21:48:09 +0800
Subject: [PATCH] regulator: twl6030: Constify regulator_ops

Signed-off-by: Axel Lin 
Signed-off-by: Mark Brown 
---
 drivers/regulator/twl6030-regulator.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/regulator/twl6030-regulator.c 
b/drivers/regulator/twl6030-regulator.c
index dcaa6512d760..15f19df6bc5d 100644
--- a/drivers/regulator/twl6030-regulator.c
+++ b/drivers/regulator/twl6030-regulator.c
@@ -244,7 +244,7 @@ static int twl6030coresmps_get_voltage(struct regulator_dev 
*rdev)
return -ENODEV;
 }
 
-static struct regulator_ops twl6030coresmps_ops = {
+static const struct regulator_ops twl6030coresmps_ops = {
.set_voltage= twl6030coresmps_set_voltage,
.get_voltage= twl6030coresmps_get_voltage,
 };
@@ -266,7 +266,7 @@ static int twl6030ldo_get_voltage_sel(struct regulator_dev 
*rdev)
return vsel;
 }
 
-static struct regulator_ops twl6030ldo_ops = {
+static const struct regulator_ops twl6030ldo_ops = {
.list_voltage   = regulator_list_voltage_linear_range,
 
.set_voltage_sel = twl6030ldo_set_voltage_sel,
@@ -281,7 +281,7 @@ static struct regulator_ops twl6030ldo_ops = {
.get_status = twl6030reg_get_status,
 };
 
-static struct regulator_ops twl6030fixed_ops = {
+static const struct regulator_ops twl6030fixed_ops = {
.list_voltage   = regulator_list_voltage_linear,
 
.enable = twl6030reg_enable,
@@ -472,7 +472,7 @@ static int twl6030smps_get_voltage_sel(struct regulator_dev 
*rdev)
return twlreg_read(info, TWL_MODULE_PM_RECEIVER, VREG_VOLTAGE_SMPS);
 }
 
-static struct regulator_ops twlsmps_ops = {
+static const struct regulator_ops twlsmps_ops = {
.list_voltage   = twl6030smps_list_voltage,
.map_voltage= twl6030smps_map_voltage,
 
-- 
2.20.1

Applied "regulator: max77650: Add missing .owner field in regulator_desc" to the regulator tree

2019-02-20 Thread Mark Brown

The patch

   regulator: max77650: Add missing .owner field in regulator_desc

has been applied to the regulator tree at

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator.git 

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.  

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark

>From 721efb504d28ce0bc704643ea2d952b9e87ed9f7 Mon Sep 17 00:00:00 2001
From: Axel Lin 
Date: Wed, 20 Feb 2019 09:53:28 +0800
Subject: [PATCH] regulator: max77650: Add missing .owner field in
 regulator_desc

Add missing .owner field in regulator_desc, which is used for refcounting.

Signed-off-by: Axel Lin 
Signed-off-by: Mark Brown 
---
 drivers/regulator/max77650-regulator.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/regulator/max77650-regulator.c 
b/drivers/regulator/max77650-regulator.c
index 5afb91400832..411912d5278b 100644
--- a/drivers/regulator/max77650-regulator.c
+++ b/drivers/regulator/max77650-regulator.c
@@ -319,6 +319,7 @@ static struct max77650_regulator_desc max77650_LDO_desc = {
.active_discharge_reg   = MAX77650_REG_CNFG_LDO_B,
.enable_time= 100,
.type   = REGULATOR_VOLTAGE,
+   .owner  = THIS_MODULE,
},
.regA   = MAX77650_REG_CNFG_LDO_A,
.regB   = MAX77650_REG_CNFG_LDO_B,
@@ -343,6 +344,7 @@ static struct max77650_regulator_desc max77650_SBB0_desc = {
.active_discharge_reg   = MAX77650_REG_CNFG_SBB0_B,
.enable_time= 100,
.type   = REGULATOR_VOLTAGE,
+   .owner  = THIS_MODULE,
},
.regA   = MAX77650_REG_CNFG_SBB0_A,
.regB   = MAX77650_REG_CNFG_SBB0_B,
@@ -367,6 +369,7 @@ static struct max77650_regulator_desc max77650_SBB1_desc = {
.active_discharge_reg   = MAX77650_REG_CNFG_SBB1_B,
.enable_time= 100,
.type   = REGULATOR_VOLTAGE,
+   .owner  = THIS_MODULE,
},
.regA   = MAX77650_REG_CNFG_SBB1_A,
.regB   = MAX77650_REG_CNFG_SBB1_B,
@@ -390,6 +393,7 @@ static struct max77650_regulator_desc max77651_SBB1_desc = {
.active_discharge_reg   = MAX77650_REG_CNFG_SBB1_B,
.enable_time= 100,
.type   = REGULATOR_VOLTAGE,
+   .owner  = THIS_MODULE,
},
.regA   = MAX77650_REG_CNFG_SBB1_A,
.regB   = MAX77650_REG_CNFG_SBB1_B,
@@ -414,6 +418,7 @@ static struct max77650_regulator_desc max77650_SBB2_desc = {
.active_discharge_reg   = MAX77650_REG_CNFG_SBB2_B,
.enable_time= 100,
.type   = REGULATOR_VOLTAGE,
+   .owner  = THIS_MODULE,
},
.regA   = MAX77650_REG_CNFG_SBB2_A,
.regB   = MAX77650_REG_CNFG_SBB2_B,
@@ -438,6 +443,7 @@ static struct max77650_regulator_desc max77651_SBB2_desc = {
.active_discharge_reg   = MAX77650_REG_CNFG_SBB2_B,
.enable_time= 100,
.type   = REGULATOR_VOLTAGE,
+   .owner  = THIS_MODULE,
},
.regA   = MAX77650_REG_CNFG_SBB2_A,
.regB   = MAX77650_REG_CNFG_SBB2_B,
-- 
2.20.1

Applied "ASoC: samsung: odroid: Fix of_node refcount unbalance" to the asoc tree

2019-02-20 Thread Mark Brown

The patch

   ASoC: samsung: odroid: Fix of_node refcount unbalance

has been applied to the asoc tree at

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git 

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.  

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark

>From d832d2b246c516eacb2d0ba53ec17ed59c3cd62b Mon Sep 17 00:00:00 2001
From: Sylwester Nawrocki 
Date: Wed, 20 Feb 2019 12:06:07 +0100
Subject: [PATCH] ASoC: samsung: odroid: Fix of_node refcount unbalance

In odroid_audio_probe() some OF nodes are left without reference count
decrease after use. Fix it by ensuring required of_node_calls() are done
before exiting probe.

Reported-by: Takashi Iwai 
Signed-off-by: Sylwester Nawrocki 
Signed-off-by: Mark Brown 
---
 sound/soc/samsung/odroid.c | 19 ---
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/sound/soc/samsung/odroid.c b/sound/soc/samsung/odroid.c
index bd2c5163dc7f..c3b0f6c612cb 100644
--- a/sound/soc/samsung/odroid.c
+++ b/sound/soc/samsung/odroid.c
@@ -257,27 +257,31 @@ static int odroid_audio_probe(struct platform_device 
*pdev)
ret = of_parse_phandle_with_args(cpu, "sound-dai",
 "#sound-dai-cells", i, );
if (ret < 0)
-   return ret;
+   break;
 
if (!args.np) {
dev_err(dev, "sound-dai property parse error: %d\n", 
ret);
-   return -EINVAL;
+   ret = -EINVAL;
+   break;
}
 
ret = snd_soc_get_dai_name(, >cpu_dai_name);
of_node_put(args.np);
 
if (ret < 0)
-   return ret;
+   break;
}
+   if (ret == 0)
+   cpu_dai = of_parse_phandle(cpu, "sound-dai", 0);
 
-   cpu_dai = of_parse_phandle(cpu, "sound-dai", 0);
of_node_put(cpu);
of_node_put(codec);
+   if (ret < 0)
+   return ret;
 
ret = snd_soc_of_get_dai_link_codecs(dev, codec, codec_link);
if (ret < 0)
-   goto err_put_codec_n;
+   goto err_put_cpu_dai;
 
/* Set capture capability only for boards with the MAX98090 CODEC */
if (codec_link->num_codecs > 1) {
@@ -288,7 +292,7 @@ static int odroid_audio_probe(struct platform_device *pdev)
priv->sclk_i2s = of_clk_get_by_name(cpu_dai, "i2s_opclk1");
if (IS_ERR(priv->sclk_i2s)) {
ret = PTR_ERR(priv->sclk_i2s);
-   goto err_put_codec_n;
+   goto err_put_cpu_dai;
}
 
priv->clk_i2s_bus = of_clk_get_by_name(cpu_dai, "iis");
@@ -310,7 +314,8 @@ static int odroid_audio_probe(struct platform_device *pdev)
clk_put(priv->clk_i2s_bus);
 err_put_sclk:
clk_put(priv->sclk_i2s);
-err_put_codec_n:
+err_put_cpu_dai:
+   of_node_put(cpu_dai);
snd_soc_of_put_dai_link_codecs(codec_link);
return ret;
 }
-- 
2.20.1

Re: [PATCH v4 2/2] media: cedrus: Add H264 decoding support

2019-02-20 Thread Jernej Škrabec

Hi!

I really wanted to do another review on previous series but got distracted by 
analyzing one particulary troublesome H264 sample. It still doesn't work 
correctly, so I would ask you if you can test it with your stack (it might be 
userspace issue):

http://jernej.libreelec.tv/videos/problematic/test.mkv

Please take a look at my comments below.

Dne sreda, 20. februar 2019 ob 15:17:34 CET je Maxime Ripard napisal(a):
> Introduce some basic H264 decoding support in cedrus. So far, only the
> baseline profile videos have been tested, and some more advanced features
> used in higher profiles are not even implemented.

What is not yet implemented? Multi slice frame decoding, interlaced frames and 
decoding frames with width > 2048. Anything else?

> 
> Signed-off-by: Maxime Ripard 
> ---
>  drivers/staging/media/sunxi/cedrus/Makefile   |   3 +-
>  drivers/staging/media/sunxi/cedrus/cedrus.c   |  30 +-
>  drivers/staging/media/sunxi/cedrus/cedrus.h   |  38 +-
>  drivers/staging/media/sunxi/cedrus/cedrus_dec.c   |  13 +-
>  drivers/staging/media/sunxi/cedrus/cedrus_h264.c  | 584 +++-
>  drivers/staging/media/sunxi/cedrus/cedrus_hw.c|   4 +-
>  drivers/staging/media/sunxi/cedrus/cedrus_regs.h  |  91 ++-
>  drivers/staging/media/sunxi/cedrus/cedrus_video.c |   9 +-
>  8 files changed, 770 insertions(+), 2 deletions(-)
>  create mode 100644 drivers/staging/media/sunxi/cedrus/cedrus_h264.c
> 
> diff --git a/drivers/staging/media/sunxi/cedrus/Makefile
> b/drivers/staging/media/sunxi/cedrus/Makefile index
> e9dc68b7bcb6..aaf141fc58b6 100644
> --- a/drivers/staging/media/sunxi/cedrus/Makefile
> +++ b/drivers/staging/media/sunxi/cedrus/Makefile
> @@ -1,3 +1,4 @@
>  obj-$(CONFIG_VIDEO_SUNXI_CEDRUS) += sunxi-cedrus.o
> 
> -sunxi-cedrus-y = cedrus.o cedrus_video.o cedrus_hw.o cedrus_dec.o
> cedrus_mpeg2.o +sunxi-cedrus-y = cedrus.o cedrus_video.o cedrus_hw.o
> cedrus_dec.o \ +   cedrus_mpeg2.o cedrus_h264.o
> diff --git a/drivers/staging/media/sunxi/cedrus/cedrus.c
> b/drivers/staging/media/sunxi/cedrus/cedrus.c index
> ff11cbeba205..c1607142d998 100644
> --- a/drivers/staging/media/sunxi/cedrus/cedrus.c
> +++ b/drivers/staging/media/sunxi/cedrus/cedrus.c
> @@ -40,6 +40,35 @@ static const struct cedrus_control cedrus_controls[] = {
>   .codec  = CEDRUS_CODEC_MPEG2,
>   .required   = false,
>   },
> + {
> + .id = 
V4L2_CID_MPEG_VIDEO_H264_DECODE_PARAMS,
> + .elem_size  = sizeof(struct 
v4l2_ctrl_h264_decode_param),
> + .codec  = CEDRUS_CODEC_H264,
> + .required   = true,
> + },
> + {
> + .id = 
V4L2_CID_MPEG_VIDEO_H264_SLICE_PARAMS,
> + .elem_size  = sizeof(struct v4l2_ctrl_h264_slice_param),
> + .codec  = CEDRUS_CODEC_H264,
> + .required   = true,
> + },
> + {
> + .id = V4L2_CID_MPEG_VIDEO_H264_SPS,
> + .elem_size  = sizeof(struct v4l2_ctrl_h264_sps),
> + .codec  = CEDRUS_CODEC_H264,
> + .required   = true,
> + },
> + {
> + .id = V4L2_CID_MPEG_VIDEO_H264_PPS,
> + .elem_size  = sizeof(struct v4l2_ctrl_h264_pps),
> + .codec  = CEDRUS_CODEC_H264,
> + .required   = true,
> + },
> + {
> + .id = 
V4L2_CID_MPEG_VIDEO_H264_SCALING_MATRIX,
> + .elem_size  = sizeof(struct 
v4l2_ctrl_h264_scaling_matrix),
> + .codec  = CEDRUS_CODEC_H264,
> + },
>  };
> 
>  #define CEDRUS_CONTROLS_COUNTARRAY_SIZE(cedrus_controls)
> @@ -278,6 +307,7 @@ static int cedrus_probe(struct platform_device *pdev)
>   }
> 
>   dev->dec_ops[CEDRUS_CODEC_MPEG2] = _dec_ops_mpeg2;
> + dev->dec_ops[CEDRUS_CODEC_H264] = _dec_ops_h264;
> 
>   mutex_init(>dev_mutex);
> 
> diff --git a/drivers/staging/media/sunxi/cedrus/cedrus.h
> b/drivers/staging/media/sunxi/cedrus/cedrus.h index
> 4aedd24a9848..8c64f9a27e9d 100644
> --- a/drivers/staging/media/sunxi/cedrus/cedrus.h
> +++ b/drivers/staging/media/sunxi/cedrus/cedrus.h
> @@ -30,7 +30,7 @@
> 
>  enum cedrus_codec {
>   CEDRUS_CODEC_MPEG2,
> -
> + CEDRUS_CODEC_H264,
>   CEDRUS_CODEC_LAST,
>  };
> 
> @@ -40,6 +40,12 @@ enum cedrus_irq_status {
>   CEDRUS_IRQ_OK,
>  };
> 
> +enum cedrus_h264_pic_type {
> + CEDRUS_H264_PIC_TYPE_FRAME  = 0,
> + CEDRUS_H264_PIC_TYPE_FIELD,
> + CEDRUS_H264_PIC_TYPE_MBAFF,
> +};
> +
>  struct cedrus_control {
>   u32 id;
>   u32 elem_size;
> @@ -47,6 +53,14 @@ struct cedrus_control {
>   unsigned char   required:1;
>  };
> 
> +struct cedrus_h264_run {
> + const struct v4l2_ctrl_h264_decode_param*decode_param;
> + const struct v4l2_ctrl_h264_pps *pps;
> + const

Re: [GIT PULL] mtd: Fixes for 5.0/5.0-rc8

2019-02-20 Thread pr-tracker-bot

The pull request you sent on Wed, 20 Feb 2019 08:56:53 +0100:

> git://git.infradead.org/linux-mtd.git tags/mtd/fixes-for-5.0-rc8

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/7d9d592caf8cc5d91f7923c5e717b69d0b1e246f

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker

Re: [GIT PULL] sound fixes for 5.0

2019-02-20 Thread pr-tracker-bot

The pull request you sent on Wed, 20 Feb 2019 11:42:46 +0100:

> git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git tags/sound-5.0

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/2137397c92aec3713fa10be3c9b830f9a1674e60

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker

Re: [GIT PULL] GPIO fixes for the v5.0 series

2019-02-20 Thread pr-tracker-bot

The pull request you sent on Wed, 20 Feb 2019 09:03:22 +0100:

> git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio.git 
> tags/gpio-v5.0-4

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/c828c2651b9a8184e1414fa0611d18b84d3847dd

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker

Re: [GIT PULL] pin control fixes for v5.0

2019-02-20 Thread pr-tracker-bot

The pull request you sent on Wed, 20 Feb 2019 09:08:25 +0100:

> git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl.git 
> tags/pinctrl-v5.0-3

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/fb83f15ef9dd984834bc60b380efbeffdf1ecc04

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker

Re: [PATCH v6 4/6] powerpc/32: Add KASAN support

2019-02-20 Thread Christophe Leroy





Le 19/02/2019 à 18:23, Christophe Leroy a écrit :

This patch adds KASAN support for PPC32.

The KASAN shadow area is located between the vmalloc area and the
fixmap area.

KASAN_SHADOW_OFFSET is calculated in asm/kasan.h and extracted
by Makefile prepare rule via asm-offsets.h

For modules, the shadow area is allocated at module_alloc().

Note that on book3s it will only work on the 603 because the other
ones use hash table and can therefore not share a single PTE table
covering the entire early KASAN shadow area.

Signed-off-by: Christophe Leroy 
---
  arch/powerpc/Kconfig  |   1 +
  arch/powerpc/Makefile |   7 ++
  arch/powerpc/include/asm/book3s/32/pgtable.h  |   2 +
  arch/powerpc/include/asm/highmem.h|  10 +-
  arch/powerpc/include/asm/kasan.h  |  23 
  arch/powerpc/include/asm/nohash/32/pgtable.h  |   2 +
  arch/powerpc/include/asm/setup.h  |   5 +
  arch/powerpc/kernel/Makefile  |   9 +-
  arch/powerpc/kernel/asm-offsets.c |   4 +
  arch/powerpc/kernel/head_32.S |   3 +
  arch/powerpc/kernel/head_40x.S|   3 +
  arch/powerpc/kernel/head_44x.S|   3 +
  arch/powerpc/kernel/head_8xx.S|   3 +
  arch/powerpc/kernel/head_fsl_booke.S  |   3 +
  arch/powerpc/kernel/setup-common.c|   2 +
  arch/powerpc/lib/Makefile |   8 ++
  arch/powerpc/mm/Makefile  |   1 +
  arch/powerpc/mm/kasan/Makefile|   5 +
  arch/powerpc/mm/kasan/kasan_init_32.c | 147 ++
  arch/powerpc/mm/mem.c |   4 +
  arch/powerpc/mm/ptdump/dump_linuxpagetables.c |   8 ++


@Daniel (and others), note that to apply properly, this requires my 
other patch which moves the dumping files in a arch/powerpc/mm/ptdump/ 
subdir.


Christophe


  arch/powerpc/purgatory/Makefile   |   3 +
  arch/powerpc/xmon/Makefile|   1 +
  23 files changed, 253 insertions(+), 4 deletions(-)
  create mode 100644 arch/powerpc/mm/kasan/Makefile
  create mode 100644 arch/powerpc/mm/kasan/kasan_init_32.c

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 08908219fba9..850b06def84f 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -175,6 +175,7 @@ config PPC
select GENERIC_TIME_VSYSCALL
select HAVE_ARCH_AUDITSYSCALL
select HAVE_ARCH_JUMP_LABEL
+   select HAVE_ARCH_KASAN  if PPC32
select HAVE_ARCH_KGDB
select HAVE_ARCH_MMAP_RND_BITS
select HAVE_ARCH_MMAP_RND_COMPAT_BITS   if COMPAT
diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index ac033341ed55..f0738099e31e 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -427,6 +427,13 @@ else
  endif
  endif
  
+ifdef CONFIG_KASAN

+prepare: kasan_prepare
+
+kasan_prepare: prepare0
+   $(eval KASAN_SHADOW_OFFSET = $(shell awk '{if ($$2 == 
"KASAN_SHADOW_OFFSET") print $$3;}' include/generated/asm-offsets.h))
+endif
+
  # Check toolchain versions:
  # - gcc-4.6 is the minimum kernel-wide version so nothing required.
  checkbin:
diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h 
b/arch/powerpc/include/asm/book3s/32/pgtable.h
index 49d76adb9bc5..4543016f80ca 100644
--- a/arch/powerpc/include/asm/book3s/32/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
@@ -141,6 +141,8 @@ static inline bool pte_user(pte_t pte)
   */
  #ifdef CONFIG_HIGHMEM
  #define KVIRT_TOP PKMAP_BASE
+#elif defined(CONFIG_KASAN)
+#define KVIRT_TOP  KASAN_SHADOW_START
  #else
  #define KVIRT_TOP (0xfe00UL)  /* for now, could be FIXMAP_BASE ? */
  #endif
diff --git a/arch/powerpc/include/asm/highmem.h 
b/arch/powerpc/include/asm/highmem.h
index a4b65b186ec6..483b90025bef 100644
--- a/arch/powerpc/include/asm/highmem.h
+++ b/arch/powerpc/include/asm/highmem.h
@@ -28,6 +28,7 @@
  #include 
  #include 
  #include 
+#include 
  
  extern pte_t *kmap_pte;

  extern pgprot_t kmap_prot;
@@ -50,10 +51,15 @@ extern pte_t *pkmap_page_table;
  #define PKMAP_ORDER   9
  #endif
  #define LAST_PKMAP(1 << PKMAP_ORDER)
+#ifdef CONFIG_KASAN
+#define PKMAP_TOP  KASAN_SHADOW_START
+#else
+#define PKMAP_TOP  FIXADDR_START
+#endif
  #ifndef CONFIG_PPC_4K_PAGES
-#define PKMAP_BASE (FIXADDR_START - PAGE_SIZE*(LAST_PKMAP + 1))
+#define PKMAP_BASE (PKMAP_TOP - PAGE_SIZE*(LAST_PKMAP + 1))
  #else
-#define PKMAP_BASE ((FIXADDR_START - PAGE_SIZE*(LAST_PKMAP + 1)) & 
PMD_MASK)
+#define PKMAP_BASE ((PKMAP_TOP - PAGE_SIZE*(LAST_PKMAP + 1)) & PMD_MASK)
  #endif
  #define LAST_PKMAP_MASK   (LAST_PKMAP-1)
  #define PKMAP_NR(virt)  ((virt-PKMAP_BASE) >> PAGE_SHIFT)
diff --git a/arch/powerpc/include/asm/kasan.h b/arch/powerpc/include/asm/kasan.h
index 2efd0e42cfc9..0bc9148f5d87 100644
--- a/arch/powerpc/include/asm/kasan.h
+++ b/arch/powerpc/include/asm/kasan.h
@@ -12,4

Re: [PATCH] RDMA/mlx4: Spread completion vectors for proxy CQs

2019-02-20 Thread Håkon Bugge

> On 20 Feb 2019, at 18:14, Jason Gunthorpe  wrote:
> 
> On Tue, Feb 19, 2019 at 06:32:50PM +0100, Håkon Bugge wrote:
>>   Anyway, Jason mentioned in a private email that maybe we could use the
>>   new completion API or something? I am not familiar with that one
>>   (yet).
> 
> I was thinking of the stuff in core/cq.c - but it also doesn't have
> automatic comp_vector balancing. It is the logical place to put
> something like that though..
> 
> An API to manage a bundle of CPU affine CQ's is probably what most
> ULPs really need.. (it makes little sense to create a unique CQ for
> every QP)

ULPs behave way differently. E.g. RDS creates one tx and one rx CQ per QP.

As I wrote earlier, we do not have any modify_cq() that changes the comp_vector 
(EQ association). We can balance #CQ associated with the EQs, but we do not 
know their behaviour.

So, assume 2 completion EQs, and four CQs. CQa and CQb are associated with the 
first EQ, the two others with the second EQ. That's the "best" we can do. But, 
if CQa and CQb are the only ones generating events, we will have all interrupt 
processing on a single CPU. But if we now could modify CQa.comp_vector to be 
that of the second EQ, we could achieve balance. But not sure if the drivers 
are able to do this at all.

> alloc_bundle()

You mean alloc a bunch of CQs? How do you know their #cqes and cq_context?

Håkon

> get_cqn_for_flow(bundle)
> alloc_qp()
> destroy_qp()
> put_cqn_for_flow(bundle)
> destroy_bundle();
> 
> Let the core code balance the cqn's and allocate (shared) CQ
> resources.
> 
> Jason

Re: xarray reserve/release?

2019-02-20 Thread Jason Gunthorpe

On Wed, Feb 20, 2019 at 09:14:14AM -0800, Matthew Wilcox wrote:

> > void __xa_release(struct xarray *xa, unsigned long index)
> > {
> > XA_STATE(xas, xa, index);
> > void *curr;
> > 
> > curr = xas_load();
> > if (curr == XA_ZERO_ENTRY)
> > xas_store(, NULL);
> > }
> > 
> > ?
> 
> I decided to instead remove the magic from xa_cmpxchg().  I used
> to prohibit any internal entry being passed to the regular API, but
> I recently changed that with 76b4e5299565 ("XArray: Permit storing
> 2-byte-aligned pointers").  Now that we can pass XA_ZERO_ENTRY, I
> think this all makes much more sense.

Except that for allocating arrays xa_cmpxchg and xa_store now do
different things with NULL. Not necessarily bad, but if you have this
ABI variation it should be mentioned in the kdoc comment.

This is a bit worrysome though:

curr = xas_load();
-   if (curr == XA_ZERO_ENTRY)
-   curr = NULL;
if (curr == old) {

It means any cmpxchg user has to care explicitly about the possibility
for true-NULL vs reserved. Seems like a difficult API.

What about writing it like this:

   if ((curr == XA_ZERO_ENTRY && old == NULL) || curr == old)

? I can't think of a use case to cmpxchg against real-null only.

And here:
xas_store(, entry);
-   if (xa_track_free(xa))
+   if (xa_track_free(xa) && !old)
xas_clear_mark(, XA_FREE_MARK);

Should this be

if (xa_track_free(xa) && entry && !old)

? Ie we don't want to clear the XA_FREE_MARK if we just wrote NULL

Also I would think !curr is clearer? I assume the point is to not pay
the price of xas_clear_mark if we already know the index stored is
marked?

> > Also, I wonder if xa_reserve() is better written as as
> > 
> >xa_cmpxchg(xa, index, NULL, XA_ZERO_ENTRY)
> > 
> > Bit clearer what is going on..
> 
> Yes, I agree.  I've pushed a couple of new commits to
> http://git.infradead.org/users/willy/linux-dax.git/shortlog/refs/heads/xarray

That looks really readable now that reserve and release are tidy
paired operations.

Thanks,
Jason

Re: [PATCH 0/7] libnvdimm/pfn: Fix section-alignment padding

2019-02-20 Thread Jeff Moyer

Dan Williams  writes:

> On Tue, Feb 12, 2019 at 1:37 PM Dan Williams  wrote:
>>
>> Lately Linux has encountered platforms that collide Persistent Memory
>> regions between each other, specifically cases where ->start_pad needed
>> to be non-zero. This lead to commit ae86cbfef381 "libnvdimm, pfn: Pad
>> pfn namespaces relative to other regions". That commit allowed
>> namespaces to be mapped with devm_memremap_pages(). However dax
>> operations on those configurations currently fail if attempted within the
>> ->start_pad range because pmem_device->data_offset was still relative to
>> raw resource base not relative to the section aligned resource range
>> mapped by devm_memremap_pages().
>>
>> Luckily __bdev_dax_supported() caught these failures and simply disabled
>> dax. However, to fix this situation a non-backwards compatible change
>> needs to be made to the interpretation of the nd_pfn info-block.
>> ->start_pad needs to be accounted in ->map.map_offset (formerly
>> ->data_offset), and ->map.map_base (formerly ->phys_addr) needs to be
>> adjusted to the section aligned resource base used to establish
>> ->map.map formerly (formerly ->virt_addr).
>>
>> See patch 7 "libnvdimm/pfn: Fix 'start_pad' implementation" for more
>> details, and the ndctl patch series "Improve support + testing for
>> labels + info-blocks" for the corresponding regression test.
>
> Hello valued reviewers, can I plead for a sanity check of at least
> "libnvdimm/pfn: Introduce super-block minimum version requirements"
> and "libnvdimm/pfn: Fix 'start_pad' implementation"? In particular
> Jeff / Johannes this has end user / distro impact in that users may
> lose access to namespaces that are upgraded to v1.3 info-blocks and
> then boot an old kernel. I did not see a way around that sharp edge.

Yes, I'll take a look.

Cheers,
Jeff

Re: [PATCH 02/10] powerpc/603: Store PGDIR physical address in a SPRG

2019-02-20 Thread Christophe Leroy





Le 25/01/2019 à 13:34, Christophe Leroy a écrit :

Use SPRN_SPRG5 to store the current thread PGDIR and
avoid reading thread_struct->pgdir at every TLB miss.


I'll send out v2 with an additional patch getting rid of SPRN_SPRG_RTAS 
hence freeing SPRN_SPRG2 which I will use here instead of SPRN_SPRG5 so 
that all 6xx will benefit.


Christophe



Signed-off-by: Christophe Leroy 
---
  arch/powerpc/include/asm/reg.h  |  1 +
  arch/powerpc/kernel/cpu_setup_6xx.S |  4 
  arch/powerpc/kernel/head_32.S   | 28 
  3 files changed, 21 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index 1c98ef1f2d5b..ba0ab1a1431b 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -1169,6 +1169,7 @@
  #define SPRN_SPRG_SCRATCH1SPRN_SPRG1
  #define SPRN_SPRG_RTASSPRN_SPRG2
  #define SPRN_SPRG_603_LRU SPRN_SPRG4
+#define SPRN_SPRG_603_PGDIRSPRN_SPRG5
  #endif
  
  #ifdef CONFIG_40x

diff --git a/arch/powerpc/kernel/cpu_setup_6xx.S 
b/arch/powerpc/kernel/cpu_setup_6xx.S
index 8c069e96c478..4c91d1f640fe 100644
--- a/arch/powerpc/kernel/cpu_setup_6xx.S
+++ b/arch/powerpc/kernel/cpu_setup_6xx.S
@@ -24,6 +24,10 @@ BEGIN_MMU_FTR_SECTION
li  r10,0
mtspr   SPRN_SPRG_603_LRU,r10   /* init SW LRU tracking */
  END_MMU_FTR_SECTION_IFSET(MMU_FTR_NEED_DTLB_SW_LRU)
+   lis r10, (swapper_pg_dir - PAGE_OFFSET)@h
+   ori r10, r10, (swapper_pg_dir - PAGE_OFFSET)@l
+   mtspr   SPRN_SPRG_603_PGDIR, r10
+
  BEGIN_FTR_SECTION
bl  __init_fpu_registers
  END_FTR_SECTION_IFCLR(CPU_FTR_FPU_UNAVAILABLE)
diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S
index c2f564690778..dbd15e03952a 100644
--- a/arch/powerpc/kernel/head_32.S
+++ b/arch/powerpc/kernel/head_32.S
@@ -502,16 +502,15 @@ InstructionTLBMiss:
mfspr   r3,SPRN_IMISS
lis r1,PAGE_OFFSET@h/* check if kernel address */
cmplw   0,r1,r3
-   mfspr   r2,SPRN_SPRG_THREAD
+   mfspr   r2, SPRN_SPRG_603_PGDIR
li  r1,_PAGE_USER|_PAGE_PRESENT|_PAGE_EXEC /* low addresses tested 
as user */
-   lwz r2,PGDIR(r2)
bge-112f
mfspr   r2,SPRN_SRR1/* and MSR_PR bit from SRR1 */
rlwimi  r1,r2,32-12,29,29   /* shift MSR_PR to _PAGE_USER posn */
lis r2,swapper_pg_dir@ha/* if kernel address, use */
addir2,r2,swapper_pg_dir@l  /* kernel page table */
-112:   tophys(r2,r2)
-   rlwimi  r2,r3,12,20,29  /* insert top 10 bits of address */
+   tophys(r2,r2)
+112:   rlwimi  r2,r3,12,20,29  /* insert top 10 bits of address */
lwz r2,0(r2)/* get pmd entry */
rlwinm. r2,r2,0,0,19/* extract address of pte page */
beq-InstructionAddressInvalid   /* return if no mapping */
@@ -576,16 +575,15 @@ DataLoadTLBMiss:
mfspr   r3,SPRN_DMISS
lis r1,PAGE_OFFSET@h/* check if kernel address */
cmplw   0,r1,r3
-   mfspr   r2,SPRN_SPRG_THREAD
+   mfspr   r2, SPRN_SPRG_603_PGDIR
li  r1,_PAGE_USER|_PAGE_PRESENT /* low addresses tested as user */
-   lwz r2,PGDIR(r2)
bge-112f
mfspr   r2,SPRN_SRR1/* and MSR_PR bit from SRR1 */
rlwimi  r1,r2,32-12,29,29   /* shift MSR_PR to _PAGE_USER posn */
lis r2,swapper_pg_dir@ha/* if kernel address, use */
addir2,r2,swapper_pg_dir@l  /* kernel page table */
-112:   tophys(r2,r2)
-   rlwimi  r2,r3,12,20,29  /* insert top 10 bits of address */
+   tophys(r2,r2)
+112:   rlwimi  r2,r3,12,20,29  /* insert top 10 bits of address */
lwz r2,0(r2)/* get pmd entry */
rlwinm. r2,r2,0,0,19/* extract address of pte page */
beq-DataAddressInvalid  /* return if no mapping */
@@ -660,16 +658,15 @@ DataStoreTLBMiss:
mfspr   r3,SPRN_DMISS
lis r1,PAGE_OFFSET@h/* check if kernel address */
cmplw   0,r1,r3
-   mfspr   r2,SPRN_SPRG_THREAD
+   mfspr   r2, SPRN_SPRG_603_PGDIR
li  r1,_PAGE_RW|_PAGE_USER|_PAGE_PRESENT /* access flags */
-   lwz r2,PGDIR(r2)
bge-112f
mfspr   r2,SPRN_SRR1/* and MSR_PR bit from SRR1 */
rlwimi  r1,r2,32-12,29,29   /* shift MSR_PR to _PAGE_USER posn */
lis r2,swapper_pg_dir@ha/* if kernel address, use */
addir2,r2,swapper_pg_dir@l  /* kernel page table */
-112:   tophys(r2,r2)
-   rlwimi  r2,r3,12,20,29  /* insert top 10 bits of address */
+   tophys(r2,r2)
+112:   rlwimi  r2,r3,12,20,29  /* insert top 10 bits of address */
lwz r2,0(r2)/* get pmd entry */
rlwinm. r2,r2,0,0,19/* extract address of pte

Re: [PATCH v15 15/15] tracing: Add hist trigger action 'expected fail' test case

2019-02-20 Thread Tom Zanussi

Hi Steve,

On Wed, 2019-02-20 at 12:17 -0500, Steven Rostedt wrote:
> On Wed, 13 Feb 2019 17:42:55 -0600
> Tom Zanussi  wrote:
> 
> > From: Tom Zanussi 
> > 
> > Add a test case verifying that basic action combinations fail as
> > expected.
> > 
> 
> Hi Tom,
> 
> This test appears to fail:
> 
> # echo 'hist:keys=comm:onmatch(sched.sched_wakeup).save(comm,prio)'
> >> /sys/kernel/debug/tracing/events/sched/sched_waking/trigger
> -bash: echo: write error: Invalid argument
> 
> # cat /sys/kernel/debug/tracing/events/sched/sched_waking/hist 
> 
> ERROR: action parsing: Handler doesn't support action: save
>   Last command: keys=comm:onmatch(sched.sched_wakeup).save(comm,prio)
> 
> 
> Is the "save" feature implemented here? It's in the README too.
> Should
> it be removed?
> 

The "save" feature is implemented, but it's not currently supported
with onmatch(), which is why it fails, and is used in the xfail test,
since it's expected to.  So, in this case, the command fails, which
means the xfail test actually passed.  ;-)

There are other tests in the inter-event testcases that use save() but
with onmax() and onchange(), and they pass.

Hope that explains things in this case,

Tom

> -- Steve
> 
> > Signed-off-by: Tom Zanussi 
> > ---
> >  .../inter-event/trigger-action-hist-xfail.tc   | 30
> > ++
> >  1 file changed, 30 insertions(+)
> >  create mode 100644
> > tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-
> > action-hist-xfail.tc
> > 
> > diff --git a/tools/testing/selftests/ftrace/test.d/trigger/inter-
> > event/trigger-action-hist-xfail.tc
> > b/tools/testing/selftests/ftrace/test.d/trigger/inter-
> > event/trigger-action-hist-xfail.tc
> > new file mode 100644
> > index ..1221240f8cf6
> > --- /dev/null
> > +++ b/tools/testing/selftests/ftrace/test.d/trigger/inter-
> > event/trigger-action-hist-xfail.tc
> > @@ -0,0 +1,30 @@
> > +#!/bin/sh
> > +# SPDX-License-Identifier: GPL-2.0
> > +# description: event trigger - test inter-event histogram trigger
> > expected fail actions
> > +
> > +fail() { #msg
> > +echo $1
> > +exit_fail
> > +}
> > +
> > +if [ ! -f set_event ]; then
> > +echo "event tracing is not supported"
> > +exit_unsupported
> > +fi
> > +
> > +if [ ! -f snapshot ]; then
> > +echo "snapshot is not supported"
> > +exit_unsupported
> > +fi
> > +
> > +grep -q "snapshot()" README || exit_unsupported # version issue
> > +
> > +echo "Test expected snapshot action failure"
> > +
> > +echo 'hist:keys=comm:onmatch(sched.sched_wakeup).snapshot()' >>
> > /sys/kernel/debug/tracing/events/sched/sched_waking/trigger &&
> > exit_fail
> > +
> > +echo "Test expected save action failure"
> > +
> > +echo 'hist:keys=comm:onmatch(sched.sched_wakeup).save(comm,prio)'
> > >> /sys/kernel/debug/tracing/events/sched/sched_waking/trigger &&
> > exit_fail
> > +
> > +exit_xfail
> 
>

Re: [PATCH] kasan: turn off asan-stack for clang-8 and earlier

2019-02-20 Thread Arnd Bergmann

On Wed, Feb 20, 2019 at 6:00 PM Andrey Ryabinin  wrote:
> On 2/20/19 5:51 PM, Arnd Bergmann wrote:
> > On Wed, Feb 20, 2019 at 3:45 PM Andrey Konovalov  
> > wrote:
> > I would have to some more research, but I expect several hundred
> > patches before we get to a clean randconfig build with a broken
> > compiler.
>
> Manually maintaining asan-stack parameter for the sake of one broken compiler 
> isn't a great idea either.
>
> Couple alternative suggestions:
>
> 1) If we can't fix the problem or the cost of fixing is too high, maybe just 
> hide it? Disable -Wframe-larger-then on pre clang-9 compilers.
>
> 2) Fallback cflags. The idea is to try to compile every the file with "-mllvm 
> -asan-stack=1 -Wframe-larger-than=2048 -Werror" at first,
>  and fallback to "-mllvm -asan-stack=0" if failed. So it would be something 
> similar to $(call cc-option, -mllvm -asan-stack=1 -Wframe-larger-than=2048 
> -Werror, -mllvm -asan-stack=0)
>  except that "cc-option" tries options only once on some code example while  
> we need to try options on every file that we actually compile.
>  Honestly, I'm not sure that it's worthy to hack Kbuild engine for that 
> particular use-case.

My original plan was to put this under CONFIG_KASAN_EXTRA to allow you
to still enable it in older compilers, but you just removed that option ;-)

Maybe bringing it back would be a compromise? That way it's hidden from
all the build testing bots (because of the !CONFIG_COMPILE_TEST dependency),
but anyone who really wants it can still have the option, and set
CONFIG_FRAME_WARN
to whichever value they like.

 Arnd

[PATCH v3 00/16] powerpc/32: Use BATs/LTLBs for STRICT_KERNEL_RWX

2019-02-20 Thread Christophe Leroy

The purpose of this serie is to:
- use BATs with STRICT_KERNEL_RWX on book3s (See patch 13 for details.)
- use LTLBs with STRICT_KERNEL_RWX on 8xx (See patch 15 for a few details.)

v3:
- Reordered to avoid build failure due to setibat() not being used for several 
steps in the serie.
Now the patch using setibat() is next to the one adding setibat().
- Fixed mmu_mapin_ram() in patch 3 to return base in all cases, thanks Jonathan 
for the test
- Fixed build failure on 8xx when CONFIG_PERF_EVENTS is set due to too many 
instructions in Exception 0x1200
- Made 8M alignment for data the default on 8xx when STRICT_KERNEL_RWX is 
selected.
- Added patch 1 to not set additionnal bat on the wii when requesting nobats. 
The only purpose of this patch
is to be backported, as this function is removed later in the series.

v2:
- Fix patch 2 (was patch 3 in v1) based on feedback from Jonathan.
- Added support for 8xx with LTLBs.
- Added systematic population of pagetables for Abatron BDI.

Christophe Leroy (16):
  powerpc/wii: properly disable use of BATs when requested.
  powerpc/mm/32: add base address to mmu_mapin_ram()
  powerpc/mm/32s: rework mmu_mapin_ram()
  powerpc/mm/32s: use generic mmu_mapin_ram() for all blocks.
  powerpc/32: always populate page tables for Abatron BDI.
  powerpc/wii: remove wii_mmu_mapin_mem2()
  powerpc/mm/32s: use _PAGE_EXEC in setbat()
  powerpc/32: add helper to write into segment registers
  powerpc/mmu: add is_strict_kernel_rwx() helper
  powerpc/kconfig: define PAGE_SHIFT inside Kconfig
  powerpc/kconfig: define CONFIG_DATA_SHIFT and CONFIG_ETEXT_SHIFT
  powerpc/mm/32s: add setibat() clearibat() and update_bats()
  powerpc/mm/32s: Use BATs for STRICT_KERNEL_RWX
  powerpc/kconfig: make _etext and data areas alignment configurable on
Book3s 32
  powerpc/8xx: don't disable large TLBs with CONFIG_STRICT_KERNEL_RWX
  powerpc/kconfig: make _etext and data areas alignment configurable on
8xx

 arch/powerpc/Kconfig  |  60 +
 arch/powerpc/include/asm/book3s/32/mmu-hash.h |   2 +
 arch/powerpc/include/asm/book3s/32/pgtable.h  |  11 ++
 arch/powerpc/include/asm/mmu.h|  11 ++
 arch/powerpc/include/asm/nohash/32/mmu-8xx.h  |   3 +-
 arch/powerpc/include/asm/page.h   |  13 +-
 arch/powerpc/include/asm/reg.h|   5 +
 arch/powerpc/kernel/head_32.S |  35 +
 arch/powerpc/kernel/head_8xx.S|  54 ++--
 arch/powerpc/kernel/vmlinux.lds.S |   9 +-
 arch/powerpc/mm/40x_mmu.c |   2 +-
 arch/powerpc/mm/44x_mmu.c |   2 +-
 arch/powerpc/mm/8xx_mmu.c |  33 -
 arch/powerpc/mm/fsl_booke_mmu.c   |   2 +-
 arch/powerpc/mm/init_32.c |   6 +-
 arch/powerpc/mm/mmu_decl.h|  10 +-
 arch/powerpc/mm/pgtable_32.c  |  38 +++---
 arch/powerpc/mm/ppc_mmu_32.c  | 180 ++
 arch/powerpc/platforms/embedded6xx/wii.c  |  24 
 19 files changed, 390 insertions(+), 110 deletions(-)

-- 
2.13.3

[PATCH v3 05/16] powerpc/32: always populate page tables for Abatron BDI.

2019-02-20 Thread Christophe Leroy

When CONFIG_BDI_SWITCH is set, the page tables have to be populated
allthough large TLBs are used, because the BDI switch knows nothing
about those large TLBs which are handled directly in TLB miss logic.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/mm/pgtable_32.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
index fd665c32a1f7..94bd7d013557 100644
--- a/arch/powerpc/mm/pgtable_32.c
+++ b/arch/powerpc/mm/pgtable_32.c
@@ -261,7 +261,10 @@ void __init mapin_ram(void)
unsigned long top = base + reg->size;
 
base = mmu_mapin_ram(base, top);
-   __mapin_ram_chunk(base, top);
+   if (IS_ENABLED(CONFIG_BDI_SWITCH))
+   __mapin_ram_chunk(reg->base, top);
+   else
+   __mapin_ram_chunk(base, top);
}
 }
 
-- 
2.13.3

[PATCH v3 02/16] powerpc/mm/32: add base address to mmu_mapin_ram()

2019-02-20 Thread Christophe Leroy

At the time being, mmu_mapin_ram() always maps RAM from the beginning.
But some platforms like the WII have to map a second block of RAM.

This patch adds to mmu_mapin_ram() the base address of the block.
At the moment, only base address 0 is supported.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/mm/40x_mmu.c   | 2 +-
 arch/powerpc/mm/44x_mmu.c   | 2 +-
 arch/powerpc/mm/8xx_mmu.c   | 2 +-
 arch/powerpc/mm/fsl_booke_mmu.c | 2 +-
 arch/powerpc/mm/mmu_decl.h  | 2 +-
 arch/powerpc/mm/pgtable_32.c| 6 +++---
 arch/powerpc/mm/ppc_mmu_32.c| 2 +-
 7 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/mm/40x_mmu.c b/arch/powerpc/mm/40x_mmu.c
index 61ac468c87c6..b9cf6f8764b0 100644
--- a/arch/powerpc/mm/40x_mmu.c
+++ b/arch/powerpc/mm/40x_mmu.c
@@ -93,7 +93,7 @@ void __init MMU_init_hw(void)
 #define LARGE_PAGE_SIZE_16M(1<<24)
 #define LARGE_PAGE_SIZE_4M (1<<22)
 
-unsigned long __init mmu_mapin_ram(unsigned long top)
+unsigned long __init mmu_mapin_ram(unsigned long base, unsigned long top)
 {
unsigned long v, s, mapped;
phys_addr_t p;
diff --git a/arch/powerpc/mm/44x_mmu.c b/arch/powerpc/mm/44x_mmu.c
index ea2b9af08a48..aad127acdbaa 100644
--- a/arch/powerpc/mm/44x_mmu.c
+++ b/arch/powerpc/mm/44x_mmu.c
@@ -170,7 +170,7 @@ void __init MMU_init_hw(void)
flush_instruction_cache();
 }
 
-unsigned long __init mmu_mapin_ram(unsigned long top)
+unsigned long __init mmu_mapin_ram(unsigned long base, unsigned long top)
 {
unsigned long addr;
unsigned long memstart = memstart_addr & ~(PPC_PIN_SIZE - 1);
diff --git a/arch/powerpc/mm/8xx_mmu.c b/arch/powerpc/mm/8xx_mmu.c
index e2c32bdb6023..46bc26ef71e9 100644
--- a/arch/powerpc/mm/8xx_mmu.c
+++ b/arch/powerpc/mm/8xx_mmu.c
@@ -99,7 +99,7 @@ static void __init mmu_patch_cmp_limit(s32 *site, unsigned 
long mapped)
modify_instruction_site(site, 0x, (unsigned long)__va(mapped) >> 
16);
 }
 
-unsigned long __init mmu_mapin_ram(unsigned long top)
+unsigned long __init mmu_mapin_ram(unsigned long base, unsigned long top)
 {
unsigned long mapped;
 
diff --git a/arch/powerpc/mm/fsl_booke_mmu.c b/arch/powerpc/mm/fsl_booke_mmu.c
index 080d49b26c3a..210cbc1faf63 100644
--- a/arch/powerpc/mm/fsl_booke_mmu.c
+++ b/arch/powerpc/mm/fsl_booke_mmu.c
@@ -221,7 +221,7 @@ unsigned long map_mem_in_cams(unsigned long ram, int 
max_cam_idx, bool dryrun)
 #error "LOWMEM_CAM_NUM must be less than NUM_TLBCAMS"
 #endif
 
-unsigned long __init mmu_mapin_ram(unsigned long top)
+unsigned long __init mmu_mapin_ram(unsigned long base, unsigned long top)
 {
return tlbcam_addrs[tlbcam_index - 1].limit - PAGE_OFFSET + 1;
 }
diff --git a/arch/powerpc/mm/mmu_decl.h b/arch/powerpc/mm/mmu_decl.h
index c4a717da65eb..61730023dde3 100644
--- a/arch/powerpc/mm/mmu_decl.h
+++ b/arch/powerpc/mm/mmu_decl.h
@@ -130,7 +130,7 @@ extern void wii_memory_fixups(void);
  */
 #ifdef CONFIG_PPC32
 extern void MMU_init_hw(void);
-extern unsigned long mmu_mapin_ram(unsigned long top);
+unsigned long mmu_mapin_ram(unsigned long base, unsigned long top);
 #endif
 
 #ifdef CONFIG_PPC_FSL_BOOK3E
diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
index ded71126ce4c..b4858818523f 100644
--- a/arch/powerpc/mm/pgtable_32.c
+++ b/arch/powerpc/mm/pgtable_32.c
@@ -258,15 +258,15 @@ void __init mapin_ram(void)
 
 #ifndef CONFIG_WII
top = total_lowmem;
-   s = mmu_mapin_ram(top);
+   s = mmu_mapin_ram(0, top);
__mapin_ram_chunk(s, top);
 #else
if (!wii_hole_size) {
-   s = mmu_mapin_ram(total_lowmem);
+   s = mmu_mapin_ram(0, total_lowmem);
__mapin_ram_chunk(s, total_lowmem);
} else {
top = wii_hole_start;
-   s = mmu_mapin_ram(top);
+   s = mmu_mapin_ram(0, top);
__mapin_ram_chunk(s, top);
 
top = memblock_end_of_DRAM();
diff --git a/arch/powerpc/mm/ppc_mmu_32.c b/arch/powerpc/mm/ppc_mmu_32.c
index 3f4193201ee7..b260ced065b4 100644
--- a/arch/powerpc/mm/ppc_mmu_32.c
+++ b/arch/powerpc/mm/ppc_mmu_32.c
@@ -73,7 +73,7 @@ unsigned long p_block_mapped(phys_addr_t pa)
return 0;
 }
 
-unsigned long __init mmu_mapin_ram(unsigned long top)
+unsigned long __init mmu_mapin_ram(unsigned long base, unsigned long top)
 {
unsigned long tot, bl, done;
unsigned long max_size = (256<<20);
-- 
2.13.3

[PATCH v3 13/16] powerpc/mm/32s: Use BATs for STRICT_KERNEL_RWX

2019-02-20 Thread Christophe Leroy

Today, STRICT_KERNEL_RWX is based on the use of regular pages
to map kernel pages.

On Book3s 32, it has three consequences:
- Using pages instead of BAT for mapping kernel linear memory severely
impacts performance.
- Exec protection is not effective because no-execute cannot be set at
page level (except on 603 which doesn't have hash tables)
- Write protection is not effective because PP bits do not provide RO
mode for kernel-only pages (except on 603 which handles it in software
via PAGE_DIRTY)

On the 603+, we have:
- Independent IBAT and DBAT allowing limitation of exec parts.
- NX bit can be set in segment registers to forbit execution on memory
mapped by pages.
- RO mode on DBATs even for kernel-only blocks.

On the 601, there is nothing much we can do other than warn the user
about it, because:
- BATs are common to instructions and data.
- BAT do not provide RO mode for kernel-only blocks.
- segment registers don't have the NX bit.

In order to use IBAT for exec protection, this patch:
- Aligns _etext to BAT block sizes (128kb)
- Set NX bit in kernel segment register (Except on vmalloc area when
CONFIG_MODULES is selected)
- Maps kernel text with IBATs.

In order to use DBAT for exec protection, this patch:
- Aligns RW DATA to BAT block sizes (4M)
- Maps kernel RO area with write prohibited DBATs
- Maps remaining memory with remaining DBATs

Here is what we get with this patch on a 832x when activating
STRICT_KERNEL_RWX:

Symbols:
c000 T _stext
c068 R __start_rodata
c068 R _etext
c080 T __init_begin
c080 T _sinittext

~# cat /sys/kernel/debug/block_address_translation
---[ Instruction Block Address Translation ]---
0: 0xc000-0xc03f 0x Kernel EXEC coherent
1: 0xc040-0xc05f 0x0040 Kernel EXEC coherent
2: 0xc060-0xc067 0x0060 Kernel EXEC coherent
3: -
4: -
5: -
6: -
7: -

---[ Data Block Address Translation ]---
0: 0xc000-0xc07f 0x Kernel RO coherent
1: 0xc080-0xc0ff 0x0080 Kernel RW coherent
2: 0xc100-0xc1ff 0x0100 Kernel RW coherent
3: 0xc200-0xc3ff 0x0200 Kernel RW coherent
4: 0xc400-0xc7ff 0x0400 Kernel RW coherent
5: 0xc800-0xcfff 0x0800 Kernel RW coherent
6: 0xd000-0xdfff 0x1000 Kernel RW coherent
7: -

~# cat /sys/kernel/debug/segment_registers
---[ User Segments ]---
0x-0x0fff Kern key 1 User key 1 VSID 0xa085d0
0x1000-0x1fff Kern key 1 User key 1 VSID 0xa086e1
0x2000-0x2fff Kern key 1 User key 1 VSID 0xa087f2
0x3000-0x3fff Kern key 1 User key 1 VSID 0xa08903
0x4000-0x4fff Kern key 1 User key 1 VSID 0xa08a14
0x5000-0x5fff Kern key 1 User key 1 VSID 0xa08b25
0x6000-0x6fff Kern key 1 User key 1 VSID 0xa08c36
0x7000-0x7fff Kern key 1 User key 1 VSID 0xa08d47
0x8000-0x8fff Kern key 1 User key 1 VSID 0xa08e58
0x9000-0x9fff Kern key 1 User key 1 VSID 0xa08f69
0xa000-0xafff Kern key 1 User key 1 VSID 0xa0907a
0xb000-0xbfff Kern key 1 User key 1 VSID 0xa0918b

---[ Kernel Segments ]---
0xc000-0xcfff Kern key 0 User key 1 No Exec VSID 0x000ccc
0xd000-0xdfff Kern key 0 User key 1 No Exec VSID 0x000ddd
0xe000-0xefff Kern key 0 User key 1 No Exec VSID 0x000eee
0xf000-0x Kern key 0 User key 1 No Exec VSID 0x000fff

Aligning _etext to 128kb allows to map up to 32Mb text with 8 IBATs:
16Mb + 8Mb + 4Mb + 2Mb + 1Mb + 512kb + 256kb + 128kb (+ 128kb) = 32Mb
(A 9th IBAT is unneeded as 32Mb would need only a single 32Mb block)

Aligning data to 4M allows to map up to 512Mb data with 8 DBATs:
16Mb + 8Mb + 4Mb + 4Mb + 32Mb + 64Mb + 128Mb + 256Mb = 512Mb

Because some processors only have 4 BATs and because some targets need
DBATs for mapping other areas, the following patch will allow to
modify _etext and data alignment.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/Kconfig |  2 +
 arch/powerpc/include/asm/book3s/32/pgtable.h | 11 
 arch/powerpc/mm/init_32.c|  4 +-
 arch/powerpc/mm/mmu_decl.h   |  8 +++
 arch/powerpc/mm/pgtable_32.c | 10 +++-
 arch/powerpc/mm/ppc_mmu_32.c | 87 ++--
 6 files changed, 112 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index edef40a2b446..640a7cfba9d0 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -727,11 +727,13 @@ config THREAD_SHIFT
 
 config ETEXT_SHIFT
int
+   default 17 if STRICT_KERNEL_RWX && PPC_BOOK3S_32
default PPC_PAGE_SHIFT
 
 config DATA_SHIFT
int
default 24 if STRICT_KERNEL_RWX && PPC64
+   default 22 if STRICT_KERNEL_RWX && PPC_BOOK3S_32
default PPC_PAGE_SHIFT
 
 config FORCE_MAX_ZONEORDER
diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h 
b/arch/powerpc/include/asm/book3s/32/pgtable.h
index

[PATCH v3 06/16] powerpc/wii: remove wii_mmu_mapin_mem2()

2019-02-20 Thread Christophe Leroy

wii_mmu_mapin_mem2() is not used anymore, remove it.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/platforms/embedded6xx/wii.c | 28 
 1 file changed, 28 deletions(-)

diff --git a/arch/powerpc/platforms/embedded6xx/wii.c 
b/arch/powerpc/platforms/embedded6xx/wii.c
index ac4ee88efc80..235fe81aa2b1 100644
--- a/arch/powerpc/platforms/embedded6xx/wii.c
+++ b/arch/powerpc/platforms/embedded6xx/wii.c
@@ -54,10 +54,6 @@
 static void __iomem *hw_ctrl;
 static void __iomem *hw_gpio;
 
-unsigned long wii_hole_start;
-unsigned long wii_hole_size;
-
-
 static int __init page_aligned(unsigned long x)
 {
return !(x & (PAGE_SIZE-1));
@@ -69,30 +65,6 @@ void __init wii_memory_fixups(void)
 
BUG_ON(memblock.memory.cnt != 2);
BUG_ON(!page_aligned(p[0].base) || !page_aligned(p[1].base));
-
-   /* determine hole */
-   wii_hole_start = ALIGN(p[0].base + p[0].size, PAGE_SIZE);
-   wii_hole_size = p[1].base - wii_hole_start;
-}
-
-unsigned long __init wii_mmu_mapin_mem2(unsigned long top)
-{
-   unsigned long delta, size, bl;
-   unsigned long max_size = (256<<20);
-
-   /* MEM2 64MB@0x1000 */
-   delta = wii_hole_start + wii_hole_size;
-   size = top - delta;
-
-   if (__map_without_bats)
-   return delta;
-
-   for (bl = 128<<10; bl < max_size; bl <<= 1) {
-   if (bl * 2 > size)
-   break;
-   }
-   setbat(4, PAGE_OFFSET+delta, delta, bl, PAGE_KERNEL_X);
-   return delta + bl;
 }
 
 static void __noreturn wii_spin(void)
-- 
2.13.3

[PATCH v3 03/16] powerpc/mm/32s: rework mmu_mapin_ram()

2019-02-20 Thread Christophe Leroy

This patch reworks mmu_mapin_ram() to be more generic and map as much
blocks as possible. It now supports blocks not starting at address 0.

It scans DBATs array to find free ones instead of forcing the use of
BAT2 and BAT3.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/mm/ppc_mmu_32.c | 63 
 1 file changed, 41 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/mm/ppc_mmu_32.c b/arch/powerpc/mm/ppc_mmu_32.c
index b260ced065b4..5fc59b195fef 100644
--- a/arch/powerpc/mm/ppc_mmu_32.c
+++ b/arch/powerpc/mm/ppc_mmu_32.c
@@ -73,39 +73,58 @@ unsigned long p_block_mapped(phys_addr_t pa)
return 0;
 }
 
+static int find_free_bat(void)
+{
+   int b;
+
+   if (cpu_has_feature(CPU_FTR_601)) {
+   for (b = 0; b < 4; b++) {
+   struct ppc_bat *bat = BATS[b];
+
+   if (!(bat[0].batl & 0x40))
+   return b;
+   }
+   } else {
+   int n = mmu_has_feature(MMU_FTR_USE_HIGH_BATS) ? 8 : 4;
+
+   for (b = 0; b < n; b++) {
+   struct ppc_bat *bat = BATS[b];
+
+   if (!(bat[1].batu & 3))
+   return b;
+   }
+   }
+   return -1;
+}
+
+static unsigned int block_size(unsigned long base, unsigned long top)
+{
+   unsigned int max_size = (cpu_has_feature(CPU_FTR_601) ? 8 : 256) << 20;
+   unsigned int base_shift = (fls(base) - 1) & 31;
+   unsigned int block_shift = (fls(top - base) - 1) & 31;
+
+   return min3(max_size, 1U << base_shift, 1U << block_shift);
+}
+
 unsigned long __init mmu_mapin_ram(unsigned long base, unsigned long top)
 {
-   unsigned long tot, bl, done;
-   unsigned long max_size = (256<<20);
+   int idx;
 
if (__map_without_bats) {
printk(KERN_DEBUG "RAM mapped without BATs\n");
-   return 0;
+   return base;
}
 
-   /* Set up BAT2 and if necessary BAT3 to cover RAM. */
+   while ((idx = find_free_bat()) != -1 && base != top) {
+   unsigned int size = block_size(base, top);
 
-   /* Make sure we don't map a block larger than the
-  smallest alignment of the physical address. */
-   tot = top;
-   for (bl = 128<<10; bl < max_size; bl <<= 1) {
-   if (bl * 2 > tot)
+   if (size < 128 << 10)
break;
+   setbat(idx, PAGE_OFFSET + base, base, size, PAGE_KERNEL_X);
+   base += size;
}
 
-   setbat(2, PAGE_OFFSET, 0, bl, PAGE_KERNEL_X);
-   done = (unsigned long)bat_addrs[2].limit - PAGE_OFFSET + 1;
-   if ((done < tot) && !bat_addrs[3].limit) {
-   /* use BAT3 to cover a bit more */
-   tot -= done;
-   for (bl = 128<<10; bl < max_size; bl <<= 1)
-   if (bl * 2 > tot)
-   break;
-   setbat(3, PAGE_OFFSET+done, done, bl, PAGE_KERNEL_X);
-   done = (unsigned long)bat_addrs[3].limit - PAGE_OFFSET + 1;
-   }
-
-   return done;
+   return base;
 }
 
 /*
-- 
2.13.3

[PATCH v3 09/16] powerpc/mmu: add is_strict_kernel_rwx() helper

2019-02-20 Thread Christophe Leroy

Add a helper to know whether STRICT_KERNEL_RWX is enabled.

This is based on rodata_enabled flag which is defined only
when CONFIG_STRICT_KERNEL_RWX is selected.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/mmu.h | 11 +++
 arch/powerpc/mm/init_32.c  |  4 +---
 2 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h
index 6d22a8e78fe2..d34ad1657d7b 100644
--- a/arch/powerpc/include/asm/mmu.h
+++ b/arch/powerpc/include/asm/mmu.h
@@ -289,6 +289,17 @@ static inline u16 get_mm_addr_key(struct mm_struct *mm, 
unsigned long address)
 }
 #endif /* CONFIG_PPC_MEM_KEYS */
 
+#ifdef CONFIG_STRICT_KERNEL_RWX
+static inline bool strict_kernel_rwx_enabled(void)
+{
+   return rodata_enabled;
+}
+#else
+static inline bool strict_kernel_rwx_enabled(void)
+{
+   return false;
+}
+#endif
 #endif /* !__ASSEMBLY__ */
 
 /* The kernel use the constants below to index in the page sizes array.
diff --git a/arch/powerpc/mm/init_32.c b/arch/powerpc/mm/init_32.c
index 3e59e5d64b01..ee5a430b9a18 100644
--- a/arch/powerpc/mm/init_32.c
+++ b/arch/powerpc/mm/init_32.c
@@ -108,12 +108,10 @@ static void __init MMU_setup(void)
__map_without_bats = 1;
__map_without_ltlbs = 1;
}
-#ifdef CONFIG_STRICT_KERNEL_RWX
-   if (rodata_enabled) {
+   if (strict_kernel_rwx_enabled()) {
__map_without_bats = 1;
__map_without_ltlbs = 1;
}
-#endif
 }
 
 /*
-- 
2.13.3

[PATCH v3 14/16] powerpc/kconfig: make _etext and data areas alignment configurable on Book3s 32

2019-02-20 Thread Christophe Leroy

Depending on the number of available BATs for mapping the different
kernel areas, it might be needed to increase the alignment of _etext
and/or of data areas.

This patchs allows the user to do it via Kconfig.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/Kconfig | 32 ++--
 1 file changed, 30 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 640a7cfba9d0..20c4e3a62b90 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -725,16 +725,44 @@ config THREAD_SHIFT
  Used to define the stack size. The default is almost always what you
  want. Only change this if you know what you are doing.
 
+config ETEXT_SHIFT_BOOL
+   bool "Set custom etext alignment" if STRICT_KERNEL_RWX && PPC_BOOK3S_32
+   depends on ADVANCED_OPTIONS
+   help
+ This option allows you to set the kernel end of text alignment. When
+ RAM is mapped by blocks, the alignment needs to fit the size and
+ number of possible blocks. The default should be OK for most configs.
+
+ Say N here unless you know what you are doing.
+
 config ETEXT_SHIFT
-   int
+   int "_etext shift" if ETEXT_SHIFT_BOOL
+   range 17 28 if STRICT_KERNEL_RWX && PPC_BOOK3S_32
default 17 if STRICT_KERNEL_RWX && PPC_BOOK3S_32
default PPC_PAGE_SHIFT
+   help
+ On Book3S 32 (603+), IBATs are used to map kernel text.
+ Smaller is the alignment, greater is the number of necessary IBATs.
+
+config DATA_SHIFT_BOOL
+   bool "Set custom data alignment" if STRICT_KERNEL_RWX && PPC_BOOK3S_32
+   depends on ADVANCED_OPTIONS
+   help
+ This option allows you to set the kernel data alignment. When
+ RAM is mapped by blocks, the alignment needs to fit the size and
+ number of possible blocks. The default should be OK for most configs.
+
+ Say N here unless you know what you are doing.
 
 config DATA_SHIFT
-   int
+   int "Data shift" if DATA_SHIFT_BOOL
default 24 if STRICT_KERNEL_RWX && PPC64
+   range 17 28 if STRICT_KERNEL_RWX && PPC_BOOK3S_32
default 22 if STRICT_KERNEL_RWX && PPC_BOOK3S_32
default PPC_PAGE_SHIFT
+   help
+ On Book3S 32 (603+), DBATs are used to map kernel text and rodata RO.
+ Smaller is the alignment, greater is the number of necessary DBATs.
 
 config FORCE_MAX_ZONEORDER
int "Maximum zone order"
-- 
2.13.3

[PATCH v3 11/16] powerpc/kconfig: define CONFIG_DATA_SHIFT and CONFIG_ETEXT_SHIFT

2019-02-20 Thread Christophe Leroy

CONFIG_STRICT_KERNEL_RWX requires a special alignment
for DATA for some subarches. Today it is just defined
as an #ifdef in vmlinux.lds.S

In order to get more flexibility, this patch moves the
definition of this alignment in Kconfig

On some subarches, CONFIG_STRICT_KERNEL_RWX will
require a special alignment of _etext.

This patch also adds a configuration item for it in Kconfig

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/Kconfig  | 9 +
 arch/powerpc/kernel/vmlinux.lds.S | 9 +++--
 2 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 417e52a27f63..edef40a2b446 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -725,6 +725,15 @@ config THREAD_SHIFT
  Used to define the stack size. The default is almost always what you
  want. Only change this if you know what you are doing.
 
+config ETEXT_SHIFT
+   int
+   default PPC_PAGE_SHIFT
+
+config DATA_SHIFT
+   int
+   default 24 if STRICT_KERNEL_RWX && PPC64
+   default PPC_PAGE_SHIFT
+
 config FORCE_MAX_ZONEORDER
int "Maximum zone order"
range 8 9 if PPC64 && PPC_64K_PAGES
diff --git a/arch/powerpc/kernel/vmlinux.lds.S 
b/arch/powerpc/kernel/vmlinux.lds.S
index c3efb972c8c1..060a1acd7c6d 100644
--- a/arch/powerpc/kernel/vmlinux.lds.S
+++ b/arch/powerpc/kernel/vmlinux.lds.S
@@ -12,11 +12,8 @@
 #include 
 #include 
 
-#if defined(CONFIG_STRICT_KERNEL_RWX) && !defined(CONFIG_PPC32)
-#define STRICT_ALIGN_SIZE  (1 << 24)
-#else
-#define STRICT_ALIGN_SIZE  PAGE_SIZE
-#endif
+#define STRICT_ALIGN_SIZE  (1 << CONFIG_DATA_SHIFT)
+#define ETEXT_ALIGN_SIZE   (1 << CONFIG_ETEXT_SHIFT)
 
 ENTRY(_stext)
 
@@ -131,7 +128,7 @@ SECTIONS
 
} :kernel
 
-   . = ALIGN(PAGE_SIZE);
+   . = ALIGN(ETEXT_ALIGN_SIZE);
_etext = .;
PROVIDE32 (etext = .);
 
-- 
2.13.3

[PATCH v3 16/16] powerpc/kconfig: make _etext and data areas alignment configurable on 8xx

2019-02-20 Thread Christophe Leroy

On 8xx, large pages (512kb or 8M) are used to map kernel linear
memory. Aligning to 8M reduces TLB misses as only 8M pages are used
in that case. We make 8M the default for data.

This patchs allows the user to do it via Kconfig.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/Kconfig   | 18 +++---
 arch/powerpc/kernel/head_8xx.S |  4 ++--
 2 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index c4d6c97d7699..cf30a8f522b9 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -726,7 +726,8 @@ config THREAD_SHIFT
  want. Only change this if you know what you are doing.
 
 config ETEXT_SHIFT_BOOL
-   bool "Set custom etext alignment" if STRICT_KERNEL_RWX && PPC_BOOK3S_32
+   bool "Set custom etext alignment" if STRICT_KERNEL_RWX && \
+(PPC_BOOK3S_32 || PPC_8xx)
depends on ADVANCED_OPTIONS
help
  This option allows you to set the kernel end of text alignment. When
@@ -738,6 +739,7 @@ config ETEXT_SHIFT_BOOL
 config ETEXT_SHIFT
int "_etext shift" if ETEXT_SHIFT_BOOL
range 17 28 if STRICT_KERNEL_RWX && PPC_BOOK3S_32
+   range 19 23 if STRICT_KERNEL_RWX && PPC_8xx
default 17 if STRICT_KERNEL_RWX && PPC_BOOK3S_32
default 19 if STRICT_KERNEL_RWX && PPC_8xx
default PPC_PAGE_SHIFT
@@ -745,8 +747,13 @@ config ETEXT_SHIFT
  On Book3S 32 (603+), IBATs are used to map kernel text.
  Smaller is the alignment, greater is the number of necessary IBATs.
 
+ On 8xx, large pages (512kb or 8M) are used to map kernel linear
+ memory. Aligning to 8M reduces TLB misses as only 8M pages are used
+ in that case.
+
 config DATA_SHIFT_BOOL
-   bool "Set custom data alignment" if STRICT_KERNEL_RWX && PPC_BOOK3S_32
+   bool "Set custom data alignment" if STRICT_KERNEL_RWX && \
+   (PPC_BOOK3S_32 || PPC_8xx)
depends on ADVANCED_OPTIONS
help
  This option allows you to set the kernel data alignment. When
@@ -759,13 +766,18 @@ config DATA_SHIFT
int "Data shift" if DATA_SHIFT_BOOL
default 24 if STRICT_KERNEL_RWX && PPC64
range 17 28 if STRICT_KERNEL_RWX && PPC_BOOK3S_32
+   range 19 23 if STRICT_KERNEL_RWX && PPC_8xx
default 22 if STRICT_KERNEL_RWX && PPC_BOOK3S_32
-   default 19 if STRICT_KERNEL_RWX && PPC_8xx
+   default 23 if STRICT_KERNEL_RWX && PPC_8xx
default PPC_PAGE_SHIFT
help
  On Book3S 32 (603+), DBATs are used to map kernel text and rodata RO.
  Smaller is the alignment, greater is the number of necessary DBATs.
 
+ On 8xx, large pages (512kb or 8M) are used to map kernel linear
+ memory. Aligning to 8M reduces TLB misses as only 8M pages are used
+ in that case.
+
 config FORCE_MAX_ZONEORDER
int "Maximum zone order"
range 8 9 if PPC64 && PPC_64K_PAGES
diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 01ed8f3c95c8..63f1b7eec3f0 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -416,7 +416,7 @@ InstructionTLBMiss:
 #ifndef CONFIG_PIN_TLB_TEXT
 ITLBMissLinear:
mtcrr11
-#ifdef CONFIG_STRICT_KERNEL_RWX
+#if defined(CONFIG_STRICT_KERNEL_RWX) && CONFIG_ETEXT_SHIFT < 23
patch_site  0f, patch__itlbmiss_linmem_top8
 
mfspr   r10, SPRN_SRR0
@@ -537,7 +537,7 @@ DTLBMissIMMR:
 DTLBMissLinear:
mtcrr11
rlwinm  r10, r10, 20, 0x0f80/* 8xx supports max 256Mb RAM */
-#ifdef CONFIG_STRICT_KERNEL_RWX
+#if defined(CONFIG_STRICT_KERNEL_RWX) && CONFIG_DATA_SHIFT < 23
patch_site  0f, patch__dtlbmiss_romem_top8
 
 0: subis   r11, r10, (PAGE_OFFSET - 0x8000)@ha
-- 
2.13.3

[PATCH v3 12/16] powerpc/mm/32s: add setibat() clearibat() and update_bats()

2019-02-20 Thread Christophe Leroy

setibat() and clearibat() allows to manipulate IBATs independently
of DBATs.

update_bats() allows to update bats after init. This is done
with MMU off.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/book3s/32/mmu-hash.h |  2 ++
 arch/powerpc/kernel/head_32.S | 35 +++
 arch/powerpc/mm/ppc_mmu_32.c  | 32 
 3 files changed, 69 insertions(+)

diff --git a/arch/powerpc/include/asm/book3s/32/mmu-hash.h 
b/arch/powerpc/include/asm/book3s/32/mmu-hash.h
index 0c261ba2c826..5cb588395fdc 100644
--- a/arch/powerpc/include/asm/book3s/32/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/32/mmu-hash.h
@@ -92,6 +92,8 @@ typedef struct {
unsigned long vdso_base;
 } mm_context_t;
 
+void update_bats(void);
+
 /* patch sites */
 extern s32 patch__hash_page_A0, patch__hash_page_A1, patch__hash_page_A2;
 extern s32 patch__hash_page_B, patch__hash_page_C;
diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S
index c2f564690778..91b302b0797f 100644
--- a/arch/powerpc/kernel/head_32.S
+++ b/arch/powerpc/kernel/head_32.S
@@ -1104,6 +1104,41 @@ BEGIN_MMU_FTR_SECTION
 END_MMU_FTR_SECTION_IFSET(MMU_FTR_USE_HIGH_BATS)
blr
 
+_ENTRY(update_bats)
+   lis r4, 1f@h
+   ori r4, r4, 1f@l
+   tophys(r4, r4)
+   mfmsr   r6
+   mflrr7
+   li  r3, MSR_KERNEL & ~(MSR_IR | MSR_DR)
+   rlwinm  r0, r6, 0, ~MSR_RI
+   rlwinm  r0, r0, 0, ~MSR_EE
+   mtmsr   r0
+   mtspr   SPRN_SRR0, r4
+   mtspr   SPRN_SRR1, r3
+   SYNC
+   RFI
+1: bl  clear_bats
+   lis r3, BATS@ha
+   addir3, r3, BATS@l
+   tophys(r3, r3)
+   LOAD_BAT(0, r3, r4, r5)
+   LOAD_BAT(1, r3, r4, r5)
+   LOAD_BAT(2, r3, r4, r5)
+   LOAD_BAT(3, r3, r4, r5)
+BEGIN_MMU_FTR_SECTION
+   LOAD_BAT(4, r3, r4, r5)
+   LOAD_BAT(5, r3, r4, r5)
+   LOAD_BAT(6, r3, r4, r5)
+   LOAD_BAT(7, r3, r4, r5)
+END_MMU_FTR_SECTION_IFSET(MMU_FTR_USE_HIGH_BATS)
+   li  r3, MSR_KERNEL & ~(MSR_IR | MSR_DR | MSR_RI)
+   mtmsr   r3
+   mtspr   SPRN_SRR0, r7
+   mtspr   SPRN_SRR1, r6
+   SYNC
+   RFI
+
 flush_tlbs:
lis r10, 0x40
 1: addic.  r10, r10, -0x1000
diff --git a/arch/powerpc/mm/ppc_mmu_32.c b/arch/powerpc/mm/ppc_mmu_32.c
index ff8580c6ab11..66f1319e8e20 100644
--- a/arch/powerpc/mm/ppc_mmu_32.c
+++ b/arch/powerpc/mm/ppc_mmu_32.c
@@ -106,6 +106,38 @@ static unsigned int block_size(unsigned long base, 
unsigned long top)
return min3(max_size, 1U << base_shift, 1U << block_shift);
 }
 
+/*
+ * Set up one of the IBAT (block address translation) register pairs.
+ * The parameters are not checked; in particular size must be a power
+ * of 2 between 128k and 256M.
+ * Only for 603+ ...
+ */
+static void setibat(int index, unsigned long virt, phys_addr_t phys,
+   unsigned int size, pgprot_t prot)
+{
+   unsigned int bl = (size >> 17) - 1;
+   int wimgxpp;
+   struct ppc_bat *bat = BATS[index];
+   unsigned long flags = pgprot_val(prot);
+
+   if (!cpu_has_feature(CPU_FTR_NEED_COHERENT))
+   flags &= ~_PAGE_COHERENT;
+
+   wimgxpp = (flags & _PAGE_COHERENT) | (_PAGE_EXEC ? BPP_RX : BPP_XX);
+   bat[0].batu = virt | (bl << 2) | 2; /* Vs=1, Vp=0 */
+   bat[0].batl = BAT_PHYS_ADDR(phys) | wimgxpp;
+   if (flags & _PAGE_USER)
+   bat[0].batu |= 1;   /* Vp = 1 */
+}
+
+static void clearibat(int index)
+{
+   struct ppc_bat *bat = BATS[index];
+
+   bat[0].batu = 0;
+   bat[0].batl = 0;
+}
+
 unsigned long __init mmu_mapin_ram(unsigned long base, unsigned long top)
 {
int idx;
-- 
2.13.3

[PATCH v3 15/16] powerpc/8xx: don't disable large TLBs with CONFIG_STRICT_KERNEL_RWX

2019-02-20 Thread Christophe Leroy

This patch implements handling of STRICT_KERNEL_RWX with
large TLBs directly in the TLB miss handlers.

To do so, etext and sinittext are aligned on 512kB boundaries
and the miss handlers use 512kB pages instead of 8Mb pages for
addresses close to the boundaries.

It sets RO PP flags for addresses under sinittext.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/Kconfig |  2 ++
 arch/powerpc/include/asm/nohash/32/mmu-8xx.h |  3 +-
 arch/powerpc/kernel/head_8xx.S   | 54 +---
 arch/powerpc/mm/8xx_mmu.c| 31 +++-
 arch/powerpc/mm/init_32.c|  2 +-
 arch/powerpc/mm/mmu_decl.h   |  2 +-
 6 files changed, 78 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 20c4e3a62b90..c4d6c97d7699 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -739,6 +739,7 @@ config ETEXT_SHIFT
int "_etext shift" if ETEXT_SHIFT_BOOL
range 17 28 if STRICT_KERNEL_RWX && PPC_BOOK3S_32
default 17 if STRICT_KERNEL_RWX && PPC_BOOK3S_32
+   default 19 if STRICT_KERNEL_RWX && PPC_8xx
default PPC_PAGE_SHIFT
help
  On Book3S 32 (603+), IBATs are used to map kernel text.
@@ -759,6 +760,7 @@ config DATA_SHIFT
default 24 if STRICT_KERNEL_RWX && PPC64
range 17 28 if STRICT_KERNEL_RWX && PPC_BOOK3S_32
default 22 if STRICT_KERNEL_RWX && PPC_BOOK3S_32
+   default 19 if STRICT_KERNEL_RWX && PPC_8xx
default PPC_PAGE_SHIFT
help
  On Book3S 32 (603+), DBATs are used to map kernel text and rodata RO.
diff --git a/arch/powerpc/include/asm/nohash/32/mmu-8xx.h 
b/arch/powerpc/include/asm/nohash/32/mmu-8xx.h
index b0f764c827c0..0a1a3fc54e54 100644
--- a/arch/powerpc/include/asm/nohash/32/mmu-8xx.h
+++ b/arch/powerpc/include/asm/nohash/32/mmu-8xx.h
@@ -231,9 +231,10 @@ static inline unsigned int mmu_psize_to_shift(unsigned int 
mmu_psize)
 }
 
 /* patch sites */
-extern s32 patch__itlbmiss_linmem_top;
+extern s32 patch__itlbmiss_linmem_top, patch__itlbmiss_linmem_top8;
 extern s32 patch__dtlbmiss_linmem_top, patch__dtlbmiss_immr_jmp;
 extern s32 patch__fixupdar_linmem_top;
+extern s32 patch__dtlbmiss_romem_top, patch__dtlbmiss_romem_top8;
 
 extern s32 patch__itlbmiss_exit_1, patch__itlbmiss_exit_2;
 extern s32 patch__dtlbmiss_exit_1, patch__dtlbmiss_exit_2, 
patch__dtlbmiss_exit_3;
diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 4a2e3ffdb5bb..01ed8f3c95c8 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -292,6 +292,17 @@ SystemCall:
  */
EXCEPTION(0x1000, SoftEmu, program_check_exception, EXC_XFER_STD)
 
+/* Called from DataStoreTLBMiss when perf TLB misses events are activated */
+#ifdef CONFIG_PERF_EVENTS
+   patch_site  0f, patch__dtlbmiss_perf
+0: lwz r10, (dtlb_miss_counter - PAGE_OFFSET)@l(0)
+   addir10, r10, 1
+   stw r10, (dtlb_miss_counter - PAGE_OFFSET)@l(0)
+   mfspr   r10, SPRN_SPRG_SCRATCH0
+   mfspr   r11, SPRN_SPRG_SCRATCH1
+   rfi
+#endif
+
. = 0x1100
 /*
  * For the MPC8xx, this is a software tablewalk to load the instruction
@@ -405,10 +416,20 @@ InstructionTLBMiss:
 #ifndef CONFIG_PIN_TLB_TEXT
 ITLBMissLinear:
mtcrr11
+#ifdef CONFIG_STRICT_KERNEL_RWX
+   patch_site  0f, patch__itlbmiss_linmem_top8
+
+   mfspr   r10, SPRN_SRR0
+0: subis   r11, r10, (PAGE_OFFSET - 0x8000)@ha
+   rlwinm  r11, r11, 4, MI_PS8MEG ^ MI_PS512K
+   ori r11, r11, MI_PS512K | MI_SVALID
+   rlwinm  r10, r10, 0, 0x0ff8 /* 8xx supports max 256Mb RAM */
+#else
/* Set 8M byte page and mark it valid */
li  r11, MI_PS8MEG | MI_SVALID
-   mtspr   SPRN_MI_TWC, r11
rlwinm  r10, r10, 20, 0x0f80/* 8xx supports max 256Mb RAM */
+#endif
+   mtspr   SPRN_MI_TWC, r11
ori r10, r10, 0xf0 | MI_SPS16K | _PAGE_SH | _PAGE_DIRTY | \
  _PAGE_PRESENT
mtspr   SPRN_MI_RPN, r10/* Update TLB entry */
@@ -494,16 +515,6 @@ DataStoreTLBMiss:
rfi
patch_site  0b, patch__dtlbmiss_exit_1
 
-#ifdef CONFIG_PERF_EVENTS
-   patch_site  0f, patch__dtlbmiss_perf
-0: lwz r10, (dtlb_miss_counter - PAGE_OFFSET)@l(0)
-   addir10, r10, 1
-   stw r10, (dtlb_miss_counter - PAGE_OFFSET)@l(0)
-   mfspr   r10, SPRN_SPRG_SCRATCH0
-   mfspr   r11, SPRN_SPRG_SCRATCH1
-   rfi
-#endif
-
 DTLBMissIMMR:
mtcrr11
/* Set 512k byte guarded page and mark it valid */
@@ -525,10 +536,29 @@ DTLBMissIMMR:
 
 DTLBMissLinear:
mtcrr11
+   rlwinm  r10, r10, 20, 0x0f80/* 8xx supports max 256Mb RAM */
+#ifdef CONFIG_STRICT_KERNEL_RWX
+   patch_site  0f, patch__dtlbmiss_romem_top8
+
+0: subis   r11, r10, (PAGE_OFFSET - 0x8000)@ha
+   rlwinm  r11, r11,

[PATCH v3 07/16] powerpc/mm/32s: use _PAGE_EXEC in setbat()

2019-02-20 Thread Christophe Leroy

Do not set IBAT when setbat() is called without _PAGE_EXEC

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/mm/ppc_mmu_32.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/mm/ppc_mmu_32.c b/arch/powerpc/mm/ppc_mmu_32.c
index 5fc59b195fef..ff8580c6ab11 100644
--- a/arch/powerpc/mm/ppc_mmu_32.c
+++ b/arch/powerpc/mm/ppc_mmu_32.c
@@ -131,6 +131,7 @@ unsigned long __init mmu_mapin_ram(unsigned long base, 
unsigned long top)
  * Set up one of the I/D BAT (block address translation) register pairs.
  * The parameters are not checked; in particular size must be a power
  * of 2 between 128k and 256M.
+ * On 603+, only set IBAT when _PAGE_EXEC is set
  */
 void __init setbat(int index, unsigned long virt, phys_addr_t phys,
   unsigned int size, pgprot_t prot)
@@ -157,11 +158,12 @@ void __init setbat(int index, unsigned long virt, 
phys_addr_t phys,
bat[1].batu |= 1;   /* Vp = 1 */
if (flags & _PAGE_GUARDED) {
/* G bit must be zero in IBATs */
-   bat[0].batu = bat[0].batl = 0;
-   } else {
-   /* make IBAT same as DBAT */
-   bat[0] = bat[1];
+   flags &= ~_PAGE_EXEC;
}
+   if (flags & _PAGE_EXEC)
+   bat[0] = bat[1];
+   else
+   bat[0].batu = bat[0].batl = 0;
} else {
/* 601 cpu */
if (bl > BL_8M)
-- 
2.13.3

[PATCH v3 04/16] powerpc/mm/32s: use generic mmu_mapin_ram() for all blocks.

2019-02-20 Thread Christophe Leroy

Now that mmu_mapin_ram() is able to handle other blocks
than the one starting at 0, the WII can use it for all
its blocks.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/mm/pgtable_32.c | 25 +++--
 1 file changed, 7 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
index b4858818523f..fd665c32a1f7 100644
--- a/arch/powerpc/mm/pgtable_32.c
+++ b/arch/powerpc/mm/pgtable_32.c
@@ -254,26 +254,15 @@ static void __init __mapin_ram_chunk(unsigned long 
offset, unsigned long top)
 
 void __init mapin_ram(void)
 {
-   unsigned long s, top;
-
-#ifndef CONFIG_WII
-   top = total_lowmem;
-   s = mmu_mapin_ram(0, top);
-   __mapin_ram_chunk(s, top);
-#else
-   if (!wii_hole_size) {
-   s = mmu_mapin_ram(0, total_lowmem);
-   __mapin_ram_chunk(s, total_lowmem);
-   } else {
-   top = wii_hole_start;
-   s = mmu_mapin_ram(0, top);
-   __mapin_ram_chunk(s, top);
+   struct memblock_region *reg;
+
+   for_each_memblock(memory, reg) {
+   unsigned long base = reg->base;
+   unsigned long top = base + reg->size;
 
-   top = memblock_end_of_DRAM();
-   s = wii_mmu_mapin_mem2(top);
-   __mapin_ram_chunk(s, top);
+   base = mmu_mapin_ram(base, top);
+   __mapin_ram_chunk(base, top);
}
-#endif
 }
 
 /* Scan the real Linux page tables and return a PTE pointer for
-- 
2.13.3

[PATCH v3 08/16] powerpc/32: add helper to write into segment registers

2019-02-20 Thread Christophe Leroy

This patch add an helper which wraps 'mtsrin' instruction
to write into segment registers.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/reg.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index 1c98ef1f2d5b..a70cbaf5c26f 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -1425,6 +1425,11 @@ static inline void msr_check_and_clear(unsigned long 
bits)
 #define mfsrin(v)  ({unsigned int rval; \
asm volatile("mfsrin %0,%1" : "=r" (rval) : "r" (v)); \
rval;})
+
+static inline void mtsrin(u32 val, u32 idx)
+{
+   asm volatile("mtsrin %0, %1" : : "r" (val), "r" (idx));
+}
 #endif
 
 #define proc_trap()asm volatile("trap")
-- 
2.13.3

[PATCH v3 10/16] powerpc/kconfig: define PAGE_SHIFT inside Kconfig

2019-02-20 Thread Christophe Leroy

This patch defined CONFIG_PPC_PAGE_SHIFT in order
to be able to use PAGE_SHIFT value inside Kconfig.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/Kconfig|  7 +++
 arch/powerpc/include/asm/page.h | 13 ++---
 2 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 849b0d5ac3d1..417e52a27f63 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -708,6 +708,13 @@ config PPC_256K_PAGES
 
 endchoice
 
+config PPC_PAGE_SHIFT
+   int
+   default 18 if PPC_256K_PAGES
+   default 16 if PPC_64K_PAGES
+   default 14 if PPC_16K_PAGES
+   default 12
+
 config THREAD_SHIFT
int "Thread shift" if EXPERT
range 13 15
diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h
index aa4497175bd3..ed870468ef6f 100644
--- a/arch/powerpc/include/asm/page.h
+++ b/arch/powerpc/include/asm/page.h
@@ -20,20 +20,11 @@
 
 /*
  * On regular PPC32 page size is 4K (but we support 4K/16K/64K/256K pages
- * on PPC44x). For PPC64 we support either 4K or 64K software
+ * on PPC44x and 4K/16K on 8xx). For PPC64 we support either 4K or 64K software
  * page size. When using 64K pages however, whether we are really supporting
  * 64K pages in HW or not is irrelevant to those definitions.
  */
-#if defined(CONFIG_PPC_256K_PAGES)
-#define PAGE_SHIFT 18
-#elif defined(CONFIG_PPC_64K_PAGES)
-#define PAGE_SHIFT 16
-#elif defined(CONFIG_PPC_16K_PAGES)
-#define PAGE_SHIFT 14
-#else
-#define PAGE_SHIFT 12
-#endif
-
+#define PAGE_SHIFT CONFIG_PPC_PAGE_SHIFT
 #define PAGE_SIZE  (ASM_CONST(1) << PAGE_SHIFT)
 
 #ifndef __ASSEMBLY__
-- 
2.13.3

< 1 2 3 4 5 6 7 8 9 10 >

401 - 500 of 1140 matches

Mail list logo